Ben,
You do need to use a separate instance of those 3 classes for each
index yes. But this is really something like:
IndexWriter writer = new IndexWriter();
So it's normal code-writing process you don't really have to create
anything new, just use existing Lucene API. As for locking,
Make sure you are not indexing your documents using the compound index
format (default in the newer versions of Lucene). Then you will see
the .frq file. Here is an example from one of Simpy's Lucene indices:
-rw-r--r--1 simpysimpy 629073 Feb 26 13:14 _1ao.frq
Otis
--
The most obvious answer is that the full-text indexing features of
RDBMS's are not as good (as fast) as Lucene. MySQL, PostgreSQL,
Oracle, MS SQL Server etc. all have full-text indexing/searching
features, but I always hear people complaining about the speed. A
person from a well-known online
Or you could just open a new IndexSearcher, forget the old one, and
have GC collect it when everyone is done with it.
Otis
--- Chris Lamprecht [EMAIL PROTECTED] wrote:
I should have mentioned, the reason for not doing this the obvious,
simple way (just close the Searcher and reopen it if a
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
Otis
--- Matt Chaput [EMAIL PROTECTED] wrote:
Is there a simple, efficient way to compute similarity of
this leave open file handles? I had a problem where there
were lots of open file handles for deleted index files, because the
old searchers were not being closed.
On Fri, 18 Feb 2005 13:41:37 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Or you could just open a new IndexSearcher
Hi Paul,
If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory. That's
a no no.
This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
Hi,
lucene.apache.org seems to work now.
Here is the query syntax:
http://lucene.apache.org/queryparsersyntax.html
[] is used as [BEGIN-RANGE-STRING TO END-RANGE-STRING]
Otis
--- Jim Lynch [EMAIL PROTECTED] wrote:
First I'm getting a
The requested URL could not be retrieved
The QueryParser is analyzing your Field.Keyword (genre field) fields,
because it doesn't know that genre is a Keyword field and should not be
analyzed.
Check section 4.4. here:
http://www.lucenebook.com/search?query=queryparser+keyword
Otis
--- Mike Rose [EMAIL PROTECTED] wrote:
Perhaps
Get and try Lucene 1.4.3. One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.
My understanding is that an
Using different analyzers for indexing and searching is not
recommended.
Your numbers are not even in the index because you are using
StandardAnalyzer. Use Luke to look at your index.
Otis
--- Hetan Shah [EMAIL PROTECTED] wrote:
Hello,
How can one search for a document based on the query
If you are not married to Java:
http://search.cpan.org/~kilinrax/HTML-Strip-1.04/Strip.pm
Otis
--- sergiu gordea [EMAIL PROTECTED] wrote:
Karl Koch wrote:
I am in control of the html, which means it is well formated HTML. I
use
only HTML files which I have transformed from XML. No
Adam,
Dawid posted some code that lets you use Carrot2 locally with Lucene,
without the componentized pipe line system described on Carrot2 site.
Otis
--- Adam Saltiel [EMAIL PROTECTED] wrote:
David, Hi,
Would you be able to comment on coincidentally recent thread RE: -
Grouping Search
I don't think there is a direct way to get the number of (unique) terms
in the index, so yes, I think you'll have to loop through TermEnum and
count.
Otis
--- Jonathan Lasko [EMAIL PROTECTED] wrote:
I'm looking for the total number of unique terms in the index. I see
that I can get a
Edwin,
--- Edwin Tang [EMAIL PROTECTED] wrote:
I have three indices really that I search via ParallelMultiSearcher.
All three
are being updated constantly. We would like to be able to perform a
search on
the indices and have the results reflect the latest documents
indexed. However,
that
Morus,
that description of 3 sets of index files is what I was imagining, too.
I'll have to test and add to the book errata, it seems.
Thanks for the info,
Otis
--- Morus Walter [EMAIL PROTECTED] wrote:
Otis Gospodnetic writes:
Hello,
Yes, that is how optimize works - copies all
:
Is Lucene-in-Action being sold anywhere in Singapore?
thanks!
Otis Gospodnetic [EMAIL PROTECTED] wrote: Gospodnetiæ
sounds like Gospodnetich and Eric is Erik :)
Otis
--- John Haxby wrote:
Otis Gospodnetic wrote:
I contacted both the US and UK Amazon sites and asked them
Karl,
This is completely fine. You can have documents with different fields
in the same index.
Otis
--- Karl Koch [EMAIL PROTECTED] wrote:
Hello all,
perhaps not such a sophisticated question:
I would like to have a very diverse set of documents in one index.
Depending
on the inside
Luke,
Boosting is only one of the factors involved in Document/Query scoring.
Assuming that by applying your boosts to Document A or a single field
of Document A increases the total score enough, yes, that Document A
may have the highest score. But just because you boost a single
Document and
Hello Karl,
Grab the source code for Lucene in Action, it's got code that parses
and indexes XML with DOM and SAX. You can see the coverage of that
stuff here:
http://lucenebook.com/search?query=indexing+XML+section%3A7*
I haven't used kXML, but I imagine the LIA code should get you going
Hello,
Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.
see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
You
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik. I think this is a business opportunity.
How many people are hating me now and going shh? Raise your
hands!
Otis
--- David Spencer [EMAIL PROTECTED] wrote:
This reminds me, has anyone every discussed
files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Hello,
Yes, that is how optimize works - copies all existing index
segments into one unified index segment, thus
500 times the original data? Not true! :)
Otis
--- Xiaohong Yang (Sharon) [EMAIL PROTECTED] wrote:
Hi,
I agree that Google mini is quite expensive. It might be similar to
the desktop version in quality. Anyone knows google's ratio of index
to text? Is it true that Lucene's index is
Publisher - Amazon information feed seems to be a fairly manual
process, and Amazon takes a while to update book information on their
site, including prices.
I contacted both the US and UK Amazon sites and asked them to fix my
last name (the last character in my name has a little slash (not an
Hi Luke,
That's not hard with RangeQuery (supported by QueryParser), take a look
at this:
http://www.lucenebook.com/search?query=date+range
The grayed-out text has the section name and page number, so you can
quickly locate this stuff in your ebook.
Otis
P.S.
Do you know if Indigo/Chapters
Gospodneti#263; sounds like Gospodnetich and Eric is Erik :)
Otis
--- John Haxby [EMAIL PROTECTED] wrote:
Otis Gospodnetic wrote:
I contacted both the US and UK Amazon sites and asked them to fix my
last name (the last character in my name has a little slash (not an
accent) above
Hello Simeon,
Heterogenous Documents/indices are OK - check out the second hit:
http://www.lucenebook.com/search?query=heterogenous+different
Otis
--- Simeon Koptelov [EMAIL PROTECTED] wrote:
Hello all. I'm new to lucene and think about using it in my project.
I have prices with dynamic
I don't have a document with chinese characters to verify this, but it
looks right, so I'll add your change to SearchFiles.java.
Thanks,
Otis
--- Eric Chow [EMAIL PROTECTED] wrote:
Search not really correct with UTF-8 !!!
The following is the search result that I used the SearchFiles in
That would be a partial solution. Accents will not be a problem any
more, but if you use an Analyzer than stems tokens, they will not rally
be tokenized properly. Searches will probably work, but if you look at
the index you will see that some terms were not analyzed properly. But
it may be
A number of people have tried putting Lucene indices in RDBMS. As far
as I know, all were slower than FSDirectory.
Otis
--- nafise hassani [EMAIL PROTECTED] wrote:
Hi
I want to know from the performance point of view it
is better to save lucene indexes in database or use
them as files???
It would be interesting to know _what_exactly_ uses your memory.
Running under an optimizer should tell you that.
The only thing that comes to mind is... can't remember the details now,
but when the index is opened, I believe every 128th term is read into
memory. This, I believe, helps with
There Kevin, that's what I was referring to, the .tii file.
Otis
--- Paul Elschot [EMAIL PROTECTED] wrote:
On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
Kevin A. Burton wrote:
We have one large index right now... its about 60G ... When I
open it
the Java VM used 940M
Hi Ansi,
If you want the print version, I would guess you could order it from
the publisher (http://www.manning.com/hatcher2) or from Amazon and they
will ship it to you in China. The electronic version (a PDF file) is
also available from the above URL.
I'll ask Manning Publications and see
Yes, I remember your email about the large number of Terms. If it can
be avoided and you figure out how to do it, I'd love to patch
something. :)
Otis
--- Kevin A. Burton [EMAIL PROTECTED] wrote:
Otis Gospodnetic wrote:
It would be interesting to know _what_exactly_ uses your memory
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:
./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java
You can find more
This:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.TooManyClauses.html
?
You can control that limit via
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.html#maxClauseCount
Otis
--- Jerry Jalenak [EMAIL PROTECTED] wrote:
OK.
Hi Ranjan,
It sounds like you are should look at and use Nutch:
http://www.nutch.org
Otis
--- Ranjan K. Baisak [EMAIL PROTECTED] wrote:
I am planning to move to Lucene but not have much
knowledge on the same. The search engine which I had
developed is searching some extranet URLs e.g.
Hello Ashley,
You can read/search while modifying the index, but you have to ensure
only one thread or only one process is modifying an index at any given
time. Both IndexReader and IndexWriter can be used to modify an index.
The former to delete Documents and the latter to add them. You have
If you are hosting the code somewhere (e.g. your site, SF, java.net,
etc.), we should link to them from one of the Lucene pages where we
link to related external tools, apps, and such.
Otis
--- Safarnejad, Ali (AFIS) [EMAIL PROTECTED] wrote:
I've written a Chinese Analyzer for Lucene that
Hello Chetan,
The code that comes with the Lucene book contains a little framework
for indexing rich-text documents. It sounds like you may be able to
use it as-is, and extending it with a parser for Excel files, which we
didn't include in the code (whould we include it in the next edition?).
The Wiki has some info about Lucene 2.0, but that is all there is about
2.0.
Regarding transactions - have you tried DbDirectory? I believe that
will provide XA support and it won't require Lucene changes.
Otis
--- John Wang [EMAIL PROTECTED] wrote:
Hi:
When is lucene 2.0 scheduled to
No, you can't add documents to an index once you close the IndexWriter.
You can re-open the IndexWriter and add more documents, of course.
Otis
--- Oscar Picasso [EMAIL PROTECTED] wrote:
Hi,
Is it safe to add documents to an IndexWriter that has been closed?
From what I have seen, the
Going for the segments file like that is not a recommended practise, or
at least not something I'd recommend. 'segments' file is really
something that a caller should not know anything about. Once day
Lucene may choose to rename the segments file or some such, and the
code that uses this trick
I didn't pay full attention to this thread, but it sounds like somebody
may be interested in RuntimeShutdownHook (or some similar name) as a
place to try to release the locks.
Otis
--- Joseph Ottinger [EMAIL PROTECTED] wrote:
On Tue, 11 Jan 2005, Doug Cutting wrote:
Joseph Ottinger wrote:
Eh, that exactly :) When I read my emails in reverse order
--- Chris Lamprecht [EMAIL PROTECTED] wrote:
What about a shutdown hook?
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() { /* whatever */ }
});
see also
Use one index, working with a single index is simpler. Also, once you
pull a Document from Hits object, all Fields are read off of the disk.
There was some discussion about selective Field reading about a week
ago, check the list archives. Also keep in mind Field compression is
now possible
Hello,
If you search for India OR Test, you will find both, if you use AND,
you will find none. Lucene can search any text, not just files. It
sounds like you are using Lucene's demo as a real application (not a
good practise). I suggest you take a look at the Resources page on the
Lucene Wiki
Hi John,
There is no API for this, but I recall somebody talking about adding
support for this a few months back. I even think that somebody might
have contributed a patch for this. I am not certain about this, but
check the patch queue (link on Lucene site). If there is a patch
there, even if
The book is $44.95 USD - it's printed on the back cover. Amazon had
the correct price (minus their discount) until recently. They are just
very slow with their site/book info updates, but I'm sure they'll fix
it eventually.
Otis
--- Erik Hatcher [EMAIL PROTECTED] wrote:
On Jan 6, 2005, at
Nutch (nutch.org) has a pretty sophisticated infrastructure for
distributed searching, but it doesn't use RemoteSearcher.
Otis
--- Yura Smolsky [EMAIL PROTECTED] wrote:
Hello.
Does anyone know application which based on RemoteSearcher to
distribute index on many servers?
Yura Smolsky,
That's the correct place to look and it includes code samples.
Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
normally programmatically, yes :).
Otis
--- Hetan Shah [EMAIL PROTECTED] wrote:
Has any one used NekoHTML ? If so how do I use it. Is it a stand
alone
jar
Hello,
--- mahaveer jain [EMAIL PROTECTED] wrote:
I am looking out to implement sorting in my lucene application. This
is what my code look like.
I am using StandardAnalyzer() analyzer.
Query query = QueryParser.parse(keyword, contents, analyzer);
Sort sortCol = new Sort(new
Correct.
The self-maintenance you are referring to is Lucene's periodic segment
merging. The frequency of that can be controlled through IndexWriter's
mergeFactor.
Otis
--- aurora [EMAIL PROTECTED] wrote:
Are not optimized indices causing you any problems (e.g. slow
searches,
high number
WhitespaceAnalyzer will let you have it. It just breaks the input on
spaces.
Otis
--- Jim [EMAIL PROTECTED] wrote:
I've seen some discussion on this and the answer seems to be write
your
own. Hasn't someone already done that by now that would share? I
really have to be able to include
Most definitely Jetty. I can't believe you're using Tomcat for Rojo!
;)
Otis
--- Erik Hatcher [EMAIL PROTECTED] wrote:
Wrong list.
Though perhaps you should be using Jetty ;)
Erik
On Dec 23, 2004, at 4:17 PM, Kevin A. Burton wrote:
What in the world is up with this
Martijn, have you seen the Highlighter in the Lucene Sandbox?
If you've stored your text in the Lucene index, there is no need to go
back to DB to pull out the blog, parse it, and highlight it - the
Highlighter in the Sandbox will do this for you.
Otis
--- M. Smit [EMAIL PROTECTED] wrote:
If you are not tied to Java, see 'unac' at http://www.senga.org/.
It's old, but if nothing else you could see how it works and rewrite it
in Java. And if you can, you can donate it to Lucene Sandbox.
Otis
--- Peter Pimley [EMAIL PROTECTED] wrote:
Hi everyone,
The Question:
In Java
I suspect Martijn really wants that snippet dynamically generated, with
KWIC, as on the lucenebook.com screen shot. Thus, he can't generate
and store the snippet at index time, and has to construct it at search
time.
Otis
--- Mike Snare [EMAIL PROTECTED] wrote:
But for the other issue on
For simpy.com I store the full text of web pages in Lucene, in order to
provide full-text web searches. Nutch (nutch.org) does the same. You
can set the maximal number of tokens you want indexed via IndexWriter.
You can also compress fields in the newest version of Lucene (or maybe
just the one
I _think_ you'd be better off doing it all at once, but I wouldn't
trust myself on this and would instead construct a small 3-index set
and test, looking at a) maximal disk usage, b) time, and c) RAM usage.
:)
Otis
--- Ryan Aslett [EMAIL PROTECTED] wrote:
Hi there, Im about to embark on a
Another possibility is that you are using an older version of Lucene,
which was known to have a bug with similar symptoms. Get the latest
version of Lucene.
You shouldn't really have multiple .cfs files after optimizing your
index. Also, optimize only at the end, if you care about indexing
Hello,
I think some of these questions my be answered in the jGuru FAQ
So my question is would it be an overkill to optimize everyday?
Only if lots of documents are being added/deleted, and you end up with
a lot of index segments.
Is
there
any guideline on how often to optimize?
When searching for phrases, what's important is the position of each
token/word extracted by the Analyzer.
WhitespaceAnalyzer/LowerCaseFilter don't do anything with the
positional information. There is nothing else in your Analyzer?
In any case, the following should help you see what your
Alex, I think you want this:
+city:London +city:Amsterdam +address:1_street +address:2_street
Otis
--- Alex Kiselevski [EMAIL PROTECTED] wrote:
Thanks Morus
So if I understand right
If the seqond query is :
+city(London) +city(Amsterdam) +address(1_street) +address(2_street)
Both
The only place where you have to specify that you are using the
compound index format is on IndexWriter instance. Nothing needs to be
done at search time on IndexSearcher.
Otis
--- Hetan Shah [EMAIL PROTECTED] wrote:
Thanks Chuck,
I now understand why I see only one file. Another question
The exact disk space usage depends on the number of fields in the index
and on how many of them store the original text. You should also keep
in mind that the call to IndexWriter's optimize() will result in your
index directory size doubling while the optimization is in progress, so
if you want
Hello,
As Erik already said - that Analyzer is really there to get people
going quickly and as a 'does pretty good' Analyzer. There is no
Analyzer that will work for everyone, and Analyzers are meant to be
custom-made. It looks like you already got that figured out and have
your own Analyzer.
--- Otis Gospodnetic [EMAIL PROTECTED]
wrote:
Hello,
There are a few things you can do:
1) Don't just pull all rows from the DB at once. Do
that in batches.
2) If you can get a Reader from your SqlDataReader,
consider this:
http://jakarta.apache.org/lucene/docs
Note that this really includes some extra steps.
You don't need a temp index. Add everything to a single index using a
single IndexWriter instance. No need to call addIndexes nor optimize
until the end. Adding Documents to an index takes a constant amount of
time, regardless of the index size,
There is one case that I can think of where this 'constant' scoring
would be useful, and I think Chuck already mentioned this 1-2 months
ago. For instace, having such scores would allow one to create alert
applications where queries run by some scheduler would trigger an alert
whenever the score
On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Hello John,
I believe you didn't get any replies to this. What you are
describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Otis
--- Bruce Ritchie [EMAIL PROTECTED] wrote:
Christoph,
I'm not entirely certain if this is what you want, but a while back
David Spencer did code up a 'More Like
Well, one could always partition an index, distribute pieces of it
horizontally across multiple 'search servers' and use the built-in
RMI-based and Parallel search feature. Nutch uses something similar
for search scaling.
Otis
--- Monsur Hossain [EMAIL PROTECTED] wrote:
My concern is that
Hello John,
I believe you didn't get any replies to this. What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.
I don't have the need for this, but others might, so
You can see Flickr-like tag (lookup) system at my Simpy site (
http://www.simpy.com ). It uses Lucene as the backend for lookups, but
still uses a RDBMS as the primary storage.
I find it that keeping the RDBMS and Lucene indices is a bit of a pain
and error prone, so _thin_ storage layer with
Hello,
There are a few things you can do:
1) Don't just pull all rows from the DB at once. Do that in batches.
2) If you can get a Reader from your SqlDataReader, consider this:
Hello,
This is probably due to some bad HTML. The application you are using
is just a demo, and uses a JavaCC-based HTML parser, which may not be
resilient to invalid HTML. For Lucene in Action we developed a little
extensible indexing framework, and for HTML indexing we used 2 tools to
handle
polluted.
*
* TODO: this tool should really lock the directory for writing before
* removing any Lucene segment files, otherwise this tool itself may
* corrupt the index.
*
* @author Otis Gospodnetic
* @version $Id$
*/
public class SegmentPurger
{
// TODO: copied from SegmentMerger
Ying,
You should follow this finally block advice below. In addition, I
think you can just close the reader, and it will close the underlying
stream (I'm not sure about that, double-check it).
You are not running out of file handles, though. Your JVM is running
out of memory. You can play
Hello Garrett,
Share some code, it will be easier for others to help you that way.
Obviously, this would be a huge bug if the problem were within Lucene.
Otis
--- Garrett Heaver [EMAIL PROTECTED] wrote:
Can anyone please explain to my why maxDoc returns 0 when Luke shows
239,473
documents?
There is no need to reindex. However, I also don't quite get what the
problem is :)
Otis
--- Santosh [EMAIL PROTECTED] wrote:
hi,
when I restart the tomcat . the Index is getting corrupted. If I take
the backup of Index and then restarting tomcat. the Index is not
working properly.
Leading wildcard character (*) is not allowed if you use QueryParser
that comes with Lucene. Reason: performance. See many discussions
about this on lucene-user mailing list. Also see the search sytax
document on the Lucene site. What other characters are you having
trouble with?
Otis
---
?
But then is there a point putting an empty value in it, if an
application will never search for empty values?
thanks
-pedja
Otis Gospodnetic said the following on 12/8/2004 1:31 AM:
Empty fields won't add any value, you can skip them. Documents in
an
index don't have
Hello,
You can use BooleanQuery for that.
Otis
--- Ravi [EMAIL PROTECTED] wrote:
Hi
How do you get all documents in lucene where a particular field
value
is in a given list of values (like SQL IN). What kind of Query class
should I use?
Thanks in advance.
Ravi.
Hello,
Yes, Lucene in Action has been listed on Amazon for a while now (I
think I recorded this in my blog some time back). The publish date is,
I believe, the date provided by publishers, but things almost always
take longer than predicted, so 31.12.2004 may be a bit off. :(
However, the ebook
If you run the same query again, the IndexSearcher will go all the way
to the index again - no caching. Some caching will be done by your
file system, possibly, but that's it. Lucene is fast, so don't
optimize early.
Otis
--- Ben Rooney [EMAIL PROTECTED] wrote:
thanks chris,
you are
Empty fields won't add any value, you can skip them. Documents in an
index don't have to be uniform. Each Document could have a different
set of fields. Of course, that has some obvious implications for
search, but is perfectly fine technically.
Otis
--- [EMAIL PROTECTED] [EMAIL PROTECTED]
If I were you, I would first use Luke to peek at the index. You may
find something obvious there, like multiple copies of the same
Document.
Does your temp index 'overlap' with A index in terms of Documents? If
so, you will end up with multliple copies, as addIndexes method doesn't
detect and
This smells like a Windows issue. It is possible that something in
your JVM is still holding onto the index directory (for example,
FSDirectory), and Winblows is not letting you remove the directory. I
bet this will work if you exit the JVM and run java.io.file.delete()
without calling Lucene.
Hm, if you can index 11, you should be able to index 8 as well. In any
case, you most likely want to make sure that your Analyzer is not just
throwing your numbers out. This may stillbe up to date:
http://www.jguru.com/faq/view.jsp?EID=538308
See also:
Hello,
Try changing IndexWriter's mergeFactor variable. It's 10 by default.
Change it to 1, for instance.
Otis
--- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Greetings,
Ok, so maybe this is common knowledge to most of you but I'm a lamen
when it comes to Lucene and
I couldnt find any
This is entirely application-specific. As the simplest approach, you
can index each user's documents in a separate index and use
(Parallel)MultiSearcher to search appropriate indices (which ones are
appropriate to search has to be a part of your app's access control
logic).
Otis
--- Paul
In my experiments with mergeFactor I found the point of diminishing/no
returns. If I remember correctly, I hit the limit at mergeFactor of
50.
But here is something from Lucene in Action that you can use to play
with various index tuning factors and see their effect on indexing
performance.
Yes, it's not wise to just pull all Document instances from Hits
instance, unless you really need them all. I don't do that, I really
just provide a wrapper, like this:
/**
* A simple List implementation wrapping a Hits object.
*
* @author Otis Gospodnetic
* @version $Id: HitList.java,v 1.4
Hello and quick answers:
See IndexWriter javadoc and in particular mergeFactor, minMergeDocs,
and maxMergeDocs. This will let you control the size of your segments,
the frequency of segment merges, the amount of buffered Documents in
RAM between segment merges and such. Also, you ask about
This is very similar to what I do - I create a List of Maps from Hits
and its Documents. So I think this change may be handy, if doable (I
didn't look into changing the two Lucene classes, actually).
Otis
--- petite_abeille [EMAIL PROTECTED] wrote:
On Dec 01, 2004, at 13:37, Karthik N S
Hello,
Lucene indexing completes in 13-15 hours on the desktop system while
it completes in about 29-33
hours on the notebook.
Now, combine it with the DROP INDEX tests completing in the same
amount of time on both and find
out why is the search only slightly faster :)
Until then, all
Hello,
I don't think Lucene can spit out the similarity matrix for you, but
perhaps you can use Lucene's Term Vector support to help you build the
matrix yourself:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermFreqVector.html
The other relevant sections of the Lucene API
QueryParser does use Analyzer, see this:
static public Query parse(String query, String field, Analyzer
analyzer)
throws ParseException {
QueryParser parser = new QueryParser(field, analyzer);
return parser.parse(query);
}
Otis
P.S.
Use lucene-user list, please.
--- Ricardo
1 - 100 of 718 matches
Mail list logo