Eric Jain wrote:
Just to clarify things: Does the current solution require all fields
that can be used for sorting to be loaded and kept in memory? (I guess
you can answer this question faster than I can figure it out by myself
:-)
Field values are loaded into memory. But values are kept in an arr
Eric Jain wrote:
That's reasonable. What I didn't quite understand yet: If I sort on a
string field, will Lucene need to keep all values in memory all the
time, or only during startup?
It will cache one instance of each unique value. So if you have a
million documents and string sort results on a
Eric Jain wrote:
I will need to have a look at the code, but I assume that in principal
it should be possible to replace the strings with sequential integers
once the sorting is done?
I don't understand the question.
Doug
-
To un
Chad Small wrote:
thanks Erik. Ok this is my official lobby effort for the release of 1.4 to final status. Anyone else need/want a 1.4 release?
Does anyone have any information on 1.4 release plans?
I'd like to make an RC once I manage to fix bug #27799, which will
hopefully be soon.
Doug
--
[EMAIL PROTECTED] wrote:
I have not been able to work out how to get custom coordination going to
demote results based on a specific term [ ... ]
Yeah, it's a little more complicated than perhaps it should be.
I've attached a class which does this. I think it's faster and more
effective than wh
Charlie Smith wrote:
I'll vote yes please release new version with "too many files open" fixed.
There is no "too many files open bug", except perhaps in your
application. It is however an easy to encounter problem if you don't
close indexes or if you change Lucene's default parameters. It will
Kevin A. Burton wrote:
We're using lucene with one large target index which right now is 5G.
Every night we take sub-indexes which are about 500M and merging them
into this main index. This merge (done via
IndexWriter.addIndexes(Directory[]) is taking way too much time.
Looking at the stats f
Boris Goldowsky wrote:
I have a situation where I'm querying for something in several fields,
with a clause similar to this:
(title:(two words)^20 keywords:(two words)^10 body:(two words))
Some good documents are being scored too low if the query terms do not
occur in the "body" field. I naive
[EMAIL PROTECTED] wrote:
Thanks for the post. BoostingQuery looks to be cleaner, faster and more generally useful than my
implementation :-)
Great! Glad to hear it was useful.
BTW, I've had a thought about your suggestion for making the highlighter use some form of RAMindex of sentence fragments
Kevin A. Burton wrote:
One way to force larger read-aheads might be to pump up Lucene's input
buffer size. As an experiment, try increasing InputStream.BUFFER_SIZE
to 1024*1024 or larger. You'll want to do this just for the merge
process and not for searching and indexing. That should help yo
Lucene 1.4 has not been released. Until it is released, you need to
check out the sources from CVS and build them, including javadoc.
Doug
Stephane James Vaucher wrote:
Are the javadocs available on the site?
I'd like to see the javadocs for lucene-1.4 (specifically SpanQuery)
somewhere on the
Esmond Pitt wrote:
Don't want to start a buffer size war, but these have always seemed too
small to me. I'd recommend upping both InputStream and OutputStream buffer
sizes to at least 4k, as this is the cluster size on most disks these days,
and also a common VM page size.
Okay.
Reading and writin
Kevin A. Burton wrote:
I'm playing with this package:
http://home.clara.net/markharwood/lucene/highlight.htm
Trying to do hit highlighting. This implementation uses another
Analyzer to find the positions for the result terms.
This seems that it's very inefficient
Does it just seem inefficient,
[EMAIL PROTECTED] wrote:
As a note of warning: I did find StandardTokenizer to be the major culprit in my
tokenizing benchmarks (avg 75ms for 16k sized docs).
I have found I can live without StandardTokenizer in my apps.
FYI, the message with Mark's timings can be found at:
http://nagoya.apache.o
Doug Cutting wrote:
According to these, if your documents average 16k, then a 10-hit result
page would require just 66ms to generate highlights using SimpleAnalyzer.
Oops. That should be 110ms.
Doug
-
To unsubscribe, e-mail
Kevin A. Burton wrote:
Doug Cutting wrote:
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1413989
According to these, if your documents average 16k, then a 10-hit
result page would require just 66ms to generate highlights using
SimpleAnalyzer.
The whole search takes only 3
Terry,
Can you please try to develop a reproducible test case? Otherwise it's
impossible to verify and debug this.
For something like this it would suffice to provide:
1. The initial index, which satisifies the test queries;
2. The new index you add;
3. Your merge and test code, as a s
Joe Rayguy wrote:
So, assuming that sort as implemented in 1.4 doesn't
work for me, my original question still stands. Do I
have to worry about merges that occur as documents are
added, or do I only have to rebuild my array after
optimizations? Or, alternatively, how did everyone
sort before 1.4?
peters marcus wrote:
is there a way to get all words stored in the index for a given document
Yes, in the 1.4 release:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#getTermFreqVectors(int)
Doug
-
Chad Small wrote:
We have a requirement to return documents with a "title" field that starts with a certain letter. Is there a way to do something like this? We're using the StandardAnalyzer
Example title fields:
This is the title of a document.
And this is a title of a different document.
Magnus Mellin wrote:
i would like to partition an index over X number of remote searchers.
Any ideas, or suggestions, on how to use the same term dictionary (one
that represents the terms and frequencies for the whole document
collection) over all my indices?
Try using a ParallelMultiSearcher com
Weir, Michael wrote:
I assume that it is possible to corrupt an index by crashing at just the right
time.
It should not be possible to corrupt an index this way.
I notice that there's a method IndexReader.unlock(). Does this method
ensure that the index has not been corrupted?
If you use this met
Weir, Michael wrote:
So if our server is the only process that ever opens the index, I should be
able to run through the indexes at startup and simply unlock them?
Yes.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additio
Francesco Bellomi wrote:
we are experiencing some difficulties in using Lucene with a NFS filesystem.
Basically, locking seems not to work properly, since it appears that
attempted concurring writing on the index (from different VMs) are not
blocked, and this often causes the index to be corrupted.
Francesco Bellomi wrote:
The only problem is that, as lucene 1.4rc2, FSDirectory is 'final'.
Please submit a patch to lucene-dev to make FSDirectory non-final.
In fact, a third architectural approach would be to define an API for
"pluggable" lock implementations: IMHO that would be more robust to
Version 1.4 RC3 of Lucene is available for download from:
http://cvs.apache.org/dist/jakarta/lucene/v1.4-rc3/
Changes are described at:
http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.85
Doug
-
To
Leonid Portnoy wrote:
Am I misunderstanding something here, or is the documentation unclear?
The documentation is unclear. Can you propose an improvement?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands
code. ( see test code )
2.) The first search is always really slow as everything initializes and
the cache fills ;) so don't let that discourage you.
-vito
On Mon, 2004-04-26 at 14:59, Doug Cutting wrote:
Anthony Vito wrote:
I noticed some talk on SQLDirectory a month or so ago. .
Di
hui wrote:
I am getting the exactly same score like 0. 04809519 for different size
documents for some queries and this happens quite frequently. Based on the
score formula, it seems this should rarely happen. Or I misunderstand the
formula?
Normalization factors (& document boosts) are represented
Win32 seems to sometimes not permit one to delete a file immediately
after it has been closed. Because of this, Lucene keeps a list of files
that need to be deleted in the 'deleteable' file. Are your files listed
in this file? If so, Lucene will again try to delete these files the
next time
Anthony Vito wrote:
I noticed some talk on SQLDirectory a month or so ago. ( I just joined
the list :) ) I have a JDBC implementation that stores the "files" in a
couple of tables and stores the data for the files as blocks (BLOBs) of
a certain size ( 16k by default ). It also has an LRU cache fo
Yukun Song wrote:
As known, currently Lucene uses flat file to store information for
indexing.
Any people has idea or resources for combining database (Like MySQL or
PostreSQL) and Lucene instead of current flat index file formats?
A few folks have implemented an SQL-based Lucene Directory, but n
Ioan Miftode wrote:
I recently upgraded to lucene 1.4 RC2 because I needed some
sorting capabilities. However some phrase searches don't
work anymore (the hits don't even have the term's I'm searching on).
Try the latest CVS. There were some bugs in 1.4RC2 that have been fixed.
(We'll probably do
Incze Lajos wrote:
Could anybody summarize what would be the technical pros/cons of a DB-based
directory over the flat files? (What I see at the moment is that for some
- significant? - perfomence penalty you'll get an index available over the
network for multiple lucene engines -- if I'm right.)
h
Please don't crosspost to lucene-user and lucene-dev!
Tate Avery wrote:
3) The maxClauseCount threshold appears not to care whether or not my
clauses are 'required' or 'prohibited'... only how many of them there are in
total.
That's correct. It is an attempt to stop out-of-memory errors which can
Matthew W. Bilotti wrote:
We suspect the coordination term in driving down
these documents' ranks and we would like to bring those documents back up
to where they should be.
That sounds right to me.
Is there a relatively easy way to implement what we want using Lucene?
Would it be better to t
James Dunn wrote:
Also I search across about 50 fields but I don't use
wildcard or range queries.
Lucene uses one byte of RAM per document per searched field, to hold the
normalization values. So if you search a 10M document collection with
50 fields, then you'll end up using 500MB of RAM.
If
requirements for a search. Does this memory
only get used only during the search operation itself,
or is it referenced by the Hits object or anything
else after the actual search completes?
Thanks again,
Jim
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
James Dunn wrote:
Also I search across ab
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds s
Jayant Kumar wrote:
Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.
Also enclosed is the file queries.txt which contains
few sample search queries.
Thanks for the data. This is exactly what I was looking for.
"Thread-14"
Doug Cutting wrote:
Please tell me if you are able to simplify your queries and if that
speeds things. I'll look into a ThreadLocal-based solution too.
I've attached a patch that should help with the thread contention,
although I've not tested it extensively.
I still don't
Jayant Kumar wrote:
Thanks for the patch. It helped in increasing the
search speed to a good extent.
Good. I'll commit it. Thanks for testing it.
But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased.
This
David Spencer wrote:
Does it ever make sense to set the Similartity obj in either (only one
of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can
I avoid setting it in IndexSearcher? Also, can I avoid setting it in
IndexWriter and only set it in IndexSearcher? I noticed Nutch s
Otis Gospodnetic wrote:
Can anyone comment on performance differences?
I'd expect multi-threaded performance to be a bit worse with the
compound format, but single-threaded performance should be nearly identical.
Doug
-
To unsub
Erik Hatcher wrote:
If you want something that does "quick fox*" where "quick" must be
followed by something starting with "fox", you'll have to do this
through the API, perhaps using the awkwardly named PhrasePrefixQuery,
which does support slop also. It would be up to you to do the term
expa
> The best example that I've been able to find is the Yahoo research
> lab - as I understand it, this is a Nutch (i.e. Lucene)
> implementation that's providing impressive performance over a
> 100 million document repository.
This demo runs on a handful of boxes. It was originally running on
thre
> What do your queries look like? The memory required
> for a query can be computed by the following equation:
>
> 1 Byte * Number of fields in your query * Number of
> docs in your index
>
> So if your query searches on all 50 fields of your 3.5
> Million document index then each search would tak
A mergeFactor of 5000 is a bad idea. If you want to index faster, try
increasing minMergeDocs instead. If you have lots of memory this can
probably be 5000 or higher.
Also, why do you optimize before you're done? That only slows things.
Perhaps you have to do it because you've set mergeFacto
Julien,
Thanks for the excellent explanation.
I think this thread points to a documentation problem. We should
improve the javadoc for these parameters to make it easier for folks to
In particular, the javadoc for mergeFactor should mention that very
large values (>100) are not recommended, sin
John Wang wrote:
While lucene tokenizes the words in the document, it counts the
frequency and figures out the position, we are trying to bypass this
stage: For each document, I have a set of words with a know frequency,
e.g. java (5), lucene (6) etc. (I don't care about the position, so it
ca
John Wang wrote:
The solution you proposed is still a derivative of creating a
dummy document stream. Taking the same example, java (5), lucene (6),
VectorTokenStream would create a total of 11 Tokens whereas only 2 is
neccessary.
That's easy to fix. We just need to reuse the token:
public cl
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize? Ideally it wouldn't
take 6 hours.
Was this the index with the mergeFactor of 5000? If so, that's why it's
so slow: you've delayed all of the work until the end. Indexing on a
ramfs will make things faster in general, howe
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most
recent segment that was created? I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine. I don't
think you should even have to delete that segment.
Also, to elaborate
MATL (Mats Lindberg) wrote:
When i copied the lucene jar file to the solaris machine from the
windows machine i used a ftp program.
FTP probably mangled the file. You need to use FTP's binary mode.
Doug
-
To unsubscribe, e-mail: [
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that
it logs merges? If so, it would be interesting to see that output,
especially the last e
Kevin A. Burton wrote:
This is why I think it makes more sense to use our own java.io.tmpdir to
be on the safe side.
I think the bug is that Tomcat changes java.io.tmpdir. I thought that
the point of the system property java.io.tmpdir was to have a portable
name for /tmp on unix, c:\windows\tmp
John Wang wrote:
Just for my education, can you maybe elaborate on using the
"implement an IndexReader that delivers a
synthetic index" approach?
IndexReader is an abstract class. It has few data fields, and few
non-static methods that are not implemented in terms of abstract
methods. So, in ef
Kevin A. Burton wrote:
During an optimize I assume Lucene starts writing to a new segment and
leaves all others in place until everything is done and THEN deletes them?
That's correct.
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has
Kevin A. Burton wrote:
With the typical handful of fields, one should never see more than
hundreds of files.
We only have 13 fields... Though to be honest I'm worried that even if I
COULD do the optimize that it would run out of file handles.
Optimization doesn't open all files at once. The mos
Armbrust, Daniel C. wrote:
The problem I ran into the other day with the new lock location is that Person A had started an index, ran into problems, erased the index and asked me to look at it. I tried to rebuild the index (in the same place on a Solaris machine) and found out that A) - her locks
Kevin A. Burton wrote:
I was going to create a new IDField class which just calls super( name,
value, false, true, false) but noticed I was prevented because
Field.java is final?
You don't need to subclass to do this, just a static method somewhere.
Why is this? I can't see any harm in making it
Kevin A. Burton wrote:
So I added a few constants to my class:
new Field( "name", "value", NOT_STORED, INDEXED, NOT_TOKENIZED );
which IMO is a lot easier to maintain.
Why not add these constants to Field.java:
public static final boolean STORED = true;
public static final boolean NOT_STORED
Doug Cutting wrote:
The calls would look like:
new Field("name", "value", Stored.YES, Indexed.NO, Tokenized.YES);
Stored could be implemented as the nested class:
public final class Stored {
private Stored() {}
public static final Stored YES = new Stored();
public st
[EMAIL PROTECTED] wrote:
What I really would like to see are some best practices or some advice from
some users who are working with really large indices how they handle this
situation, or why they don't have to care about it or maybe why I am
completely missing the point ;-))
Many folks with re
Aviran wrote:
First let me explain what I found out. I'm running Lucene on a 4 CPU server.
While doing some stress tests I've noticed (by doing full thread dump) that
searching threads are blocked on the method: public FieldInfo fieldInfo(int
fieldNumber) This causes for a significant cpu idle time
Aviran wrote:
I use Lucene 1.4 final
Here is the thread dump for one blocked thread (If you want a full thread
dump for all threads I can do that too)
Thanks. I think I get the point. I recently removed a synchronization
point higher in the stack, so that now this one shows up!
Whether or not y
Aviran wrote:
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.
Note that this patch won't produce the a speedup on earlier releases,
since their was another multi-thread bottleneck higher up the stack that
was only recently removed, revealing this lower-lev
Kevin A. Burton wrote:
Doug Cutting wrote:
Field and Document are not designed to be extensible. They are
persisted in such a way that added methods are not available when the
field is restored. In other words, when a field is read, it always
constructs an instance of Field, not a subclass
John Wang wrote:
On the same thought, how about the org.apache.lucene.analysis.Token
class. Can we make it non-final?
Sure, if you make a case for why it should be non-final.
What would your subclasses do? Which methods would you override?
Doug
--
Whether this will make a difference depends on the size of the index.
If your index is relatively small, then this patch will help more. If
your index is large, it will help less.
Aviran wrote:
Try to compile this code changes into lucene
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06116.h
Have you looked at:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
in particular, at:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
http://jakarta.apache.org/lucene/docs/api/org/apache/lucen
Florian Sauvin wrote:
Everywhere in the documentation (and it seems logical) you say to use
the same analyzer for indexing and querying... how is this handled on
not tokenized fields?
Imperfectly.
The QueryParser knows nothing about the index, so it does not know which
fields were tokenized and wh
fp235-5 wrote:
I am looking at the code to implement setIndexInterval() in IndexWriter. I'd
like to have your opinion on the best way to do it.
Currently the creation of an instance of TermInfosWriter requires the following
steps:
...
IndexWriter.addDocument(Document)
IndexWriter.addDocument(Docume
You can define a subclass of FilterIndexReader that re-sorts documents
in TermPositions(Term) and document(int), then use
IndexWriter.addIndexes() to write this in Lucene's standard format. I
have done this in Nutch, with the (as yet unused) IndexOptimizer.
http://cvs.sourceforge.net/viewcvs.p
Optimization should not require huge amounts of memory. Can you tell a
bit more about your configuration: What JVM? What OS? How many
fields? What mergeFactor have you used?
Also, please attach the output of 'ls -l' of your index directory, as
well as the stack trace you see when OutOfMemo
The key in the WeakHashMap should be the IndexReader, not the Entry. I
think this should become a two-level cache, a WeakHashMap of HashMaps,
the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I
think the Entry class can also be changed to not include an IndexReader
field. Doe
Ernesto De Santis wrote:
If some field have set a boots value in index time, and when in search time
the query have another boost value for this field, what happens?
which value is used for boost?
The two boosts are both multiplied into the score.
Doug
--
Lucene scores are not percentages. They really only make sense compared
to other scores for the same query. If you like percentages, you can
divide all scores by the first score and multiply by 100.
Doug
lingaraju wrote:
Dear All
How the score method works(logic) in Hits class
For 100% match
Rob Clews wrote:
I want to do the same, set a boost for a field containing a date that
lowers as the date is further from now, is there any way I could do
this?
You could implement Similarity.idf(Term, Searcher) to, when
Term.field().equals("date"), return a value that is greater for more
recent
Vincent Le Maout wrote:
I have to index a huge, huge amount of data: about 10 million documents
making up about 300 GB. Is there any technical limitation in Lucene that
could prevent me from processing such amount (I mean, of course, apart
from the external limits induce by the hardware: RAM, disks
John Patterson wrote:
I would like to hold a significant amount of the index in memory but use the
disk index as a spill over. Obviously the best situation is to hold in
memory only the information that is likely to be used again soon. It seems
that caching TermDocs would allow popular search ter
You could instead use a HitCollector to gather only documents with
scores in that range.
Doug
Karthik N S wrote:
Hi
Apologies
If I want to get all the hits for Scores between 0.5f to 0.8f,
I usally use
query = QueryParser.parse(srchkey,Fields, analyzer);
int tothits = searcher.search(q
Terry Steichen wrote:
But if, in the future, I or someone else took on this task of enhancing QueryParser, I'd like to be assured that the underlying Lucene engine will accept and support negative boosting. Is that the case?
Lucene will multiply negative boosts into scores just like positive
ones
Kevin A. Burton wrote:
Is it possible to take an existing index (say 1G) and break it up into a
number of smaller indexes (say 10 100M indexes)...
I don't think theres currently an API for this but its certainly
possible (I think).
Yes, it is theoretically possible but not yet implemented.
An ea
Looks to me like you're using an older version of Lucene on your Linux
box. The code is back-compatible, it will read old indexes, but Lucene
1.3 cannot read indexes created by Lucene 1.4, and will fail in the way
you describe.
Doug
Sven wrote:
Hi!
I have a problem to port a Lucene based knowl
I can successfully use gcc 3.4.0 with Lucene as follows:
ant jar jar-demo
gcj -O3 build/lucene-1.5-rc1-dev.jar build/lucene-demos-1.5-rc1-dev.jar
-o indexer --main=org.apache.lucene.demo.IndexHTML
./indexer -create docs
It runs pretty snappy too! However I don't know if there's much milage
in p
Yonik Seeley wrote:
Setup info & Stats:
- 4.3M documents, 12 keyword fields per document, 11
[ ... ]
"field1:4 AND field2:188453 AND field3:1"
field1:4 done alone selects around 4.2M records
field2:188453 done alone selects around 1.6M records
field3:1 done alone selects around 1K record
Bill Janssen wrote:
Hi.
Hey, Bill. It's been a long time!
I've got a Lucene application that's been in use for about two years.
Some users are using Lucene 1.2, some 1.3, and some are moving to 1.4.
The indices seem to behave differently under each version. I'd like
to add code to my application
Kevin A. Burton wrote:
My problem is that I have two machines... one for searching, one for
indexing.
The searcher has an existing index.
The indexer found an UPDATED document and then adds it to a new index
and pushes that new index over to the searcher.
The searcher then reloads and when some
Kevin A. Burton wrote:
It looks like Document.java uses its own implementation of a LinkedList..
Why not use a HashMap to enable O(1) lookup... right now field lookup is
O(N) which is certainly no fun.
Was this benchmarked? Perhaps theres the assumption that since
documents often have few field
Ben Litchfield wrote:
PDFBox: slow PDF text extraction for Java applications
http://www.pdfbox.org
Shouldn't that read, "PDFBox: *free* slow PDF text extraction for Java
applications, with Lucene integration"?
Doug
-
To unsubscri
Chris Fraschetti wrote:
I've seen throughout the list mentions of millions of documents.. 8
million, 20 million, etc etc.. but can lucene potentially handle
billions of documents and still efficiently search through them?
Lucene can currently handle up to 2^31 documents in a single index. To
a la
Bill Janssen wrote:
I'd think that if a user specified a query "cutting lucene", with an
implicit AND and the default fields "title" and "author", they'd
expect to see a match in which both "cutting" and "lucene" appears. That is,
(title:cutting OR author:cutting) AND (title:lucene OR author:lucen
Aad Nales wrote:
Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to be to create an Analy
David Spencer wrote:
Good heuristics but are there any more precise, standard guidelines as
to how to balance or combine what I think are the following possible
criteria in suggesting a better choice:
Not that I know of.
- ignore(penalize?) terms that are rare
I think this one is easy to threshol
It sounds like the ThreadLocal in TermInfosReader is not getting
correctly garbage collected when the TermInfosReader is collected.
Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
that you're running in an older JVM. Is that right?
I've attached a patch which should fi
Daniel Naber wrote:
On Thursday 09 September 2004 18:52, Doug Cutting wrote:
I have not been
able to construct a two-word query that returns a page without both
words in either the content, the title, the url or in a single anchor.
Can you?
Like this one?
konvens leitseite
Leitseite is only in
David Spencer wrote:
Doug Cutting wrote:
And one should not try correction at all for terms which occur in a
large proportion of the collection.
I keep thinking over this one and I don't understand it. If a user
misspells a word and the "did you mean" spelling correction algori
Andrzej Bialecki wrote:
I was wondering about the way you build the n-gram queries. You
basically don't care about their position in the input term. Originally
I thought about using PhraseQuery with a slop - however, after checking
the source of PhraseQuery I realized that this probably wouldn't
David Spencer wrote:
[1] The user enters a query like:
recursize descent parser
[2] The search code parses this and sees that the 1st word is not a term
in the index, but the next 2 are. So it ignores the last 2 terms
("recursive" and "descent") and suggests alternatives to
"recursize"...thu
301 - 400 of 458 matches
Mail list logo