See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/144/changes
Changes:
[mikemccand] LUCENE-843: add missing 'synchronized' to allThreadsIdle() method
[yonik] replace div with shift since idiv takes ~40 cycles and compiler can't
do strength reduction w/o knowing ops are non-negat
Hi Everybody,
We are building a search infrastructure using lucene to scale upto 500
million document with search < 500 ms.
Here is my rough math on the size of content & index :
Total Documents = 500 million documents
Size / Document = 10k / document
Index Size / Million = 2 GB / million docum
Hi,
We are facing a strange problem with RAMDirectory for indices greater than 8
GB. We have indexed around 6.5 million lucene documents and the index size
is around 8 GB. Below is the contents of Index Directory.
2236964197 _1x.fdt
51811488 _1x.fdx
293 _1x.fnm
2234929832 _1x.f
You can implement a FilterIndexReader that returns only a subset of an
index. Then use IndexWriter#addIndexes() to add this to a new, empty
index. Do this for each range of terms.
This is somewhat similar to Nutch's IndexSorter:
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/ap
DM Smith wrote:
My question is whether contrib should have a separate policy?
If the @author is removed from the file, should we make sure that there
is a CREDITS.txt for the contrib with the info in it.
Credit isn't file-by-file, it's commit-by-commit and recorded in both
Jira and in CHANGES
Instead of "overriding" the trigram approach you may want to do a
combination. That is create trigrams out of the list of words from the
dictionary and weigh the matches much higher than those coming from the
index or even have a first dictionary exact lookup and then a trigram/index
based lookup
Now, SpellChecker use the trigram algorithm to find similar words. It
works well for keyboard fumbles, but not well enough for short words
and for languages like french where a same sound can be wrote
differently.
Spellchecking is a classical computer task, and aspell provides some
nice and
: Interesting question... I guess we haven't had one contrib depend on
: another yet, or at least, I haven't checked to see if we have.
we do actually, a good example is xml-query-parser depending on the
queries contrib. the current best practice for doing this is to set a
property in your cont
As a user of Lucene, I can go either way.
With an active developer community their need is lessened.
The greatest value I have found in them is being able to track down
"duplicate" bugs. If I find a particular bug in one piece of code, I
try to find other places where the same bug exists, on
[
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-843:
---
Lucene Fields: [New, Patch Available] (was: [Patch Available, New])
Re-opening this
+1
I think it makes sense to remove them at one fell swoop and also
discourage adding them going forward?
Mike
"Tom White" <[EMAIL PROTECTED]> wrote:
> Hadoop recently removed all @author tags:
> https://issues.apache.org/jira/browse/HADOOP-1147.
>
> Tom
>
> On 05/07/07, Grant Ingersoll <[EMA
Hadoop recently removed all @author tags:
https://issues.apache.org/jira/browse/HADOOP-1147.
Tom
On 05/07/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Solr just suggested (http://www.mail-archive.com/solr-
[EMAIL PROTECTED]/msg04883.html) that they remove Author tags for
a variety of good rea
12 matches
Mail list logo