This can happen in some cases: for example if you are doing a
disjunction of foo and bar with coordination factor disabled, and
the segment has no postings for bar.
In this case the optimum scorer to return is just a termscorer for foo.
On Thu, Aug 7, 2014 at 12:42 PM, Christian Reuschling
You can put this thing before your stemmer, with a custom map of exceptions:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html
On Tue, Jul 29, 2014 at 10:03 AM, Robert Nikander
rob.nikan...@gmail.com wrote:
Hi,
I created
On Wed, Jul 23, 2014 at 6:03 AM, Harald Kirsch
harald.kir...@raytion.com wrote:
Hi,
below is an exception I get from one Solr core. According to
https://issues.apache.org/jira/browse/LUCENE-5617 the check that leads to
the exception was introduced recently.
Two things are worth mentioning:
On Wed, Jul 23, 2014 at 7:29 AM, Harald Kirsch
harald.kir...@raytion.com wrote:
(As a side note: after truncating the file to the expected size+16, at least
the core starts up again. Have not tested anything else yet.)
After applying your truncation-fix, Is it possible for you to run the
On Wed, Jul 23, 2014 at 7:29 AM, Harald Kirsch
harald.kir...@raytion.com wrote:
File system is xfs hosted on a corporate file share somewhere.
Sorry, i forgot to ask: how do you access this? is it mounted over nfs?
-
To
Hey, thank you for following up!
On Wed, Jul 23, 2014 at 8:46 AM, Harald Kirsch
harald.kir...@raytion.com wrote:
On 23.07.2014 13:38, Robert Muir wrote:
On Wed, Jul 23, 2014 at 7:29 AM, Harald Kirsch
harald.kir...@raytion.com wrote:
(As a side note: after truncating the file
Your code isn't doing what you think it is doing. You need to ensure
things aren't eliminated by the compiler.
On Mon, Jul 14, 2014 at 5:57 AM, wangzhijiang999
wangzhijiang...@aliyun.com wrote:
Hi everybody, I found a problem confused me when I tested the mmap
feature in lucene. I
don't use RAMDirectory: its not very performant and really intended
for e.g. testing and so on.
also, using a ramdirectory here defeats the purpose: the idea behind
using a docvaluesfield in most cases is to keep (most of) such
datastructures out of heap memory. The datastructures and even the
25 June 2014, Apache Lucene™ 4.9.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.9.0
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires
No they do not. The method is:
public abstract long computeNorm(FieldInvertState state);
On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha nalinikar...@gmail.com wrote:
Thanks for the info!
We're more interested in changing the lengthnorm function vs using
additional stats for scoring so
methods on the
TFIDFSimilarity class -
public byte encodeNormValue(float f)
public float decodeNormValue(byte b)
On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir rcm...@gmail.com wrote:
No they do not. The method is:
public abstract long computeNorm(FieldInvertState state);
On Thu, Jun 19
Again, because merging is based on byte size, you have to be careful how
you measure (hint: use LogDocMergePolicy).
Otherwise you are comparing apples and oranges.
Separately, your configuration is using experimental codecs like
disk/memory which arent as heavily benchmarked etc as the default
PM, Robert Muir rcm...@gmail.com wrote:
Can you just use the tokenstream api? Thats the one we maintain and
support...
On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski lop...@gmail.com
wrote:
Dear all,
I am not much into searching, however, I used Lucene to do some text
Can you just use the tokenstream api? Thats the one we maintain and support...
On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski lop...@gmail.com wrote:
Dear all,
I am not much into searching, however, I used Lucene to do some text
postprocessing, (esp. stemming) using low level tools
They are still encoded the same way: so likely you arent testing apples to
apples (e.g. different number of segments or whatever).
On Fri, Jun 13, 2014 at 8:28 PM, Zhao, Gang gz...@ea.com wrote:
I used lucene 4.4 to create index for some documents. One of the indexing
fields is
Check and make sure you are not opening an indexreader for every
search. Be sure you don't do that.
On Mon, Jun 2, 2014 at 2:51 AM, Jamie ja...@mailarchiva.com wrote:
Greetings
Despite following all the recommended optimizations (as described at
No, you are incorrect. The point of a search engine is to return top-N
most relevant.
If you insist you need to open an indexreader on every single search,
and then return huge amounts of docs, maybe you should use a database
instead.
On Tue, Jun 3, 2014 at 6:42 AM, Jamie ja...@mailarchiva.com
.
Jamie
On 2014/06/03, 1:17 PM, Robert Muir wrote:
No, you are incorrect. The point of a search engine is to return top-N
most relevant.
If you insist you need to open an indexreader on every single search,
and then return huge amounts of docs, maybe you should use a database
instead
May 2014, Apache Lucene™ 4.8.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.1
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text
addIndexes doesn't call maybeMerge, so i think you are just getting in
a situation with too many segments, so applying deletes is slow.
can you try calling IndexWriter.maybeMerge() after you call
addIndexes? (it wont have immediate impact, you have to do some merges
to get your index healthy
This looks like a bug in ICU? I'll try to reproduce it. We are also a
little out of date, maybe they've already fixed it.
Thank you for reporting this.
On Fri, May 9, 2014 at 12:14 PM, feedly team feedly...@gmail.com wrote:
I am using the 4.7.0 ICU analyzer (via elastic search) and noticed this
fyi: this bug was already found and fixed in ICU's trunk:
http://bugs.icu-project.org/trac/ticket/10767
On Wed, May 14, 2014 at 4:32 AM, Robert Muir rcm...@gmail.com wrote:
This looks like a bug in ICU? I'll try to reproduce it. We are also a
little out of date, maybe they've already fixed
I opened https://issues.apache.org/jira/browse/LUCENE-5671
for now, if you are able to use the latest release of ICU, it should
prevent the bug.
On Wed, May 14, 2014 at 11:47 AM, Robert Muir rcm...@gmail.com wrote:
fyi: this bug was already found and fixed in ICU's trunk:
http://bugs.icu
I think NoMergePolicy.NO_COMPOUND_FILES and
NoMergePolicy.COMPOUND_FILES should be removed, and replaced with
NoMergePolicy.INSTANCE
If you want to change whether CFS is used by indexwriter flush, you
need to set that in IndexWriterConfig.
On Tue, Apr 29, 2014 at 8:03 AM, Varun Thacker
On Tue, Apr 29, 2014 at 8:14 AM, Shai Erera ser...@gmail.com wrote:
If we only offer NoMP.INSTANCE, what would it do w/ merged segments? always
compound? always not-compound?
it doesnt merge though.
-
To unsubscribe, e-mail:
April 2014, Apache Lucene™ 4.7.2 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.2
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires
Did you really mean to shingle twice (shingleanalyzerwrapper just
wraps the analyzer with a shinglefilter, then the code wraps that with
another shinglefilter again) ?
On Wed, Apr 2, 2014 at 1:42 PM, Natalia Connolly
natalia.v.conno...@gmail.com wrote:
Hello,
I am very confused about what
This, this is, is, is a , and so
on. Is there another way I could do it?
Thank you,
Natalia
On Wed, Apr 2, 2014 at 2:40 PM, Robert Muir rcm...@gmail.com wrote:
Did you really mean to shingle twice (shingleanalyzerwrapper just
wraps the analyzer with a shinglefilter, then the code wraps that with
another
January 2014, Apache Lucene™ 4.6.1 available The Lucene PMC is pleased
to announce the release of Apache Lucene 4.6.1Apache Lucene is a
high-performance, full-featured text search engine library written
entirely in Java. It is a technology suitable for nearly any
application that requires
See Tokenizer.java for the state machine logic. In general you should
not have to do anything if the tokenizer is well-behaved (e.g. close
calls super.close() and so on).
On Tue, Jan 7, 2014 at 2:50 PM, Benson Margulies bimargul...@gmail.com wrote:
In 4.6.0,
is:
public MyTokenizer(Reader reader, ) {
super(reader);
myWrappedInputDevice = new MyWrappedInputDevice(this.input);
}
On Tue, Jan 7, 2014 at 2:59 PM, Robert Muir rcm...@gmail.com wrote:
See Tokenizer.java for the state machine logic. In general you should
not have
it is correct. the format of normalization factors has not changed since 4.2
On Tue, Dec 17, 2013 at 10:49 AM, Torben Greulich
torben.greul...@s24.com wrote:
Hi,
we had a OOM error in solr and were confused about one part of the
stackTrace where Lucene42DocValuesProducer.ramBytesUsed is
the usual thing
and performance is not a particular concern).
Thanks,
Jon
On Mon, Oct 14, 2013 at 9:58 PM, Robert Muir rcm...@gmail.com wrote:
are your documents large?
try PostingsHighlighter(int) ctor with a larger value than
DEFAULT_MAX_LENGTH.
sounds like the passages you see with matches
offhand that it's
silently enforcing a limit ...
Mike McCandless
http://blog.mikemccandless.com
On Tue, Oct 15, 2013 at 9:31 AM, Robert Muir rcm...@gmail.com wrote:
Thanks Jon. Ill add some stuff to the javadocs here to try to make it
more obvious.
On Tue, Oct 15, 2013 at 5:54 AM, Jon
On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Well, unfortunately, this is a trap that users do hit.
By requiring the user to think about the limit on creating
PostingsHighlighter, he/she would think about it and realize they are
in fact setting a limit.
On Tue, Oct 15, 2013 at 10:57 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Tue, Oct 15, 2013 at 10:11 AM, Robert Muir rcm...@gmail.com wrote:
On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Well, unfortunately, this is a trap that users do hit
did you try the latest release? There are some bugs fixed...
On Mon, Oct 14, 2013 at 2:11 PM, Jon Stewart
j...@lightboxtechnologies.com wrote:
Hello,
I've observed that when using PostingsHighlighter in Lucene 4.4 that
some of the responsive documents in TopDocs will have zero matches in
the
is greater than zero.
Jon
On Mon, Oct 14, 2013 at 5:24 PM, Robert Muir rcm...@gmail.com wrote:
did you try the latest release? There are some bugs fixed...
On Mon, Oct 14, 2013 at 2:11 PM, Jon Stewart
j...@lightboxtechnologies.com wrote:
Hello,
I've observed that when using PostingsHighlighter
Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless of normalization form or whether its normalized at
all?
But for other tokenizers, such a charfilter should be useful: there is
a JIRA for it, but it has some unresolved issues
That would be great!
On Mon, Sep 16, 2013 at 1:41 PM, Benson Margulies ben...@basistech.com wrote:
Thanks, I might pitch in.
On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir rcm...@gmail.com wrote:
Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies ben...@basistech.com wrote:
In Japanese, compounds are just decompositions of the input string. In
other languages, compounds can manufacture entire tokens from thin
air. In those cases, it's something of a question how to decide on the
offsets.
On Fri, Sep 6, 2013 at 8:03 PM, Benson Margulies ben...@basistech.com wrote:
I'm confused by the comment about compound components here.
If a single token fissions into multiple tokens, then what belongs in
the PositionLengthAttribute. I'm wanting to store a fraction in here!
Or is the idea
On Fri, Sep 6, 2013 at 9:32 PM, Benson Margulies ben...@basistech.com wrote:
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir rcm...@gmail.com wrote:
its the latter. the way its designed to work i think is illustrated
best in kuromoji analyzer where it heuristically decompounds nouns
FieldType myType = new FieldType(TextField.TYPE_NOT_STORED);
myType.setIndexOptions(IndexOptions.DOCS_ONLY);
document.add(new Field(title, some title, myType));
document.add(new Field(body, some contents, myType));
...
On Sat, Aug 24, 2013 at 3:27 AM, Airway Wong airwayw...@gmail.com wrote:
Hi,
On Thu, Aug 22, 2013 at 1:48 AM, Sean Bridges sean.brid...@gmail.com wrote:
Is there a supported DocValuesFormat that doesn't load all the values into
ram?
Not with any current release, but in lucene 4.5 if all goes well, the
official implementation will work that way (I spent essentially the
On Wed, Aug 21, 2013 at 11:30 AM, Sean Bridges sean.brid...@gmail.com wrote:
What is the recommended way to use DiskDocValuesFormat in production if we
can't reindex when we upgrade?
I'm not going to recommend using any experimental codecs in production, but...
1. with 4.3 jar file:
There is a unit test demonstrating this at a very basic level here:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/search/TestDocValuesScoring.java
On Mon, Aug 12, 2013 at 10:43 AM, Ross Woolf r...@rosswoolf.com wrote:
The JavaDocs for
, 2013 at 8:54 AM, Robert Muir rcm...@gmail.com wrote:
There is a unit test demonstrating this at a very basic level here:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/search/TestDocValuesScoring.java
On Mon, Aug 12, 2013 at 10:43 AM, Ross
On Mon, Aug 12, 2013 at 8:48 AM, Ross Woolf r...@rosswoolf.com wrote:
Okay, just for clarity sake, what you are saying is that if I make the
FieldCache call it won't actually create and impose the loading time of the
FieldCache, but rather just use the NumericDocValuesField instead. Is this
On Mon, Aug 12, 2013 at 11:06 AM, Shai Erera ser...@gmail.com wrote:
Or, you'd like to keep FieldCache API for sort of back-compat with existing
features, and let the app control the caching by using an explicit
RamDVFormat?
Yes. In the future ideally fieldcache goes away and is a
On Thu, Aug 8, 2013 at 11:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:
A number of users have complained about the apparent RAM usage of
WeakIdentityMap, and it adds complexity to ByteBufferIndexInput to do
this tracking ... I think defaulting the unmap hack to off is best for
, Robert Muir rcm...@gmail.com wrote:
On Thu, Aug 8, 2013 at 11:18 AM, Tom Burton-West tburt...@umich.edu
wrote:
Sure I should be able to build a lucene core and give it a try. I
probably
won't run it until tomorrow night though because right now I'm running
some
other tests on the machine
Hi Tom, I committed a fix for the root cause
(https://issues.apache.org/jira/browse/LUCENE-5156).
Thanks for reporting this!
I dont know if its feasible for you to build a lucene-core.jar from
branch_4x and run checkindex with that jar file to confirm it really
addresses the issue: if this is
On Thu, Aug 8, 2013 at 11:18 AM, Tom Burton-West tburt...@umich.edu wrote:
Sure I should be able to build a lucene core and give it a try. I probably
won't run it until tomorrow night though because right now I'm running some
other tests on the machine I would run CheckIndex from and disk I/O
Thanks, this is what I expected. I opened an issue to remove seek by Ord
from this vectors format.
On Aug 2, 2013 2:13 PM, Tom Burton-West tburt...@umich.edu wrote:
Thanks Robert,
Looks like it switches between seekCeil and seekExact:
main prio=10 tid=0x0e79a000 nid=0x5fe5 runnable
On Thu, Aug 1, 2013 at 6:40 PM, Tom Burton-West tburt...@umich.edu wrote:
Hi all,
OK, I really should have titled the post, CheckIndex limit with large tvd
files?
I started a new CheckIndex run about 1:00 pm on Tuesday and it seems to be
stuck again looking at termvectors.
I gave
On Tue, Jul 30, 2013 at 8:41 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think that's ~ 110 billion, not trillion, tokens :)
Are you certain you don't have any term vectors?
Even if your index has no term vectors, CheckIndex goes through all
docIDs trying to load them, but
Open a bug with android team... the problem is android isn't java (and
doesnt implement/follow the spec)
On Sat, Jul 13, 2013 at 4:31 AM, VIGNESH S vigneshkln...@gmail.com wrote:
Hi,
I did not striped META-INF/services and it contains the files.
Even when i combined with other jars,i
:51 PM, VIGNESH S vigneshkln...@gmail.com
wrote:
Hi Robert,
Thanks for your reply.
If possible,can you please explain why this new class loading mechanism
was
introduced in Lucene 4
Thanks and Regards
Vignesh
On Sat, Jul 13, 2013 at 6:56 PM, Robert Muir rcm...@gmail.com
I don't see anything abnormal.
This is what happens why it downloads dependencies. replicator must pull in
2MB of jars from various places.
If you are impatient during this process and press ^C, then that really
only makes matters worse as it then leaves a .lck file in your ivy cache,
and future
On Tue, Jun 18, 2013 at 9:48 AM, Tom Burton-West tburt...@umich.edu wrote:
Hello,
I'm trying to understand BlockGroupingCollector. I thought I would start
by running the tests in the debugger. However the only test I can find is
FieldCacheTermsFilter will use your docvalues field.
Its confusing: I think we should rename FieldCacheXXX to DocValuesXXX.
On Thu, Jun 6, 2013 at 2:22 AM, Arun Kumar K arunk...@gmail.com wrote:
Hi Guys,
I was trying to better the filtering mechanism for my use case.
When i use the existing
the filter will create a
DocValues for that field using FieldCache.
Arun
On Thu, Jun 6, 2013 at 3:49 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Thu, Jun 6, 2013 at 5:35 AM, Robert Muir rcm...@gmail.com wrote:
Its confusing: I think we should rename FieldCacheXXX
Its a bug: its already fixed for 4.3 (coming soon):
https://issues.apache.org/jira/browse/LUCENE-4888
On Fri, Apr 19, 2013 at 1:09 PM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
When writing a custom codec, I encountered an issue in SloppyPhraseScorer.
I am using
Your stack trace is incomplete: it doesn't even show where the OOM occurred.
On Sun, Apr 14, 2013 at 7:48 PM, Wei Wang welshw...@gmail.com wrote:
Unfortunately, I got another problem. My index has 9 segments (9 dvdd
files) with total size is about 22GB. The merging step eventually failed
and
merging binarydocvalues doesn't use any RAM, it streams the values from the
segments its merging directly to the newly written segment.
So if you have this problem, its unrelated to merging: it means you don't
have enough RAM to support all the stuff you are putting in these
binarydocvalues
On Tue, Apr 9, 2013 at 8:22 AM, Wei Wang welshw...@gmail.com wrote:
DocValues makes fast per doc value lookup possible, which is nice. But it
brings other interesting issues.
Assume there are 100M docs and 200 NumericDocValuesFields, this ends up
with huge number of disk and memory usage,
On Tue, Apr 9, 2013 at 9:06 AM, Wei Wang welshw...@gmail.com wrote:
Thanks for the hint. Could you point to some Codec that might do this for
some types, even just as an side effect as you mentioned? It will be
helpful to have something to start with.
Have a look at diskdv/ codec in the
On Sat, Mar 16, 2013 at 12:57 AM, Steve Rowe sar...@gmail.com wrote:
Thanks for the explanation.
I ran a lucene/benchmark alg comparing the two stemmers on trunk on my
Macbook Pro with Oracle Java 1.7.0_13, and it looks like the situation hasn't
changed much.
The original-algorithm
Porter says the Porter2 stemmer is better[1]. Robert
Muir (who wrote EnglishAnalyzer), if you're reading, what do you think?
This was intentional actually. The default was a tradeoff of
benefits (which affect less than 5% of english vocabulary, if you
read around the snowball site), versus a much
On Thu, Mar 14, 2013 at 7:22 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
Hi list,
a stupid question about the naming of the index files.
While using lucene (and solr) 4.2 I still see files with Lucene41 in the
name.
This is somewhat confusing if lucene 4.x produces files with
March 2013, Apache Lucene™ 4.2 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.2
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text
The underlying data formats are different. For example, because
Lucene42Codec will load terms into RAM, it uses an FST. But DiskDV
uses a more simplistic storage for the terms thats more suitable for
being disk-resident.
There are also different compression block sizes and so on in use.
you can
On Fri, Mar 1, 2013 at 11:16 AM, Oliver Christ ochr...@ebscohost.com wrote:
I've seen some changes in trunk regarding the data format of Lucene's
FST-based suggesters, and wonder whether the automata created by trunk
builds/next Lucene version are/will be binary-compatible to the ones
created
Just to be sure what you are trying to do:
A) compare the relevance of different similarities? this is something
the benchmark.quality package (actually pretty much unrelated from the
rest of the benchmark package!) does, if you have some e.g. TREC
collection or whatever to test with.
B) compare
The top method here is your random string generation.
are you indexing random data?
On Thu, Jan 31, 2013 at 12:46 AM, arun k arunk...@gmail.com wrote:
Hi,
Please find the snapshots here.
http://picpaste.com/Lucene3.0.2-G00Z5FfX.png
http://picpaste.com/Lucene4.1-LsxpcQk0.png
Arun
On
On Thu, Jan 24, 2013 at 9:25 AM, Jerome Lanneluc
jerome_lanne...@fr.ibm.com wrote:
Note the 2 tokens in the second sample when I would expect to have only one
token with the (55401 57046) characters.
I could not figure out if I'm doing something wrong, or if this is a bug in
the Chinese
On Thu, Jan 24, 2013 at 10:53 AM, Jerome Lanneluc
jerome_lanne...@fr.ibm.com wrote:
It looks like my attachment was lost. It referred to
org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer.
I think this analyzer will not properly tokenize text outside of the
BMP: it pretty much only works
Which statistics in particular (which methods)?
On Thu, Jan 17, 2013 at 5:10 AM, Jon Stewart
j...@lightboxtechnologies.com wrote:
Thanks very much for your reply, Ian.
I am using SlowCompositeReaderWrapper because I am also retrieving the
term frequency statistics for the corpus (at the end
On Thu, Jan 3, 2013 at 12:16 PM, Alon Muchnick a...@datonics.com wrote:
value org.apache.lucene.index.TermInfosReader$ThreadResources ---
termInfoCache |org.apache.lucene.util.cache.SimpleLRUCache
termEnum |org.apache.lucene.index.SegmentTermEnum
You aren't using lucene 3.6.2 if
On Tue, Dec 25, 2012 at 11:30 PM, Vishwas Goel vishw...@gmail.com wrote:
Hi,
I am looking to get a bit more information back from SOLR/Lucene about the
query/document pair scores. This would include field level scores, overall
text relevance score, Boost value, BF value etc.
Use
On Wed, Dec 26, 2012 at 6:15 AM, Vishwas Goel vishw...@gmail.com wrote:
Use Scorer.getChildren()/freq()/getWeight()
in your collector you can walk the scorer hierarchy, associate scorers
with specific terms and queries, and determine which scorers matched
which documents and with what
25 December 2012, Apache Lucene™ 3.6.2 available
The Lucene PMC and Santa Claus are pleased to announce the release of
Apache Lucene 3.6.2.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any
Maybe get lucene-test-framework.jar, and extends LuceneTestCase, using
newDirectory and so on.
if you have files still open this will fail the test and give you a
stacktrace of where you initially opened the file.
On Sun, Dec 9, 2012 at 12:28 PM, Clemens Wyss DEV clemens...@mysign.chwrote:
Hi
On Wed, Dec 5, 2012 at 1:30 PM, Tom Burton-West tburt...@umich.edu wrote:
java.version=1.6.0_16
Tom can you use a newer java version for this? That's pretty old, and
seeing such a crazy field number worries me that its some jvm bug.
you could even try to run the checkindex itself with a newer
I'm particularly thinking its something like
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921
We tried to add workarounds to lucene to dodge problems from this, but
really a newer unaffected version would be safer.
On Wed, Dec 5, 2012 at 1:47 PM, Robert Muir rcm...@gmail.com wrote
On Wed, Dec 5, 2012 at 2:27 PM, Tom Burton-West tburt...@umich.edu wrote:
Thanks Robert,
I've asked our sysadmins to install a more recent Java version for testing.
I'll report back if it fails with the newer Java version.
Please let us know either way!
On Tue, Nov 27, 2012 at 6:17 AM, Trejkaz trej...@trypticon.org wrote:
Ah, yeah... I should have been clearer on what I meant there.
If you want to make a filter which relies on data that isn't in the
index, there is no mechanism for invalidation. One example of it is if
you have a filter
On Wed, Nov 28, 2012 at 12:27 AM, Trejkaz trej...@trypticon.org wrote:
On Wed, Nov 28, 2012 at 2:09 AM, Robert Muir rcm...@gmail.com wrote:
I don't understand how a filter could become invalid even though the
reader
has not changed.
I did state two ways in my last email, but just to re
On Thu, Nov 22, 2012 at 11:10 PM, Trejkaz trej...@trypticon.org wrote:
As for actually doing the invalidation, CachingWrapperFilter itself
doesn't appear to have any mechanism for invalidation at all, so I
imagine I will be building a variation of it with additional methods
to invalidate
On Tue, Nov 20, 2012 at 6:26 AM, Carsten Schnober
schno...@ids-mannheim.dewrote:
Thanks, Uwe!
I think what changed in comparison to Lucene 3.6 is that reset() is
called upon initialization, too, instead of after processing the first
document only, right?
There is no such change: this step
On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz trej...@trypticon.org wrote:
I have a feature I wanted to implement which required a quick way to
check whether an individual document matched a query or not.
IndexSearcher.explain seemed to be a good fit for this.
The query I tested was just a
On Fri, Nov 16, 2012 at 5:18 PM, Tom Burton-West tburt...@umich.edu wrote:
Hi Otis,
I hope this is not off-topic,
Apparently in Lucene similarity does not have to be set at index time:
Actually in the general case it does. IndexWriter calls the Similarity's
computeNorm method at
On Wed, Nov 14, 2012 at 4:04 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
Hi list,
while walking through the code with debugger (eclipse juno) I get the
following:
com.sun.jdi.InvocationException occurred invoking method.
This is while trying to see
On Wed, Nov 14, 2012 at 5:38 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
AFAIK eclipse is just an ide and using the java debugger, so this is then a
java debugger problem?
http://stackoverflow.com/questions/4123628/com-sun-jdi-invocationexception-occurred-invoking-method
I have
On Wed, Nov 14, 2012 at 9:47 AM, Scott Smith ssm...@mainstreamdata.com wrote:
Reading the documentation for these two filters seems to imply that
CJKWidthFilter is a subset of ICUFoldingFilter. Is that true? I'm basically
using the CjkAnalyzer (from Lucene 4.0) but adding ICUFoldingFilter
On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
By the way, why does TrimFilter option updateOffset defaults to false,
just keep it backwards compatible?
In my opinion this option should be removed.
TokenFilters shouldn't muck with offsets, for a lot of
coord() and queryNorm() work on the query as a whole, which may span
multiple fields.
On Wed, Nov 7, 2012 at 5:23 PM, Joel Barry jmb...@gmail.com wrote:
Hi folks,
I have a question on PerFieldSimilarityWrapper. It seems that it is
not possible to get per-field behavior on queryNorm() and
Hi Christoph: in my opinion, (ICU)Collation should actually be
implemented as DocValues just as you propose: e.g. we'd deprecate the
Analyzer and just offer a (ICU)CollationFields that provide an easy
way to do this, so you would just add one of these to your Lucene
Document.
I started a
On Sat, Nov 3, 2012 at 7:35 PM, Igal @ getRailo.org i...@getrailo.org wrote:
hi,
I want to make sure that every comma (,) and semi-colon (;) is followed by a
space prior to tokenizing.
the idea is to then use a WhitespaceTokenizer which will keep commas but
still split the phrase in a case
101 - 200 of 503 matches
Mail list logo