I'm having trouble with the IndexReader class as per below: (using
lucene 2.9.1)
RAMDirectory dir = new RAMDirectory();
createIndex(dir);
IndexReader reader = IndexReader.open(dir);
IndexReader reader2 = reader.reopen();
reader.close();
reader2.terms(); // AlreadyClosedException - this IndexReader
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29
...
If the index has not changed since this instance was (re)opened, then
this call is a NOOP and returns this instance
--
Ian.
On Fri, Jul 30, 2010 at 9:16 AM, Gregory Tarr wrote:
> I'm having t
Hi all,
I'd like to do a very simple change to the idf computation, but I can't seem
to wrap my head around it.
There are very useful hints in the javadocs for "Changing Similarity" for
new tf() and lengthNorm() behavior, but it was a little bit blurrier for
idf()
http://lucene.apache.org/java/3_0
Hello everyone,
I'm using lucene for obvious purposes and I'm trying to highlight
search-term results.
libraries I use:
lucene-core version: 3.0.2
lucene-highlighter version: 3.0.2
Dev-System:
WinXP Pro 32Bit, jdk1.6.0_20,
java version "1.6.0_20"
Java
Your linux set up is evidently missing a jar file - the one that contains
org/apache/lucene/index/memory/MemoryIndex. Or it is there but not in the
CLASSPATH, or something else along those lines.
--
Ian.
On Fri, Jul 30, 2010 at 2:30 PM, Markus Roth wrote:
>
>
> Hello everyone,
>
> I'm using
First of all, thanks for your response.
But how can that be true if a search-term without a wildcard (and the
highlighting of the results) works fine?
Greetings,
Markus
Ian Lea
Because the highlighter only uses MemoryIndex if wildcards are involved? I
don't use the highlighter package so have no idea if that is correct or not,
but the message
java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex
is clear. The jvm can't find that class.
--
Ian.
Well it turns out that your suggestion was true. I added
lucene-memory-3.0.2.jar from the contrib/memory folder to the CLASSPATH and
it works.
The odd thing is that I most definitely have not added the jar to the CP in
Windows - and there wildcards work (with just core and highlight).
Thanks
Any hints on making something like an InverseWildcardQuery?
We're trying to find all documents that have at least one field that doesn't
match the wildcard query.
Or is there a way to inverse any particular query?
-
To
I can't get my head round exactly what you want, but a standard lucene
technique is a BooleanQuery holding a MatchAllDocsQuery and a second
query, can be anything, having Occur.MUST_NOT. I guess that is a way
of inverting the second query.
--
Ian.
On Fri, Jul 30, 2010 at 3:29 PM, Justin wrote
Hey,
I was wondering if we can search info from a subset of papers
instead of from the whole index pool.
Thanks,
Shuai
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-u
I think you're suggesting, for example, "*:* AND -myfield:foo*".
If my document contains "myfield:foobar" and "myfield:dog", the document would
be thrown out because of the first field. I want to keep the document because
the second field does not match.
Related, is there a way to use wildcards
Working on the nightly build of solr and lucene -
MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words
defined as synonyms
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 5
at
org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:191)
> I think you're suggesting, for example, "*:* AND -myfield:foo*".
Yes, I think that is equivalent.
> If my document contains "myfield:foobar" and "myfield:dog", the document would
> be thrown out because of the first field. I want to keep the document because
> the second field does not match.
With all these requirements you slow down your queries immense. You should
think about indexing your terms different:
- if you need leading wildcards, think about indexing your terms in reverse!
Wildcards starting with * needs to iterate all terms, so it's very slow (and
because of this defaults t
> indexing your terms in reverse
Unfortunately the suffix requires a wildcard as well in our case. There are a
limited number of prefixes though (10ish), so perhaps we could combine them all
into one query. We'd still need some sort of InverseWildcardQuery
implementation.
> use another analyze
Hi Justin,
> [...] "*:* AND -myfield:foo*".
>
> If my document contains "myfield:foobar" and "myfield:dog", the document
> would be thrown out because of the first field. I want to keep the
> document because the second field does not match.
I'm assuming that you mistakenly used the same field n
> assuming that you mistakenly used the same field name
Nope, wasn't a mistake. We'd have to dynamically iterate through an unknown
number of fields if we didn't use the same one.
- Original Message
From: Steven A Rowe
To: "java-user@lucene.apache.org"
Sent: Fri, July 30, 2010 11:1
Depending on what exactly you mean by "subset" and "index pool", then yes.
If you've got one lucene index containing docs
docno: 1
category: computers
text: some words about computers
docno: 2
category: computers
text: some more words about computers
docno: 3
category: finance
text: some words
Hi Justin,
> Unfortunately the suffix requires a wildcard as well in our case. There
> are a limited number of prefixes though (10ish), so perhaps we could
> combine them all into one query. We'd still need some sort of
> InverseWildcardQuery implementation.
>
> > use another analyzer so you don'
> an example
PerFieldAnalyzerWrapper analyzers =
new PerFieldAnalyzerWrapper(new KeywordAnalyzer());
// myfield defaults to KeywordAnalyzer
analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion,
"English"));
// analyzers affects the indexed field value
IndexWriter writer = new I
Nice catch -- thanks! I will fix.
Mike
On Fri, Jul 30, 2010 at 11:20 AM, jayendra patil
wrote:
> Working on the nightly build of solr and lucene -
>
> MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words
> defined as synonyms
>
> SEVERE: java.lang.ArrayIndexOutOfBoundsException
Sorry for the confusion..
Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as
the unique IDs)
in the lucene index. We were wondering if we can search for a given term in a
subset of these papers
(eg, 30 papers; by providing a list of the pubmed IDs) instead of search
Hi Justin,
> > an example
>
> PerFieldAnalyzerWrapper analyzers =
> new PerFieldAnalyzerWrapper(new KeywordAnalyzer());
> // myfield defaults to KeywordAnalyzer
> analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion,
> "English"));
> // analyzers affects the indexed field valu
Hi Ian,
In your example below, how do we set the parameters so we can search for
"category:computers" AND "text:words"?
Thanks,
Shuai
On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:
> Depending on what exactly you mean by "subset" and "index pool", then yes.
>
> If you've got one lucene index co
Mike,
We took your suggestion and refactored like this:
TermEnum termEnum = indexReader.terms(new Term(field, "0"));
TermDocs allTermDocs = indexReader.termDocs();
while(termEnum.next() && termEnum.term().field().equals(field) {
allTermsDocs.seek(termEnum);
while(allTermDocs.next()) {
> you want what Lucene already does, but that's clearly not true
Hmmm, let's pretend that "contents" field in my example wasn't analyzed at
index
time. The unstemmed form of terms will be indexed. But if I query with a
stemmed
form or use QueryParser with the SnowballAnalyzer, I'm not going to
> > you want what Lucene already does, but that's clearly not true
>
> Hmmm, let's pretend that "contents" field in my example wasn't analyzed at
> index
> time. The unstemmed form of terms will be indexed. But if I query with a
> stemmed
> form or use QueryParser with the SnowballAnalyzer, I'm
> make both a stemmed field and an unstemmed field
While this approach is easy and would work, it means increasing the size of the
index and reindexing every document. However, the information is already
available in the existing field and runtime analysis is certainly faster than
more disk I/O
Yes, you can do that. Make a Query for the 30 papers and use that
with your main query in a BooleanQuery if doing it programatically.
Or with so few documents and papers to match, just in a long string
via QueryParser. See
http://lucene.apache.org/java/3_0_2/queryparsersyntax.html for details
on
I just tried the long query string as you suggested and it works great.
Thanks,
Shuai
On Jul 30, 2010, at 1:35 PM, Ian Lea wrote:
> Yes, you can do that. Make a Query for the 30 papers and use that
> with your main query in a BooleanQuery if doing it programatically.
> Or with so few documents
31 matches
Mail list logo