, but if they don't produce the same token stream at
both the token and character/byte level, your queries may fail.
Rule #1 with Lucene and Solr - always be prepared to completely reindex your
data, precisely because ideas about analysis evolve over time.
-- Jack Krupansky
-Original Message-
From
/PerFieldSimilarityWrapper.html
This appear to be the change from that original design:
https://issues.apache.org/jira/browse/LUCENE-3749
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Sunday, December 02, 2012 9:54 AM
To: java-user
Subject: Re: Using alternative scoring
analyzer.
-- Jack Krupansky
-Original Message-
From: Johannes.Lichtenberger
Sent: Friday, November 30, 2012 10:15 AM
To: java-user@lucene.apache.org
Cc: Michael McCandless
Subject: Re: What is flexible indexing in Lucene 4.0 if it's not the
ability to make new postings codecs?
On 11/28/2012
aren't appropriate on
user lists.
-- Jack Krupansky
-Original Message-
From: sri krishna
Sent: Tuesday, November 27, 2012 12:36 PM
To: java-user@lucene.apache.org
Subject: How does lucene handle the wildcard and fuzzy queries ?
How does lucene handle the prefix queries(wild card
calculations.
Some of these complex queries are constant score for performance reasons.
-- Jack Krupansky
-Original Message-
From: sri krishna
Sent: Tuesday, November 27, 2012 12:38 PM
To: java-user
Subject: handling different scores related to queries
for a search string hello*~ how
Add debugQuery=true to your query and look at the explain section to see
how the scoring is calculated for each document. Sometimes it is
counter-intuitive and some factors may differ but those differences can be
overwhelmed by other, unrelated factors.
-- Jack Krupansky
-Original
#explain(org.apache.lucene.search.Query,
int)
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, November 21, 2012 11:44 AM
To: java-user@lucene.apache.org
Subject: Re: Question about ordering rule of SpanNearQuery
Add debugQuery=true to your query and look
quite well, without
highlighting the actual indexed term, which can be quite ugly.
-- Jack Krupansky
-Original Message-
From: Elmer van Chastelet
Sent: Wednesday, November 21, 2012 8:49 AM
To: java-user@lucene.apache.org
Subject: Re: Which stemmer?
I've just created a small web
into a
single string, line delimiters and all. Be careful about encoding though.
-- Jack Krupansky
-Original Message-
From: Mansour Al Akeel
Sent: Tuesday, November 20, 2012 1:19 PM
To: java-user
Subject: Line feed on windows
Hello all,
We are indexing and storing files contents in lucene
Unfortunately, there doesn't appear to be any Javadoc that discusses what
factors are used to score spans. For example, how to relate the number of
times a span matches in a document vs. the exactness of each span match.
-- Jack Krupansky
-Original Message-
From: 杨光
Sent: Monday
Sounds like you want path to be a unique key field. So, just do a Lucene
search with a TermQuery for the path, which should return one document. No
need to mess with Lucene internal doc ids.
-- Jack Krupansky
-Original Message-
From: wgggfiy
Sent: Saturday, November 17, 2012 8:08 AM
, and
this is independent of what gets returned for a stored field. The stem is
simply the means to THAT end.
The fact that dog and dogs are not equivalent in KStem is in fact
disheartening, at least to me, but it may not be problematic in some use
cases.
-- Jack Krupansky
-Original Message
gave a handful of examples
that illustrated how some common words are stemmed.
-- Jack Krupansky
-Original Message-
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
To: java-user@lucene.apache.org
Subject: Which stemmer?
Does anyone have any experience with the stemmers
/lucene/analysis/en/EnglishMinimalStemFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html
-- Jack Krupansky
-Original Message-
From: Scott Smith
Sent: Wednesday, November 14, 2012 5:17 PM
To: java-user
(or
author_sorted or author_string) field that you copy the name to:
copyField source=author dest=author_s /
Query on author, but sort on author_s.
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Monday, November 12, 2012 5:28 AM
To: java-user
Subject: Re: content
.
Tell us the full problem and then we can focus on legitimate solutions.
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Sunday, November 04, 2012 8:06 AM
To: java-user
Subject: Re: using CharFilter to inject a space
Ahh, I don't know of a better way. I can imagine complex
:
http://wiki.apache.org/solr/ExtractingRequestHandler
Whether the Azure distribution is full Solr including Solr Cell or not, I
cannot answer.
Note: For future reference, Solr questions should be asked on the
solr-user mailing list.
-- Jack Krupansky
-Original Message-
From: Aloke
How about DirectoryReader.html#openIfChanged?
See:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader)
-- Jack Krupansky
-Original Message-
From: Scott Smith
Sent: Friday, October 26, 2012 7:54
to score them.
-- Jack Krupansky
-Original Message-
From: Paul Taylor
Sent: Thursday, October 25, 2012 7:11 AM
To: java-user@lucene.apache.org
Subject: Is there anything in Lucene 4.0 that provides 'absolute' scoring so
that i can compare the scoring results of different searches
Lucene document id.
-- Jack Krupansky
-Original Message-
From: Ravikumar Govindarajan
Sent: Thursday, October 25, 2012 6:10 AM
To: java-user@lucene.apache.org
Subject: App supplied docID in lucene possible?
We have the need to re-index some fields in our application frequently.
Our
With edismax in Solr 3.6/4.0 field aliases are supported:
The syntax for aliasing is f.myalias.qf=realfield. A user query for
myalias:foo will be queried as realfield:foo.
See:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming
-- Jack Krupansky
-Original Message
OR allergies IS NULL would be OR (*:* -allergies:[* TO *]) in
Lucene/Solr.
-- Jack Krupansky
-Original Message-
From: Vitaly Funstein
Sent: Thursday, October 25, 2012 8:25 PM
To: java-user@lucene.apache.org
Subject: Re: query for documents WITHOUT a field?
Sorry for resurrecting
://issues.apache.org/jira/browse/LUCENE-4386
I think it is:
new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
Use a SHOULD of that rather than a second level of BooleanQuery. Let us know
if it actually works!
-- Jack Krupansky
-Original Message-
From: Vitaly Funstein
Annex #29. That is a standard.
See:
http://lucene.apache.org/core/4_0_0-ALPHA/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html
http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html
-- Jack Krupansky
I didn't explicitly say it, but ClassicAnalyzer does do exactly what you
want it to do - work break plus email and URL, or StandardAnalyzer plus
email and URL.
-- Jack Krupansky
-Original Message-
From: kiwi clive
Sent: Wednesday, October 24, 2012 1:27 PM
To: java-user
s/work break/word break/
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, October 24, 2012 3:52 PM
To: java-user@lucene.apache.org ; kiwi clive
Subject: Re: StandardAnalyzer functionality change
I didn't explicitly say it, but ClassicAnalyzer does do exactly
source for all non-stored
fields, or at least non-stored fields which are not copied from another
field which is stored.
-- Jack Krupansky
-Original Message-
From: Shaya Potter
Sent: Monday, October 22, 2012 3:47 PM
To: java-user@lucene.apache.org
Subject: Re: understanding the need
The scape merely assures that the slash will not be parsed as query syntax
and will be passed directly to the analyzer, but the standard analyzer will
in fact always remove it. Maybe you want the white space analyzer or keyword
analyzer (no characters removed.)
-- Jack Krupansky
That's The escape merely...
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Monday, October 01, 2012 9:58 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for a search string containing a literal slash
doesn't work with QueryParser
The scape merely assures
You can apply the lower case filter to the whitespace or other analyzer and
use that as the analyzer.
-- Jack Krupansky
-Original Message-
From: Jochen Hebbrecht
Sent: Monday, October 01, 2012 10:34 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for a search string
Sorry, I meant apply the filter to the TOKENIZER that the analyzer uses.
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Monday, October 01, 2012 10:44 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for a search string containing a literal slash
doesn't
it to override the createComponents method
that creates the StopFilter, so you would essentially have to copy the
source for SnowballAnalyzer and then add in the code to invoke
StopFilter.setEnablePositionIncrements the way StopFilterFactory does.
-- Jack Krupansky
-Original Message-
From
Could you suggest the code for a mock field cache? I mean, what would an
anonymous instance look like.
-- Jack Krupansky
-Original Message-
From: karsten-s...@gmx.de
Sent: Tuesday, September 18, 2012 9:07 AM
To: java-user@lucene.apache.org
Subject: Re: how to disable the field cache
should escape all special characters, and then add the fuzzy
query. Note: In 4.0 the fuzzy query is limited to an editing distance of 2.
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Monday, September 17, 2012 10:41 AM
To: java-user@lucene.apache.org
Subject: how to fully
Either is fine. In fact just escape based on the individual character, not
the context. The multi-character context is telling you places where escape
is not essential, but that doesn't mean it would hurt.
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Monday
My personal estimate is that it will likely be within a week or two, but
there is no official date.
-- Jack Krupansky
-Original Message-
From: sausarkar
Sent: Friday, September 14, 2012 1:05 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 4.0 GA time frame
Now that the BETA
/
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType
That mapping-ISOLatin1Accent.txt file maps or folds all the accented
characters into the base ASCII letter.
-- Jack Krupansky
-Original Message-
From: Robert Streitberger
Sent: Wednesday, September 12, 2012 8:45
MappingCharFilter can do all of that. The file I referenced already has ae,
oe, and ss. That default file handles your umlauts differently, but you can
change the rules to suit your exact needs.
-- Jack Krupansky
-Original Message-
From: Robert Streitberger
Sent: Wednesday
Follow the instruction here:
http://lucene.apache.org/core/discussion.html
-- Jack Krupansky
-Original Message-
From: Noopur Julka
Sent: Monday, September 10, 2012 12:43 PM
To: java-user@lucene.apache.org
Cc: Dhananjeyan Balaretnaraja
Subject: Re: How to create a Lucene in-memory
See:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201208.mbox/%3c42b9aa72526143469066683a1a2fe86a95a...@clg-exch1.clg.Local%3E
Just plug in whatever analyzer you use for indexing.
-- Jack Krupansky
From: Damian Birchler
Sent: Monday, August 27, 2012 10:30 AM
To: mailto:java-user
file format (LUCENE-3082).
Bottom line: Write a test and see for yourself.
-- Jack Krupansky
-Original Message-
From: Sitowitz, Paul
Sent: Wednesday, August 22, 2012 1:31 PM
To: java-user@lucene.apache.org
Cc: sitow...@gmail.com
Subject: Lucene Index backward compatibility related question
I can't speak for any non-Latin languages, but how about simply using the
StandardAnalyzer plus the EdgeNGramFilter for indexing (but not query.) The
latter would allow a query of run to match running.
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Friday, August 24
BooleanQuery at the Lucene Query level.)
Read more detail at:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/
-- Jack Krupansky
-Original Message-
From: heikki
Sent: Thursday, August 23, 2012 8:09 AM
To: java-user@lucene.apache.org
Subject: Question about BooleanQuery
causing the problem since your
nested query expression was not fully specified - it needed
booleanField_1:true.
But, of course, I am guessing what you really want.
-- Jack Krupansky
-Original Message-
From: heikki
Sent: Thursday, August 23, 2012 8:38 AM
To: java-user@lucene.apache.org
And (NOT booleanField_2 = true ) is really just booleanField_2 = false,
right? Unless you are looking for fields that are not populated with any
value in addition to an explicit false value.
-- Jack Krupansky
-Original Message-
From: heikki
Sent: Thursday, August 23, 2012 9:13 AM
that distance. That gives you a basic BooleanQuery with AND clauses
converted to spans.
-- Jack Krupansky
-Original Message-
From: Dave Seltzer
Sent: Tuesday, August 21, 2012 6:53 PM
To: java-user@lucene.apache.org
Subject: Creating Span Queries from Boolean Queries
Hi Everyone
You could index the values in both a text and a separate string field.
Then you can query the text field by keyword as well as the string field by
the full literal value, or as a wildcard or prefix query (e.g.,
Microsoft*), or as a range query with the full literal string values.
-- Jack
Although not directly related to UI, the syntax for span queries supported
by the LucidWorks Search platform may give you some ideas for how spans can
be composed into larger queries:
http://lucidworks.lucidimagination.com/display/lweug/Proximity+Operations
-- Jack Krupansky
-Original
to make such a large request work, but it may not be worth
the effort.
-- Jack Krupansky
-Original Message-
From: Ralf Heyde
Sent: Tuesday, August 14, 2012 7:45 AM
To: java-user@lucene.apache.org
Subject: Solr adding Documents / Commit in different Threads
Hello,
we currently facing
Please send such inquiries to the Solr user email list, not the Lucene user
list.
-- Jack Krupansky
-Original Message-
From: Ralf Heyde
Sent: Tuesday, August 14, 2012 7:45 AM
To: java-user@lucene.apache.org
Subject: Solr adding Documents / Commit in different Threads
Hello,
we
Add qp.setAutoGeneratePhraseQueries = true before calling qp.parse.
Otherwise, the query (clause of the larger BooleanQuery) will be the same as
cla OR war, which will match all war documents, plus any cla
documents you may have.
-- Jack Krupansky
-Original Message-
From
= new SpanMultiTermQueryWrapperWildcardQuery(wq);
// will only match jumps over extremely very lazy broxn dog
SpanFirstQuery sfq = new SpanFirstQuery(swq, 3);
assertEquals(1, searcher.search(sfq, 10).totalHits);
}
Or, is the issue simply a peculiarity of getSpans?
-- Jack Krupansky
-Original
got no hits with autoGeneratePhraseQueries - which
suggests that maybe the index didn't use the same analyzer or maybe the
literal text in the title is not exactly what you think it is.
You could use the WhitespaceAnalyzer, but that would leave leading and
trailing punctuation.
-- Jack
really do need a wiki page for Lucene term analysis.
-- Jack Krupansky
-Original Message-
From: Bill Chesky
Sent: Friday, August 03, 2012 9:19 AM
To: simon.willna...@gmail.com ; java-user@lucene.apache.org
Subject: RE: Analyzer on query question
Thanks Simon,
Unfortunately, I'm using Lucene
such as stemming)
becomes unnecessary and risky if you are not very careful or very lucky.
-- Jack Krupansky
-Original Message-
From: Ian Lea
Sent: Friday, August 03, 2012 1:12 PM
To: java-user@lucene.apache.org
Subject: Re: Analyzer on query question
Bill
You're getting the snowball
);
BytesRef bytes = termAtt.getBytesRef();
return new Term(BytesRef.deepCopyOf(bytes));
} else
return null;
// TODO: Close the StringReader
// TODO: Handle terms that analyze into multiple terms (e.g., embedded
punctuation)
}
-- Jack Krupansky
-Original Message-
From: Bill Chesky
Sent
hope). In theory, this will give guarantee fidelity of the query and
improve performance (the toString/parse round-trip is not cheap/free.)
As I said, the toString/reparse may indeed work for your specific use-case,
but isn't quite ideal for general use.
-- Jack Krupansky
-Original Message
.
The bottom line is that Lucene/Solr 4.0 GA in the fall (before December) is
a reasonable expectation, based on the code's current trajectory.
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Tuesday, July 31, 2012 8:46 AM
To: java-user@lucene.apache.org
Subject: Re
was created and scan for files near that same date.
They may also have a cron job that does incremental updates - look for the
crontab.
-- Jack Krupansky
-Original Message-
From: Rodrigo P. Bregalanti
Sent: Saturday, July 28, 2012 10:09 AM
To: java-user@lucene.apache.org
Subject
The filter should work (remove the letter and apostrophe).
Could you supply an exact code fragment that shows the literal term, the
code invoking the filter, and the exact literal output?
And, which release of Lucene?
-- Jack Krupansky
-Original Message-
From: yamo93
Sent
, such as to assure that the match occurs at
the beginning, near the beginning, or end of a document.
See:
http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/spans/SpanPositionRangeQuery.html
-- Jack Krupansky
-Original Message-
From: Levin, Ilya
Sent: Sunday, July 22
The query parser/analyzer is lower-casing the query terms automatically. You
have to do the same with with terms for BooleanQuery - Term(cs-method,
GET) should be Term(cs-method, get).
StandardAnalyzer is doing the lower-casing.
-- Jack Krupansky
-Original Message-
From: Deepak
Yes, I failed to notice that the removal of the slash was yet another
instance of the analyzer transforming its input. But the bottom line is that
you must do 100% of the same steps that analysis performs. If in doubt, pass
your literals through the standard analyzer itself.
-- Jack Krupansky
situations, but maybe it would work well for your case.
-- Jack Krupansky
-Original Message-
From: Ian Lea
Sent: Wednesday, July 04, 2012 4:00 AM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search
In fact there is an FAQ entry Can I combine wildcard
Oops... that's EdgeNGramTokenFilter in Lucene.
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, July 04, 2012 4:52 PM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search
Here's a Solr field type that supports edge n-grams
You didn't show us your luceneQuery, but the gist of the solution is to
use MUST clauses for each of the individual terms and then a SHOULD of the
phrase. You can add an additional boost to the phrase, but lucene should
naturally boost documents containing the phrase.
-- Jack Krupansky
through the field analyzer for the desired
field type.
-- Jack Krupansky
-Original Message-
From: lis...@alphamatrix.org
Sent: Monday, June 25, 2012 4:12 PM
To: java-user@lucene.apache.org
Subject: Re: how to remove the dash
More information...
If I change
System.out.println(Query
Oopd... I was mistaken to suggest that a simple term query would invoke
the field analyzer - it passes the literal text without invoking any field
analyzer.
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Monday, June 25, 2012 10:14 PM
To: java-user@lucene.apache.org
and
highlighting on the limited body field.
-- Jack Krupansky
-Original Message-
From: Paul Hill
Sent: Friday, June 22, 2012 2:23 PM
To: java-user@lucene.apache.org
Subject: Fast way to get the start of document
Our Hit highlighting (Using the older Highlighter) is wired with a too
huge limit, so we
FunctionQuery, ValueSource, and TermFreqValueSource.
See:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
-- Jack Krupansky
-Original Message-
From: Mike Sokolov
Sent: Saturday, June 16, 2012 2:33 PM
To: java-user@lucene.apache.org
Subject: filter by term
Look at this code: QueryTermExtractor.getTerms(Query query)
http://lucene.apache.org/core/3_6_0/api/contrib-highlighter/org/apache/lucene/search/highlight/QueryTermExtractor.html
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Thursday, June 14, 2012 2:36 PM
To: java
Try putting the phone number in quotes in the query:
String qstr = \800-555-1212\;
And check query.toString to see how the query parser analyzed the term, bot
with and without quotes.
And make sure you initialized the query parser with contents as the
default field.
-- Jack Krupansky
size is
reached.
Then the code compares the number of segments eligible for merge to that
limit. If over that limit, the code then scores each merge and selects the
merge with the best score.
-- Jack Krupansky
-Original Message-
From: thomas
Sent: Tuesday, June 12, 2012 4:43 AM
.
That said, maybe you could clarify your specific intent with an example.
Maybe you simple want to internally call some existing stemmer filter and
output both the original and stemmed term at the same location?
-- Jack Krupansky
-Original Message-
From: Paul Hill
Sent: Tuesday, June 12, 2012
I forgot about the Solr/Lucene code shuffling. Back in 3.4, WDF was in Solr
rather than Lucene. Here's the code:
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?revision=1166268view=markup
-- Jack Krupansky
Is the process taking longer or even a lot longer than with the earlier
release of Lucene?
Is the amount of available JVM memory so low that repeated garbage
collections might be occurring?
-- Jack Krupansky
-Original Message-
From: Paul Hill
Sent: Monday, June 11, 2012 12:56 PM
uses the default set of stopwords
(which includes “or” and “in”)?
Try passing null as the second argument to ClassicAnalyzer – it disables the
default stop word list.
-- Jack Krupansky
From: Bob Rhodes
Sent: Thursday, June 07, 2012 1:50 PM
To: java-user@lucene.apache.org
Subject: easy one
, so copying data
to Java heap space is not useful.
-- Jack Krupansky
-Original Message-
From: Cheng
Sent: Monday, June 04, 2012 10:08 AM
To: java-user@lucene.apache.org
Subject: RAMDirectory unexpectedly slows
Hi,
My apps need to read from and write to some big indexes frequently. So
consuming memory.
-- Jack Krupansky
-Original Message-
From: nishesh.gu...@emc.com
Sent: Friday, June 01, 2012 7:53 PM
To: java-user@lucene.apache.org
Subject: OOM during IndexReader open
Hi,
I am getting the following OOM consistently whenever the index is opened .
Is it because now
have individual fuzzy terms within a
span - which can be more complex than a simple phrase:
https://issues.apache.org/jira/browse/LUCENE-2754
-- Jack Krupansky
-Original Message-
From: harish...@thomsonreuters.com
Sent: Friday, June 01, 2012 2:50 AM
To: java-user@lucene.apache.org
retrieve the text to be re-indexed - if and only if it was indexed in stored
fields.
-- Jack Krupansky
-Original Message-
From: 黄靖宇
Sent: Friday, January 21, 2011 4:04 AM
To: java-user@lucene.apache.org
Subject: How to rebuild index
Hi,
I am new to lucene. Recently I was assigned
do
exactly what you asked.
-- Jack Krupansky
-Original Message-
From: Camden Daily
Sent: Friday, January 21, 2011 10:15 AM
To: java-user@lucene.apache.org
Subject: Performing a query on token length
Hello all,
Does anyone know if it is possible in Lucene to do a query based
Oops... I only solved half the problem, the other half was to limit length
to 20, which would be done with a negated leading wildcard of 21 question
marks:
first_name:??* -first_name:?*
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent
that query
term will fail to match anything because the existence of the wildcard
suppresses all analysis, including the lowercasing. Once again, different
query parsers may behave differently.
-- Jack Krupansky
-Original Message-
From: Amin Mohammed-Coleman
Sent: Thursday, January
201 - 284 of 284 matches
Mail list logo