You failed to disclose up front that you are using such an old release of
Lucene. Lucene is now on 6.0. I'll defer to others if they wish to provide
support for such an old release.
-- Jack Krupansky
On Mon, Apr 18, 2016 at 8:01 AM, PK C <tech.kumar...@gmail.com> wrote:
> Hi,
>
&g
The standard analyzer/tokenizer should do a decent job of splitting on dot,
hyphen, and underscore, in addition to whitespace and other punctuation.
Can you post some specific test cases you are concerned with? (You should
always run some test cases.)
-- Jack Krupansky
On Tue, Apr 12, 2016
There is no simple, direct way to do this "Boolean Reverse Query" in
Lucene, but I suggest filing a Jira to request this as a feature
improvement/new feature.
-- Jack Krupansky
On Fri, Mar 25, 2016 at 11:43 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:
> Hi Otmar,
>
Are you calling the IndexSearcher#explain method to get the details of the
score calculation?
How exactly are your results not what you expect?
What Similarity are you using? Scores will be the product of the underlying
calculated scores and you term boost values.
-- Jack Krupansky
On Thu, Mar
BooleanQuery can be nested, so you do a top-level BQ that has two clauses,
the first a TQ for a:x and the second another BQ that itself has two
clauses, both SHOULD.
-- Jack Krupansky
On Tue, Mar 8, 2016 at 4:38 AM, sandeep das <yarnhad...@gmail.com> wrote:
> Hi,
>
> I'm usi
LE binding.
-- Jack Krupansky
source line. And then there is the issue of code sequences that span source
lines.
-- Jack Krupansky
On Mon, Feb 15, 2016 at 8:30 AM, Kudrettin Güleryüz <kudret...@gmail.com>
wrote:
> Since documents are source code, I am considering matching on operators
> too.
>
> Using whitespa
ate string (not
tokenized text) field and then you can do a complex regex that spans terms
(and only do that if normal span queries don't do what you need.)
What does your typical cross-term regex actually look like?
-- Jack Krupansky
On Sat, Feb 13, 2016 at 1:25 PM, Uwe Schindler <u...@theta
no code that would tell the
analyzer that "tag" is a defined field.
Also, I see no value to having the single-clause BooleanQuery wrapped
around the actual query.
-- Jack Krupansky
On Wed, Jan 27, 2016 at 12:52 PM, G.Long <jde...@gmail.com> wrote:
> Hi :)
>
> I would like to
Be sure to check and see if your app is compute or I/O bound during this
process - whether too little of your index is cached in system memory and
each query requires I/O, lots of it.
-- Jack Krupansky
On Thu, Jan 21, 2016 at 1:52 PM, Doug Turnbull <
dturnb...@opensourceconnections.com>
It looks like you attempted to quote the URL in your query using
apostrophes (sometimes referred to as single quotes), but you need to use
quote (sometimes referred to as double quote).
Change:
id:'http://www.yahoo.com'
to:
id:"http://www.yahoo.com;
-- Jack Krupansky
On Sun, Dec 27, 2015
, but is deprecated and has been relegated to the sand box,
so it is not really usable going forward:
http://lucene.apache.org/core/5_4_0/sandbox/index.html?org/apache/lucene/sandbox/queries/SlowFuzzyQuery.html
-- Jack Krupansky
On Tue, Dec 22, 2015 at 4:02 AM, Yonghui Zhao <zhaoyong...@gmail.
The standard answer is that you need to reindex all of your data.
-- Jack Krupansky
On Thu, Dec 17, 2015 at 6:10 AM, Kumaran Ramasubramanian <kums@gmail.com
> wrote:
> Dear All
>
> i am using lucene 4.10.4. Is there any more information i missed to
> provide?
Delete the full index and create from scratch with the correct field type,
re-adding all documents. Any remnants of the old field must be removed.
-- Jack Krupansky
On Thu, Dec 17, 2015 at 11:48 AM, Kumaran R <kums@gmail.com> wrote:
> While Reindexing only am facing this problem.
You could certainly read your stored values from your current index and
then write new documents to a new index and then use the new index. That's
if all of the indexed field values are stored.
-- Jack Krupansky
On Thu, Dec 17, 2015 at 2:10 PM, Kumaran Ramasubramanian <kums@gmail.com
>
/DictionaryCompoundWordTokenFilterFactory.html
The doc is weak. I do have some examples in my old Solr 4.x Deep Dive
e-book:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
You might also be able to achieve a similar effect with synonyms, but again
only
/similarities/TFIDFSimilarity.html
https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/search/similarities/BM25Similarity.html
-- Jack Krupansky
On Sun, Dec 13, 2015 at 8:30 AM, Shay Hummel <shay.hum...@gmail.com> wrote:
> Hi
>
> I need help to implement similarity between query mod
You didn't post your code that creates the index. Make sure you are using a
tokenized TextField rather than a single-token StringField.
-- Jack Krupansky
On Fri, Nov 27, 2015 at 4:06 PM, Kunzman, Douglas * <
douglas.kunz...@fda.hhs.gov> wrote:
> Hi -
>
> This is my first Luc
for a
significant sample of realistic data and then you can empirically deduce
who the big-O function is for your particular application data and data
model.
-- Jack Krupansky
On Fri, Nov 20, 2015 at 4:38 AM, Adrien Grand <jpou...@gmail.com> wrote:
> I don't think the big-O notation is ap
, so if you need to keep that entire string as one term,
use the whitespace tokenizer. That said, treating hyphen as a word break is
usually not a problem as long as you enable auto phrase generation for the
query parser.
-- Jack Krupansky
On Mon, Oct 5, 2015 at 4:06 AM, Bhaskar <bhask
Technically, there is no such thing as a "sentence search" in Lucene.
Please provide an example of how you wish to search, and then we can
determine whether a phrase query or a span query might accomplish the task.
-- Jack Krupansky
On Thu, Oct 1, 2015 at 11:53 AM, Bhaskar <bhaskar1
Phrase query for a tokenized text field should do it.
-- Jack Krupansky
On Thu, Oct 1, 2015 at 10:04 PM, Bhaskar <bhaskar1...@gmail.com> wrote:
> Hi Jack,
>
> my searching is working like this.
>
> if i give input as "SD RAM Bhaskar" then which ever strings are
was really how to get case-sensitive query, simply create
your own analyzer without the lower case filter.
-- Jack Krupansky
On Fri, Aug 14, 2015 at 10:07 AM, Erick Erickson erickerick...@gmail.com
wrote:
Add LowercaseFilterFactory to your analysis chain for the fieldType
both at query and index time
ConstantScoreQuery is the proper approach. What specific failure did you
encounter?
-- Jack Krupansky
On Wed, Jul 29, 2015 at 7:09 AM, 丁儒 bfore...@126.com wrote:
Hi, all
Currently i'm using lucene. But i don't care the score and weight, i
just need the documents meets the query. I tried
/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean)
-- Jack Krupansky
On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti socac...@gmail.com wrote:
Hi all,
i'm new to lucene and tried to write my own analyzer to support
hyphenated words like wi-fi, jean-pierre, etc.
For our customer it is important
://lucene.apache.org/core/5_2_0/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html
You can also simply escape the spaces with a backslash rather than quote
the entire term, but you still need to use the keyword analyzer.
-- Jack Krupansky
On Fri, Jun 19, 2015 at 2:31 AM, Gimantha
the sentence boundaries are? Be specific, because that determines
what your queries should look like, which determines what the indexed text
should look like, which determines how the text should be analyzed.
-- Jack Krupansky
On Wed, Apr 15, 2015 at 8:12 AM, Shay Hummel shay.hum...@gmail.com wrote:
Hi
/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query,
int)
-- Jack Krupansky
On Fri, Apr 10, 2015 at 4:15 PM, Gregory Dearing gregdear...@gmail.com
wrote:
Hi Ali,
The short answer to your question is... there's no good way to create a
score from your result
/browse/ACCUMULO-3698
The SQRRL commercial product has (or at least had before the company
shifted its corporate strategy) Lucene indexing of Accumulo data, but
that's a proprietary product:
http://sqrrl.com/product/search/
-- Jack Krupansky
On Thu, Apr 9, 2015 at 6:33 AM, madhvi madhvi.gu
is always a great contribution.
-- Jack Krupansky
On Thu, Mar 26, 2015 at 8:15 PM, Erick Erickson erickerick...@gmail.com
wrote:
You really have to just pick a problem, dive into the code and learn
it bit by bit through exploration. The code base changes fast enough
that anything published
, everything runs
great on commodity hardware! Kool-Aid. IOW, running a 32GB index on a 16
GB box is probably not a great idea if you need low latency.
-- Jack Krupansky
On Tue, Mar 24, 2015 at 8:37 AM, Gaurav gupta gupta.gaurav0...@gmail.com
wrote:
Erick,
When further testing the index sizes using Lucene
This is the first mention that I have seen for that corpus on this list.
There seem to be more than a few references when I google for brown
corpus lucene, such as:
https://github.com/INL/BlackLab/wiki/Blacklab-query-tool
-- Jack Krupansky
On Tue, Feb 24, 2015 at 1:40 AM, Koji Sekiguchi
You could store the length of the field (in terms) in a second field and
then add a MUST term to the BooleanQuery which is a RangeQuery with an
upper bound that is the maximum length that can match.
-- Jack Krupansky
On Wed, Feb 18, 2015 at 4:54 AM, Ian Lea ian@gmail.com wrote:
You mean
documents have different capitalization of
Java/java.
-- Jack Krupansky
On Fri, Jan 23, 2015 at 9:54 AM, Rajendra Rao rajendra@launchship.com
wrote:
Hello
Reply to the mail, sent by Nitin We tried and this is what we got :
My query was dotNet^10.0 Resume:jdbc Resume:C# Resume:MVC
Documents
.
-- Jack Krupansky
On Thu, Jan 15, 2015 at 11:23 AM, danield danield...@gmail.com wrote:
Hi Mike,
Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and
therefore
the results will still be wrong, as prepending
/lucene/facet/FacetsCollector.java?revision=1634013view=markup
Any other particular features of Lucene 5 that you are particularly
interested in?
-- Jack Krupansky
On Sat, Jan 10, 2015 at 3:01 PM, Elad Margalit eladm...@gmail.com wrote:
Hi,
I would like to ask regarding Lucene 5,
Do you
Oops... I take that back! After I clicked Send I realized that this is the
Lucene list - what I said is true for Solr queries, but that is because
Solr added a hack to do things properly, but the Lucene query parser
doesn't have that hack, so Erick is correct.
-- Jack Krupansky
On Wed, Jan 7
that the above strategy would be reasonable, or do you need to process
large numbers of large documents.
-- Jack Krupansky
-Original Message-
From: ryanb
Sent: Tuesday, November 25, 2014 7:39 PM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError indexing large documents
Hello,
We
Oops... you sent this to the wrong list - this is the Lucene user list, send
it to the Solr user list.
-- Jack Krupansky
-Original Message-
From: Peter Keegan
Sent: Thursday, November 6, 2014 3:21 PM
To: java-user
Subject: Exceptions during batch indexing
How are folks handling Solr
Pure negative queries are not supported, but all you need to do is include
*:*, which translates into MatchAllDocsQuery.
hello dolly is the same as hello dolly~0
-- Jack Krupansky
-Original Message-
From: Prad Nelluru
Sent: Monday, October 27, 2014 8:57 PM
To: java-user
What is the value of the qf parameter? You don't have an explicit field
name such as title in your query string, q.
-- Jack Krupansky
-Original Message-
From: Aleksander Sadecki
Sent: Thursday, October 16, 2014 11:46 AM
To: java-user@lucene.apache.org
Subject: How to properly use
Oops... for future reference, please post Solr questions to the *Solr* user
list, not the *Lucene* (java) user list!
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Saturday, October 18, 2014 7:50 AM
To: java-user@lucene.apache.org
Subject: Re: How to properly use
-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
The free Solr Reference Guide has a short section on the Solr Term Vector
component. You could check it out before buying my $10 e-book.
See:
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
-- Jack
Yeah, I can be a moderator, for both Lucene and Solr.
-- Jack Krupansky
-Original Message-
From: Chris Hostetter
Sent: Tuesday, September 30, 2014 12:51 PM
To: java-user@lucene.apache.org
Cc: java-user-ow...@lucene.apache.org
Subject: NOTICE: Seeking Moderators for java-user@lucene
Yes, most special characters are treated as term delimiters, except that
underscores, dots, and commas have some special rules.
See the details under Standard Tokenizer in my Solr e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product
since
ES has some special things they do so that a raw Lucene index will unlikely
be compatible with ES, and to simple reindex your source data directly
into ES to take full advantage of ES.
-- Jack Krupansky
-Original Message-
From: Aditya
Sent: Friday, September 26, 2014 3:55 AM
on a refine results
button to re-do the search with the more expensive cross-corpus df-based
scoring.
Thoughts?
-- Jack Krupansky
-Original Message-
From: Baldwin, David
Sent: Friday, September 5, 2014 8:05 PM
To: java-user@lucene.apache.org
Subject: How to properly correlate relevance
/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java?revision=1622067view=markup
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Wednesday, September 3, 2014 7:14 PM
To: java-user
Subject: Re: Question regarding complex queries and long tail suggestions
Take a look
Use the ngram token filter, and the a query of 512 would match by itself:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Thursday, August 28, 2014 11:52 PM
-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html
-- Jack Krupansky
-Original Message-
From: Michael Sokolov
Sent: Wednesday, August 27, 2014 10:26 AM
To: java-user@lucene.apache.org
Subject: Re: Why does this search fail?
Tokenization is tricky. You might
://support.google.com/websearch/answer/136861?hl=en
It also seems to support ** in a quoted phrase to mean one or more
arbitrary terms. This isn't documented, but seems to work.
-- Jack Krupansky
-Original Message-
From: Milind
Sent: Wednesday, August 27, 2014 10:51 AM
To: java-user
are defined as multi-term, so they will be
performed, but the standard tokenizer is not being called, so the dot
remains and this whole term is treated as one term, unlike the index
analysis.
-- Jack Krupansky
-Original Message-
From: Milind
Sent: Tuesday, August 26, 2014 12:24 PM
on it:
https://issues.apache.org/jira/browse/LUCENE-5785
-- Jack Krupansky
-Original Message-
From: Sheng
Sent: Thursday, August 14, 2014 11:38 PM
To: java-user@lucene.apache.org
Subject: WhiteSpaceTokenizer
The length of token has to be shorter than 255, otherwise there will
be unpredictable
Sure, that should be a configurable option.
Oh, and I neglected to mention a workaround: use the pattern tokenizer,
which doesn't have a limit (yet.) But it might be slower.
-- Jack Krupansky
-Original Message-
From: Sheng
Sent: Friday, August 15, 2014 8:13 AM
To: java-user
The standard analyzer will discard most special characters as punctuation.
What analyzer are you using?
-- Jack Krupansky
-Original Message-
From: Scott Selvia
Sent: Thursday, August 14, 2014 7:42 PM
To: java-user@lucene.apache.org
Subject: Searching with String that Represents
And unfiltered. So even if you use the keyword tokenizer that only generates
a single token, you still want token filtering, such as lower case.
-- Jack Krupansky
-Original Message-
From: Christoph Kaser
Sent: Tuesday, August 12, 2014 3:07 AM
To: java-user@lucene.apache.org
Subject
The default changed to false in Lucene 3.1. Before that it was true.
-- Jack Krupansky
-Original Message-
From: Chris Salem
Sent: Tuesday, August 12, 2014 8:34 AM
To: java-user@lucene.apache.org
Subject: RE: escaping characters
Thanks! That worked.
We recently upgraded from 2.9
#setAutoGeneratePhraseQueries(boolean)
-- Jack Krupansky
-Original Message-
From: Chris Salem
Sent: Monday, August 11, 2014 1:03 PM
To: java-user@lucene.apache.org
Subject: RE: escaping characters
I'm not using Solr. Here's my code:
FSDirectory fsd = FSDirectory.open(new File(C
need to manually filter your query terms. Sounds like
maybe a term got stemmed.
-- Jack Krupansky
-Original Message-
From: Bianca Pereira
Sent: Thursday, August 7, 2014 7:28 AM
To: java-user@lucene.apache.org
Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency
Hi
Also, usually query-time analysis is done by a query parser, so if you
aren't going through a quwery parser, you have to call the aalyzer yourself.
The stemming is very likely the culprit here.
-- Jack Krupansky
-Original Message-
From: Uwe Schindler
Sent: Thursday, August 7, 2014
The standard tokenizer will strip off those escaped quotes at query time.
Ditto for the hyphen at index time.
Try constructing your own analyzer using the white space tokenizer instead
of the standard tokenizer.
-- Jack Krupansky
-Original Message-
From: itisismail
Sent: Friday
of your stop words,
or possibly a pattern that matches stop words plus a short suffix that might
get stemmed.
-- Jack Krupansky
-Original Message-
From: Arjen van der Meijden
Sent: Sunday, July 6, 2014 2:47 PM
To: java-user@lucene.apache.org
Subject: How to handle words that stem to stop
I'll defer the the hard-core Lucene committers for the technical details,
but I would suggest that a very large term with dozens of wildcards is a
known limitation (albeit not well-documented.) IOW, to use wildcards in
Lucene in a performant manner, they need to be brief.
-- Jack Krupansky
introduces
a regex query term. It is added by the escape method you call, but the
escaping will be gone by the time your analyzer is called.
So, just try a simple, unescaped slash in your char mapping table.
-- Jack Krupansky
-Original Message-
From: Luis Pureza
Sent: Tuesday, June 17
.
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 9, 2014 6:56 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
To me, it seems strange that these default analyzers, don't provide
constructors that enable one to override stemming, etc?
On 2014/06
Please do file a Jira. I'm sure the discussion will be interesting.
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 9, 2014 9:33 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
Jack
Thanks. I figured as much.
I'm modifying each analyzer
to
be indexed.
-- Jack Krupansky
-Original Message-
From: Johan Tibell
Sent: Tuesday, June 3, 2014 9:32 PM
To: java-user@lucene.apache.org
Subject: How to approach indexing source code?
Hi,
I'd like to index (Haskell) source code. I've run the source code through a
compiler (GHC) to get rich
a 256GB machine?
How frequent are your commits for updates while doing queries?
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 2, 2014 2:51 AM
To: java-user@lucene.apache.org
Subject: search performance
Greetings
Despite following all the recommended optimizations
(Was this supposed to be a java-user/Lucene question or a Solr question?!)
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Wednesday, May 21, 2014 10:58 AM
To: java-user
Subject: Re: Multi-thread indexing, should the commit be called from each
thread?
I'll be more
for a batch
update model as opposed to a true real-time database (it's a search engine,
not a database!), but... the original goals and requirements might give us
some insight.
Thanks.
-- Jack Krupansky
-Original Message-
From: Michael McCandless
Sent: Monday, May 19, 2014 6:10 AM
To: Lucene
Does your index fit fully in system memory - the OS file cache? If not,
there could be a lot of thrashing (I/O) as Lucene accesses the index.
-- Jack Krupansky
-Original Message-
From: Liviu Matei
Sent: Monday, May 19, 2014 4:21 PM
To: java-user@lucene.apache.org
Subject: Performance
The explain section of the debug response when you set the debugQuery=true
parameter will give you the final terms that were matched for each document.
-- Jack Krupansky
-Original Message-
From: venkatesham.gu...@igate.com
Sent: Saturday, May 17, 2014 2:28 AM
To: java-user
Oops... I just noticed that you sent this request to the java-user list,
which is primarily for developers using the Lucene library directly. Try
sending it to the solr-user list, which is for users and developers working
with Solr.
-- Jack Krupansky
-Original Message-
From
by having a tokenizer that that simply
ignored punctuation and whitespace and generated one big original token and
then n-grammed it based on some maximal query phrase size. And... the
original requirement spec didn't list that as a use case anyway.
-- Jack Krupansky
-Original Message-
From
.
In truth, Lucene/Solr doesn't have a good out of the box solution for this
use case.
-- Jack Krupansky
-Original Message-
From: teko
Sent: Thursday, May 8, 2014 9:03 AM
To: java-user@lucene.apache.org
Subject: How to locate a Phrase inside text (like a Browser text searcher)
Hi
. Average users just get annoyed when the search engine is being
so picky.
-- Jack Krupansky
-Original Message-
From: Jose Carlos Canova
Sent: Wednesday, April 16, 2014 12:53 PM
To: java-user@lucene.apache.org
Subject: Re: is there a historical reason why default conjunction operator
it.
-- Jack Krupansky
-Original Message-
From: Adrien Grand
Sent: Friday, April 4, 2014 4:50 PM
To: java-user@lucene.apache.org
Subject: Re: Stored fields and OS file caching
Hi Vitaly,
Doc values are indeed well-suited for grouping and sorting. However
stored fields remain better at returning
/houses?/
-- Jack Krupansky
-Original Message-
From: Uwe Schindler
Sent: Tuesday, March 25, 2014 11:34 AM
To: java-user@lucene.apache.org
Subject: RE: Lucene Wildcard for zero or one character
The default WildcardQuery only supports:
'*' (star) is the wildcard in WildcardQuery
now, but otherwise, that's the limit for now.
-- Jack Krupansky
-Original Message-
From: Artem Gayardo-Matrosov
Sent: Friday, March 21, 2014 12:41 PM
To: java-user@lucene.apache.org
Subject: Re: maxDoc/numDocs int fields
Hi Oli,
Thanks for your reply,
I thought about this, but it feels
Of course - you need to use the same analyzer for both indexing and query.
So, just reindex your data with this new analyzer.
-- Jack Krupansky
-Original Message-
From: Natalia Connolly
Sent: Tuesday, March 18, 2014 10:37 AM
To: java-user@lucene.apache.org
Subject: Re: How to search
of
that info is hanging around as part of the query matching process.
Still, that is a reasonable feature to want and it has been requested
before. Worth a Jira.
-- Jack Krupansky
-Original Message-
From: Christian Reuschling
Sent: Thursday, March 6, 2014 1:34 PM
To: java-user
picking a PU Unicode
character?
-- Jack Krupansky
-Original Message-
From: G.Long
Sent: Monday, March 3, 2014 12:09 PM
To: java-user@lucene.apache.org
Subject: encoding problem when retrieving document field value
Hi :)
My index (Lucene 3.5) contains a field called title. Its value
Please elaborate on what you expect will be in this payload. Is it
information derived from the indexing process itself or is it external
information to be added to the indexed terms?
-- Jack Krupansky
-Original Message-
From: Mrugendra
Sent: Sunday, March 2, 2014 5:15 AM
To: java
Be careful with very short terms and fuzzy query. The rounding when
converting from a fraction to an edit distance can make the match exact
rather than fuzzy.
What terms does your index have? XV, Xv, xV, xv? XV~0.7 may only match XV.
-- Jack Krupansky
-Original Message-
From: G.Long
Sounds like a custom filter.
Or maybe an option for stop filter or a specialization of stop filter.
Or maybe it could be even more generalized.
What are some practical example token sequences?
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Wednesday, February 26
If this is primarily an issue with the document input, as opposed to
queries, you might be better off simply preprocessing the text before it is
given to Lucene to be indexed.
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Wednesday, February 26, 2014 1:37 PM
in the native
file system for greater performance. Solrandra stored the Lucene indexes in
Cassandra, but the performance penalty was too high.
-- Jack Krupansky
-Original Message-
From: Jason Wee
Sent: Friday, February 14, 2014 3:13 AM
To: java-user@lucene.apache.org
Subject: codec mismatch
be as simple as whether the data file should have DOS or UNIX
or Mac line endings (CRLF vs. NL vs. CR.) Be sure to use an editor that
satisfies the requirements of ICU.
To be clear, Lucene itself does not have a published API for modifying the
mappings of ICU.
-- Jack Krupansky
-Original Message
Take a look at the complex phrase query parser.
See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html
See also:
https://issues.apache.org/jira/browse/LUCENE-1486
-- Jack Krupansky
-Original Message-
From
In theory, the query with holes (position increments) for stop words should
work... unless you originally indexed the data without the stop word filter.
Any time you change the filters, you typically need to reindex the data.
-- Jack Krupansky
-Original Message-
From: Jean-Claude
The analyzer is generating holes for the stop words - the position of the
subsequent term is incremented an extra time for each stop word so that
their positions are maintained.
-- Jack Krupansky
-Original Message-
From: Jean-Claude Dauphin
Sent: Monday, December 09, 2013 4:15 PM
at the start
or end.
-- Jack Krupansky
-Original Message-
From: Stephane Nicoll
Sent: Thursday, November 21, 2013 9:42 AM
To: java-user@lucene.apache.org
Subject: tokenizer to strip a set of characters
Hi,
I am using lucene 3.6 and I am looking to a tokenized that would remove
certain
As I indicated in my previous message, we need actual queries and the actual
indexed data where matches are failing.
Note that *NALYZE will not match ANALYZER. So, it might be that you have
composed queries in which some of the terms match properly and only some
fail.
-- Jack Krupansky
, and what does the indexed data look
like?
The simple answer to your question is that wildcards don't behave any
differently between the two analyzers - simply because they are not used at
all for the wildcard terms.
-- Jack Krupansky
-Original Message-
From: raghavendra.k@barclays.com
protWords)
See:
http://lucene.apache.org/core/4_5_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html
-- Jack Krupansky
-Original Message-
From: Stéphane Nicoll
Sent: Tuesday, November 05, 2013 2:40 AM
To: java-user@lucene.apache.org
Subject: Twitter analyser
the use of curly braces for exclusive end points.
-- Jack Krupansky
-Original Message-
From: Umashanker, Srividhya
Sent: Tuesday, October 29, 2013 3:57 AM
To: java-user@lucene.apache.org
Subject: DateQuery with comparison operators
HI -
I are using Lucene 4.5 and want to support date
using at query time?
-- Jack Krupansky
-Original Message-
From: saisantoshi
Sent: Sunday, October 20, 2013 12:47 PM
To: java-user@lucene.apache.org
Subject: Handling special characters in Lucene 4.0
I have created strings like the below
searchtext
+sampletext
and when I try to search
characters with a
backslash, and then leave the asterisk unescaped to perform a wildcard
query.
-- Jack Krupansky
-Original Message-
From: saisantoshi
Sent: Sunday, October 20, 2013 6:02 PM
To: java-user@lucene.apache.org
Subject: Re: Handling special characters in Lucene 4.0
characters that you don't want to keep (e.g., period, comma,
semicolon, parentheses, etc.)
The query parser itself knows nothing about what your chosen analyzer does.
But the query parser does specially interpret the special characters that
the escape method mentions.
-- Jack Krupansky
-Original
.
-- Jack Krupansky
-Original Message-
From: saisantoshi
Sent: Sunday, October 20, 2013 7:43 PM
To: java-user@lucene.apache.org
Subject: Re: Handling special characters in Lucene 4.0
what about other characters like ','( quote) characters. We have a
requirement that a text can start
1 - 100 of 284 matches
Mail list logo