Hi Otmar,
Shouldn't Occur.SHOULD alone do what you ask? Documents that match all
terms in the query would be scored higher than documents that match fewer
than all terms.
-sujit
On Fri, Mar 25, 2016 at 2:20 AM, Otmar Caduff wrote:
> Hi all
> In Lucene, I know of the
Hi Ali,
I agree with the others that there is no good way to do what you are
looking for if you want to assign lucene-like scores to your external
results, but if you have some objective measure of goodness that doesn't
depend on your lucene scores, you can apply it to both result sets and
merge
I did something like this sometime back. The objective was to find patterns
surrounding some keywords of interest so I could find keywords similar to
the ones I was looking for, sort of like a poor man's word2vec. It uses
SpanQuery as Jigar said, and you can find the code here (I believe it was
Hi John,
Take a look at the PerFieldAnalyzerWrapper. As the name suggests, it allows
you to create different analyzers per field.
-sujit
On Fri, Sep 19, 2014 at 6:50 AM, John Cecere john.cec...@oracle.com wrote:
I've considered this, but there are two problems with it. First of all, it
Hi Shouvik, not sure if you have already considered this, but you could put
the database primary key for the record into the index - ie, reverse your
insert to do DB first, get the record_id and then add this to the Lucene
index as record_id field. During retrieval you can minimize the network
Hi Arjen,
This is kind of a spin on your last observation that your list of stop
words don't change frequently. If you have a custom filter that attempts to
stem the incoming token and if it stems to the same as a stopword, only
then sets the keyword attribute on the original token.
That way
Hi Arjen,
You could also mark a token as keyword so the stemmer passes it through
unchanged. For example, per the Javadocs for PorterStemFilter:
http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html
Note: This filter is aware of the
Hi Rafaela,
I built something along these lines as a proof of concept. All data in the
index was unstored and only fields which were searchable (tokenized and
indexed) were kept in the index. The full record was encrypted and stored in a
MongoDB database. A custom Solr component did the search
Hi Michael,
Instead of putting the annotation in Payloads, why not put them in as
synonyms, ie at the same spot as the original string (see SynonymFilter in
the LIA book). So your string would look like (to the index):
W. A. Mozart was born in Salzburg
artist city
so you can
, schrieb SUJIT PAL:
Hi Carsten,
Why not use your idea of the BooleanQuery but wrap it in a Filter instead?
Since you are not doing any scoring (only filtering), the max boolean clauses
limit should not apply to a filter.
Hi Sujit,
thanks for your suggestion! I wasn't aware that the max
Hi Uwe,
I see, makes sense, thanks very much for the info. Sorry about giving you wrong
info Carsten.
-sujit
On Apr 15, 2013, at 1:06 PM, Uwe Schindler wrote:
Hi,
Original Message-
From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL
Sent: Monday, April 15
Hi Carsten,
Why not use your idea of the BooleanQuery but wrap it in a Filter instead?
Since you are not doing any scoring (only filtering), the max boolean clauses
limit should not apply to a filter.
-sujit
On Apr 12, 2013, at 7:34 AM, Carsten Schnober wrote:
Dear list,
I would like to
Hi Jerome,
How about this one?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
Regards,
Sujit
On Mar 22, 2013, at 9:22 AM, Jerome Blouin wrote:
Hello,
I'm looking for an analyzer that allows performing accent insensitive search
in latin
Hi Glen,
I don't believe you can attach a single payload to multiple tokens. What I did
for a similar requirement was to combine the tokens into a single _ delimited
single token and attached the payload to it. For example:
The Big Bad Wolf huffed and puffed and blew the house of the Three
.
Is this the best way? I can't see a way to compute the Sim() metric at
any other time, because in scorePayload(), we don't have access to the
full payload, nor to the query.
Thanks again,
Steve
On Mon, Nov 28, 2011 at 1:51 PM, Sujit Pal sujit@comcast.net wrote:
Hi Stephen,
We
Hi Stephen,
We are doing something similar, and we store as a multifield with each
document as (d,z) pairs where we store the z's (scores) as payloads for
each d (topic). We have had to build a custom similarity which
implements the scorePayload function. So to find docs for a given d
(topic), we
Hi Grant,
Not sure if this qualifies as a bet you didn't know, but one could use
Lucene term vectors to construct document vectors for similarity,
clustering and classification tasks. I found this out recently (although
I am probably not the first one), and I think this could be quite
useful.
Hi Paul,
Since you have modified the StandardAnalyzer (I presume you mean
StandardFilter), why not do a check on the term.text() and if its all
punctuation, skip the analysis for that term? Something like this in
your StandardFilter:
public final boolean incrementToken() throws IOException {
Hi Mead,
You may want to check out the permuterm index idea.
http://www-nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html
Basically you write a custom filter that takes a term and generates all
word permutations off it. On the query side, you convert your query so
its always a
Hi,
Question about Payload Query and Document Boosts. We are using Lucene
3.2 and Payload queries, with our own PayloadSimilarity class which
overrides the scorePayload method like so:
{code}
@Override
public float scorePayload(int docId, String fieldName,
int start, int end, byte[]
Depending on what you wanted to do with the Javabean (I assume you want
to make some or all its fields searchable since you are writing to
Lucene), you could use reflection to break it up into field name value
pairs and write them out to the IndexWriter using something like this:
Document d = new
+0200, Simon Willnauer wrote:
On Wed, Jun 22, 2011 at 8:53 PM, Sujit Pal s...@healthline.com wrote:
Hello,
I am currently in need of a LowerCaseFilter and StopFilter that will
recognize KeywordAttribute, similar to the way PorterStemFilter
currently does (on trunk). Specifically, in case
Hello,
I am currently in need of a LowerCaseFilter and StopFilter that will
recognize KeywordAttribute, similar to the way PorterStemFilter
currently does (on trunk). Specifically, in case the term is a
KeywordAttribute.isKeyword(), it should not lowercase and remove
respectively.
This can be
Hi Leroy,
Would it make sense to index as Lucene documents the unit to be
searched? So if you want paragraphs to be shown in search results, you
could parse the source document during indexing into paragraphs and
index them as separate Lucene documents.
-sujit
On Wed, 2011-05-25 at 15:46 -0400,
Thank you Koji. I opened LUCENE-3141 for this.
https://issues.apache.org/jira/browse/LUCENE-3141
-sujit
On Tue, 2011-05-24 at 22:33 +0900, Koji Sekiguchi wrote:
(11/05/24 3:28), Sujit Pal wrote:
Hello,
My version: Lucene 3.1.0
I've had to customize the snippet for highlighting
Hello,
My version: Lucene 3.1.0
I've had to customize the snippet for highlighting based on our
application requirements. Specifically, instead of the snippet being a
set of relevant fragments in the text, I need it to be the first
sentence where a match occurs, with a fixed size from the
Hi Deepak,
Would something like this work in your case?
Arcos Bioscience^2.0 Arcos Bioscience
ie, a BooleanQuery with the full phrase boosted OR'd with a query on
each word?
-sujit
On Tue, 2011-04-26 at 14:46 -0400, Deepak Konidena wrote:
Hi,
Currently when I type in Arcos Bioscience in
I don't know if there is already an analyzer available for this, but you
could use GATE or UIMA for Named Entity Extraction against names and
expand the query to include the extra names that are used synonymously.
You could do this outside Lucene or inline using a custom Lucene
tokenizer that
One way to do this currently is to build a per field similarity wrapper
(that triggers off the field name). I believe there is some work going
on with Lucene Similarity that would make it pluggable for this sort of
stuff, but in the meantime, this is what I did:
public class
the other methods that are calculating
the similarity scores...
those methods are called and they have the implementation you have in
DefaultSimilarityClass.. right ?
On 1 March 2011 21:12, Sujit Pal sujit@comcast.net wrote:
One way to do this currently is to build a per field
30 matches
Mail list logo