Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
A weighted OR, of course. On 6 May 2024, at 12:43, Paul Libbrecht wrote: Do I mistake or “ “ makes an OR if there’s no other? On 6 May 2024, at 12:41, Saha, Rajib wrote: Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' an

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
Do I mistake or “ “ makes an OR if there’s no other? On 6 May 2024, at 12:41, Saha, Rajib wrote: Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' and 'NOT' in query string stands for same reason theoretically. [cid:image001.png@01DA9FCF.

Re: Exact KNN

2024-01-30 Thread Paul Libbrecht
Isn’t that what Semantic-Vectors is doing? E.g. https://github.com/Ontotext-AD/semanticvectors Paul On 30 Jan 2024, at 20:50, William Zhou wrote: > Is there a way of directly executing an exact nearest neighbor search? It > seems like the API provides some general functionality, and

Re: Search results/criteria validation

2021-03-17 Thread Paul Libbrecht
queries and in the score’s fineness, it was indicating thing sub-query was used. This was used to attempt highlighting matching of the parts of a formula. Paul On 17 Mar 2021, at 20:24, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: Maybe using explain? https://chrisperks.co/2017/06/06/explaining

Re: Document metadata in ranking?

2021-02-25 Thread Paul Libbrecht
that the influence of a positive category takes precedence over the different orderings (TF-IDF per default). At the end you can write custom-score-engine but I can only imagine ruining the performance when doing so... paul On 26 Feb 2021, at 3:40, Philip Warner wrote: I am sorry if this has

[ANNOUNCE] Apache Lucene 8.8.0 released

2021-02-01 Thread Noble Paul
you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access. - - Noble Paul -BEGIN PGP SIGNATURE- Version: FlowCrypt Email Encryption 8.0.0 Comment: Seamlessly

Re: Using Lucene for technical documentation

2020-11-23 Thread Paul Libbrecht
parametrisation). But I’d be happy to read of others’ works on this! In the Math working group of W3C at the time, work stopped when considering the complexity of compound documents: the alternatives as above (mix words or recognise math pieces?) certainly made things difficult. paul PS: [paper for

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-02 Thread Noble Paul
e_logo_green_300.png > >> > >> Please vote for one of the above choices. This vote will close about one > >> week from today, Mon, Sept 7, 2020 at 11:59PM. > >> > >> Thanks! > >> > >> [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221 > >> [first-vote] > >> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e > >> [second-vote] > >> http://mail-archives.apache.org/mod_mbox/lucene-dev/202009.mbox/%3cCA+DiXd7eBrQu5+aJQ3jKaUtUTJUqaG2U6o+kUZfNe-m=smn...@mail.gmail.com%3e > >> [rank-choice-voting] https://en.wikipedia.org/wiki/Instant-runoff_voting > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- - Noble Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [VOTE] Lucene logo contest, here we go again

2020-09-01 Thread Noble Paul
tps://issues.apache.org/jira/browse/LUCENE-9221 > [first-vote] > http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e > [rank-choice-voting] https://en.wikipedia.org/wiki/Instant-runoff_voting > > -- - Noble Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

CVE-2018-11802: Apache Solr authorization bug vulnerability disclosure

2019-04-24 Thread Noble Paul
CVE-2018-11802: Apache Solr authorization bug disclosure Severity: Important Vendor: The Apache Software Foundation Versions Affected: Apache Solr 7.6 or less Description: jira ticket : https://issues.apache.org/jira/browse/SOLR-12514 In apache Solr the cluster can be partitioned into multiple co

Re: Environmental Protection Agency: Stop Deforesting in Sri Lanka

2019-03-21 Thread Noble Paul
re and sign the petition here: > > > > http://chng.it/vY78rzGf8G > > > > Thanks! > > Janaka > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional

Re: Lucene commit

2016-08-22 Thread Paul Masurel
Awesome! Thank you very much! On Mon, Aug 22, 2016 at 3:45 PM, Christoph Kaser wrote: > Hello Paul, > > this is already possible using > DirectoryReader.openIfChanged(indexReader,indexWriter). > This will give you an indexreader that already "sees" all changes ma

Lucene commit

2016-08-21 Thread Paul Masurel
hable after another one even though it was added before. The benefit would be to reduce the average latency for a document to become searchable, without hurting throughput by calling commit() too frequently. Regards, Paul

Re: 5.3.1 artifacts in maven central

2015-09-30 Thread Noble Paul
have this info all on > ReleaseTodo. > > Maybe instead the item should say: "go here and follow the steps: > PublishMavenArtifacts.” That way it’s clear it’s not optional. > > Steve > > > On Sep 30, 2015, at 7:39 AM, Noble Paul wrote: > > > > I have been edi

Re: 5.3.1 artifacts in maven central

2015-09-30 Thread Noble Paul
Everything looks good now, thank you. >> >> --Terry >> >> >> On Tue, Sep 29, 2015 at 1:26 AM, Noble Paul wrote: >> >>> Please check now >>> >>> On Mon, Sep 28, 2015 at 8:42 PM, Noble Paul wrote: >>> > Looks like I missed it

Re: 5.3.1 artifacts in maven central

2015-09-28 Thread Noble Paul
Please check now On Mon, Sep 28, 2015 at 8:42 PM, Noble Paul wrote: > Looks like I missed it , I shall upload it soon > > > On Mon, Sep 28, 2015 at 7:59 PM, Terry Smith wrote: >> Guys, >> >> I'm unable to find the 5.3.1 artifacts in maven cen

Re: 5.3.1 artifacts in maven central

2015-09-28 Thread Noble Paul
> 5.3.0. > > http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.lucene%22%20AND%20a%3A%22lucene-core%22 > > Am I doing something wrong or are the artifacts not yet published? > > --Terry -- --

[ANNOUNCE] Apache Lucene 5.3.1 released

2015-09-24 Thread Noble Paul
the case, please try another mirror. This also goes for Maven access. Noble Paul on behalf of Lucene PMC - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h

Re: Loading Solr Analyzer from RuntimeLib Blob

2015-09-10 Thread Noble Paul
om code. Is > there a way to specify runtimeLib="true" on the schema or perhaps an > alternate method to make sure that jar is loaded on the classpath before > the schema is loaded? > > Thanks for the help, > > -Steve -

[ANNOUNCE] Apache Lucene 5.3.0 released

2015-08-24 Thread Noble Paul
://lucene.apache.org/core/discussion.html) -- ----- Noble Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-06 Thread Paul Taylor
On 06/03/2015 15:07, Michael McCandless wrote: On Thu, Mar 5, 2015 at 4:27 PM, Paul Taylor wrote: On 05/03/2015 19:01, Michael McCandless wrote: On Thu, Mar 5, 2015 at 12:12 PM, Paul Taylor wrote: On 05/03/2015 15:53, Paul Taylor wrote: On 05/03/2015 14:43, Michael McCandless wrote: It

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-06 Thread Paul Taylor
On 06/03/2015 17:34, Michael McCandless wrote: On Fri, Mar 6, 2015 at 11:03 AM, Paul Taylor wrote: Right, did you see my last post the query parser does trap the exception if you enable assertions. I thought this was what you were saying was fixed in a later version, but assume you actually

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-06 Thread Paul Taylor
On 06/03/2015 15:07, Michael McCandless wrote: On Thu, Mar 5, 2015 at 4:27 PM, Paul Taylor wrote: On 05/03/2015 19:01, Michael McCandless wrote: On Thu, Mar 5, 2015 at 12:12 PM, Paul Taylor wrote: On 05/03/2015 15:53, Paul Taylor wrote: On 05/03/2015 14:43, Michael McCandless wrote: It

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-06 Thread Paul Taylor
On 05/03/2015 21:27, Paul Taylor wrote: FWIW if I do enable assertions then parse does throw an assertion before actually trying to do the search. java.lang.AssertionError at org.apache.lucene.search.MultiTermQuery.(MultiTermQuery.java:252) at org.apache.lucene.search.AutomatonQuery

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-05 Thread Paul Taylor
On 05/03/2015 19:01, Michael McCandless wrote: On Thu, Mar 5, 2015 at 12:12 PM, Paul Taylor wrote: On 05/03/2015 15:53, Paul Taylor wrote: On 05/03/2015 14:43, Michael McCandless wrote: It looks like field was null? Back in 4.1.0 we just assert field != null, but in newer releases it

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-05 Thread Paul Taylor
On 05/03/2015 15:53, Paul Taylor wrote: On 05/03/2015 14:43, Michael McCandless wrote: It looks like field was null? Back in 4.1.0 we just assert field != null, but in newer releases it's a real check. Mike McCandless Hi, thankyou Il try and get the query logged for when it next ha

Re: is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-05 Thread Paul Taylor
On 05/03/2015 14:43, Michael McCandless wrote: It looks like field was null? Back in 4.1.0 we just assert field != null, but in newer releases it's a real check. Mike McCandless Hi, thankyou Il try and get the query logged for when it next happens

is this lucene 4.1.0 bug in PerFieldPostingsFormat

2015-03-05 Thread Paul Taylor
t.doSearch(SearchServerServlet.java:616) at org.musicbrainz.search.servlet.SearchServerServlet.doGet(SearchServerServlet.java:551) at javax.servlet.http.HttpServlet.service(HttpServlet.java:618) thanks Paul |

Re: How best to compare tow sentences

2014-12-05 Thread Paul Taylor
I no longer need to keep the song titles, and if I did it would eventually end up consuming too much memory so I really do need to abandon lucene for this task and just find the best way to compare two strings with each other. It already works to a degree using CosineS

Re: How best to compare tow sentences

2014-12-03 Thread Paul Taylor
On 03/12/2014 15:14, Barry Coughlan wrote: Hi Paul, I don't have much expertise in this area so hopefully others will answer, but maybe this is better than nothing. I don't know many out-of-the-box solutions for this problem, but I'm sure they exist. Mahout and Carrot2

How best to compare tow sentences

2014-12-02 Thread Paul Taylor
lucene experts on how to proceed! Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
ore a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi wrote: > Hi Paul, > > I cannot compare it to SemanticVectors as I don't know SemanticVectors. > But word vectors that are produced by word2vec have interesting properties. > > Here is the

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi wrote: > Hello, > > It's my pleasure to share that I have an interesting tool "word2vec for > Lucene" > available at https://github.com/kojisekig/wor

Re: Document Term matrix

2014-11-11 Thread Paul Libbrecht
The project semanticvectors might be doing what you are looking for. paul On 11 nov. 2014, at 22:37, parnab kumar wrote: > hi, > > While indexing the documents , store the Term Vectors for the content > field. Now for each document you will have an array of terms and their >

Re: how to ignore full stop for specific word

2014-11-06 Thread Paul Libbrecht
My trick would be to replace .net with dotNet (or use some funky Unicode-letter to replace the dot). If you use consistently the same analyzer-chain, then it will match cleanly. paul On 6 nov. 2014, at 12:42, Rajendra Rao wrote: > I have some word which contain full stop (.) its

Format version is not supported:What is it trying to tell me here

2014-10-03 Thread Paul Taylor
ndexes are built with a later version of lucene than the reading code or something else ? paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Paul Taylor
On 01/10/2014 18:42, Steve Rowe wrote: Paul, Boilerplate upgrade recommendation: consider using the most recent Lucene release (4.10.1) - it’s the most stable, performant, and featureful release available, and many bugs have been fixed since the 4.1 release. Yeah sure, I did try this and hit

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Paul Taylor
ead: relatively easy) to create an analyzer (or a modification of the standard one's lexer) so that punctuation is returned as a separate token type? Dawid On Wed, Oct 1, 2014 at 7:01 AM, Steve Rowe wrote: Hi Paul, StandardTokenizer implements the Word Boundaries rules in the Unicode Text S

Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-09-30 Thread Paul Taylor
Does StandardTokenizer remove punctuation (in Lucene 4.1) Im just trying to move back to StandardTokenizer from my own old custom implemenation because the newer version seems to have much better support for Asian languages However this code except fails on incrementToken() implying that the

Re: Case sensitivity

2014-09-19 Thread Paul Libbrecht
two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: > Is there a way to set up Lucene so that both case-sensitive and > case-insensitive searches can be done without having to generate two indexes? > > -- > John Cecere > Principal Engineer - Oracle Corporat

Can Lucene based application be made to work with Scaled Elastic Beanstalk environemnt on Amazon Web Services

2014-06-27 Thread Paul Taylor
have any experience of this,please ? Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

eDismax for Lucene 4

2014-06-06 Thread Paul Taylor
ticed that Solr now has the eDismax allow both types to be supported in one syntax, this would be very useful but I dont want to move to solr, is eDismax now available in Lucene 4 ? Paul - To unsubscribe, e-mail: java-user-uns

Indexing integer ranges for point search

2014-06-04 Thread Paul Tyson
use case for Lucene, and if so can someone sketch out a solution so I can connect the dots? Or is there example code, or documentation for this sort of thing? I've studied dynamic range facets, but those don't seem right

Re: Lucene for Log file indexing and search

2013-09-19 Thread Paul Libbrecht
Ashok, I would look at solr which has an amount more field types to support more queries. E.g. there you have a nice query syntax for times-spans and fantastic caching. I think there's very few initiatives for indexing logs and I would be interested to see the results of your entreprise.

Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Paul Taylor
ed by the end user application. Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Why does index boosting a field to 2.0f on a document have such a dramatic effect

2013-04-04 Thread Paul Taylor
On 04/04/2013 23:26, Chris Hostetter wrote: : At index time I boost the alias field of a small set of documents, setting the : boost to 2.0f, which I thought meant equivalent to doubling the score this doc : would get over another doc, everything else being equal. 1) you haven't shown us enough

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 04/04/2013 10:59, Paul Taylor wrote: On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQ

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another query type (generally a query

Re: How to use concurrency efficiently

2013-04-03 Thread Paul Bell
All, Sorry, but I inadvertenly put my post re MultiFieldQueryParser in the wrong thread (wrong subject via cut and paste). Igor, thank you for the reply. I will look into what you suggest. -Paul On Wed, Apr 3, 2013 at 6:58 AM, Igor Shalyminov wrote: > I personally use SpanNearQuey (s

Re: How to use concurrency efficiently

2013-04-02 Thread Paul
something about the abstract class MultiTermQuery, but I don't really understand whether or not it would help with this problem. Thank you. -Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additio

Re: Indexing a long list

2013-03-31 Thread Paul Bell
quite crystallized the generic query syntax and don't know how best to map it to both a Lucene query and to an appropriate Lucene index structure. Please let me know if you've any suggestions! Thanks again. -Paul On Sun, Mar 31, 2013 at 8:33 AM, Jack Krupansky wrote: > The first q

Re: Indexing a long list

2013-03-31 Thread Paul Bell
uld it make more sense to index on 'inEdges.v123'? The problem with this, I think, is that I can no longer ask about multiple edges with a single query, right? Thanks, Jack. -Paul On Sun, Mar 31, 2013 at 9:00 AM, Jack Krupansky wrote: > Multivalued fields are the other approach to keyw

Indexing a long list

2013-03-31 Thread Paul Bell
are really two parts to this question: 1. Lucene "best practices" for long list 2. Where to store such a list Thanks for your help. -Paul

Types of Queries

2013-03-29 Thread Paul Bell
l" type queries? For example, suppose I want to find all documents whose "name" field is not equal to a given value... Thank you. -Paul

Re: Beginner's questions

2013-03-29 Thread Paul Bell
t using the provided analyzer for tokenization. But be careful! In order for searches to work correctly, you need the analyzer used at search time to "match" the tokens produced by the analyzers at indexing time." Is this warning from the author of a piece with what you're warning me

Storing Documents in Lucene

2013-03-28 Thread Paul
out the indexing and less about storing documents. Thank you. -Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Beginner's questions

2013-03-27 Thread Paul Bell
okenized> But if I'm interested in just obtaining the "content" field ("This is a test; for the next..."), what should I do? -Paul public class TestLucene { public static int id = 0; public static void main(String[] args) { RAMDirectory idx = new

Re: Beginner's questions

2013-03-27 Thread Paul Bell
fy the vertex (or vertices) that matched the query. Can you shed any light on this issue? Thanks again. -Paul On Tue, Mar 26, 2013 at 11:46 PM, Sashidhar Guntury < sashidhar.mo...@gmail.com> wrote: > hi, > > I think this stack overflow question might be of some help to you- >

Beginner's questions

2013-03-26 Thread Paul
t as coherent as I can make them. Thank you. -Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Using MappingCharFIlter in analyzer breaking wildcard matches

2013-03-25 Thread Paul Taylor
I created this simple StripSpacesAndSeparatorsAnalyzer so that it ignores certain characters such as hypens in the field so that I can search for catno:WRATHCD25 catno:WRATHCD-25 and get the same results, and that works (the original value of the field added to the index was WRATHCD-25) How

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-27 Thread Paul Taylor
On 26/02/2013 18:01, Paul Taylor wrote: On 26/02/2013 17:22, Uwe Schindler wrote: Hi, You cannot override rewrite() because you could easily break the logic behind TopTermsRewrite. If you want another behavior, subclass another base class and wrap the TopTermsRewrite instead of subclassing it

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-26 Thread Paul Taylor
l try out your recommendations Paul -----Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Tuesday, February 26, 2013 5:34 PM To: java-user@lucene.apache.org Subject: Uable to extends TopTermsRewrite in Lucene 4.1 In Lucene 3.6 I had code that replicated a Dismax Qu

Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-26 Thread Paul Taylor
In Lucene 3.6 I had code that replicated a Dismax Query, and the search used fuzzy queries in some cases to match values. But I was finding the score attributed to matches on fuzzy searches was completely different to the score attributed to matches on exact searches so the total score returned

In Lucene 4.1 FuzzyQuery constructor now takes parameter maxEdits instead of parameter minSimilarity

2013-02-26 Thread Paul Taylor
FuzzyQuery constructor now takes parameter maxEdits instead of parameter minSimilarity. But I'm unclear how to map from the old value to the new value or whether they are unrelated and can no longer be compared. I was previously using a minsimilarity of 0.5f thanks

Re: NullPointerException thrown on tokenizer in 4.1, worked okay in 3.6

2013-02-26 Thread Paul Taylor
On 26/02/2013 12:29, Paul Taylor wrote: This code worked in 3.6 but now throws nullpointer exception in 41, Im not expecting there to be a token created, but surely it shouldn't throw NullPointerException Tokenizer tokenizer = new org.apache.lucene.analysis.standard.StandardToke

Re: ArrayIndexOutOfBoundsException trying to use tokenizer in Lucene 4.1

2013-02-26 Thread Paul Taylor
On 26/02/2013 13:29, Alan Woodward wrote: Hi Paul, You need to call tokenizer.reset() before you call incrementToken() Alan Woodward www.flax.co.uk <http://www.flax.co.uk> Hi, thanks that fixes it

NullPointerException thrown on tokenizer in 4.1, worked okay in 3.6

2013-02-26 Thread Paul Taylor
This code worked in 3.6 but now throws nullpointer exception in 41, Im not expecting there to be a token created, but surely it shouldn't throw NullPointerException Tokenizer tokenizer = new org.apache.lucene.analysis.standard.StandardTokenizer(Version.LUCENE_41, new StringReader("!!!")); to

ArrayIndexOutOfBoundsException trying to use tokenizer in Lucene 4.1

2013-02-26 Thread Paul Taylor
This works in 3.6, but in 4.1 fails whats wrong with the code public void testTokenization() throws IOException { StringBuffer sb = new StringBuffer(); for(char i=0;i<100;i++) { Character c = new Character(i); if(!Character.isWhitespace(c)) {

Re: What replaces the computeNorm method in DefaultSimilarity in 4.1 now that the method is final

2013-02-26 Thread Paul Taylor
On 19/02/2013 11:42, Paul Taylor wrote: What replaces the computeNorm method in DefaultSimilarity in 4.1 Ive always subclassed DefaultSimilarity to resolve an issue whereby when document has multiple values in a field (because has one-many relationship) its score worse then a document which

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-26 Thread Paul Taylor
On 25/02/2013 11:24, Thomas Matthijs wrote: On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <mailto:li...@selckin.be>> wrote: On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs mailto:li...@selckin.be>> wrote: On Mon, Feb 25, 2013 at 11:24

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-25 Thread Paul Taylor
On 20/02/2013 11:28, Paul Taylor wrote: Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests that use NormalizeCharMap for replacing characters in the anyalzers are not working. Below Ive created a self-contained test case, this is the output when I run it --term

Do you still have to override QueryParser to allow numeric range searches in Lucene 4.1

2013-02-25 Thread Paul Taylor
In my 3.6 code I was adding numeric field to my index as follows: public void addNumericField(IndexField field, Integer value) { addField(field, NumericUtils.intToPrefixCoded(value)); } but I've chnaged it to (work in progress) public void addNumericField(IndexField field, Integer

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-25 Thread Paul Taylor
On 20/02/2013 11:28, Paul Taylor wrote: Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests that use NormalizeCharMap for replacing characters in the anyalzers are not working. bump, anybody I thought a self contained testcase would be enough to pique somebodys interest

Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-20 Thread Paul Taylor
ch using same analyzer. Maybe the problem is with the query parser, but its certainly related to 4.1 because worked previously. thanks Paul package org.musicbrainz.search.analysis; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.

Re: Field seems to have become binary field on update to Lucene 4.1

2013-02-19 Thread Paul Taylor
On 19/02/2013 20:56, Paul Taylor wrote: Strange test failure after converting code from Lucene 3.6 to Lucene 4.1 public void testIndexPuid() throws Exception { addReleaseOne(); RAMDirectory ramDir = new RAMDirectory(); createIndex(ramDir); IndexReader ir

Field seems to have become binary field on update to Lucene 4.1

2013-02-19 Thread Paul Taylor
Strange test failure after converting code from Lucene 3.6 to Lucene 4.1 public void testIndexPuid() throws Exception { addReleaseOne(); RAMDirectory ramDir = new RAMDirectory(); createIndex(ramDir); IndexReader ir = IndexReader.open(ramDir); Fields fiel

What replaces the computeNorm method in DefaultSimilarity in 4.1 now that the method is final

2013-02-19 Thread Paul Taylor
What replaces the computeNorm method in DefaultSimilarity in 4.1 Ive always subclassed DefaultSimilarity to resolve an issue whereby when document has multiple values in a field (because has one-many relationship) its score worse then a document which just has single value but the computeNorm

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
ValueSource referring to the DocValues field, 5 lines of code -> and it will return consistent results! Uwe Thanks bit clearer now, but 5 line example would be nice And if this is the way to do things isnt the migration doc incorrect Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bre

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
different to what the migration guide says so I don't see that as an improvement. Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, Februar

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
tance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: What is equivalent to Document.setBo

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Spatial indexing: IndexOutOfBounds in QuadPrefixTree

2013-02-14 Thread Paul Alexandrow
, that zero-length is not checked in getNode()). Unfortunately I don't understand checkBattenberg() well enough to delve any further - so I'm stuck. Any help is very much appreciated, thanks, Paul Find below a stacktrace of the Exception: SEVERE: org.apache.solr.common.SolrExcept

Migration to Lucene 4.1

2013-01-29 Thread Paul Sitowitz
redDocs( conf.getInt( "indexer.minMergeDocs", 1000 ) ); .. this.luceneWriter = new IndexWriter( directory, iwconf ); Thanks, Paul -- -- Paul Sitowitz sitow...@gmail.com 703-626-3593

Re: Is there a problem with my Analyzer subclass ?

2013-01-22 Thread Paul Taylor
seableThreadLocal) of the superclass, shoudnt there just be the one per instance of the analyzer ? (Using Lucene 3.6.0) Paul Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional comm

Is there a problem with my Analyzer subclass ?

2013-01-22 Thread Paul Taylor
I've been investigating potential memory leaks in my Lucene based application thats runs on jetty. I did a memory dump with jmap and one thing I've noticed is that for any subclass of analyzer that I have created that there are alot instances of the $SavedStream inner class. So for example I c

RE: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Paul Hill
is only a thought, because I haven't upgraded to 4.0 yet. -Paul > -Original Message- > From: Igal Sapir [mailto:i...@getrailo.org] > Sent: Wednesday, January 09, 2013 11:42 AM > To: java-user@lucene.apache.org > Subject: Re: Upgrade Lucene to latest version (4.0) from 2.4.0

RE: Is StandardAnalyzer good enough for multi languages...

2013-01-09 Thread Paul Hill
added. That would be one of the main points of the whole ICU infrastructure. -Paul

RE: Is StandardAnalyzer good enough for multi languages...

2013-01-08 Thread Paul Hill
The ICU project ( http://site.icu-project.org/ ) has Analyzers for Lucene and it has been ported to ElasticSearch. Maybe those integrate better. As to not doing some tokenization, I would think an extra tokenizer in you chain would be just the thing. -Paul > -Original Message- >

Re: international stop set?

2012-10-27 Thread Paul Libbrecht
e. Using this really powers you up quite much into expanding into a useful set of available languages. Also, you could use the whitespace tokenizer as a simple analyzer for an "exact" field. paul

Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches ?

2012-10-25 Thread Paul Taylor
indexes of artists, usually the user just searches artists or releases. But sometimes they want to search all and interleave the results from the two indexes, but its not sensible for me to interleave them based on their score at the moment. t

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht
ures being based on NFS and corruption is something I've never heard about. Note: no concurrent access to a lucene index, right? Paul Le 2 oct. 2012 à 14:01, Jong Kim a écrit : > Thank you all for reply. > > So it soudns like it is a known fact that the performance would suffe

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht
My experience in the Lucene 1.x times were a factor of at least four in writing to NFS and about two when reading from there. I'd discourage this as much as possible! (rsync is way more your friend for transporting and replication à la solr should also be considered) paul Le 2 oct. 2

Re: Memory issues with Lucene deployment

2012-09-27 Thread Paul Taylor
elease IndexSearchers or something, anyway trying to get a stack dump done Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Memory issues with Lucene deployment

2012-09-25 Thread Paul Taylor
this is running on, I know this question is a bit vague but I would appreciate some ideas where to investigate next thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: let's use our native language

2012-09-14 Thread Paul Libbrecht
> most sentences around Lucene what I searched out aren't compiled correctly. > wondering if we build our local mailing list... Which language? paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa

Lucene Index backward compatibility related question

2012-08-27 Thread Sitowitz, Paul
index created using Lucene 3.01 by our first product? OR, will we have to "bite the bullet" and upgrade BOTH products to use the latest version of Lucene? Thanks in advance for you response. Sincerely, Paul Sitowitz P a u l S i t o w i t z Core E

Re: RAM or SSD...

2012-07-18 Thread Paul Jakubik
If only 30GB, go with RAM and MMAPDirectory (as long as you have the budget for that hardware). My understanding is that RAMDirectory is intended for unit tests, not for production indexes. On Wed, Jul 18, 2012 at 10:50 AM, Dragon Fly wrote: > > Hi, > > If I want to improve performance, which of

RE: any good idea for loading fields into memory?

2012-06-25 Thread Paul Hill
faster. But I can't say what the tradeoff would be if you wanted most fields at each step in the search. Good luck, -Paul

RE: Fast way to get the start of document

2012-06-25 Thread Paul Hill
response try using less common words and a few more of them and you won't run into your too-huge documents unless you really want to see them. So is there NO way to read the "all_text" field and only read _the_start_ of it? Otherwise, I'm thinking I'll go with an e

RE: any good idea for loading fields into memory?

2012-06-22 Thread Paul Hill
y (or closer to) what you need in it. -Paul > -Original Message- > From: Li Li [mailto:fancye...@gmail.com] > our old map implementation use about 10 ms, while newer one is 40 > ms. the reason is we need to return some fields of all hitted documents. the > fields are not ver

  1   2   3   4   5   6   7   8   9   10   >