Re: Lucene 9.0.0 inconsistent index options

2021-12-14 Thread Ian Lea
xact same options (index options, points dimensions, norms, > doc values type, etc.) as already indexed documents that also have > this field. > > However it's a bug that Lucene fails to open an index that was legal > in Lucene 8. Can you file a JIRA issue? > > On Mon, Dec 13,

Lucene 9.0.0 inconsistent index options

2021-12-13 Thread Ian Lea
Hi We have a long-standing index with some mandatory fields and some optional fields that has been through multiple lucene upgrades without a full rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when open an IndexWriter we are hitting the exception Exception in thread "main"

Re: [VOTE] Lucene logo contest

2020-06-18 Thread Ian Lea
A. Non-PMC. -- Ian. On Wed, Jun 17, 2020 at 1:28 PM jim ferenczi wrote: > I vote option A (PMC vote) > > Le mer. 17 juin 2020 à 14:24, Felix Kirchner < > felix.kirch...@uni-wuerzburg.de> a écrit : > > > A > > > > non-PMC > > > > Am 16.06.2020 um 00:08 schrieb Ryan Ernst: > > > Dear Lucene

Re: lucene Input and Output format

2017-08-02 Thread Ian Lea
What are the full package names for these interfaces? I don't think they are org.apache.lucene. -- Ian. On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N wrote: > Hi, > > It's not about the file formats. Rather It is about LuceneInputFormat > and LuceneOutputFormat

Re: join on two txt files data using apache lucene

2017-07-14 Thread Ian Lea
Looks like your screenshot didn't make it, but never mind: I'm sure we all know what text files look like. A join on two ID fields sounds more like SQL database territory rather than lucene. Lucene is not an SQL database. But I typed "lucene join" into a well known search engine and the top hit

Re: Un-used index files are not getting released

2017-05-09 Thread Ian Lea
er of files in that index folder using java > (File.listFiles()) it lists 1761 files in that folder. This count goes down > to a double digit number when I restart the tomcat. > > Thanks for looking into it. > > -- > Regards > -Siraj Haider > (212) 306-0154 > > -O

Re: Un-used index files are not getting released

2017-05-05 Thread Ian Lea
The most common cause is unclosed index readers. If you run lsof against the tomcat process id and see that some deleted files are still open, that's almost certainly the problem. Then all you have to do is track it down in your code. -- Ian. On Thu, May 4, 2017 at 10:09 PM, Siraj Haider

Re: unable to delete document via the IndexWriter.deleteDocuments(term) method

2017-02-17 Thread Ian Lea
on of lucene, but not found in > version 5.x > > Any suggestion to bypass that? > > Sorry for my bad English. > > 2017-02-17 19:40 GMT+08:00 Ian Lea <ian@gmail.com>: > > Hi > > > > > > SimpleAnalyzer uses LetterTokenizer which divides text a

Re: unable to delete document via the IndexWriter.deleteDocuments(term) method

2017-02-17 Thread Ian Lea
Hi SimpleAnalyzer uses LetterTokenizer which divides text at non-letters. Your add and search methods use the analyzer but the delete method doesn't. Replacing SimpleAnalyzer with KeywordAnalyzer in your program fixes it. You'll need to make sure that your id field is left alone. Good to see

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Ian Lea
oal.search.ConstantScoreQuery? "A query that wraps another query and simply returns a constant score equal to the query boost for every document that matches the query. It therefore simply strips of all scores and returns a constant one." -- Ian. On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal

Re: Favoring Terms Occurring in Close Proximity

2016-06-27 Thread Ian Lea
No, it implies that Lucene is a low level library that allows people like you and me, application developers, to develop applications that meet our business and technical needs. Like you, most of the things I work with prefer documents where the search terms are close together, often preferably

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Ian Lea
Sounds to me like it's related to the index not having been closed properly or still being updated or something. I'd worry about that. -- Ian. On Thu, Jun 16, 2016 at 11:19 AM, Mukul Ranjan wrote: > Hi, > > I'm observing below exception while getting instance of

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Ian Lea
I'd definitely go for b). The index will of course be larger for every extra bit of data you store but it doesn't sound like this would make much difference. Likewise for speed of indexing. -- Ian. On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote: > Hi there, > I

Re: Selective Output fields in Search Result. Lucene 5.5.0

2016-05-16 Thread Ian Lea
Would http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/IndexReader.html#document(int,%20java.util.Set) be what you are looking for? -- Ian. On Mon, May 16, 2016 at 1:39 PM, wrote: > Hello, > > I am storing close to 100 fields in a single document which

Re: Need help in alphanumeric search

2015-10-01 Thread Ian Lea
. >> > > >> > > Uwe >> > > >> > > - >> > > Uwe Schindler >> > > H.-H.-Meier-Allee 63, D-28213 Bremen >> > > http://www.thetaphi.de >> > > eMail: u...@thetaphi.de >> > > >> > > >> > > > -Original

Re: Need help in alphanumeric search

2015-09-28 Thread Ian Lea
Hi Can you provide a few examples of values of cpn that a) are and b) are not being found, for indexing and searching. You may also find some of the tips at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F useful. You haven't shown the code that

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Also double check that it's Lucene that you should be concentrating on. In my experience it's often the reading of the data from a database, if that's what you are doing, that is the bottleneck. -- Ian. On Wed, Sep 9, 2015 at

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
mro...@gmail.com> wrote: > Thanks a lot ! > > But do you know some links that helps implement these optimization options > without the Solr (using only lucene) ? > > I am using lucene 4.9. > > More thanks. > > Humberto > > > On Wed, Sep 9, 2015 at 5:23 AM, Ian

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
data-source-2) ... t1.start() t2.start() ... wait ... iw.close() -- Ian. > On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea <ian@gmail.com> wrote: > >> The link that I sent, >> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene, >> not

Re: IndexWriter is not closing the FDs (deleted files)

2015-09-01 Thread Ian Lea
>From a glance, you need to close the old reader after calling openIfChanged if it gives you a new one. See https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader). You may wish to pay attention to the words

Re: IndexReader returns all fields, but IndexSearcher does not

2015-06-02 Thread Ian Lea
Hi - I suggest you narrow the problem down to a small self-contained example and if you still can't get it to work, show us the code. And tell us what version of Lucene you are using. -- Ian. On Mon, Jun 1, 2015 at 5:20 PM, Rahul Kotecha kotecha.rahul...@gmail.com wrote: Hi All, I am

Re: Filtering question

2015-03-11 Thread Ian Lea
Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your BooleanQuery? Seems more logical and I suspect would solve the problem. Caching filters can be good too, depending on how often your data changes. See CachingWrapperFilter. -- Ian. On Tue, Mar 10, 2015 at 12:45 PM, Chris

Re: Difference between StoredField vs Other Fields with Field.Store.YES

2015-03-11 Thread Ian Lea
Is there a difference between using StoredField and using other types of fields with Field.Store.YES? It will depend on what the other type of field is. As the javadoc for Field states, the xxxField classes are sugar. If you are doing standard things on standard data it's generally easier to

Re: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Ian Lea
I think if you follow the Field.fieldType().numericType() chain you'll end up with INT or DOUBLE or whatever. But if you know you stored it as an IntField then surely you already know it's an integer? Unless you sometimes store different things in the one field. I wouldn't do that. -- Ian.

Re: Indexing Query

2015-02-18 Thread Ian Lea
, and I want to make sure I match only index entries that do not have more than 2 tokens, is there a way to do that too? Thanks On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea ian@gmail.com wrote: Break the query into words then add them as TermQuery instances as optional clauses

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea
, Ian Lea ian@gmail.com wrote: Sounds like a job for org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. -- Ian. On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: We have a requirement in that E-mail addresses need

Re: Indexing Query

2015-02-17 Thread Ian Lea
Break the query into words then add them as TermQuery instances as optional clauses to a BooleanQuery with a call to setMinimumNumberShouldMatch(2) somewhere along the line. You may want to do some parsing or analysis on the query terms to avoid problems of case matching and the like. -- Ian.

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea
Sounds like a job for org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. -- Ian. On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: We have a requirement in that E-mail addresses need to be added in a tokenized form to one field

Re: occurrence of two terms with the highest frequency

2015-02-13 Thread Ian Lea
...@gmail.com wrote: Thanks Ian for your help. But I didn't get aol search, what it is ? tried searching in google but couldn't find. Thanks On Fri, Feb 13, 2015 at 3:00 AM, Ian Lea ian@gmail.com wrote: I think you can do it with 4 simple queries: 1) +flying +shooting 2) +flying

Re: occurrence of two terms with the highest frequency

2015-02-12 Thread Ian Lea
I think you can do it with 4 simple queries: 1) +flying +shooting 2) +flying +fighting etc. or BooleanQuery equivalents with MUST clauses. Use aol.search.TotalHitCountCollector and it should be blazingly fast, even if you have more that 100 docs. -- Ian. On Thu, Feb 12, 2015 at 5:42 PM,

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
logic and boosts and whatever else I wanted. -- Ian. On Wed, Feb 11, 2015 at 2:37 PM, Jon Stewart j...@lightboxtechnologies.com wrote: Ok... so how does anyone ever use date-time queries in lucene with the new recommended way of using longs? Jon On Wed, Feb 11, 2015 at 9:26 AM, Ian Lea ian

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
To the best of my knowledge you are spot on with everything you say, except that the component to parse the strings doesn't exist. I suspect that a contribution to add that to StandardQueryParser might well be accepted. -- Ian. On Wed, Feb 11, 2015 at 4:21 AM, Jon Stewart

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
that gets handed a field name and query components (e.g., created, 2010-01-01, 2014-12-31), which I can derive from, parse the timestamp strings, and then turn the whole thing into a numeric range query component? Jon On Wed, Feb 11, 2015 at 9:10 AM, Ian Lea ian@gmail.com wrote

Re: search on a field by a single word

2015-02-11 Thread Ian Lea
If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be aware that it is exact: if you do nothing else, a search for a will not match A or A . Or you could so something with start and end markers e.g. index your

Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
org.apache.lucene.search.BooleanQuery. -- Ian. On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote: Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want

Re: Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
, BooleanClause.Occur.SHOULD); bquery.add(queryFieldB, BooleanClause.Occur.SHOULD); this is the correct way? Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr Von: Ian Lea ian@gmail.com An: java-user@lucene.apache.org Betreff: Re: combine to MultiTermQuery

Re: Boolean Search Query is not workng

2015-01-23 Thread Ian Lea
How about home~10 house~10 flat. See http://lucene.apache.org/core/4_10_3/queryparser/index.html -- Ian. On Fri, Jan 23, 2015 at 7:17 AM, Priyanka Tufchi priyanka.tuf...@launchship.com wrote: Hi ALL I am working on a project which uses lucene for searching . I am struggling with boolean

Re: Boolean Search Query is not workng

2015-01-23 Thread Ian Lea
by query it is giving same score ..It is not working. Thanks Priyanka On Fri, Jan 23, 2015 at 3:19 PM, Ian Lea ian@gmail.com wrote: How about home~10 house~10 flat. See http://lucene.apache.org/core/4_10_3/queryparser/index.html -- Ian. On Fri, Jan 23, 2015 at 7:17 AM, Priyanka

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread Ian Lea
Are you asking if your two suggestions 1) a MultiPhraseQuery or 2) a BooleanQuery made up of multiple PhraseQuery instances are equivalent? If so, I'd say that they could be if you build them carefully enough. For the specific examples you show I'd say not and would wonder if you get correct

Re: forceMerge(1) grows index and does not shrink back

2015-01-20 Thread Ian Lea
by some javaprocess. Jürgen. Am 19.01.2015 um 13:36 schrieb Ian Lea: Do you need to call forceMerge(1) at all? The javadoc, certainly for recent versions of lucene, advises against it. What version of lucene are you running? It might be helpful to run lsof against the index directory before

Re: forceMerge(1) grows index and does not shrink back

2015-01-19 Thread Ian Lea
Do you need to call forceMerge(1) at all? The javadoc, certainly for recent versions of lucene, advises against it. What version of lucene are you running? It might be helpful to run lsof against the index directory before/during/after the merge to see what files are coming or going, or if

Re: trouble with Collector and FieldCache

2015-01-15 Thread Ian Lea
How are you storing the id field? A wild guess might be that this error might be caused by having some documents with id stored, perhaps, as a StringField or TextField and some as an IntField. -- Ian. On Wed, Jan 14, 2015 at 2:07 PM, Sascha Janz sascha.j...@gmx.net wrote: hello, i am

Re: AlreadyClosedException on new index

2015-01-06 Thread Ian Lea
Presumably no exception is thrown from the new IndexWriter() call? I'd double check that, and try some harmless method call on the writer and make sure that works. And run CheckIndex against the index. -- Ian. On Tue, Jan 6, 2015 at 5:05 PM, Brian Call brian.c...@soterawireless.com

Re: batch-update-pattern, NoMergeScheduler?

2014-12-23 Thread Ian Lea
Hi I can't give an exact answer to your question but my experience has been that it's best to leave all the merge/buffer/etc settings alone. If you are doing a bulk update of a large number of docs then it's no surprise that you are seeing a heavy IO load. If you can, it's likely to be worth

Re: Index keeps growing, then shrinks on restart

2014-11-11 Thread Ian Lea
Telling us the version of lucene and the OS you're running on is always a good idea. A guess here is that you aren't closing index readers, so the JVM will be holding on to deleted files until it exits. A combination of du, ls, and lsof commands should prove it, or just losf: run it against the

Re: SpanTermQuery Issue

2014-10-03 Thread Ian Lea
Toronto != toronto. From the javadocs for StandardAnalyzer: Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, LowerCaseFilter does what you would expect. -- Ian. On Fri, Oct 3, 2014 at 3:52 AM, Xu Chu 1989ch...@gmail.com wrote: Hi everyone In the following

Re: Case sensitivity

2014-09-19 Thread Ian Lea
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers. Personally I'd simply store the case-insensitive field with a call to toLowerCase() on the value and equivalent on the search string. You will of course use more storage, but you don't need to store the text contents for

4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Hi On running a quick test after a handful of minor code changes to deal with 4.10 deprecations, a program that updates an existing index failed with Exception in thread main java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
at 7:01 AM, Ian Lea ian@gmail.com wrote: Hi On running a quick test after a handful of minor code changes to deal with 4.10 deprecations, a program that updates an existing index failed with Exception in thread main java.lang.IllegalStateException: cannot write 3x SegmentInfo unless

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Wednesday, September 10, 2014 1:01 PM To: java-user@lucene.apache.org Subject: 4.10.0: java.lang.IllegalStateException: cannot write 3x

Re: Fetching stored data takes more time

2014-08-04 Thread Ian Lea
Retrieving stored data is always likely to take longer than not doing so. There are some tips in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. But taking over a minute to retrieve data for 50 hits sounds excessive. Are you sure about those figures? -- Ian. On Thu, Jul 31, 2014

Re: How does Lucene decide which fields to index?

2014-08-04 Thread Ian Lea
You tell it what you want. See the javadocs for org.apache.lucene.document.Field and friends such as TextField. -- Ian. On Mon, Aug 4, 2014 at 2:43 PM, Sachin Kulkarni kulk...@hawk.iit.edu wrote: Hi, I am using lucene 4.6.0 to index a dataset. I have the following fields: doctitle,

Re: More Like This query is not working.

2014-07-21 Thread Ian Lea
, Jul 18, 2014 at 7:34 PM, Ian Lea ian@gmail.com wrote: You need to supply more info. Tell us what version of lucene you are using and provide a very small completely self-contained example or test case showing exactly what you expect to happen and what is happening instead. -- Ian

Re: Lucene Query Wrong Result for phrase.

2014-07-18 Thread Ian Lea
Probably because something in the analysis chain is removing the hyphen. Check out the javadocs. Generally you should also make sure you use the same analyzer at index and search time. -- Ian. On Fri, Jul 18, 2014 at 6:52 AM, itisismail it.is.ism...@gmail.com wrote: Hi I have created index

Re: More Like This query is not working.

2014-07-18 Thread Ian Lea
You need to supply more info. Tell us what version of lucene you are using and provide a very small completely self-contained example or test case showing exactly what you expect to happen and what is happening instead. -- Ian. On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao

Re: Can phrasequery allow mismatch?

2014-07-17 Thread Ian Lea
Might be able to do it with some combination of SpanNearQuery, with suitable values for slop and inOrder, combined into a BooleanQuery with setMinimumNumberShouldMatch = number of SpanNearQuery instances - 1. So, making this up as I go along, you'd have SpanNearQuery sn1 = B after A, slop 0, in

Re: Warm up IndexReader

2014-07-14 Thread Ian Lea
There's no magic to it - just build a query or six and fire them at your newly opened reader. If you want to put the effort in you could track recent queries and use them, or make sure you warm up searches on particular fields. Likewise, if you use Lucene's sorting and/or filters, it might be

Re: IndexSearcher.doc thread safe problem

2014-07-09 Thread Ian Lea
It's more likely to be a demonstration that concurrent programming is hard, results often hard to predict and debugging very hard. Or perhaps you simply need to add acceptsDocsOutOfOrder() to your collector, returning false. Either way, hard to see any evidence of a thread-safety problem in

Re: re-use IndexWriter

2014-07-08 Thread Ian Lea
Read the javadocs to understand the difference between commit() and flush(). You need commit(), or close(). There are no hard and fast rules and it depends on how much data you are indexing, how fast, how many searches you're getting and how up to date they need to be. And how much you worry

Re: Lucene Upgrade from 2.9.x to 4.7.x

2014-05-29 Thread Ian Lea
The migration guide that came out with 4.0 is probably the best place to start. http://lucene.apache.org/core/4_8_1/MIGRATE.html is from the current release but probably hasn't changed since 4.0. There's also the changes file with every release. And if you browse the list archives I expect

Re: Which is better ,Search through query and whole text document or search through query with document field.

2014-02-13 Thread Ian Lea
The one that meets your requirements most easily will be the best. If people will want to search for words in particular fields you'll need to split it but if they only ever want to search across all fields there's no point. A common requirement is to want both, in which case you can split it

Re: Delete a field in old documents

2014-01-07 Thread Ian Lea
You'll have to reindex. -- Ian. On Mon, Jan 6, 2014 at 2:11 PM, manoj raj manojluc...@gmail.com wrote: Hi, I have stored fields. I want to delete a single field in all documents. Can i do that without reindexing? if yes, is it costly operations..? Thanks, Manoj.

Re: Slow Index Writes

2014-01-07 Thread Ian Lea
an NRTManager class? In Lucene 4.5 I cannot find the class (missing a maven dependency?). Can anyone point me to a working example? Cheers, Klaus On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea ian@gmail.com wrote: You will indeed get poor performance if you commit for every doc. Can you compromise

Re: Slow Index Writes

2014-01-03 Thread Ian Lea
You will indeed get poor performance if you commit for every doc. Can you compromise and commit every, say, 1000 docs, or once every few minutes, or whatever makes sense for your app. Or look at lucene's near-real-time search features. Google Lucene NRT for info. Or use Elastic Search. --

Re: Deletion of Index not happening in Lucene 4.3

2013-11-29 Thread Ian Lea
How do you know it's not working? My favourite suggestion: post a very small self-contained RAMDirectory based program or test case, or maybe 2 in this case, for 3.6 and 4.3, that demonstrates the problem. -- Ian. On Fri, Nov 29, 2013 at 6:00 AM, VIGNESH S vigneshkln...@gmail.com wrote: Hi,

Re: java.lang.NoSuchFieldError: STOP_WORDS_SET

2013-11-13 Thread Ian Lea
Pasting that line into a chunk of code works fine for me, with 4.5 rather than 4.3 but I don't expect that matters. Have you got a) all the right jars in your classpath and b) none of the wrong jars? -- Ian. On Wed, Nov 13, 2013 at 11:20 AM, Hang Mang gucko.gu...@googlemail.com wrote: Hi

Re: subscribe

2013-11-11 Thread Ian Lea
Have you set an analyzer when you create your IndexWriter? -- Ian. P.S. Please start new questions in new messages with sensible subjects. On Mon, Nov 11, 2013 at 9:00 AM, Rohit Girdhar rohit.ii...@gmail.com wrote: Hi I was trying to use the lucene JAVA API to create an index. I am

Re: IndexWriter.addDocument() gives NullPointerException when used with a doc containing TextField

2013-11-11 Thread Ian Lea
still not sure what went wrong in using the other constructor for TextField... Thanks PS: Sorry about that, didn't realize that while posting :( . Updated the message subject now. On Mon, Nov 11, 2013 at 10:00 PM, Ian Lea ian@gmail.com wrote: Have you set an analyzer when you create

Re: Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Ian Lea
If you're using Solr you'd be better off asking this on the Solr list: http://lucene.apache.org/solr/discussion.html. You might also like to clarify what you want with regard to sentence vs document. If you want to display the sentences of a matched doc, surely you just do it: store what you

Re: Optimizing Filters

2013-10-17 Thread Ian Lea
. On Oct 11, 2013, at 7:33 AM, Ian Lea ian@gmail.com wrote: Are you going to be caching and reusing the filters e.g. by CachingWrapperFilter? The main benefit of filters is in reuse. It takes time to build them in the first place, likely roughly equivalent to running the underlying query

Re: PhraseQuery boost doesn't affect ScoreDoc.score

2013-10-17 Thread Ian Lea
Boosting query clauses means more this clause is more important than that clause rather than make the score for this search higher. I use it for biblio searching when want to search across multiple fields and want matches in titles to be more important than matches in blurbs.. Amended version of

Re: QueryParser stripping off Hyphen from query

2013-10-15 Thread Ian Lea
If you want to keep hyphens you could try WhitespaceAnalyzer. But that may of course have knock on effects on other searches. Don't forget to use the same analyzer for indexing and searching, unless you're doing clever things. An alternative is to create the queries directly in code, but you'll

Re: Calculating min, max and sum of a field in docs returned by search [SEC=UNOFFICIAL]

2013-10-14 Thread Ian Lea
I'd start with the simple approach of a stored field and only worry about performance if you needed to. Field caching would likely help if you did need to. -- Ian. On Mon, Oct 14, 2013 at 2:04 AM, Stephen GRAY stephen.g...@immi.gov.au wrote: UNOFFICIAL Hi everyone, I'd appreciate some

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
Do some googling on leading wildcards and read things like http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick an option you like. -- Ian. On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy nischal.srini...@gmail.com wrote: Hi, I have problem with doing wild card search on

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea ian@gmail.com wrote: Do some googling on leading wildcards and read things like http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick an option you like. -- Ian. On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy nischal.srini

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
path. TIA, Nischal Y On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea ian@gmail.com wrote: Seems to me that it should work. I suggest you show us a complete self-contained example program that demonstrates the problem. -- Ian. On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy

Re: Multiple Keywords - Regular and Any Order Search

2013-10-11 Thread Ian Lea
Looks like you can achieve most of what you want by using AND rather than OR. I think that all the should/should not examples you give will work if you use AND on your content field. For ordering, I suggest you look at SpanNearQuery. That can consider order and slop, the distance between the

Re: Optimizing Filters

2013-10-11 Thread Ian Lea
Are you going to be caching and reusing the filters e.g. by CachingWrapperFilter? The main benefit of filters is in reuse. It takes time to build them in the first place, likely roughly equivalent to running the underlying query although with variations as you describe. Or are you saying that

Re: Performance/scoring impacts with multiple occurrences of a field

2013-10-11 Thread Ian Lea
With multiple fields of the same name vs a single field I doubt you'd be able to tell the difference in performance or matching or scoring in normal use. There may be some matching/ranking effect if you are looking at, say, span queries across the multiple fields. Try it out and see what

Re: queries with doesn't work but AND does

2013-10-10 Thread Ian Lea
Looks like you've got some XML processing in there somewhere. Nothing to do with lucene. This code: public static void main(String[] _args) throws Exception { QueryParser qp = new QueryParser(Version.LUCENE_44, x, new StandardAnalyzer(Version.LUCENE_44)); for (String s : _args) {

Re: Multiphrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
) (!Character.isWhitespace(cn)). My analyzer will use a lowe case filter on top of the tokenizer.This Woks Perfect in case of 3.6 In 4.3 it is creating problems in offsets of tokens. On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea ian@gmail.com wrote: Whenever someone says they are using

Re: Multiphrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
vigneshkln...@gmail.com wrote: Ian, Thanks for your reply.. I am facing the same problem if i use whiteSpaceTokenizer also. My analyzer works perfect in case of Lucene 3.6. Thanks and Regards Vignesh Srinivasan On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea ian@gmail.com wrote: Certainly sounds like

Re: Handling abrupt shutdown while indexing

2013-10-03 Thread Ian Lea
I'd write a shutdown method that calls close() in a controlled manner and invoke it at 23:55. You could also call commit() at whatever interval makes sense to you but if you carried on killing the JVM you'd still be liable to lose any docs indexed since the last commit. This is standard stuff

Re: Problem with MultiPhrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
Are you sure it's not failing because adhoc != ad-hoc? -- Ian. On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S vigneshkln...@gmail.com wrote: Hi, I am Trying to do Multiphrase Query in Lucene 4.3. It is working Perfect for all scenarios except the below scenario. When I try to Search for a

Re: Problem with MultiPhrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
because of that On Thu, Oct 3, 2013 at 8:17 PM, Ian Lea ian@gmail.com wrote: Are you sure it's not failing because adhoc != ad-hoc? -- Ian. On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S vigneshkln...@gmail.com wrote: Hi, I am Trying to do Multiphrase Query in Lucene 4.3

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-02 Thread Ian Lea
Yes, as I suggested, you could search on your unique id and not index if already present. Or, as Uwe suggested, call updateDocument instead of add, again using the unique id. -- Ian. On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok gudise.as...@gmail.com wrote: I am really sorry if something made

Re: Multi server

2013-10-01 Thread Ian Lea
I'm not aware of a lucene rather than Solr or whatever tutorial. A search for something like lucene sharding will get hits. Why don't you want to use Solr or Katta or similar? They've already done much of the hard work. How much data are you talking about? What are your master-master

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
milliseconds as unique keys are a bad idea unless you are 100% certain you'll never be creating 2 docs in the same millisecond. And are you saying the log record A1 from file a.log indexed at 14:00 will have the same unique id as the same record from the same file indexed at 14:30 or will it be

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
I'm still a bit confused about exactly what you're indexing, when, but if you have a unique id and don't want to add or update a doc that's already present, add the unique id to the index and search (TermQuery probably) for each one and skip if already present. Can't you change the log

Re: Multiphrase Query in Lucene 4.3

2013-09-30 Thread Ian Lea
, whether I need to add any other parameter in addition to this while indexing? Is there any MultiPhraseQueryTest java file for Lucene 4.3? I checked in Lucene branch and i was not able to find..Please kindly help. On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea ian@gmail.com wrote

Re: Multiphrase Query in Lucene 4.3

2013-09-30 Thread Ian Lea
{ break; } } while (trm.next() != null); } On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea ian@gmail.com wrote: Whenever someone says something along the lines of a search for geoffrey not matching Geoffrey the case difference

Re: Multiphrase Query in Lucene 4.3

2013-09-26 Thread Ian Lea
I use the code below to do something like this. Not exactly what you want but should be easy to adapt. public ListString findTerms(IndexReader _reader, String _field) throws IOException { ListString l = new ArrayListString(); Fields ff =

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-09-26 Thread Ian Lea
Is this OOM happening as part of your early morning optimize or at some other point? By optimize do you mean IndexWriter.forceMerge(1)? You really shouldn't have to use that. If the index grows forever without it then something else is going on which you might wish to report separately. -- Ian.

Re: Strange behaviour of tokenizer with wildcard queries

2013-09-20 Thread Ian Lea
It's reasonable that block-major won't find anything. block-major-57 should match. The split into block and major-57 will be because, from the javadocs for ClassicTokenizer, Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product

Re: Strange behaviour of tokenizer with wildcard queries

2013-09-20 Thread Ian Lea
with a PrefixQuery? I think that would work. -- Ian. On Fri, Sep 20, 2013 at 1:48 PM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: On Fri, Sep 20, 2013 at 6:11 PM, Ian Lea ian@gmail.com wrote: It's reasonable that block-major won't find anything. block-major-57 should match. Thank you

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Ian Lea
Not exactly dumb, and I can't tell you exactly what is happening here, but lucene stores some info at the index level rather than the field level, and things can get confusing if you don't use the same Field definition consistently for a field. From the javadocs for

Re: Regarding Compression Tool

2013-09-13 Thread Ian Lea
Are you talking about CompressionTools as in http://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/document/CompressionTools.html? They've long been superseded by a completely different, low-level, transparent compression method. Anyway, use them to compress stored fields, not fields

Re: Check if Term present in Existing Index before Merging indexes from Directory.

2013-09-11 Thread Ian Lea
If you want to stick with the approach of multiple indexes you'll have to add some logic to work round it. Option 1. Post merge, loop through all docs identifying duplicates and deleting the one(s) you don't want. Option 2. Pre merge, read all indexes in parallel, identifying and deleting as

Re: Lucene Concurrent Search

2013-09-06 Thread Ian Lea
for lots of Web apps that use resources like Lucene. On Thu, Sep 5, 2013 at 12:05 PM, Ian Lea ian@gmail.com wrote: I use a singleton class but there are other ways in tomcat. Can't remember what - maybe application scope. -- Ian. On Thu, Sep 5, 2013 at 4:46 PM, David Miranda

Re: Lucene Concurrent Search

2013-09-05 Thread Ian Lea
Take a look at org.apache.lucene.search.SearcherManager. From the javadocs Utility class to safely share IndexSearcher instances across multiple threads, while periodically reopening.. -- Ian. On Thu, Sep 5, 2013 at 2:16 AM, David Miranda david.b.mira...@gmail.com wrote: Hi, I'm developing

  1   2   3   4   5   6   7   8   9   10   >