xact same options (index options, points dimensions, norms,
> doc values type, etc.) as already indexed documents that also have
> this field.
>
> However it's a bug that Lucene fails to open an index that was legal
> in Lucene 8. Can you file a JIRA issue?
>
> On Mon, Dec 13,
Hi
We have a long-standing index with some mandatory fields and some optional
fields that has been through multiple lucene upgrades without a full
rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when
open an IndexWriter we are hitting the exception
Exception in thread "main"
A. Non-PMC.
--
Ian.
On Wed, Jun 17, 2020 at 1:28 PM jim ferenczi wrote:
> I vote option A (PMC vote)
>
> Le mer. 17 juin 2020 à 14:24, Felix Kirchner <
> felix.kirch...@uni-wuerzburg.de> a écrit :
>
> > A
> >
> > non-PMC
> >
> > Am 16.06.2020 um 00:08 schrieb Ryan Ernst:
> > > Dear Lucene
What are the full package names for these interfaces? I don't think they
are org.apache.lucene.
--
Ian.
On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N
wrote:
> Hi,
>
> It's not about the file formats. Rather It is about LuceneInputFormat
> and LuceneOutputFormat
Looks like your screenshot didn't make it, but never mind: I'm sure we all
know what text files look like.
A join on two ID fields sounds more like SQL database territory rather than
lucene. Lucene is not an SQL database. But I typed "lucene join" into a
well known search engine and the top hit
er of files in that index folder using java
> (File.listFiles()) it lists 1761 files in that folder. This count goes down
> to a double digit number when I restart the tomcat.
>
> Thanks for looking into it.
>
> --
> Regards
> -Siraj Haider
> (212) 306-0154
>
> -O
The most common cause is unclosed index readers. If you run lsof against
the tomcat process id and see that some deleted files are still open,
that's almost certainly the problem. Then all you have to do is track it
down in your code.
--
Ian.
On Thu, May 4, 2017 at 10:09 PM, Siraj Haider
on of lucene, but not found in
> version 5.x
>
> Any suggestion to bypass that?
>
> Sorry for my bad English.
>
> 2017-02-17 19:40 GMT+08:00 Ian Lea <ian@gmail.com>:
> > Hi
> >
> >
> > SimpleAnalyzer uses LetterTokenizer which divides text a
Hi
SimpleAnalyzer uses LetterTokenizer which divides text at non-letters.
Your add and search methods use the analyzer but the delete method doesn't.
Replacing SimpleAnalyzer with KeywordAnalyzer in your program fixes it.
You'll need to make sure that your id field is left alone.
Good to see
oal.search.ConstantScoreQuery?
"A query that wraps another query and simply returns a constant score equal
to the query boost for every document that matches the query. It therefore
simply strips of all scores and returns a constant one."
--
Ian.
On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal
No, it implies that Lucene is a low level library that allows people like
you and me, application developers, to develop applications that meet our
business and technical needs.
Like you, most of the things I work with prefer documents where the search
terms are close together, often preferably
Sounds to me like it's related to the index not having been closed properly
or still being updated or something. I'd worry about that.
--
Ian.
On Thu, Jun 16, 2016 at 11:19 AM, Mukul Ranjan wrote:
> Hi,
>
> I'm observing below exception while getting instance of
I'd definitely go for b). The index will of course be larger for every
extra bit of data you store but it doesn't sound like this would make much
difference. Likewise for speed of indexing.
--
Ian.
On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote:
> Hi there,
> I
Would
http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/IndexReader.html#document(int,%20java.util.Set)
be what you are looking for?
--
Ian.
On Mon, May 16, 2016 at 1:39 PM, wrote:
> Hello,
>
> I am storing close to 100 fields in a single document which
.
>> > >
>> > > Uwe
>> > >
>> > > -
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: u...@thetaphi.de
>> > >
>> > >
>> > > > -Original
Hi
Can you provide a few examples of values of cpn that a) are and b) are
not being found, for indexing and searching.
You may also find some of the tips at
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
useful.
You haven't shown the code that
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Also double check that it's Lucene that you should be concentrating
on. In my experience it's often the reading of the data from a
database, if that's what you are doing, that is the bottleneck.
--
Ian.
On Wed, Sep 9, 2015 at
mro...@gmail.com> wrote:
> Thanks a lot !
>
> But do you know some links that helps implement these optimization options
> without the Solr (using only lucene) ?
>
> I am using lucene 4.9.
>
> More thanks.
>
> Humberto
>
>
> On Wed, Sep 9, 2015 at 5:23 AM, Ian
data-source-2)
...
t1.start()
t2.start()
...
wait ...
iw.close()
--
Ian.
> On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea <ian@gmail.com> wrote:
>
>> The link that I sent,
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene,
>> not
>From a glance, you need to close the old reader after calling
openIfChanged if it gives you a new one.
See
https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader).
You may wish to pay attention to the words
Hi - I suggest you narrow the problem down to a small self-contained
example and if you still can't get it to work, show us the code. And
tell us what version of Lucene you are using.
--
Ian.
On Mon, Jun 1, 2015 at 5:20 PM, Rahul Kotecha
kotecha.rahul...@gmail.com wrote:
Hi All,
I am
Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your
BooleanQuery? Seems more logical and I suspect would solve the problem.
Caching filters can be good too, depending on how often your data changes.
See CachingWrapperFilter.
--
Ian.
On Tue, Mar 10, 2015 at 12:45 PM, Chris
Is there a difference between using StoredField and using other types of
fields with Field.Store.YES?
It will depend on what the other type of field is. As the javadoc for
Field states, the xxxField classes are sugar. If you are doing
standard things on standard data it's generally easier to
I think if you follow the Field.fieldType().numericType() chain you'll
end up with INT or DOUBLE or whatever.
But if you know you stored it as an IntField then surely you already
know it's an integer? Unless you sometimes store different things in
the one field. I wouldn't do that.
--
Ian.
, and I want to make sure
I match only index entries that do not have more than 2 tokens, is there a
way to do that too?
Thanks
On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea ian@gmail.com wrote:
Break the query into words then add them as TermQuery instances as
optional clauses
, Ian Lea ian@gmail.com wrote:
Sounds like a job for
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
--
Ian.
On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
We have a requirement in that E-mail addresses need
Break the query into words then add them as TermQuery instances as
optional clauses to a BooleanQuery with a call to
setMinimumNumberShouldMatch(2) somewhere along the line. You may want
to do some parsing or analysis on the query terms to avoid problems of
case matching and the like.
--
Ian.
Sounds like a job for
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
--
Ian.
On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
We have a requirement in that E-mail addresses need to be added in a
tokenized form to one field
...@gmail.com wrote:
Thanks Ian for your help. But I didn't get aol search, what it is ? tried
searching in google but couldn't find.
Thanks
On Fri, Feb 13, 2015 at 3:00 AM, Ian Lea ian@gmail.com wrote:
I think you can do it with 4 simple queries:
1) +flying +shooting
2) +flying
I think you can do it with 4 simple queries:
1) +flying +shooting
2) +flying +fighting
etc.
or BooleanQuery equivalents with MUST clauses. Use
aol.search.TotalHitCountCollector and it should be blazingly fast,
even if you have more that 100 docs.
--
Ian.
On Thu, Feb 12, 2015 at 5:42 PM,
logic and boosts and whatever else I wanted.
--
Ian.
On Wed, Feb 11, 2015 at 2:37 PM, Jon Stewart
j...@lightboxtechnologies.com wrote:
Ok... so how does anyone ever use date-time queries in lucene with the
new recommended way of using longs?
Jon
On Wed, Feb 11, 2015 at 9:26 AM, Ian Lea ian
To the best of my knowledge you are spot on with everything you say,
except that the component to parse the strings doesn't exist. I
suspect that a contribution to add that to StandardQueryParser might
well be accepted.
--
Ian.
On Wed, Feb 11, 2015 at 4:21 AM, Jon Stewart
that gets handed a
field name and query components (e.g., created, 2010-01-01,
2014-12-31), which I can derive from, parse the timestamp strings,
and then turn the whole thing into a numeric range query component?
Jon
On Wed, Feb 11, 2015 at 9:10 AM, Ian Lea ian@gmail.com wrote
If you only ever want to retrieve based on exact match you could index
the name field using org.apache.lucene.document.StringField. Do be
aware that it is exact: if you do nothing else, a search for a will
not match A or A .
Or you could so something with start and end markers e.g. index your
org.apache.lucene.search.BooleanQuery.
--
Ian.
On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote:
Hi,
i want to combine two MultiTermQueries.
One searches over FieldA, one over FieldB. Both queries should be combined
with OR operator.
so in lucene Syntax i want
, BooleanClause.Occur.SHOULD);
bquery.add(queryFieldB, BooleanClause.Occur.SHOULD);
this is the correct way?
Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr
Von: Ian Lea ian@gmail.com
An: java-user@lucene.apache.org
Betreff: Re: combine to MultiTermQuery
How about home~10 house~10 flat. See
http://lucene.apache.org/core/4_10_3/queryparser/index.html
--
Ian.
On Fri, Jan 23, 2015 at 7:17 AM, Priyanka Tufchi
priyanka.tuf...@launchship.com wrote:
Hi ALL
I am working on a project which uses lucene for searching . I am
struggling with boolean
by query it is giving same score ..It is not working.
Thanks
Priyanka
On Fri, Jan 23, 2015 at 3:19 PM, Ian Lea ian@gmail.com wrote:
How about home~10 house~10 flat. See
http://lucene.apache.org/core/4_10_3/queryparser/index.html
--
Ian.
On Fri, Jan 23, 2015 at 7:17 AM, Priyanka
Are you asking if your two suggestions
1) a MultiPhraseQuery or
2) a BooleanQuery made up of multiple PhraseQuery instances
are equivalent? If so, I'd say that they could be if you build them
carefully enough. For the specific examples you show I'd say not and
would wonder if you get correct
by
some javaprocess.
Jürgen.
Am 19.01.2015 um 13:36 schrieb Ian Lea:
Do you need to call forceMerge(1) at all? The javadoc, certainly for
recent versions of lucene, advises against it. What version of lucene
are you running?
It might be helpful to run lsof against the index directory
before
Do you need to call forceMerge(1) at all? The javadoc, certainly for
recent versions of lucene, advises against it. What version of lucene
are you running?
It might be helpful to run lsof against the index directory
before/during/after the merge to see what files are coming or going,
or if
How are you storing the id field? A wild guess might be that this
error might be caused by having some documents with id stored,
perhaps, as a StringField or TextField and some as an IntField.
--
Ian.
On Wed, Jan 14, 2015 at 2:07 PM, Sascha Janz sascha.j...@gmx.net wrote:
hello,
i am
Presumably no exception is thrown from the new IndexWriter() call?
I'd double check that, and try some harmless method call on the
writer and make sure that works. And run CheckIndex against the
index.
--
Ian.
On Tue, Jan 6, 2015 at 5:05 PM, Brian Call
brian.c...@soterawireless.com
Hi
I can't give an exact answer to your question but my experience has
been that it's best to leave all the merge/buffer/etc settings alone.
If you are doing a bulk update of a large number of docs then it's no
surprise that you are seeing a heavy IO load. If you can, it's likely
to be worth
Telling us the version of lucene and the OS you're running on is
always a good idea.
A guess here is that you aren't closing index readers, so the JVM will
be holding on to deleted files until it exits.
A combination of du, ls, and lsof commands should prove it, or just
losf: run it against the
Toronto != toronto. From the javadocs for StandardAnalyzer:
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter,
LowerCaseFilter does what you would expect.
--
Ian.
On Fri, Oct 3, 2014 at 3:52 AM, Xu Chu 1989ch...@gmail.com wrote:
Hi everyone
In the following
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers.
Personally I'd simply store the case-insensitive field with a call to
toLowerCase() on the value and equivalent on the search string.
You will of course use more storage, but you don't need to store the
text contents for
Hi
On running a quick test after a handful of minor code changes to deal
with 4.10 deprecations, a program that updates an existing index
failed with
Exception in thread main java.lang.IllegalStateException: cannot
write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)
at
at 7:01 AM, Ian Lea ian@gmail.com wrote:
Hi
On running a quick test after a handful of minor code changes to deal
with 4.10 deprecations, a program that updates an existing index
failed with
Exception in thread main java.lang.IllegalStateException: cannot
write 3x SegmentInfo unless
.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
Sent: Wednesday, September 10, 2014 1:01 PM
To: java-user@lucene.apache.org
Subject: 4.10.0: java.lang.IllegalStateException: cannot write 3x
Retrieving stored data is always likely to take longer than not doing
so. There are some tips in
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
But taking over a minute to retrieve data for 50 hits sounds
excessive. Are you sure about those figures?
--
Ian.
On Thu, Jul 31, 2014
You tell it what you want. See the javadocs for
org.apache.lucene.document.Field and friends such as TextField.
--
Ian.
On Mon, Aug 4, 2014 at 2:43 PM, Sachin Kulkarni kulk...@hawk.iit.edu wrote:
Hi,
I am using lucene 4.6.0 to index a dataset.
I have the following fields:
doctitle,
, Jul 18, 2014 at 7:34 PM, Ian Lea ian@gmail.com wrote:
You need to supply more info. Tell us what version of lucene you are
using and provide a very small completely self-contained example or
test case showing exactly what you expect to happen and what is
happening instead.
--
Ian
Probably because something in the analysis chain is removing the
hyphen. Check out the javadocs. Generally you should also make sure
you use the same analyzer at index and search time.
--
Ian.
On Fri, Jul 18, 2014 at 6:52 AM, itisismail it.is.ism...@gmail.com wrote:
Hi I have created index
You need to supply more info. Tell us what version of lucene you are
using and provide a very small completely self-contained example or
test case showing exactly what you expect to happen and what is
happening instead.
--
Ian.
On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao
Might be able to do it with some combination of SpanNearQuery, with
suitable values for slop and inOrder, combined into a BooleanQuery
with setMinimumNumberShouldMatch = number of SpanNearQuery instances -
1.
So, making this up as I go along, you'd have
SpanNearQuery sn1 = B after A, slop 0, in
There's no magic to it - just build a query or six and fire them at
your newly opened reader. If you want to put the effort in you could
track recent queries and use them, or make sure you warm up searches
on particular fields. Likewise, if you use Lucene's sorting and/or
filters, it might be
It's more likely to be a demonstration that concurrent programming is
hard, results often hard to predict and debugging very hard.
Or perhaps you simply need to add acceptsDocsOutOfOrder() to your
collector, returning false.
Either way, hard to see any evidence of a thread-safety problem in
Read the javadocs to understand the difference between commit() and
flush(). You need commit(), or close().
There are no hard and fast rules and it depends on how much data you
are indexing, how fast, how many searches you're getting and how up to
date they need to be. And how much you worry
The migration guide that came out with 4.0 is probably the best place to start.
http://lucene.apache.org/core/4_8_1/MIGRATE.html is from the current
release but probably hasn't changed since 4.0. There's also the
changes file with every release. And if you browse the list archives
I expect
The one that meets your requirements most easily will be the best.
If people will want to search for words in particular fields you'll
need to split it but if they only ever want to search across all
fields there's no point.
A common requirement is to want both, in which case you can split it
You'll have to reindex.
--
Ian.
On Mon, Jan 6, 2014 at 2:11 PM, manoj raj manojluc...@gmail.com wrote:
Hi,
I have stored fields. I want to delete a single field in all documents. Can
i do that without reindexing? if yes, is it costly operations..?
Thanks,
Manoj.
an NRTManager
class? In Lucene 4.5 I cannot find the class (missing a maven dependency?).
Can anyone point me to a working example?
Cheers,
Klaus
On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea ian@gmail.com wrote:
You will indeed get poor performance if you commit for every doc. Can
you compromise
You will indeed get poor performance if you commit for every doc. Can
you compromise and commit every, say, 1000 docs, or once every few
minutes, or whatever makes sense for your app.
Or look at lucene's near-real-time search features. Google Lucene
NRT for info.
Or use Elastic Search.
--
How do you know it's not working? My favourite suggestion: post a
very small self-contained RAMDirectory based program or test case, or
maybe 2 in this case, for 3.6 and 4.3, that demonstrates the problem.
--
Ian.
On Fri, Nov 29, 2013 at 6:00 AM, VIGNESH S vigneshkln...@gmail.com wrote:
Hi,
Pasting that line into a chunk of code works fine for me, with 4.5
rather than 4.3 but I don't expect that matters. Have you got a) all
the right jars in your classpath and b) none of the wrong jars?
--
Ian.
On Wed, Nov 13, 2013 at 11:20 AM, Hang Mang gucko.gu...@googlemail.com wrote:
Hi
Have you set an analyzer when you create your IndexWriter?
--
Ian.
P.S. Please start new questions in new messages with sensible subjects.
On Mon, Nov 11, 2013 at 9:00 AM, Rohit Girdhar rohit.ii...@gmail.com wrote:
Hi
I was trying to use the lucene JAVA API to create an index. I am
still not sure what went wrong in using the other constructor
for TextField...
Thanks
PS: Sorry about that, didn't realize that while posting :( . Updated the
message subject now.
On Mon, Nov 11, 2013 at 10:00 PM, Ian Lea ian@gmail.com wrote:
Have you set an analyzer when you create
If you're using Solr you'd be better off asking this on the Solr list:
http://lucene.apache.org/solr/discussion.html.
You might also like to clarify what you want with regard to sentence
vs document. If you want to display the sentences of a matched doc,
surely you just do it: store what you
.
On Oct 11, 2013, at 7:33 AM, Ian Lea ian@gmail.com wrote:
Are you going to be caching and reusing the filters e.g. by
CachingWrapperFilter? The main benefit of filters is in reuse. It
takes time to build them in the first place, likely roughly equivalent
to running the underlying query
Boosting query clauses means more this clause is more important than
that clause rather than make the score for this search higher. I
use it for biblio searching when want to search across multiple fields
and want matches in titles to be more important than matches in
blurbs.. Amended version of
If you want to keep hyphens you could try WhitespaceAnalyzer. But
that may of course have knock on effects on other searches. Don't
forget to use the same analyzer for indexing and searching, unless
you're doing clever things.
An alternative is to create the queries directly in code, but you'll
I'd start with the simple approach of a stored field and only worry
about performance if you needed to. Field caching would likely help
if you did need to.
--
Ian.
On Mon, Oct 14, 2013 at 2:04 AM, Stephen GRAY stephen.g...@immi.gov.au wrote:
UNOFFICIAL
Hi everyone,
I'd appreciate some
Do some googling on leading wildcards and read things like
http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
an option you like.
--
Ian.
On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
nischal.srini...@gmail.com wrote:
Hi,
I have problem with doing wild card search on
On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea ian@gmail.com wrote:
Do some googling on leading wildcards and read things like
http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
an option you like.
--
Ian.
On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
nischal.srini
path.
TIA,
Nischal Y
On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea ian@gmail.com wrote:
Seems to me that it should work. I suggest you show us a complete
self-contained example program that demonstrates the problem.
--
Ian.
On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
Looks like you can achieve most of what you want by using AND rather
than OR. I think that all the should/should not examples you give
will work if you use AND on your content field.
For ordering, I suggest you look at SpanNearQuery. That can consider
order and slop, the distance between the
Are you going to be caching and reusing the filters e.g. by
CachingWrapperFilter? The main benefit of filters is in reuse. It
takes time to build them in the first place, likely roughly equivalent
to running the underlying query although with variations as you
describe. Or are you saying that
With multiple fields of the same name vs a single field I doubt you'd
be able to tell the difference in performance or matching or scoring
in normal use. There may be some matching/ranking effect if you are
looking at, say, span queries across the multiple fields.
Try it out and see what
Looks like you've got some XML processing in there somewhere. Nothing
to do with lucene. This code:
public static void main(String[] _args) throws Exception {
QueryParser qp = new QueryParser(Version.LUCENE_44,
x,
new StandardAnalyzer(Version.LUCENE_44));
for (String s : _args) {
) (!Character.isWhitespace(cn)).
My analyzer will use a lowe case filter on top of the tokenizer.This Woks
Perfect in case of 3.6
In 4.3 it is creating problems in offsets of tokens.
On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea ian@gmail.com wrote:
Whenever someone says they are using
vigneshkln...@gmail.com wrote:
Ian,
Thanks for your reply..
I am facing the same problem if i use whiteSpaceTokenizer also.
My analyzer works perfect in case of Lucene 3.6.
Thanks and Regards
Vignesh Srinivasan
On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea ian@gmail.com wrote:
Certainly sounds like
I'd write a shutdown method that calls close() in a controlled manner
and invoke it at 23:55. You could also call commit() at whatever
interval makes sense to you but if you carried on killing the JVM
you'd still be liable to lose any docs indexed since the last commit.
This is standard stuff
Are you sure it's not failing because adhoc != ad-hoc?
--
Ian.
On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S vigneshkln...@gmail.com wrote:
Hi,
I am Trying to do Multiphrase Query in Lucene 4.3. It is working Perfect
for all scenarios except the below scenario.
When I try to Search for a
because of that
On Thu, Oct 3, 2013 at 8:17 PM, Ian Lea ian@gmail.com wrote:
Are you sure it's not failing because adhoc != ad-hoc?
--
Ian.
On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S vigneshkln...@gmail.com wrote:
Hi,
I am Trying to do Multiphrase Query in Lucene 4.3
Yes, as I suggested, you could search on your unique id and not index
if already present. Or, as Uwe suggested, call updateDocument instead
of add, again using the unique id.
--
Ian.
On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok gudise.as...@gmail.com wrote:
I am really sorry if something made
I'm not aware of a lucene rather than Solr or whatever tutorial. A
search for something like lucene sharding will get hits.
Why don't you want to use Solr or Katta or similar? They've already
done much of the hard work.
How much data are you talking about?
What are your master-master
milliseconds as unique keys are a bad idea unless you are 100% certain
you'll never be creating 2 docs in the same millisecond. And are you
saying the log record A1 from file a.log indexed at 14:00 will have
the same unique id as the same record from the same file indexed at
14:30 or will it be
I'm still a bit confused about exactly what you're indexing, when, but
if you have a unique id and don't want to add or update a doc that's
already present, add the unique id to the index and search (TermQuery
probably) for each one and skip if already present.
Can't you change the log
, whether I need to add any other parameter in
addition to this while indexing?
Is there any MultiPhraseQueryTest java file for Lucene 4.3? I checked in
Lucene branch and i was not able to find..Please kindly help.
On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea ian@gmail.com wrote
{
break;
}
}
while (trm.next() != null);
}
On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea ian@gmail.com wrote:
Whenever someone says something along the lines of a search for
geoffrey not matching Geoffrey the case difference
I use the code below to do something like this. Not exactly what you
want but should be easy to adapt.
public ListString findTerms(IndexReader _reader,
String _field) throws IOException {
ListString l = new ArrayListString();
Fields ff =
Is this OOM happening as part of your early morning optimize or at
some other point? By optimize do you mean IndexWriter.forceMerge(1)?
You really shouldn't have to use that. If the index grows forever
without it then something else is going on which you might wish to
report separately.
--
Ian.
It's reasonable that block-major won't find anything.
block-major-57 should match.
The split into block and major-57 will be because, from the javadocs
for ClassicTokenizer, Splits words at hyphens, unless there's a
number in the token, in which case the whole token is interpreted as a
product
with a PrefixQuery? I think
that would work.
--
Ian.
On Fri, Sep 20, 2013 at 1:48 PM, Ramprakash Ramamoorthy
youngestachie...@gmail.com wrote:
On Fri, Sep 20, 2013 at 6:11 PM, Ian Lea ian@gmail.com wrote:
It's reasonable that block-major won't find anything.
block-major-57 should match.
Thank you
Not exactly dumb, and I can't tell you exactly what is happening here,
but lucene stores some info at the index level rather than the field
level, and things can get confusing if you don't use the same Field
definition consistently for a field.
From the javadocs for
Are you talking about CompressionTools as in
http://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/document/CompressionTools.html?
They've long been superseded by a completely different, low-level,
transparent compression method.
Anyway, use them to compress stored fields, not fields
If you want to stick with the approach of multiple indexes you'll have
to add some logic to work round it.
Option 1.
Post merge, loop through all docs identifying duplicates and deleting
the one(s) you don't want.
Option 2.
Pre merge, read all indexes in parallel, identifying and deleting as
for lots of Web apps that use
resources like Lucene.
On Thu, Sep 5, 2013 at 12:05 PM, Ian Lea ian@gmail.com wrote:
I use a singleton class but there are other ways in tomcat. Can't
remember what - maybe application scope.
--
Ian.
On Thu, Sep 5, 2013 at 4:46 PM, David Miranda
Take a look at org.apache.lucene.search.SearcherManager.
From the javadocs Utility class to safely share IndexSearcher
instances across multiple threads, while periodically reopening..
--
Ian.
On Thu, Sep 5, 2013 at 2:16 AM, David Miranda david.b.mira...@gmail.com wrote:
Hi,
I'm developing
1 - 100 of 911 matches
Mail list logo