Re: Lucene 4.0 MemoryIndex Bug?

2011-12-08 Thread Uwe Schindler
Hi, You mixed incompatible jar file versions of Lucene 4.0 modules. Try to recompile everything from source. Uwe -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Stephen Howe schrieb: I've been playing around with Lucene's MemoryIndex and anytime I

RE: IndexReader.openIfChanged Doesn't Work on MultiReader

2011-12-10 Thread Uwe Schindler
boolean parameter (which defaults to read-only mode). Reopening in R/W mode is not supported at all. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent:

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Uwe Schindler
If you remove the useless CSS in the HTML it looks perfect in package.html! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Shai Erera [mailto:ser...@gmail.com] > Sent: Thursday, December 15,

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Uwe Schindler
really fine), leave an unformatted first sentence in the docs and then copy the plain HTML without CSS after it (removing the from the first title) Should I provide a patch? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Uwe Schindler
Yes, I could attach the patch there! Will you open it? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Shai Erera [mailto:ser...@gmail.com] > Sent: Thursday, December 15, 2011 1:47 PM > T

RE: Why is the old value still in the index

2011-12-16 Thread Uwe Schindler
Hi, > I'm adding documents to an index, at a later date I modify a document and > update the index, close the writer and open a new IndexReader. My > indexreader iterates over terms for that field and docFreq() returns one as I > would expect, however the iterator returns both the old value of the

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-16 Thread Uwe Schindler
Do you have NumericFields? If yes, how are they configured? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Thushara Wijeratna [mailto:thu...@gmail.com] > Sent: Saturday, December 17, 2011 12:2

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-16 Thread Uwe Schindler
provide us your full Java version (java -version) and ideally the full source code you use during indexing. The only chance you can get this Exception is by a JVM bug. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Messag

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-18 Thread Uwe Schindler
iew_bug.do?bug_id=5091921 (see also http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-March/00494 2.html) Otherwise, is all fine, if you remove the numeric field? The code you are using can never cause such behavior, this is extensively tested. Uwe - Uwe Schindler H.-H

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
.html>("valSize > > must be 32 or 64"); > > } > > > > typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC : > > TOKEN_TYPE_LOWER_PREC); > > posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0); > > shift += precisionStep; >

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
flexible indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene moved to binary terms in 4.0 and no longer uses character based terms, so the code is different. BytesRef is just a wrapper around a byte[]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

RE: Query that returns all docs that contain a field

2011-12-19 Thread Uwe Schindler
Hi, There is also a Query/Filter based on that FieldCache: o.a.l.search.FieldValueFilter, possibly wrapped with ConstantScoreQuery Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: M

RE: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
case you should RTFM of BytesRef and realted classes (possibly > > watch the flexible indexing talk done by me in Berlin, Barcelona or > > San Francisco). Lucene moved to binary terms in 4.0 and no longer uses > > character based terms, so the code is different. BytesRef is just

RE: inspecting chinese index using luke

2011-12-19 Thread Uwe Schindler
headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de >

Re: Query that returns all docs that contain a field

2011-12-20 Thread Uwe Schindler
Hi, No. But the corresponding filter/query does. The bits are just for lookup, if you already have a valid document. The remaining bits are undefined (like the rest of Fieldcache). Uwe -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Paul Taylor schrieb: On 19/12

RE: Query that returns all docs that contain a field

2011-12-20 Thread Uwe Schindler
() directly, you have to countercheck IndexReader.isDeleted(), eg.: (Bits.get(doc) && !reader.isDeleted(doc); Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Taylor [mailto:paul_t.

RE: Can't get a hit

2011-12-29 Thread Uwe Schindler
- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Cheng [mailto:zhoucheng2...@gmail.com] > Sent: Thursday, December 29, 2011 5:27 PM > To: java-user@lucene.apache.org > Subject: Can'

RE: Indexing product keys with and without spaces in them

2012-01-03 Thread Uwe Schindler
eywordTokenizer only for this field. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Christoph Kaser [mailto:lucene_l...@iconparc.de] > Sent: Tuesday, January 03, 2012 2:23 PM > To:

RE: Indexing product keys with and without spaces in them

2012-01-03 Thread Uwe Schindler
Hi, > Has somebody ever tried something like this? Is there a way to do this without > increasing the index to about 15 times (1+2+3+4+5) its original size? The index will not have 15 times the size as it is inverted index and only indexes the unique parts of your tokens. In most cases it will ha

RE: Boolean OR does not work as described

2012-01-03 Thread Uwe Schindler
el:13? schluessel:23?" (first two are MUST clauses, third is should). If you read this query using MUST/SHOULD you will understand what happens: This means the first 2 terms MUST be in the result, the third MIGHT be in the result. So it's not an union, it's something more complex :-

RE: Boolean OR does not work as described

2012-01-03 Thread Uwe Schindler
Hi Hoss, Hey, nice blog article - it makes my previous mail obsolete :-) Explained perfectly! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fuc

RE: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-07 Thread Uwe Schindler
Hi, > > I mean my benchmarks show up > > to 300% improvement with 4.x versus older versions so something is > > weird ie. non-realistic here or there is a bug so lets figure this > > out. Can you profile you app and see if you find something suspicious? > > I'll try now and report back. > > It s

RE: any tips for upgrading Lucene 3.0.3 -> 3.5.0?

2012-01-19 Thread Uwe Schindler
Lucene 3.5 can read any index going back to 2.0. The IndexUpgrader is only needed to "forcefully" upgrade indexes for maximum performance and safe migration to Lucene 4.0 (that can only read indexs >= 3.0). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaph

RE: any tips for upgrading Lucene 3.0.3 -> 3.5.0?

2012-01-19 Thread Uwe Schindler
> -Original Message- > From: earlh...@gmail.com [mailto:earlh...@gmail.com] On Behalf Of Earl > Hood > Sent: Friday, January 20, 2012 12:41 AM > To: java-user@lucene.apache.org > Subject: Re: any tips for upgrading Lucene 3.0.3 -> 3.5.0? > > On Thu, Jan 19, 201

RE: Lucene 4 getSpans not retrieving spans

2012-01-25 Thread Uwe Schindler
Hi, > Goofing off with my index, I ran across this example > http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a- > positional-match-in-lucene/ > for > using span queries to see what else is around a word that hits. Noticeably, > there's a nice getSpans(IndexReader) method th

RE: Query term counting, again...

2012-01-26 Thread Uwe Schindler
You have to take care that BooleanScorer2 is used, by requesting docsInOrder. Then its very nice, I have a customer using this. The important thing is that your Collector returns the right thing :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

RE: Null scorer constructed by TermQuery

2012-01-27 Thread Uwe Schindler
UCENE-3442 But still: null is a valid return value for scorer()!!! It may return null, if no document can match this query. Means the term does not exist at all. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original

RE: Null scorer constructed by TermQuery

2012-01-27 Thread Uwe Schindler
Lucene 3.5, so theoretically your > code should work on 3.5: > https://issues.apache.org/jira/browse/LUCENE-3442 > > But still: null is a valid return value for scorer()!!! It may return null, if no > document can match this query. Means the term does not exist at all. > > Uw

RE: deprecated optimize()!

2012-01-27 Thread Uwe Schindler
Hi, > After reading all about the renaming of optimize() and updating my Lucene > libraries to 3.4, I was surprised and confused by what I found. > > I have a 1 segment index (all files are named _1*.*) that had been created > with 3.0.1 code which had been optimized many times (all 3.0.1 code)

RE: deprecated optimize()!

2012-01-28 Thread Uwe Schindler
Hi, > > > The first time my code used the 3.4 libraries with version level set > > > to 3.4 and it tried to optimize() (still using this now deprecated > > > old call), the new code > > went wild! > > > It took up more memory than the heap was limited to, so I believe it > > > is taking > > > up s

RE: Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Uwe Schindler
Hi, > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Saturday, January 28, 2012 10:33 AM > To: 'java-user@lucene.apache.org' > Subject: Does Fuzzy Search scores the same as Exact Match > > All things being equal does a fuzzy match give the same score as an e

RE: Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Uwe Schindler
- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Saturday, January 28, 2012 11:01 AM > Cc: java-user@lucene.apache.org > Subject: Re: Do

RE: Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Uwe Schindler
> > >> -Original Message- > > >> From: Paul Taylor [mailto:paul_t...@fastmail.fm] > > >> Sent: Saturday, January 28, 2012 10:33 AM > > >> To: 'java-user@lucene.apache.org' > > >> Subject: Does Fuzzy Search scores the same as Exact Match > > >> > > >> All things being equal does a fuzzy matc

RE: How to avoid filtering stop words like "IS" in StandardAnalyzer

2012-01-28 Thread Uwe Schindler
Right, but Collections.emptySet() should be used :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Pedro Lacerda [mailto:pslace...@gmail.com] > Sent: Saturday, January 28, 2012 12:49 PM &

RE: How to avoid filtering stop words like "IS" in StandardAnalyzer

2012-01-28 Thread Uwe Schindler
Or even better: CharArraySet.EMPTY_SET - sorry for noise. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Saturday, January 28, 2012 12:52 PM &

RE: How to avoid filtering stop words like "IS" in StandardAnalyzer

2012-01-29 Thread Uwe Schindler
Hi, If you want to disable *all* stop words, then CharArraySet.EMPTY_SET is the right choice. For performance reasons you should also use CharArraySet for non-empty stop words instead of simple HashSet. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

RE: Null scorer constructed by TermQuery

2012-01-31 Thread Uwe Schindler
sses. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael Kazekin [mailto:michael.kaze...@mediainsight.info] > Sent: Tuesday, January 31, 2012 9:30 AM > To: java-user@lucene.apache.o

RE: Searching a string using lucene

2012-01-31 Thread Uwe Schindler
MemoryIndex only allows *one* document! So it is mostly for lookup if a term is contained in a document and where (used internally by the highlighter). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- >

RE: too many boolean clauses

2012-02-01 Thread Uwe Schindler
her.search(queryParser.parse("content: (hello world)"), filter,...); Or you use the wrapped as ConstantScore and combine it with the query: BooleanQuery bq = new BooleanQuery(); bq.add(queryParser.parse("content: (hello world)"), BooleanClause.Occur.MUST); bq.add(wrapped, Bo

RE: Lucene 2.9.4 Wildcard Search, Boost and Sorting

2012-02-01 Thread Uwe Schindler
wildcard. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Lutz Fechner [mailto:lfech...@hubwoo.com] > Sent: Wednesday, February 01, 2012 11:42 AM > To: java-user@lucene.apache.org > Subjec

RE: Lucene appears to use memory maps after unmapping them

2012-02-01 Thread Uwe Schindler
have IR.getRefCount()==0). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Trejkaz [mailto:trej...@trypticon.org] > Sent: Wednesday, February 01, 2012 2:33 AM > To: java-user@lucen

RE: Configure writer to write to FSDirectory?

2012-02-05 Thread Uwe Schindler
uot; space is outside your JVM heap, so does not slowdown the garbage collector. So be sure to *not* allocate too much heap space (-Xmx) to your search app, only the minimum needed to execute it and leave the rest of your RAM available for the OS kernel to manage FS cache. Uwe - Uwe Schindler H

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Please review the following articles about NRT, absolutely instant updates that are visible as they are done are almost impossible (even with RAMDirectory): http://goo.gl/mzAHt http://goo.gl/5RoPx http://goo.gl/vSJ7x Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Hi Cheng, all pros and cons are explained in those articles written by Mike! As soon as there are harddisks in the game, there is a slowdown, what do you expect? If you need it faster, buy SSDs! :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

RE: Filter and IndexSearcher in Lucene 4.0 (trunk)

2012-02-10 Thread Uwe Schindler
Whats the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Hany Azzam [mailto:h...@eecs.qmul.ac.uk] > Sent: Friday, February 10, 2012 6:43 PM > To: java-user@lucene.apache.org &

RE: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Uwe Schindler
Hi, > as for Trunk 4.x, I can't find the isDeleted(int) method. any one could tell me > why this method is removed? See MIGRATE.txt... Hint: AtomicReader.getLiveDocs() Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene

RE: Merging results from two searches on two separate Searchers

2012-02-14 Thread Uwe Schindler
Scores are only compatible if the query is the same, which is not the case for you. So you cannot merge hits from different queries. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Trejkaz [mailt

RE: Merging results from two searches on two separate Searchers

2012-02-14 Thread Uwe Schindler
Hi, To merge TopDocs with "compatible scores", you can use the new Lucene (since 3.3) method: TopDocs.merge(). Just execute the query on different indexes with same topdocs count and execute this method. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaph

RE: Empty numeric field

2012-02-15 Thread Uwe Schindler
separately indexed. So what do you mean with equal length? Why must this "length" be identical? The only suggestion is to index a "fake" placeholder value (like -1, infinity, NaN). If you only need it in the "stored" fields, just store it but don't index it. Uwe

RE: Empty numeric field

2012-02-15 Thread Uwe Schindler
alue and > max_value are offered. Further, we have to convert this value back into null > or > something empty, whereby the empty string value field integrates seamless as > it should (with respect to the exceptions ;) ) But maybe there is no other > possibility to take the max/min va

RE: Empty numeric field

2012-02-15 Thread Uwe Schindler
Hi again, I just have to remind that sorting on multi-valued fields is not supported by Lucene! This has nothing to do with numeric, it just does not work and may throw other exceptions depending on the version you use. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http

RE: Short circuit AND or subquerying in lucene for performance

2012-02-15 Thread Uwe Schindler
> : Basically for queries such as field1:foo AND field2:*bar, I think it > : would be highly beneficial to restrict evaluation of the second field on > : the result of the first to avoid scanning the index in its entirety due > : to the leading wildcard. > > This is exactly how the BooleanQuery cl

RE: query for documents WITHOUT a field?

2012-02-16 Thread Uwe Schindler
Lucene 3.6 will have a FieldValueFilter that can be negated: Query q = new ConstantScoreQuery(new FieldValueFilter("field", true)); (see http://goo.gl/wyjxn) Lucen 3.5 does not yet have it, you can download 3.6 snapshots from Jenkins: http://goo.gl/Ka0gr - Uwe Schindler H.-H.-M

RE: query for documents WITHOUT a field?

2012-02-16 Thread Uwe Schindler
ngWrapperFilter with PrefixFilter instead. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Tim Eck [mailto:tim...@gmail.com] > Sent: Thursday, February 16, 2012 10:14 PM > To: java-user@lucene.apa

Re: query for documents WITHOUT a field?

2012-02-16 Thread Uwe Schindler
I already mentioned that pseudo NULL term, but the user asked for another solution... -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Jamie Johnson schrieb: Another possible solution is while indexing insert a custom token which is impossible to show up in the

RE: Counting all the hits with parallel searching

2012-02-19 Thread Uwe Schindler
the 2 million's result page, so pass a small number of top hits. To simply count all hits like you seem to do, there is a separate collector available: http://goo.gl/XsPVR ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -

RE: Hanging with fixed thread pool in the IndexSearcher multithread code

2012-02-19 Thread Uwe Schindler
execute new callables from within another running callable) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Benson Margulies [mailto:bimargul...@gmail.com] > Sent: Monday, February 20, 201

RE: Multiple CFS files are generated

2012-02-20 Thread Uwe Schindler
in the MergePolicy passed to IndexWriterConfig. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ganesh [mailto:emailg...@yahoo.co.in] > Sent: Monday, February 20, 2012 9:46 AM > T

RE: How to separate one index into multiple?

2012-02-20 Thread Uwe Schindler
more effective as it does not first copy the whole index, it just "merges" a subset of all documents using a FilterIndexReader to another directory. For your case, the filter could be a QueryWrapperFilter(TermQuery(new Term("category", categoryToFilter))). See http://goo.gl/v

RE: Is it possible to create an index with lucene core version 3.3+ by using Version.2_3 that I can then open an index with the original lucene core 2.3 version?

2012-02-20 Thread Uwe Schindler
'c Codec API, it would be eventually possible to write a "codec" that emits the very ancient index format, but that's not planned and up to the user to do this if he has time and money :-). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMa

RE: Question about CustomScoreQuery

2012-02-21 Thread Uwe Schindler
multiplication). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Dominika Puzio [mailto:dominika.pu...@gmail.com] > Sent: Tuesday, February 21, 2012 10:27 AM > To: java-user@luce

RE: Here a merge thread, there a merge thread ...

2012-02-25 Thread Uwe Schindler
long time). The MergePolicy simply tells under which conditions and how segments are merged. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sen

RE: StandardAnalyzer and Email Addresses

2012-02-26 Thread Uwe Schindler
StandardTokenizer). I am not sure why there is no Analyzer implementation already available, maybe Steven Rowe knows more. The trick with the phrase is of lower performance as it uses a PhraseQuery internally, which is more expensive. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

RE: Date time as String or Numeric field

2012-02-28 Thread Uwe Schindler
token, but search with NumericRangeQuery is then as slow as a TermRangeQuery. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ganesh [mailto:emailg...@yahoo.co.in] > Sent: Tuesday, February 28,

RE: RE: Date time as String or Numeric field

2012-02-28 Thread Uwe Schindler
data is encoded on disk. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ganesh [mailto:emailg...@yahoo.co.in] > Sent: Tuesday, February 28, 2012 12:17 PM > To: java-user@lucene.apache.

RE: In Lucene 3.5 is it always better to not optimize indexes ?

2012-03-02 Thread Uwe Schindler
Why not simply forceMerge your index one time and compare with unoptimized? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Friday, March 02,

RE: What replaces IndexReader.openIfChanged in Lucene 4.0?

2012-03-05 Thread Uwe Schindler
DirectoryReader.openIfChanged - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Benson Margulies [mailto:bimargul...@gmail.com] > Sent: Monday, March 05, 2012 4:54 PM > To: java-user@lucen

RE: Is Java 7 now safe with Lucene?

2012-03-06 Thread Uwe Schindler
one is explicitely tested with Java 7u1+ Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Chris Bamford [mailto:chris.bamf...@talktalk.net] > Sent: Tuesday, March 06, 2012 2:13 PM > T

RE: A little more CHANGES.txt help on terms(), please

2012-03-06 Thread Uwe Schindler
AtomicReader.fields() - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Benson Margulies [mailto:bimargul...@gmail.com] > Sent: Tuesday, March 06, 2012 2:50 PM > To: java-user@lucene.apache.o

RE: A little more CHANGES.txt help on terms(), please

2012-03-06 Thread Uwe Schindler
HANGES.txt help on terms(), please > > On Tue, Mar 6, 2012 at 8:56 AM, Uwe Schindler wrote: > > AtomicReader.fields() > > I went and read up AtomicReader in CHANGES.txt. Should I call > SegmentReader.getReader(IOContext)? > > I just posted a patch to CHANGES.txt to cl

RE: A little more CHANGES.txt help on terms(), please

2012-03-06 Thread Uwe Schindler
Hi, The recommended way to get an atomic reader from a composite reader is to use SlowCompositeReaderWrapper.wrap(reader). MultiFields is now purely internal. I think it's only public because the codecs package may need it, otherwise it should be pkg-private. ----- Uwe Schindler H.-H.-

RE: Problem with updating a document or TermQuery with current trunk

2012-03-06 Thread Uwe Schindler
String field is analyzed, but with KeywordTokenizer, so all should be fine. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Tuesd

RE: MemoryIndex "field must not be added more than once"

2012-03-07 Thread Uwe Schindler
Hi, Can you open a bug report in JIRA about this? The IndexWriter/IndexReader contract allows to add the same field several times (internally concatenating, but with a positionIncrement gap). If you append the fields before, phrase queries may behave differently. Uwe - Uwe Schindler H.-H

RE: More About NOT Optimizing

2012-03-08 Thread Uwe Schindler
Hi, > Interesting coincidence, just last night one of our in-house indexes must have > decide it could use some merging and dropped 5 segments (of ~30+) and 4-5 > GB (of a total ~20-25 GB). So it was great to see it in action. > > I'm in no hurry, but I'll be eventually looking into using TieredM

RE: lots of .cfs (compound files) in the index directory

2012-03-15 Thread Uwe Schindler
Hi, It will always commit before closing! But the result depends how you call close... but depending on the boolean parameter, the background merges can be interrupted, so it closes as soon as possible, leaving pending merges undone. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http

RE: lots of .cfs (compound files) in the index directory

2012-03-15 Thread Uwe Schindler
Close calls and always did call commit in 3.x? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Thursday, March 15, 2012 5:29 PM &

RE: A key value field storing

2012-03-21 Thread Uwe Schindler
You can use a CustomScoreQuery wrapping your scored query to multiply the "confidence level" (as a DocValues field in Lucene trunk, or an indexed NumericField with precisionStep=Integer.MAX_VALUE using FieldCache) into the score. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 B

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
Hi, Are you sure that you are not reusing the same NumericField instances across different threads? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jianwen lou [mailto:loujan...@gmail.com]

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
The bug mentioned in this link was a multithread bug (what I asked you). If you reuse Documents and Fields this can happen, otherwise not. This code is heavily tested and the code you sent cannot fail. Maybe its different to the one you actually use? - Uwe Schindler H.-H.-Meier-Allee 63, D

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
ntime Environment (build 1.6.0_25-b06) Java HotSpot(TM) > > 64-Bit Server VM (build 20.0-b11, mixed mode) > > > > > > > > On Tue, Mar 27, 2012 at 3:24 PM, Uwe Schindler wrote: > > > >> Hi, > >> > >> Are you sure that you are not reusing

RE: TVD, TVX and TVF files

2012-03-27 Thread Uwe Schindler
Maybe you only see CFS files? If this is the case, your index is in compound file format. In that case (the default), to get the raw files, disable compound files in the merge policy! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de

RE: weird multifile problems

2012-04-06 Thread Uwe Schindler
To enforce creation of CFS files, you have to set the CFS percentage to 100% (1.0) in the MergePolicy: http://goo.gl/X9pF3, http://goo.gl/QFKGf By default Lucene only created CFS files, if the segment size is larger than 10% of the whole index. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D

RE: weird multifile problems

2012-04-06 Thread Uwe Schindler
> To enforce creation of CFS files, you have to set the CFS percentage to 100% > (1.0) in the MergePolicy: http://goo.gl/X9pF3, http://goo.gl/QFKGf By default > Lucene only created CFS files, if the segment size is larger than 10% of the > whole index. Sorry, other way round of course. Large segme

RE: Lucene 4 - POS and Syntactic Tagging

2012-04-10 Thread Uwe Schindler
he query and the indexing side, possibly with PerFieldAnalyzerWrapper to limit it to specific fields. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark McGuire [mailto:mmcgu...@hawk.iit.edu] &g

RE: [regression] Lucene 3.6.0 tells me "ant jflex" target does not exist

2012-04-17 Thread Uwe Schindler
Hi, You have to go into subdirectory "lucene/core"; there you can execute the jflex target. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Torsten Krah [mailto:tk...@fac

RE: Field value vs TokenStream

2012-04-18 Thread Uwe Schindler
is really needed (I prefer to have the stored and indexed fields completely separated with different fiel names; e.g. stored fields can also be used to store a XML file for search result display in the index that has nothing to do with the field used for retrieval, but tokenizing and indexing t

RE: Two questions on RussianAnalyzer

2012-04-19 Thread Uwe Schindler
> My questions are: 1) it this change is by design (not a mistake) and > 2) is the only option to achieve old behaviour is to use > Version.LUCENE_30 for creating analyzer? This is why this option is there! - To unsubscribe, e-m

RE: DisjunctionMaxQuery and scoring

2012-04-19 Thread Uwe Schindler
Hi, > I think > BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the > desired "name IN (dick, rich)" scoring behavior. This is because (name:dick | > name:rich) with coord=false would score the 'document' "Dick Rich" higher > than "Rich" because the former has two term matches

RE: DisjunctionMaxQuery and scoring

2012-04-19 Thread Uwe Schindler
), but not add them at all, DisjunctionMaxQuery is fine. I think this is what Benson asked for. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Frid

RE: lucene algorithm ?

2012-04-26 Thread Uwe Schindler
Hi, > I read the paper by Doug "Space optimizations for total ranking", > > since it was written a long time ago, I wonder what algorithms lucene uses > (regarding postings list traversal and score calculation, ranking) The algorithms described in that paper are still in use to merge posting li

Re: new feature in lucene3.6

2012-05-15 Thread Uwe Schindler
> Hello > I have a question about this new feature in lucene 3.6 : > " The QueryParser now interprets * as an open end for range queries. > Literal asterisks may be represented by quoting or escaping (i.e. \* or > "*") Custom QueryParser subclasses overriding getRangeQuery() will be > passed null f

RE: Memory question

2012-05-15 Thread Uwe Schindler
It mmaps the files into virtual memory if it runs on a 64 bit JVM. Because of that you see the mmapped CFS files. This is outside Java Heap and is all *virtual* no RAM is explicitely occupied except the O/S cache. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

RE: Boosting numerical field

2012-05-19 Thread Uwe Schindler
You can use CustomScoreQuery with ValueSources. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Meeraj Kunnumpurath [mailto:meeraj.kunnumpur...@asyska.com] > Sent: Saturday, May 19, 2012 1:2

RE: IndexReader.deleteDocument(Term) in Lucene 3.6/4.0

2012-05-25 Thread Uwe Schindler
To change the behaviour of IndexReaders use FilterIndexReader, don't subclass IndexReader's directly. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Nikolay Zamosenchuk [mailto:niko

Re: Directory, IndexInput and IndexOutput concurrency

2012-05-29 Thread Uwe Schindler
In addition, IndexInput.clone must create abother view on the same file, useable from another thread. -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Michael McCandless schrieb: Multiple threads are free to interact with Directory. But it will be only one thread

Re: my rangequery problem

2012-05-30 Thread Uwe Schindler
You don't need to rewrite queries, its automatically done inside Lucene's search logic, just run the query with IndexSearcher.search. To what Lucene rewrites queries is internal and can change at any time. So what is the problem? -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 B

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Uwe Schindler
will then utilize the remaining memory for caching. Please read docs of MMapDirectory and inform yourself about mmap in e.g. Wikipedia. Uwe -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Cheng schrieb: Please shed more insight into the difference between JVM heap

RE: RAMDirectory unexpectedly slows

2012-06-04 Thread Uwe Schindler
This is managed by your operating system. In general OS kernels like Linux or Windows use all free memory to cache disk accesses. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Cheng [mailto:zhou

RE: Support for NumericRangeQuery in QueryParser

2012-06-12 Thread Uwe Schindler
yParser by overriding the factory method that creates range queries (I assume you are German and know this IT magazine). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jochen Hebbrecht [mailto:jo

RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Uwe Schindler
Just take the BooleanQuery returned by the QueryParser and get its clauses (sub-queries like TermQuery, PhraseQuery, other BooleanQuery...). By that you get all query components. In most cases some recursive instanceof checking for various Query subclasses can do this. Uwe - Uwe Schindler H

  1   2   3   4   5   6   7   8   9   10   >