Re: Exceptions during batch indexing

2014-11-10 Thread Peter Keegan
--Original Message- From: Peter Keegan > Sent: Thursday, November 6, 2014 3:21 PM > To: java-user > Subject: Exceptions during batch indexing > > > How are folks handling Solr exceptions that occur during batch indexing? > Solr (4.6) stops parsing the docs stream when an er

Exceptions during batch indexing

2014-11-06 Thread Peter Keegan
How are folks handling Solr exceptions that occur during batch indexing? Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing. The bad document is not identified, so it would be hard for the client to recover by skipping over

Re: DocValues memory usage

2013-03-28 Thread Peter Keegan
the same experiment and got same result. Then I used per-field > codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but > I'm not feeling confident with this usage. Anyone can say more about > removing DirectSource API? > > > > > > > >

DocValues memory usage

2013-03-26 Thread Peter Keegan
Inspired by this presentation of DocValues: http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene I decided to try them out in 4.2. I created a 1M document index with one DocValues field: BinaryDocValuesField conceptsDV = new BinaryDocValuesField("con

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan
All term queries, including payload queries, deal only with words from the query that exist in a document. They don't know what other terms are in a matching document, due to the inverted nature of the index. Peter On Fri, Feb 3, 2012 at 11:50 AM, shyama wrote: > Hi Peter > Thanks for your repl

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan
AveragPayloadFunction is just what it sounds like: return numPayloadsSeen > 0 ? (payloadScore / numPayloadsSeen) : 1; What values are you seeing returned from PayloadHelper.decodeFloat ? Peter On Fri, Feb 3, 2012 at 4:13 AM, shyama wrote: > Hi Peter > I have checked payload associated with term

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-02 Thread Peter Keegan
I don't quite follow what you're doing, but is it possible that your payloads are not on the desired terms when you indexed them? The first explanation shows that the matching document contained "luteinizing hormone" in both fields 'AbstractText' and 'AbstractTitle'. The average payload value was '

Re: Search within a sentence (revisited)

2011-07-26 Thread Peter Keegan
y to work on the > fixes. > > > > I can likely look at this later today. > > > > - Mark Miller > > lucidimagination.com > > > > On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > > > >> Hi Mark, > >> > >> Sorry to bug you a

Re: Search within a sentence (revisited)

2011-07-25 Thread Peter Keegan
y(String text) { return new SpanTermQuery(new Term(field, text)); } public TermQuery makeTermQuery(String text) { return new TermQuery(new Term(field, text)); } } Peter On Thu, Jul 21, 2011 at 5:23 PM, Mark Miller wrote: > > I just uploaded a patch for 3X that will work for 3.2.

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan
: > > > Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to > change that to an IndexReader I believe. > > > > - Mark > > > > On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: > > > >> Does this patch require th

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan
atch to > https://issues.apache.org/jira/browse/LUCENE-777 > > Further tests may be needed though. > > - Mark > > > On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote: > > > Hi Mark, > > > > Here is a unit test using a version of 'SpanWithinQuery&#x

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan
, text)); } public TermQuery makeTermQuery(String text) { return new TermQuery(new Term(field, text)); } } Peter On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller wrote: > > On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: > > > > > On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote:

Re: Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
nto sentences and put those in a multi-valued field > and then search that. > > On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan > wrote: > > I have browsed many suggestions on how to implement 'search within a > > sentence', but all seem to have drawbacks. For example

Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072 Steve Rowe writes: -- One common technique, instead of using a l

Re: how to index large number of files?

2010-10-22 Thread Peter Keegan
> running eclipse with -Xmx2G parameter. This only affects the Eclipse JVM, not the JVM launched by Eclipse to run your application. Did you add -Xmx2G to the 'VM arguments' of your Debug or Run configuration? Peter On Thu, Oct 21, 2010 at 3:26 PM, Sahin Buyrukbilen < sahin.buyrukbi...@gmail.com

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan
levant? How formal was that > process? > > -Grant > > On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > > > We discovered very soon after going to production that Lucene's scores > were > > often 'too precise'. For example, a page of 25 results may ha

Re: Relevancy Practices

2010-05-03 Thread Peter Keegan
We discovered very soon after going to production that Lucene's scores were often 'too precise'. For example, a page of 25 results may have several different score values, and all within 15% of each other, but to the end user all 25 results were equally relevant. Thus we wanted the secondary sort f

Re: Combining TopFieldCollector with custom Collector

2010-03-12 Thread Peter Keegan
ier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Peter Keegan [mailto:peterlkee...@gmail.com] > > Sent: Thursday, March 11, 2010 9:41 PM > > To: java-user@lucene.apache.org > > Subject: R

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
I want the TFC to do all the cool things it does like custom sorting, saving the field values, max score, etc. I suppose the custom Collector could explicitly delegate all TFC's methods, but this doesn't seem right. Peter On Thu, Mar 11, 2010 at 3:40 PM, Peter Keegan wrote: > Yes

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Then in each of Collectors methods that you implement, do your own > stuff (setting the bit) but also then call tfc.XXX (eg tfc.collect). > > That should work? > > Mike > > On Thu, Mar 11, 2010 at 2:57 PM, Peter Keegan > wrote: > > Yes. Could you give me a hint on how t

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010 at 2:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Can you make your own collector and then just delegate internally to TFC? > > Mike > > On Thu, Mar 11, 2010 at 2:30 PM, Peter Keega

Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Is it possible to issue a single search that combines a TopFieldCollector (MultiComparatorScoringMaxScoreCollector) with a custom Collector? The custom Collector just collects the doc IDs into a BitSet (or DocIdSet). The collect() methods of the various TopFieldCollectors cannot be overridden. Tha

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote: > Great, I'll give it a try. > Thanks! > > &

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
call .getReader().getVersion(), then close & > open the writer, I think (but you better test to be sure!) the next > .getReader().getVersion() should always match. > > Mike > > On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan > wrote: > > Is there a way for the applicat

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
t for BG merges to > finish, but IW.close does (by default), this means you'll pick up an > extra version whenever a merge is running when you call close. > > Mike > > On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan > wrote: > > I'm pretty sure this output oc

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
[Indexer]: commit: already prepared IW 10 [Indexer]: commit: pendingCommit != null IW 10 [Indexer]: commit: wrote segments file "segments_e" IFD [Indexer]: now checkpoint "segments_e" [4 segments ; isCommit = true] IFD [Indexer]: deleteCommits: now decRef commit "segmen

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
I've reproduced this and I have a bunch of infoStream log files. Since the messages have no timestamps, it's hard to tell where the relevant entries are. What should I be looking for? Peter On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan wrote: > I'm pretty sure there are f

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
ote: > Well I'm at a loss then. The version should only increment on commit. > > Can you make it all happen when infoStream is on, and post back? > > Mike > > On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan > wrote: > > Only one writer thread and one writer proc

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
s. > > It's only on prepareCommit (or, commit, if you didn't first prepare, > since that will call prepareCommit internally) that this version > should increase. > > Is there only 1 thread doing this? > > Oh, and, are you passing false for autoCommit? > > Mike

IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4. writer.getReader(

Re: PayloadNearSpanScorer explain method

2010-02-22 Thread Peter Keegan
Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan wrote: > Yes, I will provide a patch. Our new proxy server has broken my access to > the svn repository, though :-( > > > On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll wrote: > >> That s

Re: PayloadNearSpanScorer explain method

2010-02-17 Thread Peter Keegan
Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll wrote: > That sounds reasonable. Patch? > > On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote: > > > The &

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Peter Keegan
Same experience here as Tom. Disk I/O becomes bottleneck with large indexes (or multiple shards per server) with less memory. Frequent updates to indexes can make the I/O bottleneck worse. Peter On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West wrote: > > Hi Chris, > > In our experience with larg

PayloadNearSpanScorer explain method

2010-02-15 Thread Peter Keegan
The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. I don't see an easy way to override this because 'payloadsSeen' and 'payloadScore' are private/protected. It seems like the 'PayloadFunction' interface should have an 'explain' method that the Scorer could ca

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan
il.com> wrote: > Peter, which filter do you use, do you respect the IndexReaders > maxDoc() and the docBase? > > simon > > On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan > wrote: > > I think the Filter's docIdSetIterator is using the top level reader for > each &

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan
mal case). > So I'm not [yet] seeing where the issue is... > > Can you boil it down to a smallish test case? > > Mike > > On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan > wrote: > > I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The F

searchWithFilter bug?

2009-12-04 Thread Peter Keegan
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I get only a subset of the expected results, even accounting for deletes. The index has 10 segments. In IndexSearcher->searchWithFilter, it looks like

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
o something else. Thanks, Peter On Tue, Nov 17, 2009 at 11:51 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan > wrote: > > The external data is just an array of fixed-length records, one for each > > Lucene

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
ader->docBase map lookup once when the custom scorer is created? No need to access the map for every doc this way. Peter On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan wrote: > The external data is just an array of fixed-length records, one for each > Lucene document. Indexes are updated a

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
ctor finds the docBase. Thanks, Peter On Tue, Nov 17, 2009 at 5:49 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan > wrote: > > >>Can you remap your external data to be per segment? > > > > Th

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
your own map from SegmentReader -> > docBase, by calling IndexReader.getSequentialSubReaders() and stepping > through adding up the maxDoc. Then, in your search, you can lookup > the SegmentReader you're working on to get the docBase? > > Mike > > On Mon, Nov 16, 2009

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have docIds that seem to be relative to the segment. Is there any way to translate these into index-wide docIds? Peter On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan wrote: >

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I forgot to mention that this is with V2.9.1 On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan wrote: > I have a custom query object whose scorer uses the 'AllTermDocs' to get all > non-deleted documents. AllTermDocs returns the docId relative to the > segment, but I need the a

Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need the absolute (index-wide) docId to access external data. What's the best way to get the unique, non-deleted docId? Thanks, Peter

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
ficial source, it doesn't mean you will create something > identical to the official jars that were released. > > -- > - Mark > > http://www.lucidimagination.com > > > > Peter Keegan wrote: > > The -dev version is confusing when it's the target of a build

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
rmula is always in flux - we likely hard coded the > change in 2.9.0 when releasing - we likely won't again in the future. > Some discussion about it came up recently on the list. > > > -- > - Mark > > http://www.lucidimagination.com > > > Peter Keegan wrote: >

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
it during build), use "ant -Dversion=2.9.1" > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Peter Keegan [mailto:peterlkee...@gmail

building lucene-core from source

2009-11-09 Thread Peter Keegan
I know this has been asked before, but I couldn't find the thread. The jar file produced from a build of 2.9.0 is 'lucene-core-2.9.jar'. For 2.9.1, it is 'lucene-core-2.9.1-dev.jar'. When does the '-dev' get removed? Peter

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan
>Are you using Lucene 2.9? Yes Peter On Sun, Nov 8, 2009 at 6:23 PM, Peter Keegan wrote: > Here is some stand-alone code that reproduces the problem. There are 2 > classes. jvm1 creates the index, jvm2 reads the index. The system console > input is used to synchronize the 4 ste

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan
gotten "true" back from isCurrent. > You're sure there were no intervening calls to IndexWriter.commit? > Are you using Lucene 2.9? If not, you have to make sure autoCommit > is false when opening the IndexWriter. > > Mike > > On Fri, Nov 6, 2009 at 2:46 PM, Pe

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
:49 PM, Mark Miller wrote: > Thanks a lot Peter! Really appreciate it. > > Peter Keegan wrote: > > Mark, > > > > With 1.9G, I had to increase the JVM heap significantly (to 8G) to avoid > > paging and GC hits. Here is a table comparing indexing times, optimizing &

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
> with that, at over a gig it starts to page and the performance gets hit. > I'd love to see what kind of benefit you see going from around a gig to > just under 2. > > Peter Keegan wrote: > > Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with >

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with optimization in just under 30 min. I used setRAMBufferSizeMB=1.9G Peter On Thu, Oct 29, 2009 at 3:46 PM, Peter Keegan wrote: > A handful of the source documents did contain the U+ character. The > patch from *LUCEN

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
x shows the problem? > > OK I'm attaching more mods. Can you re-run your CheckIndex? It will > produce an enormous amount of output, but if you can excise the few > lines around when that warning comes out & post back that'd be great. > > Mike > > On Wed, Oct 28,

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
with this index. Peter On Wed, Oct 28, 2009 at 11:29 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan > wrote: > > The only change I made to the source code was the patch for > PayloadNearQuery > > (LUCENE-1986

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
nd post somewhere that I could download? That's the smallest > of the broken segments I think. > > I don't need the full IW output just yet, thanks. > > Mike > > On Wed, Oct 28, 2009 at 10:21 AM, Peter Keegan > wrote: > > Yes, I used JDK 1.6.0_16 when running C

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
My last post got truncated - probably exceeded max msg size. Let me know if you want to see more of the IndexWriter log. Peter

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
) detected WARNING: would write new segments file, and 663862 documents would be lost, if -fix were specified Do the unit tests create multi-segment indexes? Peter On Tue, Oct 27, 2009 at 3:08 PM, Peter Keegan wrote: > It's reproducible with a large no. of docs (>1 million), but no

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
dless.com> wrote: > This is odd -- is it reproducible? > > Can you narrow it down to a small set of docs that when indexed > produce a corrupted index? > > If you attempt to optimize the index, does it fail? > > Mike > > On Tue, Oct 27, 2009 at 1:40 PM, Peter Kee

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
_3.nrm" IFD [Indexer]: delete "_4.tii" IFD [Indexer]: delete "_0.nrm" IFD [Indexer]: delete "_5.fnm" IFD [Indexer]: delete "_1.tis" IFD [Indexer]: delete "_0.fnm" IFD [Indexer]: delete "_2.prx" IFD [Indexer]: delete "_6.tii" IFD

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
0 JREs. > Yes, I remember this problem - that's why we stayed at _03 Thanks. > > Mike > > On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan > wrote: > > After rebuilding the corrupted indexes, the low disk space exception is > now > > occurring as expected. Sorr

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
Clarification: this CheckIndex is on the index from which the merge/optimize failed. Peter On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan wrote: > Running CheckIndex after the IOException did produce an error in a term > frequency: > > Opening index @ D:\mnsavs\lresumes3\lresumes3.l

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file "segments_5" Peter On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan wrote: > After rebuilding the corrupted indexes, the low disk space exception is now > occurring as expected. Sorry for

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
"_0.tis" IFD [Indexer]: delete "_0.fnm" IFD [Indexer]: delete "_0.tii" IFD [Indexer]: delete "_0.frq" IFD [Indexer]: delete "_0.fdx" IFD [Indexer]: delete "_0.prx" IFD [Indexer]: delete "_0.fdt" Peter On Mon, Oct 26, 2009 at 3:

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan > wrote: > > On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> O

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Oct 26, 2009 at 10:44 AM, Peter Keegan > wrote: > > Even running in console mode, the exception is difficult to interpret. > > Here's an exception that I think oc

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
at org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3695) I guess this is just the nature of a low disk space condition on Windows. I expected to see a 'no space left on device' IO exception. Peter On Sun, Oct 25, 2009 at 8:54 PM, Peter Keegan wrote: > The environm

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan
> On Sun, Oct 25, 2009 at 7:35 PM, Peter Keegan > wrote: > >>Did you get any traceback printed at all? > > no, only what I reported. > > > >>Did you see any BG thread exceptions on wherever your System.err is > > directed to? > > The jvm was running a

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan
ny traceback printed at all? It should include one > traceback into Lucene's optimized method, and then another (under > "caused by") showing the exception from the BG merge thread. > > Did you see any BG thread exceptions on wherever your System.err is > directed

Re: IO exception during merge/optimize

2009-10-24 Thread Peter Keegan
btw, this is with Lucene 2.9 On Sat, Oct 24, 2009 at 5:20 PM, Peter Keegan wrote: > I'm sometimes seeing the following exception from an operation that does a > merge and optimize: > java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 > into _2 [optimize

IO exception during merge/optimize

2009-10-24 Thread Peter Keegan
I'm sometimes seeing the following exception from an operation that does a merge and optimize: java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 into _2 [optimize] [mergeDocStores] I'm pretty sure that it's caused by a temporary low disk space condition, but I'd like to be ab

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan
I can reproduce this with a unit test - will post to JIRA shortly. Peter On Fri, Oct 16, 2009 at 8:06 AM, Peter Keegan wrote: > next() is called in PayloadNearQuery->setFreqCurrentDoc: > super.setFreqCurrentDoc(); > But, I think it should be called before 'getPayloads'

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan
that it fails? > > -Grant > > > > > On Oct 15, 2009, at 1:28 PM, Peter Keegan wrote: > > The query is: >> +payloadNear([spanNear([contents:insurance, contents:agent], 1, >> false), >> spanNear([contents:winston, contents:salem], 1, false)], 10, false)

Re: NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
is happened on) would be greatly appreciated. > > -Yonik > http://www.lucidimagination.com > > > On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan > wrote: > > I'm using Lucene 2.9 and sometimes get a NPE in NearSpansU

NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerException at org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219) at org.apache.lucene.search.payloads.PayloadNearQuery$PayloadNearSpanScorer.processPayloads(PayloadNearQuery.j

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
9, 2009 at 9:40 AM, Peter Keegan > wrote: > > IndexSearcher.search is calling my custom scorer's 'next' and 'doc' > methods > > 64% fewer times. I see no 'advance' method in any of the hot spots'. I am > > getting the same number of hits

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
he gains. > > Mike > > On Wed, Sep 9, 2009 at 9:44 AM, Mark Miller wrote: > > How about the new score inorder/out of order stuff? It was an option > > before, but I think now it uses whats best by default? And pairs with > > the collector? I didn't follow any of that

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
ed, Sep 9, 2009 at 9:17 AM, Yonik Seeley < yonik.see...@lucidimagination.com> wrote: > On Wed, Sep 9, 2009 at 8:57 AM, Peter Keegan > wrote: > > Using JProfiler, I observe that the improvement > > is due to a huge reduction in the number of calls to TermDocs.next and > >

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2. I'm seeing a huge increase in throughput (2x-10x) on an index that was built with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses containing a custom scorer. Using JProfiler, I observe that the improv

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Peter Keegan
Or you could try this patch: *LUCENE-1316 * Peter* * On Thu, Aug 6, 2009 at 8:51 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Opening your IndexReader with readOnly=true should also fix it, I think. > > Mike > > On Thu, Aug 6, 200

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Peter Keegan
There is a similar discussion on this topic here: http://www.gossamer-threads.com/lists/lucene/java-user/42824?search_string=Lucene%20search%20performance%3A%20linear%3F;#42824 or: *http://tinyurl.com/lpp3hf* On Wed, Jun 17, 2009 at 1:18 PM, Teruhiko Kurosaka wrote: > Thank you, Ian and Erick,

Re: How to het the score in percentage

2009-05-05 Thread Peter Keegan
Maybe joseph means 'percentage of the theoretical maximum score' for the query? See this thread: http://www.gossamer-threads.com/lists/lucene/java-user/61075?search_string=theoretical%20maximum%20score;#61075 Peter On Tue, May 5, 2009 at 8:36 AM, Erick Erickson wrote: > But to echo Chris, what

Re: sloppyFreq question

2009-03-20 Thread Peter Keegan
Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the terms are indexed at the same position. In doc 2, the terms are indexed in adjacent positions (normal way). For the query "the quick brown fox", d

Re: sloppyFreq question

2009-03-11 Thread Peter Keegan
> I suppose SpanTermQuery could override the weight/scorer methods so that > it behaved more like a TermQuery if it was executed directly ... but > that's really not what it's intended for. This is currently the only way to boost a term via payloads. BoostingTermQuery extends SpanTermQuery. > if

Re: sloppyFreq question

2009-03-09 Thread Peter Keegan
Any comments? Thanks, Peter On Tue, Mar 3, 2009 at 2:42 PM, Peter Keegan wrote: > The DefaultSimilarity class defines sloppyFreq as: > > public float sloppyFreq(int distance) { > return 1.0f / (distance + 1); > } > > For a 'SpanNearQuery', this reduces the eff

sloppyFreq question

2009-03-03 Thread Peter Keegan
The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number of terms in the span increases. So, for a simple phrase query (using spans),

Re: queryNorm affect on score

2009-03-02 Thread Peter Keegan
her fields". > > Best > Erick > > On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan > wrote: > > > As suggested, I added a query-time boost of 0.0f to the 'literals' field > > (with index-time boost still there) and I did get the same scores for > both &

Re: queryNorm affect on score

2009-03-01 Thread Peter Keegan
hat had no affect on the score, when combined with the above. This seems ok in this example since the the matching terms had boost = 0. Thanks Yonik, Peter On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley wrote: > On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan > wrote: > >> in situat

Re: queryNorm affect on score

2009-02-28 Thread Peter Keegan
> in situations where you deal with simple query types, and matching query structures, the queryNorm > *can* be used to make scores semi-comparable. Hmm. My example used matching query structures. The only difference was a single term in a field with zero weight that didn't exist in the matching

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan
Got it. This is another example of why scores can't be compared between (even similar) queries. (we don't) Thanks. On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley wrote: > On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan > wrote: > > Any comments about this? Is this just t

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan
Any comments about this? Is this just the way queryNorm works or is this a bug? Thanks, Peter On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan wrote: > > The explanation of scores from the same document returned from 2 similar > queries differ in an unexpected way. There are 2 fields

queryNorm affect on score

2009-02-20 Thread Peter Keegan
The explanation of scores from the same document returned from 2 similar queries differ in an unexpected way. There are 2 fields involved, 'contents' and 'literals'. The 'literals' field has setBoost = 0. As you an see from the explanations below, the total weight of the matching terms from the 'li

Re: Payloads

2008-12-29 Thread Peter Keegan
Hi Karl, I use payloads for weight only, too, with BoostingTermQuery (see: http://www.nabble.com/BoostingTermQuery-scoring-td20323615.html#a20323615) A custom tokenizer looks for the reserved character '\b' followed by a 2 byte 'boost' value. It then creates a special Token type for a custom filt

Re: BoostingTermQuery scoring

2008-11-07 Thread Peter Keegan
cting performance? (I haven't tried it yet). Thanks, Peter On Thu, Nov 6, 2008 at 6:56 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Peter, > > On 11/06/2008 at 4:25 PM, Peter Keegan wrote: > > I've discovered another flaw in using this technique: > > &

Re: Boosting results

2008-11-07 Thread Peter Keegan
If you sort first by score, keep in mind that the raw scores are very precise and you could see many unique values in the result set. The secondary sort field would only be used to break equal scores. We had to use a custom comparator to 'smooth out' the scores to allow the second field to take eff

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan
score that doc. Yet another reason to use BoostingTermQuery. Peter On Thu, Nov 6, 2008 at 1:08 PM, Peter Keegan <[EMAIL PROTECTED]> wrote: > Let me give some background on the problem behind my question. > > Our index contains many fields (title, body, date, city, etc). Most queries

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan
in a higher level Query, > kind of like the BooleanQuery, but then part of it sounds like it is per > document, right? Is it that you want to deal with multiple payloads in a > document, or multiple BTQs in a bigger query? > > On Nov 4, 2008, at 9:42 AM, Peter Keegan wrote: > > I&#

BoostingTermQuery scoring

2008-11-04 Thread Peter Keegan
I'm using BoostingTermQuery to boost the score of documents with terms containing payloads (boost value > 1). I'd like to change the scoring behavior such that if a query contains multiple BoostingTermQuery terms (either required or optional), documents containing more matching terms with payloads

Re: Payloads and SpanScorer

2008-07-19 Thread Peter Keegan
-sourced somewhere. Feel free to throw darts at it :) Peter On Thu, Jul 10, 2008 at 2:09 PM, Peter Keegan <[EMAIL PROTECTED]> wrote: > I may take a crack at this. Any more thoughts you may have on the > implementation are welcome, but I don't want to distract you too much. > >

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan
terms have payloads that boost their scores >> b) the terms are positionally next to each other (minimal slop - as it >> works >> now) >> >> >> Does this make sense? >> >> Peter >> >> On Thu, Jul 10, 2008 at 9:21 AM, Grant Ingersoll <[EMA

  1   2   >