Re: Request for clarification on unordered SpanNearQuery

2010-03-04 Thread Paul Elschot
is not really straightforward to implement, especially when different terms can be indexed in the same position. Perhaps the javadocs for the unordered case should be improved to mention that in the unordered case the first subspans is always the one that is advanced first. Regards, Paul Elschot

Re: Request for clarification on unordered SpanNearQuery

2010-03-04 Thread Paul Elschot
impossible) and, last but not least, nested SpanNearQueries. As Mark said, spans are funny beasts. Before starting these 40 hours, you could try and discuss design ideas here. Could you elaborate on what you need to achieve? Regards, Paul Elschot Op donderdag 04 maart 2010 21:03:09 schreef Goddard

Re: Request for clarification on unordered SpanNearQuery

2010-03-05 Thread Paul Elschot
iations that your users are used to are slightly different from these two, so you might end up reimplementing the ones that your users are used to on top of Lucene. When you have test cases for them, start from these. It could also be useful to have a look at the Surround language in contrib. That

Precedence parser: NOT/AND, disableCoord

2005-03-13 Thread Paul Elschot
However, from what I see now in the precedence parser, giving up might have been premature. It seems to be possible to make the mix after all. I also noticed a BooleanQuery(disableCoord) constructor. This would be straightforward to implement in the new BooleanScorer2 by dropping the Coordinator t

Re: Precedence parser: NOT/AND, disableCoord

2005-03-15 Thread Paul Elschot
On Tuesday 15 March 2005 01:55, Erik Hatcher wrote: > > On Mar 13, 2005, at 2:35 AM, Paul Elschot wrote: > > I had a short look through the new precedence parser > > and noticed a possible issue. > > > > Adding this in the TestPrecedenceParser testSimple() method:

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Paul Elschot
java 1.4 is not acceptable? That would leave them useable for later. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Paul Elschot
nt? The FilteredQuery as posted there requires jdk 1.4 because it uses BitSet.nextSetBit(): http://issues.apache.org/bugzilla/show_bug.cgi?id=32965#c2 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For ad

Re: UnscoredRangeQuery

2005-04-15 Thread Paul Elschot
it currently impersonates a BooleanQuery because of > > http://issues.apache.org/bugzilla/show_bug.cgi?id=34407 > > - no per-doc scoring (a small constant is returned). we don't have > > any range queries where scorin

Re: Troubling with StandarTokenizer/QueryParser code generate in JavaCC

2005-04-18 Thread Paul Elschot
cc.dev.java.net/ Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Troubling with StandarTokenizer/QueryParser code generate in JavaCC

2005-04-18 Thread Paul Elschot
ectory. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene 2.0?

2005-04-20 Thread Paul Elschot
ghts? I agree that it shouldn't be released as it is, so pulling it out and putting it back in when its development continues after 2.0 seems the right way to go. I've just renamed my copy with a _2_1 suffix to keep it ou

Re: UnscoredRangeQuery

2005-04-21 Thread Paul Elschot
ou could also open a new bug (for lucene search) and then post them there, that would leave you free to choose the bug title. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: search on stored, unindexed fields?

2005-04-21 Thread Paul Elschot
e have a look at BooleanScorer and ConjunctionScorer in package org.apache.lucene.search for the details. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r165500 - in /lucene/java/trunk/docs: images/asf-logo.gif lucene-sandbox/index.html

2005-05-01 Thread Paul Elschot
that just follows the svn repository I don't use it. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Performance] Streaming main memory indexing of single strings

2005-05-02 Thread Paul Elschot
es. Yes, the svn trunk uses skipTo more often than 1.4.3. However, your implementation of skipTo() needs some improvement. See the javadoc of skipTo of class Scorer: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Scorer.html#skipTo(int) In case the underlying scorers provide sk

Re: [Performance] Streaming main memory indexing of single strings

2005-05-02 Thread Paul Elschot
n something else? Since 0 is the only document number in the index, a return target == 0; might be nice for skipTo(). It doesn't really help performance, though, and the next() works just as well. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Fwd: svn commit: r168454 - /lucene/java/trunk/src/test/org/apache/lucene/search/CheckHits.java

2005-05-06 Thread Paul Elschot
Daniel, When using the static TestCase methods it is quite likely that the testCase attribute of CheckHits can be removed completely. I suppose JUnit assumes a single thread for reporting errors/failures of a test case? Regards, Paul Elschot -- Forwarded Message -- Subject

Re: constant scoring queries

2005-05-12 Thread Paul Elschot
;s, so it can only provide an iterator over document numbers and not a random access as in contains(int) above. It needs only be used when it uses less memory than a bitset, which is the case when it allows rougly less than 1/8 of the docs of the reader. > Thoughts? Yes, thanks for combining th

Re: constant scoring queries

2005-05-12 Thread Paul Elschot
On Thursday 12 May 2005 17:11, Paul Elschot wrote: > On Tuesday 10 May 2005 22:39, Doug Cutting wrote: ... > > > Thoughts? > > Yes, thanks for combining the constant score with the one by one approach :) That should have been: Thanks for combining the constant score with the

Re: constant scoring queries

2005-05-15 Thread Paul Elschot
a maximum. This could also provide for changes in the underlying reader. A callback interface (as mentioned by Robert Engels) similar to updateQuery(Term t,BitSet newdocs); could be used in for added documents, but when docs are deleted this would not work well on the level of a reader. It might

Re: [Performance]: IndexWriter again...

2005-05-16 Thread Paul Elschot
ment after the bug is opened for the first time: http://issues.apache.org/bugzilla/enter_bug.cgi Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene vs. Ruby/Odeum

2005-05-17 Thread Paul Elschot
p://www.suse.de/~bastian/Export/linking.txt Another performance benchmark with lucene, gcj and wikipedia: http://www.spindazzle.org/green/index.php?m=20050511 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For addi

Re: constant scoring queries

2005-05-18 Thread Paul Elschot
non-score sort methods use HitCollector? That might be wasteful because it provides the score value for each doc. As for implementing all of this constant scoring and filter caching, I would like to make sure that the new BooleanScorer works close to perfection before continuing. In particular there are some patches for which I'd like to know whether they are acceptable: a patch for DisjunctionSumScorer: http://issues.apache.org/bugzilla/show_bug.cgi?id=34193 and a split off of the coordination: http://issues.apache.org/bugzilla/show_bug.cgi?id=34154 In fact, since this builds on the new scorer, I'd prefer to have that used widely before continuing. Also, I have no idea about the order in which order this constantScoring and filter caching could be easily implemented. Any ideas? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: constant scoring queries

2005-05-19 Thread Paul Elschot
the SortVIntList is that it does not have a fast skip. This forces the skipTo method in the posted filtered query to use a linear search. More inspiration from the Lucene index data structures could be used to build in the forwarding information to allow a faster skipTo on the compact filt

Re: Submission, btree BooleanScorer

2005-05-22 Thread Paul Elschot
lean query and scorers. The version posted here: http://issues.apache.org/bugzilla/show_bug.cgi?id=34154 splits off the coordination computations in case they are not needed, which might affect performance also. Regards, Paul Elschot

Re: Submission, btree BooleanScorer

2005-05-22 Thread Paul Elschot
uld also be considered as a complete replacement of the rewritten code because of the simplicity of the implementation. There is a recent thread on constant scoring queries for which the subscorers can used one by one. Regards, Paul Elsc

Re: One Byte is Seven bits too many? - A Design suggestion

2005-05-22 Thread Paul Elschot
nice fit for the recently discussed constant scoring queries. For (b) the relative variance and the influence and on the score is still high. Perhaps a mixed form with a minimum field length in a single bit could be considered there, but addressing that might be costly. Regards, Paul Elschot.

Re: svn commit: r178059 - /lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanNearQuery.java

2005-05-25 Thread Paul Elschot
On Tuesday 24 May 2005 09:05, Paul Elschot wrote: > Erik, > > On Tuesday 24 May 2005 03:35, [EMAIL PROTECTED] wrote: > > Author: ehatcher > > Date: Mon May 23 18:35:13 2005 > > > + /** Returns true iff o is equal to this. */ > > + public boolean equal

Which scorer to use for disjunctions?

2005-05-25 Thread Paul Elschot
erval in the array. Could someone indicate a few typical cases to use for selecting the best disjunction scorer? Regards, Paul Elschot P.S. I also tried getting this to work under gcj, but I'm having problems with class loading from shared libraries. I got gcj/gij to work for another proje

contrib/queryParsers/surround

2005-05-28 Thread Paul Elschot
ase holler. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: contrib/queryParsers/surround

2005-05-28 Thread Paul Elschot
On Saturday 28 May 2005 17:06, Erik Hatcher wrote: > > On May 28, 2005, at 10:04 AM, Paul Elschot wrote: > > Dear readers, > > > > I've started moving the surround query language > > http://issues.apache.org/bugzilla/show_bug.cgi?id=34331 > > into the d

Re: contrib/queryParsers/surround

2005-05-28 Thread Paul Elschot
On Saturday 28 May 2005 21:26, Erik Hatcher wrote: > > On May 28, 2005, at 1:07 PM, Paul Elschot wrote: > > A little bit of deprecation is left in the CharStream (getLine and > > getColumn) in the parser. Would you have any idea how to deal with > > that? > >

Re: contrib/surround

2005-05-28 Thread Paul Elschot
On Saturday 28 May 2005 21:26, Erik Hatcher wrote: > > On May 28, 2005, at 1:07 PM, Paul Elschot wrote: ... > > I'll leave the build.xml stand alone with constants for the > > environment. > > It was derived from a lucene build.xml of a few eons ago, so > > I

Re: contrib/queryParsers/surround

2005-05-29 Thread Paul Elschot
ay 29, 2005, at 9:33 AM, Otis Gospodnetic wrote: > > > > --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > > > >> > >> On May 28, 2005, at 10:04 AM, Paul Elschot wrote: > >> > >> > >>> Dear readers, > >>&

Re: contrib/surround

2005-06-04 Thread Paul Elschot
y, and this throws an exception when rewriting causes too many terms to be used, much like the TooManyClauses for BooleanQuery. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: contrib/surround

2005-06-05 Thread Paul Elschot
How about putting this here: http://wiki.apache.org/general/SummerOfCode2005 It seems to be a nice fit for the sponsor. Regards, Paul Elschot On Saturday 04 June 2005 22:25, Paul Elschot wrote: > On Monday 30 May 2005 02:44, Erik Hatcher wrote: > > I concur with Daniel on this.

Re: Possible bug in TermInfosReader/Writer

2005-06-05 Thread Paul Elschot
((df % skipInterval) == 0) { bufferSkip(lastDoc); } Regards, Paul Elschot. > > Regards > Daniel > > -- > http://www.danielnaber.de > > - > To unsubscribe, e

Re: Unexpected: ordered

2005-07-03 Thread Paul Elschot
On Sunday 03 July 2005 17:42, Dave Kor wrote: > Quoting Paul Elschot <[EMAIL PROTECTED]>: > > > On Sunday 03 July 2005 15:27, Dave Kor wrote: > > > I have a system that automatically generate span queries to Lucene. > > Sometimes, > > > the system ge

Re: Unexpected: ordered

2005-07-03 Thread Paul Elschot
On Sunday 03 July 2005 17:42, Dave Kor wrote: > Quoting Paul Elschot <[EMAIL PROTECTED]>: > > > On Sunday 03 July 2005 15:27, Dave Kor wrote: > > > I have a system that automatically generate span queries to Lucene. > > Sometimes, > > > the system ge

Re: Unexpected: ordered

2005-07-03 Thread Paul Elschot
On Sunday 03 July 2005 23:23, Paul Elschot wrote: Please forget about the last message, I thought I had lost the earlier one. Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Unexpected: ordered

2005-07-05 Thread Paul Elschot
ual subquery. The conclusion is that the current span code is not capable of handling such cases. It probably chokes at the moment the matches for such subqueries concur. The question is whether you would consider such a concurrence to be a match for the query. If so, the fix might be to return tru

Re: Unexpected: ordered

2005-07-05 Thread Paul Elschot
On Tuesday 05 July 2005 14:35, Dave Kor wrote: > Quoting Paul Elschot <[EMAIL PROTECTED]>: > > > On Monday 04 July 2005 22:51, Dave Kor wrote: > > > > I had another look at the code, and my guess now is that this is > > > > related to the spanNear wi

Re: Unexpected: ordered

2005-07-17 Thread Paul Elschot
Dave, On Tuesday 05 July 2005 20:54, Paul Elschot wrote: > On Tuesday 05 July 2005 14:35, Dave Kor wrote: ... > > > > Hopefully, this explains what I am trying to achieve with Lucene and why I > > need > > to match repeated sub-queries. I would really appreciate it i

Re: BooleanScorer2 ArrayIndexOutOfBoundsException

2005-07-21 Thread Paul Elschot
happens in two or three places, so don't hold your breath... Regards, Paul Elschot > > > [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 > > [java] at org.apache.lucene.search.BooleanScorer2 > > $Coordinat

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-24 Thread Paul Elschot
nation, mostly for performance. This is (or at least should be) independent of the bug in the current trunk that causes the array index out of bounds. It's best to fix the bug in the current trunk first, and then #34154 might be r

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-26 Thread Paul Elschot
On Tuesday 26 July 2005 03:01, Erik Hatcher wrote: > > On Jul 24, 2005, at 11:23 AM, Paul Elschot wrote: > > > On Friday 22 July 2005 21:18, Erik Hatcher wrote: > > > >> Paul, > >> > >> I don't have a test case handy (yet), but we're sti

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-26 Thread Paul Elschot
corer14(true) Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Uneffective writeBytes and readBytes [FIX]

2005-09-08 Thread Paul Elschot
e cases are when many terms are used in a query. Would it be easily possible to make the buffer size for a term iterator depend on the numbers of documents to be iterated? Many terms only occur in a few documents, so this could be a nice win

Delaying buffer allocation in BufferedIndexInput

2005-09-10 Thread Paul Elschot
On Friday 09 September 2005 00:34, Doug Cutting wrote: > Paul Elschot wrote: > > I suppose one of these cases are when many terms are used in a query. > > Would it be easily possible to make the buffer size for a term iterator > > depend on the numbers of documents to be ite

Re: Normalization Techniques

2005-09-27 Thread Paul Elschot
On Wednesday 28 September 2005 02:35, Ira Goldstein wrote: > Hi. I’m working on a project to compare various normalization techniques > and want to make sure that I understand the code before I begin making > changes. It appears that while the tf’s are being stored in DocumentWriter, > the actual

Re: [jira] Commented: (LUCENE-395) CoordConstrainedBooleanQuery + QueryParser support

2005-10-06 Thread Paul Elschot
r by default the coordination factor can be easily "overwhelmed" by other factors. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Eliminating norms ... completley

2005-10-08 Thread Paul Elschot
y as boolean would avoid reading the norms from disk. For really large indexes the norms might become a bottleneck for when building them, but iirc this was improved recently. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: regex-based query contribution

2005-10-13 Thread Paul Elschot
s in SrndTruncQuery.java here: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/src/java/org/apache/lucene/queryParser/surround/query/ So, with an addition to the javadocs that the length of the prefix is important for performance, I think a regular expression based query term would b

Re: regex-based query contribution

2005-10-13 Thread Paul Elschot
y that provides a Weight that flattens all the weights of the subqueries, for example to the maximum weight, and for the rest works like the usual Weight of BooleanQuery. The choice between these two depends on how special the flattening mechani

Re: regex-based query contribution

2005-10-13 Thread Paul Elschot
common Terms. MaxDisjunctionQuery scores with the maximum of the scores of the subqueries, but idf "flattening" affects the weights of the subqueries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fwd: skipInterval

2005-10-16 Thread Paul Elschot
ing the complexity from linear to logarithmic. Does the sqrt() also apply in the case of searching for two required terms and returning all the documents in which they both occur? Regards, Paul Elschot - To unsubscribe, e-mai

Re: svn commit: r329384 - /lucene/java/trunk/src/test/org/apache/lucene/search/spans/TestSpans.java

2005-10-30 Thread Paul Elschot
r ages > For the record, some of these (most?, I don't recall the details) are from here: http://issues.apache.org/jira/browse/LUCENE-405 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Faking index merge by modifying segments file?

2005-11-01 Thread Paul Elschot
one should be fairly straightforward. Some care will be needed to avoid clashes in the segment names. Also what should happen with the index from which the segments are taken? Should the shared segments be copied between indexes? It's possible to share segments between indexes when the file sy

Re: Faking index merge by modifying segments file?

2005-11-02 Thread Paul Elschot
On Wednesday 02 November 2005 12:47, Otis Gospodnetic wrote: > Hello, > > --- Paul Elschot <[EMAIL PROTECTED]> wrote: ... > > > It's possible to share segments between indexes when the file system > > allows files to be present in multiple directories. >

Re: svn commit: r330900 - in /lucene/java/trunk/src/test/org/apache/lucene/search: CheckHits.java TestBoolean2.java

2005-11-04 Thread Paul Elschot
Yonik, On Friday 04 November 2005 22:15, [EMAIL PROTECTED] wrote: >  /** Test BooleanQuery2 against BooleanQuery by overriding the standard query > parser. I think you meant testing BooleanScorer2 against BooleanScorer ... Thanks for extending TestBoolean2, regards, Paul E

Re: svn commit: r331111 - /lucene/java/trunk/src/test/org/apache/lucene/search/TestBoolean2.java

2005-11-07 Thread Paul Elschot
gt; > > Modified: > > lucene/java/trunk/src/test/org/apache/lucene/search/ > > TestBoolean2.java > > Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-395) CoordConstrainedBooleanQuery + QueryParser support

2005-11-10 Thread Paul Elschot
eparate issue that can be dealt with later. > > So it looks to me like this is ready to be committed. Does anyone see a reason why it shouldn't be? I don't see any, Regards, Paul Elschot - To unsubscribe,

Contrib in oblivion

2005-11-11 Thread Paul Elschot
ucene jar. Using ant -v it also complained that some targets are defined both in build.xml and common-build.xml (iirc). Is there anyone else with code in contrib that knows enough of ant to know how to fix this? Regards, Paul Elschot

Re: Contrib in oblivion

2005-11-11 Thread Paul Elschot
On Friday 11 November 2005 22:51, Erik Hatcher wrote: > > On 11 Nov 2005, at 14:52, Paul Elschot wrote: > > Quoting Doug on nutch-dev yesterday: > > > > > >> In 1.9 and beyond the plan is to build and distribute the contrib > >> with > >> Lucen

Re: Contrib in oblivion

2005-11-12 Thread Paul Elschot
On Saturday 12 November 2005 02:12, Erik Hatcher wrote: > > On 11 Nov 2005, at 17:26, Paul Elschot wrote: > >> Paul - what version of Ant are you using? I'm (always) swamped, but > >> I'll put this on my list of things to look at as soon as possible

Re: Contrib in oblivion

2005-11-12 Thread Paul Elschot
On Saturday 12 November 2005 10:55, Erik Hatcher wrote: > > On 12 Nov 2005, at 04:41, Paul Elschot wrote: > >> Could you try using Ant 1.6.5? I just built contrib/surround > >> > > > > The joy of developing at the cutting edge... > > Was the upgrade

Re: Contrib in oblivion: no more, ant test-contrib working now.

2005-11-13 Thread Paul Elschot
iling any test code during the build? Btw. ant 1.6.2 seems to work the same way as 1.6.5 on the test-contrib target, so the absolute/relative path change in subant for 1.6.3 was not a problem, it must have been my version of the various build xml files. Regards, Paul Elschot -

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-14 Thread Paul Elschot
nce on the score and the order of the results in Hits. TermQuery relies on field boost and document term frequency, so having PrefixQuery ignore these would also lead to unexpected surprises. Regards, Paul Elschot - To unsubscribe,

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-15 Thread Paul Elschot
On Tuesday 15 November 2005 19:35, Yonik Seeley wrote: > On 11/15/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Paul Elschot wrote: > > > I think loosing the field boosts for PrefixQuery and friends would not be > > > advisable. Field boosts have a very b

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-15 Thread Paul Elschot
On Tuesday 15 November 2005 20:30, Doug Cutting wrote: > Paul Elschot wrote: > > Not using the document term frequencies in PrefixQuery would still > > leave these as a surprise factor between PrefixQuery and TermQuery. > > Should we dynamically decide to switch t

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Paul Elschot
ld work for all disjunctions, not only terms. I think it is possible to implement this hook for DisjunctionSumScorer with a scores[] array, iterating over the subscorers one by one. Getting that hook called through BooleanScorer2 is no problem when the coordination factor can be left out. Regards, Pa

Test code for regex queries

2005-11-23 Thread Paul Elschot
und parser? Kind regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Test code for regex queries

2005-11-24 Thread Paul Elschot
On Thursday 24 November 2005 00:06, Erik Hatcher wrote: > > On 23 Nov 2005, at 15:42, Paul Elschot wrote: > > I refactored it to have a few more tests, and all seems to work well. > > It also includes the tests from TestSpanRegexQuery.java . > > ... > > >

Re: Test code for regex queries

2005-11-24 Thread Paul Elschot
On Thursday 24 November 2005 10:25, Erik Hatcher wrote: > > On 24 Nov 2005, at 03:17, Paul Elschot wrote: > >> I must admit that I haven't used the surround parser. For my custom > >> parser (a legacy syntax that no one here would want), I take any term > >>

Re: Test code for regex queries

2005-11-26 Thread Paul Elschot
hen be possible by overriding some methods in the parser. I hope the regex compiler and matcher from java.util.regex have interfaces, otherwise interfaces will have to made to allow different regex implementations. To have term rotation built into a query parser requires some way to know whi

Re: "Advanced" query language

2005-12-02 Thread Paul Elschot
ipsum dolor sit amet)) for practical use, this could be simplified to: mlt(3, 30, (Lorem ipsum dolor sit amet)) Such additions are a bit of work, but the query possibilities of Lucene do not change that fast. Adding infix operators with operators in between their arguments (infix) is a bit more involved. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

2005-12-03 Thread Paul Elschot
a GUI automatically (by introspection) given a set of Query classes of which objects can be mixed to form a query? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

2005-12-04 Thread Paul Elschot
On Sunday 04 December 2005 05:17, Yonik Seeley wrote: > On 12/3/05, Paul Elschot <[EMAIL PROTECTED]> wrote: > > Indeed, this is a disadvantage of the "function call" syntax. > > It depends on the langage. Take Python for example: > > >>> def foo(a,

Re: "Advanced" query language

2005-12-04 Thread Paul Elschot
On Sunday 04 December 2005 15:26, Erik Hatcher wrote: > > On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote: > > I tried rewroting the XML query in exactly this way, with a > > few property=.. constructs: > > > > boostingQuery(

Re: "Advanced" query language

2005-12-04 Thread Paul Elschot
out passing mandatory parameters. Perhaps > in these cases it would be better to preserve the existing class and > provide a "parser wrapper bean" used purely to integrate the existing > Query class with the new parser framework. That sounds like some good reasons for a layer

Re: "Advanced" query language

2005-12-05 Thread Paul Elschot
a colon at the end of the previous line. To be read with a fixed width font: boosting: match: moreLikeThis(percent="0.25", docId="44"): compareField("contents") compareField("title") downgrade(demote="0.5"): simple("contents&

Re: "Advanced" query language

2005-12-05 Thread Paul Elschot
On Monday 05 December 2005 23:36, Erik Hatcher wrote: > On Dec 5, 2005, at 3:18 PM, Paul Elschot wrote: .. > > > > boosting: > > match: > > moreLikeThis(percent="0.25", docId="44"): > > compareField("contents&quo

Re: "Advanced" query language

2005-12-06 Thread Paul Elschot
cation and aliasing in one go. > > Especially if I can convince Yonik's boss to pay him to do all the hard > work. :) > My strategy now is to wait and see what XML structures will be introduced and then try and define some XS

Re: Query modifier

2005-12-16 Thread Paul Elschot
a single purpose. There is no visitor pattern like this in the surround parser, because there the only real purpose of visiting is to create lucene queries. There is also a recursive toString() implementation in this ComposedQuery. Regards, Paul Elschot ---

Re: "Advanced" query language

2005-12-17 Thread Paul Elschot
all the results for a single XML document. This is not provided by default, but has been done with extension to this code." Regards, Paul Elschot On Friday 16 December 2005 03:45, Wolfgang Hoschek wrote: > I think implementing an XQuery Full-Text engine is far beyond the > scope

Re: BooleanQuery: static setMaxClauseCount(int)?

2006-01-11 Thread Paul Elschot
imum clause count only needs to be "local" to Query.rewrite(), so it might be necessary to move setMaxClauseCount() from BooleanQuery to Query lateron, for example when the "Advanced" query language discussed recently runs into this problem. Regards, Paul Elschot --

Re: NearSpans issue

2006-01-25 Thread Paul Elschot
ply all those > patches (if they still work) and give it another try? I would certainly like to get the NearSpans code correct, and I could continue bug hunting as earlier. I would prefer to start from the NearSpansOrdered/Unor

Re: Filter

2006-01-26 Thread Paul Elschot
is, now score it", > ala... > > public interface SearchFilter { > /** returns doc ids that pass the filter, in increasing order. >* returns 0 once there are no more docs. >*/ > i

Re: Filter

2006-01-27 Thread Paul Elschot
re is no next document number. Yonik mentioned that it would be good for performance to add this: int nextDocNr() implicitly using the last document number without the need to pass it. The advantage of using -1 instead of boolean is that only one return value needs to be used, saving a method call in many cases during query search, but at the expensive of an extra test for the return value being positive. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: NearSpans issue

2006-01-27 Thread Paul Elschot
gs up locally with those JIRA patches > and see where that takes it. We've not had success building a > generic index that we can share that duplicates this issue, > unfortunately. I hope those patches solve the problem. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread Paul Elschot
> distribution of set bits is typically exponential... > usage in caching, Filter... This gives some context, a performance comparison program, and indicates that the licence for the context is LGPL: http://www.iis.uni-stuttgart.de/personen/lippold/MathCollection/index-en.htm

Re: Implementing new scoring algorithms in lucene

2006-02-18 Thread Paul Elschot
the moment I only have time to answer with links: http://issues.apache.org/jira/browse/LUCENE-293 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200410.mbox/<200410172050.24372.paul.elschot%40xs4all.nl> http://www.loc.gov/standards/sru/cql/ http://svn.apache.org/viewcvs.c

Re: Implementing new scoring algorithms in lucene

2006-02-21 Thread Paul Elschot
ghtly knit together that > make swapping a new algorithm in quite difficult. Ideally one should > only have to extend the Similarity, Query and Scorer classes. It's possible to implement another way of scoring. To keep the efficiency of Lucene,

Re: Lucene 1.9 RC1 release available: surround package.html files

2006-02-25 Thread Paul Elschot
SrndQuery.makeLuceneQueryField method. For this, TermQuery, BooleanQuery and SpanQuery are used from Lucene. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: ArrayIndexOutOfBoundsException in org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor

2006-03-04 Thread Paul Elschot
Robin, This could be a duplicate of https://issues.apache.org:443/jira/browse/LUCENE-413 Regards, Paul Elschot. On Saturday 04 March 2006 00:37, Robin H. Johnson wrote: > On Fri, Mar 03, 2006 at 03:28:22PM -0800, Robin H. Johnson wrote: > > I've been developing a search ap

Re: [jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-05 Thread Paul Elschot
)); } Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-06 Thread Paul Elschot
fails for both hunks. Could you post your version of FilteredQuery.java? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread Paul Elschot
ethods in TestSpansAdvanced to be repeated at the TestSpansAdvanced2 test. Correcting this will need some refactoring. The problem is that I don't know what these two classes are supposed to test. Anyway, the warnings for the unexpected score values might be used to f

Re: Query.extractTerms - a poor introspection API?

2006-04-06 Thread Paul Elschot
> desired. There is a visitMatchingTerms method here: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/src/java/org/apache/lucene/queryParser/surround/query/SimpleTerm.java?rev=209183&view=log This is the superclass for the term queries such as truncated terms.

Re: SpanNearQuery with minimum slop

2006-04-12 Thread Paul Elschot
ch as an alternative to (a b) in the example above. Without nesting, a "flat" phrase query on the terms can be used. Nesting Scorers takes method calls and these bring some loss of search performance. When nesting and exact slop matching are both needed, a simplified NearSpansOrdered from LUCENE-413 could be considered. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

  1   2   3   4   5   6   7   8   >