[jira] Updated: (LUCENE-1888) Provide Option to Store Payloads on the Term Vector

2009-10-05 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1888: -- Fix Version/s: (was: 3.0) > Provide Option to Store Payloads on the Term Vec

Re: Modifying payloads

2009-09-23 Thread Jason Rutherglen
> serving read requests from the buffer (for NRT readers). > >  Michael > > On Wed, Sep 23, 2009 at 3:40 PM, Grant Ingersoll > wrote: >> >> Has anyone done any work on modifying payloads "inline" in the index?  The >> idea being that if

Re: Modifying payloads

2009-09-23 Thread Michael Busch
ers). Michael On Wed, Sep 23, 2009 at 3:40 PM, Grant Ingersoll wrote: > Has anyone done any work on modifying payloads "inline" in the index? The > idea being that if you know the length of the payload isn't changing, you > can modify it w/o reindexing. Some concern

Modifying payloads

2009-09-23 Thread Grant Ingersoll
Has anyone done any work on modifying payloads "inline" in the index? The idea being that if you know the length of the payload isn't changing, you can modify it w/o reindexing. Some concerns that come to mind are thread-safety

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
So to modify the rules a bit to account for the ordered case (again, I am sure of nothing): 1. Only one span can start from a term. 2. Start matching from the left and work right. 3. If the Span is ordered, upon finding a match, shrink the start position to the same term closest to the end term

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
quot;denormalize") all possible expansions. Each leaf node would hold > > > actual data (position, term, payload, etc.), and then the tree nodes > > > would express how they are and/ord/near'd together. My app could then > > > walk the tree to compute any co

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Paul Elschot
, that mirrors the query's > > tree structure, to hold the spans, rather than try to enumerate > > ("denormalize") all possible expansions. Each leaf node would hold > > actual data (position, term, payload, etc.), and then the tree nodes > > would express ho

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Michael McCandless
On Sat, Sep 12, 2009 at 8:40 AM, Mark Miller wrote: >>> They start at the left and march right - each Span always starting >>> after the last started, >> >> That's not quite always true -- eg I got span 1-8, twice, once I >> added "b" as a clause to the SNQ. > > Mmm - right - depends on how you l

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
Mark Miller wrote: > >> Yeah I think you do, except each payload is only returned once. So >> it's only the first span that hits a payload that will return it. >> >> So it sounds like SNQ just isn't guaranteed to be exhaustive in how it >> enumerates the spans, eg I'll never see that 2nd occurrenc

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
Sorry for the spam - type of '8' instead of 'a' - hard enough to follow without that - read this one below instead: Mark Miller wrote: > Mark Miller wrote: > >>> Yeah I think you do, except each payload is only returned once. So >>> it's only the first span that hits a payload that will return

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
(position, term, payload, etc.), and then the tree nodes >> would express how they are and/ord/near'd together. My app could then >> walk the tree to compute any combination I wanted. >> >> >> >>> In the end, I accepted my definition of works as - w

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Mark Miller
tion, term, payload, etc.), and then the tree nodes > would express how they are and/ord/near'd together. My app could then > walk the tree to compute any combination I wanted. > > >> In the end, I accepted my definition of works as - when I ask for >> the payloads back, wi

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Grant Ingersoll
(position, term, payload, etc.), and then the tree nodes would express how they are and/ord/near'd together. My app could then walk the tree to compute any combination I wanted. In the end, I accepted my definition of works as - when I ask for the payloads back, will I end up with a bag of all t

Re: SpanNearQuery's spans & payloads

2009-09-12 Thread Michael McCandless
nodes would express how they are and/ord/near'd together. My app could then walk the tree to compute any combination I wanted. > In the end, I accepted my definition of works as - when I ask for > the payloads back, will I end up with a bag of all the payloads that > the Spans tou

Re: SpanNearQuery's spans & payloads

2009-09-11 Thread Mark Miller
ot; twice: > > span 0 to 8 >payload: pos: 7 >payload: pos: 1 >payload: pos: 0 > span 1 to 8 > payload: pos: 0 > span 1 to 8 >payload: pos: 3 > span 3 to 8 >payload: pos: 6 > span 6 to 8 >

Re: SpanNearQuery's spans & payloads

2009-09-11 Thread Grant Ingersoll
ight now trunk does this: span 0 to 8 payload: pos: 0 payload: pos: 7 span 1 to 8 payload: pos: 0 span 3 to 8 payload: pos: 3 span 6 to 8 payload: pos: 6 The first span properly includes the payload for "a" (pos: 0) and for "k" (pos:

Re: SpanNearQuery's spans & payloads

2009-09-11 Thread Michael McCandless
span 1 to 8 payload: pos: 3 span 3 to 8 payload: pos: 6 span 6 to 8 payload: pos: 6 Also, the payloads sort of shifted down (eg "pos: 3" now shows up in the "span 1 to 8" but before showed up in "span 3 to 8"), and "pos: 1" (for b) wa

Re: SpanNearQuery's spans & payloads

2009-09-11 Thread Mark Miller
econd question I am less sure about without looking at code. I think its because each payload can only be loaded once. So the first time you hit 0 to 8, you get both payloads - but every other span that hits 8, that payload was already loaded ? So you get all of the payloads you should, your just not dupli

SpanNearQuery's spans & payloads

2009-09-11 Thread Michael McCandless
payload: pos: 3 span 6 to 8 payload: pos: 6 The first span properly includes the payload for "a" (pos: 0) and for "k" (pos: 7), but the the subsequent three do not include the payload for "k". Shouldn't you get all payloads associated w/ the span? Mi

[jira] Created: (LUCENE-1888) Provide Option to Store Payloads on the Term Vector

2009-09-03 Thread Grant Ingersoll (JIRA)
Provide Option to Store Payloads on the Term Vector --- Key: LUCENE-1888 URL: https://issues.apache.org/jira/browse/LUCENE-1888 Project: Lucene - Java Issue Type: Improvement

[jira] Updated: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-08-18 Thread Michael Busch (JIRA)
ent Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 > Project: Lucene - Java > Issue Type: New Feature > Components: Ind

[jira] Updated: (LUCENE-1585) Allow to control how payloads are merged

2009-08-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1585: -- Fix Version/s: 3.1 > Allow to control how payloads are mer

[jira] Resolved: (LUCENE-1776) NearSpansOrdered does not lazy load payloads as the PayloadSpans javadoc implies

2009-08-03 Thread Mark Miller (JIRA)
red does not lazy load payloads as the PayloadSpans javadoc > implies > > > Key: LUCENE-1776 > URL: https://issues.apache.org/jira/browse/LUCENE-1776 > P

[jira] Updated: (LUCENE-1776) NearSpansOrdered does not lazy load payloads as the PayloadSpans javadoc implies

2009-08-03 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1776: Attachment: LUCENE-1776.patch > NearSpansOrdered does not lazy load payloads as the PayloadSp

[jira] Created: (LUCENE-1776) NearSpansOrdered does not lazy load payloads as the PayloadSpans javadoc implies

2009-08-02 Thread Mark Miller (JIRA)
NearSpansOrdered does not lazy load payloads as the PayloadSpans javadoc implies Key: LUCENE-1776 URL: https://issues.apache.org/jira/browse/LUCENE-1776 Project: Lucene

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-17 Thread Grant Ingersoll
On Jun 15, 2009, at 2:11 PM, Grant Ingersoll wrote: More questions: 1. What about Highlighter and MoreLikeThis? They have not been converted. Also, what are they going to do if the attributes they need are not available? Caveat emptor? 2. Same for TermVectors. What if the user specif

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-17 Thread Michael Busch
On 6/15/09 10:10 AM, Grant Ingersoll wrote: But, as Michael M reminded me, it is complex, so please accept my apologies. No worries, Grant! I was not really offended, but rather confused... Thanks for clarifying. Michael

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Grant Ingersoll wrote: 1. What about Highlighter I would guess Highlighter has not been updated because its kind of a royal * :) -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Mark Miller wrote: Grant Ingersoll wrote: On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Grant Ingersoll wrote: On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about how to convert a st

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
*Sent:* Monday, June 15, 2009 10:39 PM *To:* java-dev@lucene.apache.org *Subject:* Re: New Token API was Re: Payloads and TrieRangeQuery I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assum

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
yeah about 5 seconds in I saw that and decided to stick with what I know :) On Mon, Jun 15, 2009 at 5:10 PM, Mark Miller wrote: > I may do the Highlighter. Its annoying though - I'll have to break back > compat because Token is part of the public API (Fragmenter, etc). > > Robert Muir wrote: >> >>

Some SVN cleanup, was: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
rg > Subject: Re: New Token API was Re: Payloads and TrieRangeQuery > > On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote: > > > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class > > inside should be in test/o/a/l/store. Shou

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
I may do the Highlighter. Its annoying though - I'll have to break back compat because Token is part of the public API (Fragmenter, etc). Robert Muir wrote: Michael OK, I plan on adding some tests for the analyzers that don't have any. I didn't try to migrate things such as highlighter, which

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
@lucene.apache.org Subject: Re: New Token API was Re: Payloads and TrieRangeQuery I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assumption is that there is always only one single instance of an Attribute

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Michael OK, I plan on adding some tests for the analyzers that don't have any. I didn't try to migrate things such as highlighter, which are definitely just as important, only because I'm not familiar with that territory. But I think I can figure out what the various language analyzers are trying

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
I agree. It's my fault, the task of changing the contribs (LUCENE-1460) is assigned to me for a while now - I just haven't found the time to do it yet. It's great that you started the work on that! I'll try to review the patch in the next couple of days and help with fixing the remaining ones

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael McCandless
On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote: > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class > inside should be in test/o/a/l/store. Should I move? Please do! Mike - To unsubscribe, e-mail: j

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assumption is that there is always only one single instance of an Attribute. With the old API, it is recommended to reuse the passed-in token

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Michael, again I am terrible with such things myself... Personally I am impressed that you have the back compat, even if you don't change any code at all I think some reformatting of javadocs might make the situation a lot friendlier. I just listed everything that came to my mind immediately. I g

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> And I don't like the *useNewAPI*() methods either. I spent a lot of time > thinking about backwards compatibility for this API. It's tricky to do > without sacrificing performance. In API patches I find myself spending > more time for backwards-compatibility than for the actual new feature! :(

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
hetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Monday, June 15, 2009 10:18 PM > To: java-dev@lucene.apache.org > Subject: RE: New Token API was Re: Payloads and TrieRangeQuery > > > there's also

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
This is excellent feedback, Robert! I agree this is confusing; especially having a deprecated API and only a experimental one that replaces the old one. We need to change that. And I don't like the *useNewAPI*() methods either. I spent a lot of time thinking about backwards compatibility for th

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Some great points - especially the decision between a deprecated API, and a new experimental one subject to change. Bit of a rock and a hard place for a new user. Perhaps we should add a little note with some guidance. - Mark Robert Muir wrote: let me try some slightly more constructive fee

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> there's also a stray bold tag gone haywire somewhere, possibly > .incrementToken() I fixed this. This was going me on my nerves the whole day when I wrote javadocs for NumericTokenStream... Uwe - To unsubscribe, e-mail: java-

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
let me try some slightly more constructive feedback: new user looks at TokenStream javadocs: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html immediately they see deprecated, text in red with the words "experimental", warnings in bold, the

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Mark, I'll see if I can get tests produced for some of those analyzers. as a new user of the new api myself, I think I can safely say the most confusing thing about it is having the old deprecated API mixed in the javadocs with it :) On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller wrote: > Robert Mu

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> If you understood that, you'd be able to look > at the actual token value if you were interested in what shift was > used. So it's redundant, has a runtime cost, it's not currently used > anywhere, and it's not useful to fields other than Trie. Perhaps it > shouldn't exist (yet)? You are right

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
redundant, has a runtime cost, it's not currently used > anywhere, and it's not useful to fields other than Trie. Perhaps it > shouldn't exist (yet)? The idea was to make the indexing process controllable. You were the one, who asked e.g. for the possibility to add payloads to tr

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Yonik Seeley
On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote: > There is a new Attribute called ShiftAttribute (or NumericShiftAttribute), > when trie range is moved to core. This attribute contains the shifted-away > bits from the prefix encoded value during trie indexing. I was wondering about this

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
ed to core. This attribute contains the shifted-away bits from the prefix encoded value during trie indexing. The idea is to e.g. have TokenFilters that may additional payloads or others to trie values, but only do this for specific precisions. In future, it may also be interesting to automatically add t

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Robert Muir wrote: Mark, I created an issue for this. Thanks Robert, great idea. I just think you know, converting an analyzer to the new api is really not that bad. I don't either. I'm really just complaining about the initial readability. Once you know whats up, its not too much differ

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Mark, I created an issue for this. I just think you know, converting an analyzer to the new api is really not that bad. reverse engineering what one of them does is not necessarily obvious, and is completely unrelated but necessary if they are to be migrated. I'd be willing to assist with some o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Robert Muir wrote: As Lucene's contrib hasn't been fully converted either (and its been quite some time now), someone has probably heard that groan before. hope this doesn't sound like a complaint, Complaints are fine in any case. Every now and then, it might cause a little rant from me o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about how to convert a stream/filter from the

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
> > As Lucene's contrib hasn't been fully converted either (and its been quite > some time now), someone has probably heard that groan before. hope this doesn't sound like a complaint, but in my opinion this is because many do not have any tests. I converted a few of these and its just grunt work

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Yonik Seeley wrote: The high-level description of the new API looks good (being able to add arbitrary properties to tokens), unfortunately, I've never had the time to try and use it and give any constructive feedback. As far as difficulty of use, I assume this only applies to implementing your o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Yonik Seeley
The high-level description of the new API looks good (being able to add arbitrary properties to tokens), unfortunately, I've never had the time to try and use it and give any constructive feedback. As far as difficulty of use, I assume this only applies to implementing your own TokenFilter? It see

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'm not sure why this (currently having to implement next() too) is such an issue for you. You brought it up at the Lucene meetup too. No user will ever have to implement both (the new API and the old) in their streams/filters. The only reas

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 15, 2009, at 12:19 PM, Michael McCandless wrote: I don't think anything was "held back" in this effort. Grant, are you referring to LUCENE-1458? That's "held back" simply because the only person working on it (me) got distracted by other things to work on. I'm sorry, I didn't mean to

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael McCandless
I thought the primary goal of switching to AttributeSource (yes, the name is very generic...) was to allow extensibility to what's created per-Token, so that an app could add their own attrs without costly subclassing/casting per Token, independent of other other "things" adding their tokens, etc.

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Shai Erera
The "old" API is deprecated, and therefore when we release 2.9 there might be some people who'd think they should move away from it, to better prepare for 3.0 (while in fact this many not be the case). Also, we should make sure that when we remove all the deprecations, this will still exist (and th

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Mark Miller
Mark Miller wrote: I don't know how I feel about rolling the new token api back. I will say that I originally had no issue with it because I am very excited about Lucene-1458. At the same time though, I'm thinking Lucene-1458 is a very advanced issue that will likely be for really expert usa

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Mark Miller
I don't know how I feel about rolling the new token api back. I will say that I originally had no issue with it because I am very excited about Lucene-1458. At the same time though, I'm thinking Lucene-1458 is a very advanced issue that will likely be for really expert usage (though I can see

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Michael Busch
On 6/14/09 5:17 AM, Grant Ingersoll wrote: Agreed. I've been bringing it up for a while now and made the same comments when it was first introduced, but felt like the lone voice in the wilderness on it and gave way [1], [2], [3]. Now that others are writing/converting, I think it is worth rev

New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Grant Ingersoll
Agreed. I've been bringing it up for a while now and made the same comments when it was first introduced, but felt like the lone voice in the wilderness on it and gave way [1], [2], [3]. Now that others are writing/converting, I think it is worth revisiting. That being said, I did just wr

Re: Payloads and TrieRangeQuery

2009-06-14 Thread Earwin Burrfoot
> Just to throw something out, the new Token API is not very consumable in my > experience. The old one was very intuitive and very easy to follow the code. > > I've had to refigure out what the heck was going on with the new one more > than once now. Writing some example code with it is hard to fo

Re: Payloads and TrieRangeQuery

2009-06-13 Thread Mark Miller
Yonik Seeley wrote: Even non-API changes have tradeoffs... the indexing improvements (err, total rewrite) made that code *much* harder to understand and debug. It's a net win since the indexing performance improvements were so fantastic. I agree - very hard to follow, worth the improvements.

Re: Payloads and TrieRangeQuery

2009-06-13 Thread Grant Ingersoll
I do it approach", but as is obvious, not everyone has that luxury b/c they aren't committers on both projects. Integrating Tika into Solr was logical, while the DelimitedPayload stuff logically belonged in contrib/analyzers (to me anyway, and one of my primary motivations for tha

Re: Payloads and TrieRangeQuery

2009-06-13 Thread Yonik Seeley
Of course consumability (good APIs) is important, but rational people can disagree when it comes to the specifics... many things come with tradeoffs. Even non-API changes have tradeoffs... the indexing improvements (err, total rewrite) made that code *much* harder to understand and debug. It's a

Re: Payloads and TrieRangeQuery

2009-06-13 Thread Simon Willnauer
Very true write up Grant! On Sat, Jun 13, 2009 at 2:58 PM, Michael McCandless wrote: > OK, good points Grant.  I now agree that it's not a simple task, > moving stuff core stuff from Solr -> Lucene.  So summing this all up: > >  * Some feel Lucene should only aim to be the core "expert" engine >  

Re: Payloads and TrieRangeQuery

2009-06-13 Thread Michael McCandless
OK, good points Grant. I now agree that it's not a simple task, moving stuff core stuff from Solr -> Lucene. So summing this all up: * Some feel Lucene should only aim to be the core "expert" engine used by Solr/Nutch/etc., so things like moving trie to core (with consumable naming, go

[jira] Resolved: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-12 Thread Grant Ingersoll (JIRA)
. > New Token filter for adding payloads "in-stream" > > > Key: LUCENE-1676 > URL: https://issues.apache.org/jira/browse/LUCENE-1676 > Project: Lucene - Java >

Re: Payloads and TrieRangeQuery

2009-06-12 Thread Grant Ingersoll
On Jun 12, 2009, at 12:20 PM, Michael McCandless wrote: On Thu, Jun 11, 2009 at 4:58 PM, Yonik Seeley> wrote: In Solr land we can quickly hack something together, spend some time thinking about the external HTTP interface, and immediately make it available to users (those using nightlies at l

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-12 Thread Grant Ingersoll (JIRA)
arser.java https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/java/lang/Integer.java > New Token filter for adding payloads "in-stream" > > > Key: LUCENE-1676 >

Re: Payloads and TrieRangeQuery

2009-06-12 Thread Michael McCandless
On Thu, Jun 11, 2009 at 4:58 PM, Yonik Seeley wrote: > In Solr land we can quickly hack something together, spend some time > thinking about the external HTTP interface, and immediately make it > available to users (those using nightlies at least). It would be a > huge burden to say to Solr that

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-12 Thread Grant Ingersoll (JIRA)
le have a better way to convert from char[] to byte[] for encoding the payloads (see FloatEncoder), other than going through Strings. > New Token filter for adding payloads "in-stream" > > > Key: LUCENE-16

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-12 Thread Grant Ingersoll (JIRA)
I'm going to commit this today. > New Token filter for adding payloads "in-stream" > > > Key: LUCENE-1676 > URL: https://issues.apache.org/jira/browse/LUCENE-1676 > Project

Re: Payloads and TrieRangeQuery

2009-06-11 Thread Yonik Seeley
In Solr land we can quickly hack something together, spend some time thinking about the external HTTP interface, and immediately make it available to users (those using nightlies at least). It would be a huge burden to say to Solr that anything of interest to the Lucene community should be pulled

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Michael McCandless (JIRA)
. I would lean towards always using contrib/CHANGES. And then we should double-check all core CHANGES entries in 2.9 and move them to contrib if needed. > New Token filter for adding payloads "in-stream" > > >

Re: Payloads and TrieRangeQuery

2009-06-11 Thread Michael McCandless
On Thu, Jun 11, 2009 at 9:20 AM, Uwe Schindler wrote: > In my opinion, solr and lucene should exchange technology much more. Solr > should concentrate on the "search server" and lucene should provide the > technology. +1 > All additional implementations inside solr like faceting and so > on, cou

Re: Payloads and TrieRangeQuery

2009-06-11 Thread Michael McCandless
On Thu, Jun 11, 2009 at 8:46 AM, Yonik Seeley wrote: >>> Really goes into Solr land... my pref for Lucene is to remain a core >>> expert-level full-text search library and keep out things that are >>> easy to do in an application or at another level. >> >> I think this must be the crux of our disa

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Mark Miller (JIRA)
r for adding payloads "in-stream" > > > Key: LUCENE-1676 > URL: https://issues.apache.org/jira/browse/LUCENE-1676 > Project: Lucene - Java > Issue Type: New Feature

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Michael McCandless (JIRA)
tent about it in the past... it's very much a chicken/egg thing, though. If we consistently use contrib's CHANGES then presumably it'd get more visibility. But I really don't feel strongly one way or another... > New Token filter for

RE: Payloads and TrieRangeQuery

2009-06-11 Thread Uwe Schindler
From: Michael McCandless [mailto:luc...@mikemccandless.com] > On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley > wrote: > > > Really goes into Solr land... my pref for Lucene is to remain a core > > expert-level full-text search library and keep out things that are > > easy to do in an application or

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Grant Ingersoll (JIRA)
entry in this patch go into contrib/CHANGES? It can, I've never quite been sure. I think more people read the top-level CHANGES, thus it is more likely to be noticed, but I'm fine either way. > New Token filter for addi

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Mark Miller (JIRA)
past. I have seen an occasion or two where contrib changes have made core changes. I think its inconsistent, and we should keep those changes in their respective changes.txt or make one for them, but it has happened. > New Token filter for adding payloads &qu

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

2009-06-11 Thread Michael McCandless (JIRA)
entry in this patch go into contrib/CHANGES? > New Token filter for adding payloads "in-stream" > > > Key: LUCENE-1676 > URL: https://issues.apache.org/jira/browse/LUCENE-1676 &g

Re: Payloads and TrieRangeQuery

2009-06-11 Thread Yonik Seeley
On Thu, Jun 11, 2009 at 7:01 AM, Michael McCandless wrote: > On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley > wrote: > >> Really goes into Solr land... my pref for Lucene is to remain a core >> expert-level full-text search library and keep out things that are >> easy to do in an application or at

Re: Payloads and TrieRangeQuery

2009-06-11 Thread Michael McCandless
On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley wrote: > Really goes into Solr land... my pref for Lucene is to remain a core > expert-level full-text search library and keep out things that are > easy to do in an application or at another level. I think this must be the crux of our disagreement.

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 5:45 PM, Michael McCandless wrote: > But, I realize this is a stretch... eg we'd have to fix rewrite to be > per-segment, which certainly seems spooky.  A top-level schema would > definitely be cleaner. Really goes into Solr land... my pref for Lucene is to remain a core e

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> > Another question not so simple to answer: When embedding these > TermPositions > > into the whole process, how would this work with MultiTermQuery? > > There's no reason why Trie has to use MultiTermQuery, right? No but is elegant and simplifies much (see current code in trunk). Uwe --

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> I think we'd need richer communication between MTQ and its subclasses, > so that eg your enum would return a Query instead of a Term? > > Then you'd either return a TermQuery, or, a BooleanQuery that's > filtering the TermQuery? > > But yes, doing after 3.0 seems good! There is one other thing

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
> Another question not so simple to answer: When embedding these TermPositions > into the whole process, how would this work with MultiTermQuery? There's no reason why Trie has to use MultiTermQuery, right? -Yonik http://www.lucidimagination.com --

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 5:24 PM, Yonik Seeley wrote: > On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless > wrote: >> On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote: >> * Was the field even indexed w/ Trie, or indexed as "simple text"? > > Why the special treatment for Trie? So that at

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
;> these >> classes for the different data types seems to be more elegant and simplier >> to maintain than the current way (having a class for each bit size). >> >> So I think I will start with 1673 and try to present something useable, >> soon >> (but without

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Earwin Burrfoot
>  * Was the field even indexed w/ Trie, or indexed as "simple text"? >    It's useful to know this "automatically" at search time, so eg a >    RangeQuery can do the right thing by default.  FieldInfos seems >    like the natural place to store this.  It's basically Lucene's >    per-segment write

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
s seems to be more elegant and simplier > to maintain than the current way (having a class for each bit size). > > So I think I will start with 1673 and try to present something useable, > soon > (but without payloads, so the payload/position-bits setting is "0"). Another quest

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless wrote: > On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote: >  * Was the field even indexed w/ Trie, or indexed as "simple text"? Why the special treatment for Trie? >    It's useful to know this "automatically" at search time, so eg a >  

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
having a class for each bit size). +1 > So I think I will start with 1673 and try to present something useable, soon > (but without payloads, so the payload/position-bits setting is "0"). > Now the oen question: Which name for the numeric range queries/fields? :-( How about:

  1   2   3   >