: I apologize for cross-posting but I believe both Solr and Lucene users and : developers should be concerned with this. I am not aware of a better way to : reach both communities.
some of these questions strike me as being largely unrelated. if anyone wishes to followup on them further, let's do it in (new) seperate threads for each topic, on the specific list appropriate to the topic... : * Do TokenFilters belong in the Solr code base at all? Yes, in so much as any java code belongs in the Solr code base (or the nutch code base for that matter). They are seperate projects with seperate communities and seperate needs -- that doesn't mean that there isn't code in Solr which could be useful to the broader community of lucene-java; in that case the appropriate course of action is to open a LUCENE issue to "promote" the code up into lucene-java, and a dependent issue in SOLR to deprecate the current code and use the newer code instead. as some people may be aware, there was a discussion aboutthis sort of thing at ApacheCon during the Lucene BOF -- some reasons this doesn't happen as often as it seems like it should are: * the code may have subtle dependency tendrals that make it hard to refactor from one code base to the other. * the tests are frequently harder to "promote" then the code (in the case of most Solr tests that use the TestHarness, it's probably easier to write new tests from scratch) * when promoting the code, it's the best time to consider wether the existing API is really the "best" API before a lot of new people start using it (compare Solr's FunctionQuery and Lucenes CustomScoreQuery for example) * someone needs to care enough to follow through on the promotion. ...further discussion is best suited for java-dev since the topic is not Solr specific (there's a lot of Nutch code out there that people have sked about promoting as well) : * How to deal with TokenFilters that add new Tokens to the stream? This is specificly regarding Payloads yes? also a pretty clear cut java-dev discussion (and one possibly already being discussed in the monolithic Payload API thread i haven't started reading yet). lucene-java sets the API and the semantics ... Solr code will follow them. : * How to patch TokenFilters and Tokenizers using the model of : LUCENE-969 in the Solr code base and in Lucene contrib? open SOLR issues containing a patchs for any Solr code that needs changed, and LUCENE issues containing patches for contrib code that needs changed. : I thought it might be useful to figure out which existing TokenFilters need to : know about Payloads. To this end I have taken an inventory of the : TokenFilters out there. I think it is fair to categorize them by Add (A), : Delete (D), Modify (M), Observe (O): again: this is a straight forward luence-java question ... once the semantics have been worked out, then there can be a Solr specific discussion about following them. (which is not to say that the Solr classes/use-cases shouldn't be considered in the discussion, just that java-dev is the right place to have the conversation) -Hoss