[jira] Updated: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1526: - Attachment: LUCENE-1526.patch Inlined into SegmentTermDocs. If there's an issue with the

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
> > That's an amazing number of changes, even when you ignore name changes. > DM, for your reference, I created another diff from 4.0->5.1, showing what will happen with JDK7 here: http://people.apache.org/~rmuir/unicodeDiff2.txt the problem is that as a search engine library, lucene cares about

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
actually i thought about this. i change my story. deprecating anything is stupid, because its still not back compatible, i.e. Character.isLetter(char) even returns different results now, even if we invoke it. hard break is the only solution. we should have done this deprecation in 2.9, but its c

[jira] Updated: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1526: - Attachment: LUCENE-1526.patch Here's a working version of this. The page size is statica

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
completely ignoring the difficulty, I would propose to fix everything to correspond with the java 1.5 unicode version, for consistency. I would exempt StandardTokenizer, because its completely inside our control. we can fix it at our leisure. for the rest of this stuff, its already a 'change in ru

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
So whats your best recommendation? Ignoring the difficulty and just considering whats best for users? Robert Muir wrote: > well, in all honesty there is a bit of complexity. > i leave the StandardTokenizer out of this, it gives the same results > regardless of JVM version. > it may not be correct,

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
well, in all honesty there is a bit of complexity. i leave the StandardTokenizer out of this, it gives the same results regardless of JVM version. it may not be correct, but its consistent, we could wait till 5.0 or 10.0 to make it correct :) Also, because it gives the same results regardless of JV

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
Robert Muir wrote: > > >> and I think it sucks they might have to reindex twice with the >> current status of things (we did not complete unicode 4 support >> in lucene 3.0) >> which is why i mentioned this problem on the unicode 4 issues im >> trying to work. > > Whether 3.

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
On Mon, Nov 16, 2009 at 8:17 PM, DM Smith wrote: > > thanks DM, I hope to work on it more soon... > > I've been reading the thread and at first my response was. No big deal, it > won't affect me (i.e. awareness of the problem). And now my thought is "I'm > hosed" (i.e. understanding) > I guess

[jira] Created: (LUCENE-2076) Add org.apache.lucene.store.FSDirectory.getDirectory()

2009-11-16 Thread George Aroush (JIRA)
Add org.apache.lucene.store.FSDirectory.getDirectory() -- Key: LUCENE-2076 URL: https://issues.apache.org/jira/browse/LUCENE-2076 Project: Lucene - Java Issue Type: Wish Component

[jira] Issue Comment Edited: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

2009-11-16 Thread Luis Alves (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778685#action_12778685 ] Luis Alves edited comment on LUCENE-2039 at 11/17/09 1:46 AM: --

[jira] Commented: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

2009-11-16 Thread Luis Alves (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778685#action_12778685 ] Luis Alves commented on LUCENE-2039: +1 I'll work on changing the queryparser on Con

Re: Why release 3.0?

2009-11-16 Thread DM Smith
On Nov 16, 2009, at 7:53 PM, Robert Muir wrote: > right, the only way you could really contain it would be to do something like > that. I'm looking forward to your ICU analyzer! IMHO, it be great to have it be a pluggable replacement for it's counterparts in core. That is, using reflection, i

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778675#action_12778675 ] Earwin Burrfoot commented on LUCENE-2075: - There's no such thing in Google Collect

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
right, the only way you could really contain it would be to do something like that. I just think we should make users aware of this, thats all. and I think it sucks they might have to reindex twice with the current status of things (we did not complete unicode 4 support in lucene 3.0) which is why

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
> Is core lucene really affected by the change? Or is it only contrib? I > mean, if we couldn't create an index using core with surrogate pairs and > other Unicode 4.0 stuff (though I'm not clear on the changes), how can it > change reading/searching the index? > > Sure, especially core analyzers l

Re: Why release 3.0?

2009-11-16 Thread DM Smith
On Nov 16, 2009, at 6:43 PM, Robert Muir wrote: > DM, in this case I'm not referring to surrogates, etc, but instead the idea > that properties for an existing character can change (the soft hyphen and > arabic ayah were two examples), also new characters are introduced. > > these will affect

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778645#action_12778645 ] Jason Rutherglen commented on LUCENE-2075: -- Solr used CHM as an LRU, however it t

[jira] Commented: (LUCENE-2071) Allow updating of IndexWriter SegmentReaders

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778641#action_12778641 ] Jason Rutherglen commented on LUCENE-2071: -- I suspect there's apps out in the wil

[jira] Commented: (LUCENE-2071) Allow updating of IndexWriter SegmentReaders

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778624#action_12778624 ] Michael McCandless commented on LUCENE-2071: I would rather not open up such a

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Description: The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (ac

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Fix Version/s: 3.1 > Use a separate JFlex generated Unicode 4 by Java 5 compatible > Standard

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Patch for trunk using Version.LUCENE_31 > Use a separate JFlex

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778620#action_12778620 ] Michael McCandless commented on LUCENE-2047: bq. Reopening after every doc cou

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074-lucene30.patch This is the patch for version 3.0, that keeps the old j

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778615#action_12778615 ] Robert Muir commented on LUCENE-2074: - Uwe, we could fix in 3.1 (but we should commit

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778614#action_12778614 ] Uwe Schindler commented on LUCENE-2074: --- Should we fix this for 3.0 or not? The curr

[jira] Issue Comment Edited: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778598#action_12778598 ] Uwe Schindler edited comment on LUCENE-2074 at 11/16/09 10:18 PM: --

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778598#action_12778598 ] Uwe Schindler commented on LUCENE-2074: --- It uses hardcode char ranges, the parser is

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778586#action_12778586 ] Michael McCandless commented on LUCENE-1458: Thanks Mark! Hopefully, once 3.0

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778585#action_12778585 ] Robert Muir commented on LUCENE-2074: - well, the wikipediatokenizer at least is simila

[jira] Issue Comment Edited: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778582#action_12778582 ] Uwe Schindler edited comment on LUCENE-2074 at 11/16/09 10:01 PM: --

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778583#action_12778583 ] Michael McCandless commented on LUCENE-2074: bq. I feel bad about this whole V

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778582#action_12778582 ] Uwe Schindler commented on LUCENE-2074: --- bq. Uwe, also, just checking, i don't know

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778580#action_12778580 ] Simon Willnauer commented on LUCENE-2074: - bq. The problem is, these are the hard

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778579#action_12778579 ] Robert Muir commented on LUCENE-2074: - Uwe, also, just checking, i don't know javacc a

RE: 3.0.0-rc1 build, please check before I post to java-user

2009-11-16 Thread Uwe Schindler
I removed the artifacs from p.a.o, they were not made really public to java-user. Let's build new ones after the jflex thing is committed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindl

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778575#action_12778575 ] Uwe Schindler commented on LUCENE-2074: --- I add the warning to my patch! Thanks. What

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778574#action_12778574 ] Simon Willnauer commented on LUCENE-2074: - nothing against the patch! I just used

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Updated patch with comment fixed and dead Token-related code rem

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778571#action_12778571 ] Mark Miller commented on LUCENE-2074: - {quote} We should really try hard to find diffe

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778570#action_12778570 ] Simon Willnauer commented on LUCENE-2074: - bq. For this one it's not new, it was t

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-2074: Attachment: jflexwarning.patch I still think we also still need a more prominent warning system.

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778569#action_12778569 ] Robert Muir commented on LUCENE-2074: - I am anti-Version too in a lot of ways. I worry

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778562#action_12778562 ] Uwe Schindler commented on LUCENE-2074: --- For this one it's not new, it was there bef

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

2009-11-16 Thread Paul Elschot
Op maandag 16 november 2009 19:09:52 schreef J. Delgado: > On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot wrote: > > This algo is strictly tied to sort-by-score, if I understand it correctly. > > Lucene has queries and sorting decoupled (except for allowOutOfOrder > > mess), so implementing it w

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778559#action_12778559 ] Simon Willnauer commented on LUCENE-2074: - This might be the wrong place to mentio

[jira] Created: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-16 Thread Michael McCandless (JIRA)
Share the Term -> TermInfo cache across threads --- Key: LUCENE-2075 URL: https://issues.apache.org/jira/browse/LUCENE-2075 Project: Lucene - Java Issue Type: Improvement Components: Inde

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I opened https://issues.apache.org/jira/browse/LUCENE-2074 It fixes the problem, the patch uses a different impl depending on matchVersion. If I commit it now, I would regenerate the rc1 artifacts and release the tomorrow to java-user. Currently the ones on people.apache.org are only "known" to j

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Here the patch. It uses an interface containing the needed metho

[jira] Assigned: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2074: - Assignee: Uwe Schindler > Use a separate JFlex generated Unicode 4 by Java 5 compatible

[jira] Created: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2009-11-16 Thread Uwe Schindler (JIRA)
Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - J

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778548#action_12778548 ] Mark Miller commented on LUCENE-1458: - Merged up - I've gotto say - that was a nasty o

[jira] Commented: (LUCENE-2073) Document issues involved in building your index with one jdk version and then searching/updating with another

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778547#action_12778547 ] Robert Muir commented on LUCENE-2073: - Mark, I agree, there are two issues I know of:

[jira] Created: (LUCENE-2073) Document issues involved in building your index with one jdk version and then searching/updating with another

2009-11-16 Thread Mark Miller (JIRA)
Document issues involved in building your index with one jdk version and then searching/updating with another - Key: LUCENE-2073 URL: https://issues.apache

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778536#action_12778536 ] Robert Muir commented on LUCENE-1689: - bq. Then thats what I am saying we should be do

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778532#action_12778532 ] Robert Muir commented on LUCENE-1689: - Steven, no its definitely the right place to po

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778531#action_12778531 ] Mark Miller commented on LUCENE-1689: - bq. Mark honestly, I do not yet know how this o

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778528#action_12778528 ] Steven Rowe commented on LUCENE-1689: - I don't know if this is the right place to poin

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated with 1.5 and it was different (completely!). I saved the old version and renamed to StandardTokenizerImplJava14 extends StandardTokenizerImpl By this the impl is exchanged depending on version. The 1.4 version can no longer be

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778527#action_12778527 ] Robert Muir commented on LUCENE-1689: - bq. We can fix that too? If so, I think we shou

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778526#action_12778526 ] Mark Miller commented on LUCENE-1689: - I'm speaking in regards to: {quote} btw, its w

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778524#action_12778524 ] Robert Muir commented on LUCENE-1689: - bq. If there is nothing we can do here, then we

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
I still reccomend we add a file then HowToRegenJflex.txt or something - that specifically says to use 1.5 or 1.6. I don't changing the current notice/warning is visible enough to ensure someone doesn't break this. Robert Muir wrote: > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2 > > the on

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778516#action_12778516 ] Mark Miller commented on LUCENE-1689: - If there is nothing we can do here, then we jus

[jira] Assigned: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2072: --- Assignee: Simon Willnauer > Upgrade contrib/regex to jakarta-regex 1.5 > --

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778515#action_12778515 ] Robert Muir commented on LUCENE-2069: - Uwe, we can use matchVersion for all of this, t

[jira] Updated: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2072: Attachment: jakarta-regexp-1.5.jar LUCENE-2072.patch > Upgrade contrib/reg

[jira] Created: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
Upgrade contrib/regex to jakarta-regex 1.5 --- Key: LUCENE-2072 URL: https://issues.apache.org/jira/browse/LUCENE-2072 Project: Lucene - Java Issue Type: Improvement Components: contrib/*

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778514#action_12778514 ] Uwe Schindler commented on LUCENE-2069: --- we can change it whenever we want, we must

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2 the only way to truly control this, would be to use something like ICU to control the unicode version being used (and actually be faster, and support higher version). see http://site.icu-project.org/home/why-use-icu4j the issue is that lucene d

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778510#action_12778510 ] Robert Muir commented on LUCENE-2069: - Simon, yes see LUCENE-1689. this is my questio

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
Did 1.6 change the unicode version? Robert? - UWE SCHINDLER Webserver/Middleware Development PANGAEA - Publishing Network for Geoscientific and Environmental Data MARUM - University of Bremen Room 2500, Leobener Str., D-28359 Bremen Tel.: +49 421 218 65595 Fax: +49 421 218 65505 http://www.pa

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778509#action_12778509 ] Simon Willnauer commented on LUCENE-2069: - we might need a changes.txt entry here

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
And what happens when someone regenerates it with 1.6 without knowing? Uwe Schindler wrote: > I check this by generating the file with 1.4 and 1.5. The 1.4 version will > not change anymore, so we just leave the java file no jflex anymore. The old > one is used for Lucene until 2.9, if you use mat

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778508#action_12778508 ] Robert Muir commented on LUCENE-2069: - Simon, those "wierd" chars are indeed real code

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
mark these are similar to my concerns with us doing unicode 4.0 (suppl. characters, etc) support in 3.1. this is why i left a comment on LUCENE-1689, I'm pretty confused about what approach we should take, because technically, fixing this will break things. and again, I do believe we should have f

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778507#action_12778507 ] Simon Willnauer commented on LUCENE-2068: - We will get this in once 3.0 is out. I

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I check this by generating the file with 1.4 and 1.5. The 1.4 version will not change anymore, so we just leave the java file no jflex anymore. The old one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the new one is used, which can also be regenerated. - Uwe Schindler H.-H.

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778504#action_12778504 ] Simon Willnauer commented on LUCENE-2069: - Robert, I assume you did use those weir

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
Good point - and that likely means the current warning is not working - what can we do to improve it? Perhaps a new text file called jflexregen or something, and it specifically says you must use java 1.5? Uwe Schindler wrote: > > I think the regenerated code in Standard is since years no longer

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I would rename the java file/class and write a big warning on it: for version < 3.0. Do not recreate (which cannot be done, because jflex file is missing). The current jflex file is recreated and is now the official support 1.5 version. The 1.4 version will never change! - Uwe Schindler H.-

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
This is a big deal, weather its jdk or Lucene related. We are forcing those on 1.4 to move to 1.5 - any problems you face with that with the JDK are Lucene problems if they affect Lucene. We need big clear warnings about this - we should have had them before we pushed to users to 1.5 as well if I a

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
We support 3.0, why do you tend to say something other? I will always fix the bug first in 3.0 and then merge (perhaps) back to 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Erick Erickson [mailto:erickerick...@gmail.

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
Steven, I think we can be almost sure of no latin-1 changes. what do you think about this jflex situation though? it seems like a mess, is there anything we can do before the jflex 1.5 stuff that is going on now (where we could actually link Version to the unicode version jflex uses explicitly?)

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I have to regenerate the JFlex files to be sure that they are Java 5. Should I do and recreate the artifacts, they are not yet released. Correct would be to copy the current generated Java file and use it if matchVersion < Version.LUCENE_30. For 3.0++ we have a new one. If the old one is really

Re: Why release 3.0?

2009-11-16 Thread Erick Erickson
Oops, stupid mouse made me send a blank message. Ok, I withdraw the question since there *are* good reasons to put 3.0 in a prod environment . It's also an easier thing to say "new Lucene users should start with 3.0" rather than "new Lucene users should start with 3.1. Use 3.0 until we release 3.1

Re: Why release 3.0?

2009-11-16 Thread Erick Erickson
On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler wrote: > Hi Erick, > > > > 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1 > release. You are right, it is not needed for 2.9.1 users to upgrade (but > they can), but for new users starting with Lucene, the recommendadion is t

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
i suppose we are ok then, except for the fact that now StandardTokenizer is working with a unicode 3.0 definition, instead of the unicode version (4.0) that corresponds to our required minimum jre (1.5)... sorry if i raised a stink about nothing, but you see my concerns maybe? On Mon, Nov 16, 200

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
JFlex was not regenerated as far as I know, but if somebody did, its already broken. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 16, 2009 8:53 PM To: java-

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I think the regenerated code in Standard is since years no longer generated with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed incompatible. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert Muir

[jira] Assigned: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2068: --- Assignee: Simon Willnauer > fix reverseStringFilter for unicode 4.0 > --

[jira] Updated: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2068: Attachment: LUCENE_2068.patch removed static import > fix reverseStringFilter for unicode

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778487#action_12778487 ] Simon Willnauer commented on LUCENE-2068: - bq. I just think we should use a consis

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
btw, so heres a great example. you are backwards broken regardless of JVM for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5 in 3.0, right? On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir wrote: > Uwe, thats probably a good solution I think. just as long as we document > so

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
Uwe, thats probably a good solution I think. just as long as we document somewhere, I think there is some warning verbage in StandardTokenizer already about this. NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate the tokenizer, remember to use JRE 1.4 to run jflex (befor

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
But it is a general warning that should be placed in the Wiki: If you upgrade from Java 1.4 to Java 5, think about reindexing. It has definitely nothing to do with 3.0, because uses could have changed (and most of them have) before. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
right, my point is its true its nothing to do with Lucene at all, really. but the reality is we should clarify this to users I think. Its especially complex in the current StandardTokenizer, which uses a mix of hardcoded ranges and properties, can you tell me if you should reindex for given langu

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
We tried out: Character.getType() for these two chars: Java 5: '\u00AD' = 16 '\u06DD' = 16 Java 1.4: '\u00AD' = 20 '\u06DD' = 7 The first is the soft hyphen. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert Mu

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
But most people already use 1.5 or 1.6 even with 2.9. They could also switch before. The problem is the used JVM not the used Lucene Version. And you can also run Lucene 1.4.3 with Java 5 -> same problem. If people change their Java Version, they have to take care what changed. The only thing:

  1   2   >