[jira] Commented: (LUCENE-2396) remove version from core and contrib analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857595#action_12857595 ] DM Smith commented on LUCENE-2396: -- Humor me. I think I'm not seeing the fores

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
sure I understand. Is JFlex used by every tokenizer? > > Alternatively, we can think of writing an ICU analyzer/tokenizer, but we're > still using JFlex, so I don't know how much control we have on that ... Robert has already started one. (1488 I think). > > Shai >

[jira] Commented: (LUCENE-2396) remove version from core and contrib analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857543#action_12857543 ] DM Smith commented on LUCENE-2396: -- Hmmm. If we are moving stuff out of core and

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
On Apr 15, 2010, at 4:50 PM, Shai Erera wrote: > Robert ... I'm sorry but changes to Analyzers don't *force* people to > reindex. They can simply choose not to use the latest version. They can > choose not to upgrade a Unicode version. They can copy the entire Analyzer > code to match their ne

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
On 04/15/2010 03:25 PM, Shai Erera wrote: We should create a migrate() API on IW which will touch just those segments and not incur a full optimize. That API can also be used for an offline migration tool, if we decide that's what we want. What about an index that has already called optimize

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
On 04/15/2010 03:12 PM, Earwin Burrfoot wrote: On Thu, Apr 15, 2010 at 23:07, DM Smith wrote: On 04/15/2010 03:04 PM, Earwin Burrfoot wrote: BTW Earwin, we can come up w/ a migrate() method on IW to accomplish manual migration on the segments that are still on old versions. That&#

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857498#action_12857498 ] DM Smith commented on LUCENE-2396: -- {quote} bq. One mechanism that would wor

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
On 04/15/2010 03:04 PM, Earwin Burrfoot wrote: BTW Earwin, we can come up w/ a migrate() method on IW to accomplish manual migration on the segments that are still on old versions. That's not the point about whether optimize() is good or not. It is the difference between telling the customer to r

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857487#action_12857487 ] DM Smith commented on LUCENE-2396: -- bq. Well, I think asking for a well-def

[jira] Issue Comment Edited: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857456#action_12857456 ] DM Smith edited comment on LUCENE-2396 at 4/15/10 2:1

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857456#action_12857456 ] DM Smith commented on LUCENE-2396: -- {quote} So I think we should instead use

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
On 04/15/2010 01:50 PM, Earwin Burrfoot wrote: First, the index format. IMHO, it is a good thing for a major release to be able to read the prior major release's index. And the ability to convert it to the current format via optimize is also good. Whatever is decided on this thread should take th

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857427#action_12857427 ] DM Smith commented on LUCENE-2396: -- Robert, I think this is a red-herring. There

Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith
what technologies are used to do searches. If the latest Lucene jar does not let me use Version (or some other mechanism) to maintain compatibility with an older index, the user will have to re-index. Or I can forgo any future upgrades with Lucene. Neither are very palatable. -- DM Smith

Re: Proposal about Version API "relaxation"

2010-04-14 Thread DM Smith
On 04/14/2010 09:13 AM, Robert Muir wrote: Its not sidetracked at all. there seem to be more compelling alternatives to achieve the same thing, so we should consider alternative solutions, too. Maybe have the index store the version(s) and use that when constructing a reader or writer? Given en

Re: Proposal about Version API "relaxation"

2010-04-13 Thread DM Smith
I like the concept of version, but I'm concerned about it too. The current Version mechanism allows one to use more than one Version in their code. Imagine that we are at 3.2 and one was unable to upgrade to a most version for a particular feature. Let's also suppose that at 3.2 a new feature

Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-15 Thread DM Smith
My 2 cents as one who has no aspirations of ever being a committer. I think with the pending re-org of contrib and the value of contrib, it doesn't make much sense to have the distinction between core and contrib let alone for contributors. Regarding the former low bar, either prune the list

Re: Lucene Query Parser Syntax document

2010-02-28 Thread DM Smith
hetaphi.de > >> -Original Message- >> From: DM Smith [mailto:dmsmith...@gmail.com] >> Sent: Sunday, February 28, 2010 2:12 PM >> To: java-dev@lucene.apache.org >> Subject: Lucene Query Parser Syntax document >> >> Earlier I had linked to >> h

Re: SegmentInfos extends Vector

2010-02-28 Thread DM Smith
IIRC: The early implementation of Vector did not extend AbstractList and thus did not have remove. On Feb 28, 2010, at 8:04 AM, Shai Erera wrote: > Why do you say remove was unsupported before? I don't see it in the class's > impl. It just inherits from Vector and so remove is supported by inhe

Lucene Query Parser Syntax document

2010-02-28 Thread DM Smith
Earlier I had linked to http://lucene.apache.org/java/docs/queryparsersyntax.html in my product manual. That no longer works. Searching I found that the document is per release. Not sure when that changed, but having found it at http://lucene.apache.org/java/2_3_2/queryparsersyntax.html I not

Re: Having a default constructor in Analyzers

2010-02-07 Thread DM Smith
On Feb 7, 2010, at 5:32 PM, Sanne Grinovero wrote: > Does it make sense to use different values across the same > application? Obviously in the unlikely case you want to threat > different indexes in a different way, but does it make sense when > working all on the same index? I think it entirel

[jira] Commented: (LUCENE-2226) move contrib/snowball to contrib/analyzers

2010-01-18 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802040#action_12802040 ] DM Smith commented on LUCENE-2226: -- bq. But i think this concept doesn't even m

[jira] Commented: (LUCENE-2226) move contrib/snowball to contrib/analyzers

2010-01-18 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802004#action_12802004 ] DM Smith commented on LUCENE-2226: -- Robert, I'm suggesting that you move it. Bu

[jira] Commented: (LUCENE-2226) move contrib/snowball to contrib/analyzers

2010-01-18 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801912#action_12801912 ] DM Smith commented on LUCENE-2226: -- +1 However this is a very minor break in bw co

[jira] Commented: (LUCENE-2055) Fix buggy stemmers and Remove duplicate analysis functionality

2010-01-18 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801909#action_12801909 ] DM Smith commented on LUCENE-2055: -- I think it is right to fix bad behavior, but su

Re: Dynamic array reallocation algorithms

2010-01-13 Thread DM Smith
On Jan 13, 2010, at 1:00 AM, Marvin Humphrey wrote: > On Tue, Jan 12, 2010 at 10:46:29PM -0500, DM Smith wrote: > >> So starting at 0, the size is 0. >> 0 => 0 >> 0 + 1 => 4 >> 4 + 1 => 8 >> 8 + 1 => 16 >> 16 + 1 => 25 >> 25

Re: Dynamic array reallocation algorithms

2010-01-12 Thread DM Smith
On Jan 12, 2010, at 6:27 PM, Marvin Humphrey wrote: > Greets, > > I've been trying to understand this comment regarding ArrayUtil.getNextSize(): > > * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... > > Maybe I'm missing something, but I can't see how the formula yields su

Re: Compound File Default

2010-01-12 Thread DM Smith
I'm not sure that it's safe to assume that production use of Lucene is not on a laptop or that it is always on big iron. It makes sense that Lucene is embedded in all sorts of desktop applications that might run on small machines. That certainly describes the application that I work on. I'm

Re: LUCENE-1515

2010-01-02 Thread DM Smith
On Jan 2, 2010, at 7:46 AM, Robert Muir wrote: >> I also want backward compatibility. Or at least control over it. That is, I >> need for indexes to work fully but want an easy path to upgrade/replace an >> index with better analyzer/filter combos. This stemmer is not backward >> compatible. >

Re: LUCENE-1515

2010-01-02 Thread DM Smith
Just my 2 cents from a user perspective to the whole thread: I want the best and an easy way to identify the best. Preferably, it will be the default by current version. The best should also have the best name. Because of the backward compatibility policy, we're painted into a box, into name hel

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2009-12-07 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786968#action_12786968 ] DM Smith commented on LUCENE-1343: -- {quote} bq. Robert Muir, Would it make sens

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2009-12-07 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786941#action_12786941 ] DM Smith commented on LUCENE-1343: -- I also am dubious about a general purpose fol

Re: Lots of results

2009-12-05 Thread DM Smith
On Dec 5, 2009, at 5:22 PM, Grant Ingersoll wrote: > At ScaleCamp yesterday in the UK, I was listening to a talk on Xapian and the > speaker said one of the optimizations they do when retrieving a large result > set is that instead of managing a Priority Queue, they just allocate a large > arr

Release artifacts

2009-12-05 Thread DM Smith
I'm wondering about the size of the builds, which are surprisingly big to me. The src is 12M/13M and the bin is 17M/26M (tar.gz/zip) for 2.9.1, similar for 3.0.0. In looking at the binary artifact I see the following: * Every contrib jar has a corresponding javadoc jar, but there is no core-jav

[jira] Commented: (LUCENE-2105) Lucene does not support Unicode Normalization Forms

2009-12-03 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785302#action_12785302 ] DM Smith commented on LUCENE-2105: -- Is this a duplicate or solved by LUCENE-1488

[jira] Commented: (LUCENE-1488) multilingual analyzer based on icu

2009-12-02 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785036#action_12785036 ] DM Smith commented on LUCENE-1488: -- Robert, just finished reviewing the code. L

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-12-02 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784812#action_12784812 ] DM Smith commented on LUCENE-2034: -- bq. But I do not see the benefit compared to

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784506#action_12784506 ] DM Smith commented on LUCENE-2034: -- {quote} bq.How about splitting out the

[jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784423#action_12784423 ] DM Smith commented on LUCENE-2102: -- For new classes, would it be helpful to add @s

[jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784421#action_12784421 ] DM Smith commented on LUCENE-2102: -- bq. but non-NFC text doesn't work

[jira] Commented: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784358#action_12784358 ] DM Smith commented on LUCENE-1581: -- bq. ultimately I still think case folding is

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784338#action_12784338 ] DM Smith commented on LUCENE-2034: -- Robert: bq. DM, I think we can have both? A me

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784327#action_12784327 ] DM Smith commented on LUCENE-2034: -- Robert, I'd like them to be in files as

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784303#action_12784303 ] DM Smith commented on LUCENE-2034: -- Patch looks good. I like how this simplifies

[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784182#action_12784182 ] DM Smith commented on LUCENE-2094: -- In reviewing Simon's latest patch, I see

[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

2009-12-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784175#action_12784175 ] DM Smith commented on LUCENE-2094: -- bq. I would like to open another issue for rob

[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

2009-11-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783842#action_12783842 ] DM Smith commented on LUCENE-2094: -- bq. If you create the StopFilter

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-11-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783737#action_12783737 ] DM Smith commented on LUCENE-2034: -- I was trying to lurk, but I'm not able to

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-24 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781947#action_12781947 ] DM Smith commented on LUCENE-1458: -- bq. Yes, this (customizing comparator for term

Re: [jira] Commented: (LUCENE-2092) BooleanQuery.hashCode and equals ignore isCoordDisabled

2009-11-23 Thread DM Smith
Since this is a bug fix, please mark it for 2.9.2 if there ever is one. On Nov 23, 2009, at 7:08 PM, Michael McCandless (JIRA) wrote: > >[ > https://issues.apache.org/jira/browse/LUCENE-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781706#acti

Re: Hiding JIRA issues

2009-11-21 Thread DM Smith
A couple of thoughts: JIRA allows for administrative export of the database to XML. If these don't export then something is really bad. Contact atlassian with the problem after searching their forums for the problem. -- DM On Nov 21, 2009, at 9:57 AM, Simon Willnauer wrote: > On Sat, Nov 21, 2

[jira] Commented: (LUCENE-1799) Unicode compression

2009-11-19 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780129#action_12780129 ] DM Smith commented on LUCENE-1799: -- The sample code is probably what is on this

Re: Why release 3.0?

2009-11-16 Thread DM Smith
perhaps the best advice is to skip 3.0 and take the pain once. > > btw, i created a diff from unicode 3's UCD to unicode 4's UCD, in case you > want to see the changes: http://people.apache.org/~rmuir/unicodeDiff.txt That's an amazing number of changes, even when you i

Re: Why release 3.0?

2009-11-16 Thread DM Smith
On Nov 16, 2009, at 6:43 PM, Robert Muir wrote: > DM, in this case I'm not referring to surrogates, etc, but instead the idea > that properties for an existing character can change (the soft hyphen and > arabic ayah were two examples), also new characters are introduced. > > these will affect

[jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer

2009-11-01 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772350#action_12772350 ] DM Smith commented on LUCENE-2023: -- Internals are internals. Anyone digging

Re: contrib and lucene 3.0

2009-10-30 Thread DM Smith
I don't see any reason to freeze new contributions from any release. On 10/30/2009 03:19 PM, Robert Muir wrote: thanks Michael. does anyone else have any opinion on this issue? fyi we already have several new features committed to 3.0 contrib already (see contrib/CHANGES), but I don't too much

[jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer

2009-10-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772050#action_12772050 ] DM Smith commented on LUCENE-2023: -- Robert, You have in BigramDictionary: {

[jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer

2009-10-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772034#action_12772034 ] DM Smith commented on LUCENE-2023: -- Thanks, Mark. I'm stuck with 1.4 for

[jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer

2009-10-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772022#action_12772022 ] DM Smith commented on LUCENE-2023: -- I fully understand that at some point, "ju

[jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer

2009-10-30 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772003#action_12772003 ] DM Smith commented on LUCENE-2023: -- If we have a 2.9.2 release, can this be there

Re: Lucene as projects in Eclipse

2009-10-28 Thread DM Smith
On Oct 28, 2009, at 1:45 PM, Robert Muir wrote: DM, I create one project (new project, checkout projects from SVN, and let it set it as a java project). I then set the source folders like you mentioned below. I add lib/junit*whatever.jar to library classpath, and set UTF-8 default encodin

Re: Lucene as projects in Eclipse

2009-10-28 Thread DM Smith
ork ... I'm not looking for anything in particular though it does make the dependencies between contribs obvious. It was more a pattern from habit on another project. -- DM DM Smith wrote: On 10/28/2009 01:03 PM, Mark Miller wrote: DM Smith wrote: Is there any guid

Re: Lucene as projects in Eclipse

2009-10-28 Thread DM Smith
On 10/28/2009 01:03 PM, Mark Miller wrote: DM Smith wrote: Is there any guidance on how to set up Lucene for development within Eclipse. Perhaps a wiki page or an old email thread? I looked but didn't find one. I've done it manually twice now and it was time-consuming and ultima

Lucene as projects in Eclipse

2009-10-28 Thread DM Smith
Is there any guidance on how to set up Lucene for development within Eclipse. Perhaps a wiki page or an old email thread? I looked but didn't find one. I've done it manually twice now and it was time-consuming and ultimately I did it differently each time, not liking any way I have done it. Or

[jira] Commented: (LUCENE-2012) Add @Override annotations

2009-10-27 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770705#action_12770705 ] DM Smith commented on LUCENE-2012: -- Uwe, what did you use to generate the @over

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768304#action_12768304 ] DM Smith commented on LUCENE-1998: -- bq. changing the order of enum constants is bad,

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768270#action_12768270 ] DM Smith commented on LUCENE-1998: -- I just noticed that enums are comparable. For

[jira] Issue Comment Edited: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768215#action_12768215 ] DM Smith edited comment on LUCENE-1998 at 10/21/09 2:2

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768215#action_12768215 ] DM Smith commented on LUCENE-1998: -- .bq I only added the license header back in

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: (was: LUCENE-1257_enum.patch) > Port to Java5 > - > >

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1998: - Attachment: LUCENE-1998_enum.patch This issue and patch were part of LUCENE-1257, but may have backward

[jira] Created: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Priority: Minor

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: LUCENE-1257_enum.patch Migrates to Java 5 enums in core and contrib. All tests pass

Parameter class and Java 5 Enums

2009-10-19 Thread DM Smith
Should the Parameter class be replaced with Java 5 enums? My only concern is backward compatibility. I noticed that Parameter is serializable. Is this used by Lucene? I wasn't able to see any place that depended on it. The only public method, Parameter.toString() results in the same value as a

[jira] Commented: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763554#action_12763554 ] DM Smith commented on LUCENE-1963: -- can you commit it to 2.9.1 too? (For those stuc

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
suppose. but this is a tricky subject, what if you have mixed Arabic / German or something like that? for some other languages written in the Latin script, English stopwords could be bad :) I think that Lowercasing non-Arabic (also cyrillic, etc)

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
owercasing non-Arabic (also cyrillic, etc), is pretty safe across the board though. On Thu, Oct 8, 2009 at 9:29 AM, DM Smith <mailto:dmsmith...@gmail.com>> wrote: On 10/08/2009 09:23 AM, Uwe Schindler wrote: Just an addition: The lowercase filter is only for the case

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
c Analyzer: possible bug DM, there is no upper/lower cases in Arabic, so don't worry, but the stop word list needs some corrections and may miss some common/stop Arabic words. Best, On Thu, Oct 8, 2009 at 4:14 PM, DM Smith wrote: Robert, Thanks for the info. As I said, I am illiterate

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
Oct 8, 2009 at 7:24 AM, DM Smith wrote: I'm wondering if there is a bug in ArabicAnalyzer in 2.9. (I don't know Arabic or Farsi, but have some texts to index in those languages.) The tokenizer/filter chain for ArabicAnalyzer is: TokenStream result = new ArabicLetterTo

Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
I'm wondering if there is a bug in ArabicAnalyzer in 2.9. (I don't know Arabic or Farsi, but have some texts to index in those languages.) The tokenizer/filter chain for ArabicAnalyzer is: TokenStream result = new ArabicLetterTokenizer( reader ); result = new StopFilter( result

Re: [jira] Created: (LUCENE-1956) Fix javadoc comments in search package

2009-10-07 Thread DM Smith
On Oct 7, 2009, at 2:59 PM, Michael Busch (JIRA) wrote: Fix javadoc comments in search package -- Key: LUCENE-1956 URL: https://issues.apache.org/jira/browse/LUCENE-1956 Project: Lucene - Java Issue Type: T

Re: [jira] Created: (LUCENE-1948) Deprecating InstantiatedIndexWriter

2009-10-05 Thread DM Smith
On 10/05/2009 12:22 PM, Karl Wettin (JIRA) wrote: Deprecating InstantiatedIndexWriter --- Key: LUCENE-1948 URL: https://issues.apache.org/jira/browse/LUCENE-1948 Project: Lucene - Java Issue Type: Task

Re: Searcher javadoc problem

2009-10-03 Thread DM Smith
working up something with a Collector on your own would be better though - why compute the score if you don't need it. Hits caching was rarely that useful either. DM Smith wrote: It makes sense if you understand the context. We make each verse of a Bible a document. There are about 36000 docc in a B

Re: svn commit: r821434 - /lucene/java/trunk/src/java/org/apache/lucene/search/Searcher.java

2009-10-03 Thread DM Smith
On Oct 3, 2009, at 6:56 PM, Mark Miller wrote: No bug fixes for the lazy! Not having 1.5 on mac osx tiger is the issue. Dou you recommend that 2.9.0 is really not for 1.4 users. Therefore was no point in waiting on Java1.5. :) Yes I see that tongue in your cheek. We should also fix

Re: svn commit: r821440 - /lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/search/Searcher.java

2009-10-03 Thread DM Smith
Please apply all bug fixes tto 2.9.0 as som of us have it as our last Java1.4.2 release. On Oct 3, 2009, at 6:55 PM, "Uwe Schindler" wrote: Should we now commit all fixes also to 2.9, which should go into 2.9.1, i fit will be released as a bugfix release together with 3.0 (e.g. the highl

Re: svn commit: r821434 - /lucene/java/trunk/src/java/org/apache/lucene/search/Searcher.java

2009-10-03 Thread DM Smith
On Oct 3, 2009, at 6:51 PM, Michael Busch wrote: On 10/4/09 12:42 AM, Mark Miller wrote: Why will 3.0 be work to upgrade? 2.9 was supposed to be the work, 3.0 no work ... With 2.9 you can be lazy and live with deprecation warnings. With 3.0 you *have* to switch to undeprecated APIs. M

Re: Searcher javadoc problem

2009-10-03 Thread DM Smith
h the JavaDoc warns you thats a major speed trap, everyone still did it ... use a Collector. Your right though - it shouldn't point to IndexSearcher.search(Query) after that - it should point to IndexSearcher.search(Query, int) Goto fix that. DM Smith wrote: I'm working on migrating my c

Searcher javadoc problem

2009-10-03 Thread DM Smith
I'm working on migrating my code to 2.9. And I'm trying to figure out what to do. Along the way I found a circular argument in the JavaDoc for Searcher. BTW, this is not a user question. My current code calls: Hits hits = searcher.search(query); The JavaDoc for it says: /**

Re: Deprecated class in spatial contrib

2009-08-30 Thread DM Smith
+1 How obvious!! On Aug 30, 2009, at 3:04 PM, Mark Miller wrote: The spatial contrib has not been in a release before, so just wondering why there are deprecated classes in it - should we remove those, or was there a good reason to keep them? In general, it seem we should just deprecate

Lucene 3.0 and Java 5 (was Re: Finishing Lucene 2.9)

2009-08-23 Thread DM Smith
the name/api problems and making the API of Lucene be what it should have been for a 3.0 release. I'd also suggest that repackaging, suggested in a prior thread, be tackled also. This could follow a 3.0 release quickly. -- DM Smith

[jira] Commented: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens

2009-08-17 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744092#action_12744092 ] DM Smith commented on LUCENE-1813: -- I like the idea of a constant and it presented

Re: Java 5 creeped again to Benchmark?

2009-08-13 Thread DM Smith
On 08/13/2009 01:56 PM, Mark Miller wrote: Shai Erera wrote: So far Mike has resolved the issue again, so it sounds like we go w/ it ? Lazy consensus - so its lookin good so far - but someone could still derail us I suppose. I've been a stick-in-the-mud wrt migrating to Java 5 in the past.

Re: who clears attributes?

2009-08-11 Thread DM Smith
Uwe, Is this example available? I think that an example like this would help the user community see the current value in the change. At least, I'd love to see the code for it. -- DM On 08/10/2009 06:49 PM, Uwe Schindler wrote: > UIMA The new API looks like UIMA, you have streams that

Beta (was Re: who clears attributes?)

2009-08-11 Thread DM Smith
On 08/11/2009 08:22 AM, Michael McCandless wrote: I do still think a longish 2.9 beta is warranted, if we can succeed in getting users outside the dev group to kick the tires and uncover stuff. I think a beta would be a great idea. Not sure it needs to be "longish." Having not looked at it

[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

2009-08-11 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741835#action_12741835 ] DM Smith commented on LUCENE-1794: -- If CachingTokenFilter.reset() means rewind, then

[jira] Created: (LUCENE-1799) Unicode compression

2009-08-10 Thread DM Smith (JIRA)
Unicode compression --- Key: LUCENE-1799 URL: https://issues.apache.org/jira/browse/LUCENE-1799 Project: Lucene - Java Issue Type: New Feature Components: Store Affects Versions: 2.4.1 Reporter: DM

[jira] Commented: (LUCENE-1793) remove custom encoding support in Greek/Russian Analyzers

2009-08-09 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741173#action_12741173 ] DM Smith commented on LUCENE-1793: -- I wasn't thinking about any encoding in p

[jira] Commented: (LUCENE-1793) remove custom encoding support in Greek/Russian Analyzers

2009-08-09 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741109#action_12741109 ] DM Smith commented on LUCENE-1793: -- bq.If this is the concern, then I think a be

Re: IndexWriter.getReader usage

2009-08-03 Thread DM Smith
On 08/03/2009 08:21 AM, Earwin Burrfoot wrote: The biggest win for NRT was switching to per-segment Collector because that meant we could re-use FieldCache entries for all segments that hadn't changed. In my opinion, this switch was enough to get as NRT-ey, as you want. Fusing IR/IW togeth

Re: IndexWriter.getReader usage

2009-08-01 Thread DM Smith
On Aug 1, 2009, at 7:52 AM, Grant Ingersoll wrote: In many NRT cases, it seems the traditional approach has been to have two RAM directories and a write-through FS Directory (for example Zoie does this, and it has also been discussed a fair number of times on the various lists). I'm wonde

  1   2   3   >