Re: Having a default constructor in Analyzers

2010-02-07 Thread DM Smith

On Feb 7, 2010, at 5:32 PM, Sanne Grinovero wrote:

> Does it make sense to use different values across the same
> application? Obviously in the unlikely case you want to threat
> different indexes in a different way, but does it make sense when
> working all on the same index?

I think it entirely depends on the use case. In my use case, my app is indexing 
one book per index with each sentence or paragraph (depends on the book) as a 
document. The app lives on a user's desktop and they can download books on an 
as needed basis and them index them in that app.

I don't have it yet, but need to: Imagine that each index maintains a manifest 
of the toolchain for the index, which includes the version of each part of the 
chain. Since the index is created all at once, this probably is the same as the 
version of lucene. When the user searches the index the manifest is consulted 
to recreate the toolchain.

Suppose the user has updated the application a couple of times and now is 
sitting at Lucene 4.7. Any index at VERSION 1.9.x (not that we go back that 
far) has been obsoleted, but all the 2.x and 3.x are still in play, based upon 
the backward compatibility policy. (2.x is in play from an index compatibility 
perspective, but not an API perspective.)

But what does Version 3.2 mean at 4.7. For a given filter, it may not have 
changed from 3.2 to 3.6. Those versions and in between are equivalent for that 
filter, but another filter in the same tool chain may have been changed at 3.4.

> If not, why not introduce a value like "Version.BY_ENVIRONMENT" which
> is statically initialized to be one of the other values, reading from
> an environment parameter?

Environment parameters are not per index, but per JVM.

> So you get the latest at first deploy, and can then keep compatibility
> as long as you need, even when updating Lucene.
> This way I could still have the safety of pinning down a specific
> version and yet avoid rebuilding the app when changing it.
> Of course the default would be LUCENE_CURRENT, so that people trying
> out Lucene get all features out of the box, and warn about setting it
> (maybe log a warning when not set).
> 
> Also, wouldn't it make sense to be able to read the recommended
> version from the Index?

Absolutely!

> I'd like to have the hypothetical AnalyzerFactory to find out what it
> needs to build getting information from the relevant IndexReader; so
> in the case I have two indexes using different versions I won't get
> mistakes. (For a query on index A I'm creating a QueryParser, so let's
> ask the index which kind of QueryParser I should use...)

IIRC: This is something that Marvin has implemented in Lucy. And what I was 
talking about above.

> 
> just some ideas, forgive me if I misunderstood this usage (should
> avoid writing late in the night..)
> Regards,
> Sanne
> 
> 
> 
> 2010/2/7 Simon Willnauer :
>> On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir  wrote:
>>> Simon, can you explain how removing CURRENT makes it harder for users to
>>> upgrade? If you mean for the case of people that always re-index all
>>> documents when upgrading lucene jar, then this makes sense to me.
>> That is what I was alluding to!
>> Not much of a deal though most IDEs let you upgrade via refactoring
>> easily and we can document this too. Yet we won't have a drop in
>> upgrade anymore though.
>> 
>>> 
>>> I guess as a step we can at least deprecate this thing and strongly
>>> discourage its use, please see the patch at LUCENE-2080.
>>> 
>>> Not to pick on Sanne, but his wording about: "Of course more advanced use
>>> cases would need to pass parameters but please make the advanced usage
>>> optional", this really caused me to rethink CURRENT, because CURRENT itself
>>> should be the advanced use case!!!
>>> 
>>> On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
>>>  wrote:
 
 Sanne, I would recommend you building a Factory pattern around you
 Analyzers / TokenStreams similar to what solr does. That way you can
 load you own "default ctor" interface via reflection and obtain you
 analyzers from those factories. That makes more sense anyway as you
 only load the factory via reflection an not the analyzers.
 
 @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
 the one hand it would make our live easier over time but would make it
 harder for our users to upgrade. I would totally agree that for
 upgrade safety it would be much better to enforce an explicit version
 number so upgrading can be done step by step. Yet, if we deprecate
 LUCENE_CURRENT people will use it for at least the next 3 to 5 years
 (until 4.0) anyway :)
 
 simon
 
 On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
  wrote:
> Thanks for all the quick answers;
> 
> finding the ctor having only a Version parameter is fine for me, I had
> noticed this "frequent pattern" but didn't understand that was a
> general rule.

Lucene Query Parser Syntax document

2010-02-28 Thread DM Smith
Earlier I had linked to 
http://lucene.apache.org/java/docs/queryparsersyntax.html in my product manual. 
That no longer works.

Searching I found that the document is per release. Not sure when that changed, 
but having found it at

http://lucene.apache.org/java/2_3_2/queryparsersyntax.html

I noticed that it declares that the syntax is that of 1.9.

The documents for 2.9 and later have this corrected.

I'm wondering if it there is a durable link to the current documentation that 
does not change with each release?

-- DM
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: SegmentInfos extends Vector

2010-02-28 Thread DM Smith
IIRC: The early implementation of Vector did not extend AbstractList and thus 
did not have remove.

On Feb 28, 2010, at 8:04 AM, Shai Erera wrote:

> Why do you say remove was unsupported before? I don't see it in the class's 
> impl. It just inherits from Vector and so remove is supported by inheritance. 
> Since the class is public, someone may have called it.
> 
> Even if we change the class to impl List, period, we'll break back-compat, 
> just because of the synchronization Vector offers. If anyone out there relies 
> on that, it's a problem.
> 
> On one hand, the best way would be is to impl Collection, as then someone 
> will be able to use Collections.synchronizedCollection if one needs it, or 
> call toArray etc. But Collection does not have a get(index) method, which 
> might be required and useful ...
> 
> All in all, I don't feel like SegmentInfos is a true collection (even though 
> its Javadoc starts with "a collection ...". It adds lots of segments related 
> methods. The collection's ones are really get and iterator? So maybe we 
> should just impl Iterable and expose whatever API we feel is necessary? 
> Back-compat wise, if we change anything in this class's extension/implements 
> details, we break it.
> 
> Unless the folks here don't think we should go to great lengths w/ this 
> class, and do whatever changes we dim are necessary, even at the cost of 
> breaking back-compat. And I'd vote that whether with this class or the new 
> one, we mark it as @lucene.internal.
> 
> Shai
> 
> On Sun, Feb 28, 2010 at 2:49 PM, Uwe Schindler  wrote:
> Hi Shai,
> 
>  
> I forgot to mention: Iterable is always a good idea. E.g. during my 3.0 
> generification, I made “BooleanQuery implements Iterable” and 
> so on. That makes look the code nice J. Also other classes got this interface 
> in Lucene. Also adding j.io.Closeable everywhere was a good idea.
> 
>  
> Uwe
> 
>  
> -
> 
> Uwe Schindler
> 
> H.-H.-Meier-Allee 63, D-28213 Bremen
> 
> http://www.thetaphi.de
> 
> eMail: u...@thetaphi.de
> 
>  
> From: Shai Erera [mailto:ser...@gmail.com] 
> Sent: Sunday, February 28, 2010 1:38 PM
> 
> 
> To: java-dev@lucene.apache.org
> Subject: Re: SegmentInfos extends Vector
> 
>  
> I would rather avoid implementing List .. we should implement Iterable for 
> sure, but I'd like to keep the API open either iterating in-order or getting 
> a particular SegmentInfo. Another thing, I haven't seen anywhere that remove 
> is called. In general I don't like to impl an interface just to throw UOE 
> everywhere ...
> 
> I will open an issue. I usually investigate the code first before I open an 
> issue. Also, what about back-compat? Are we even allowed to change that 
> class? If not, then we can deprecate it and introduce a new one ...
> 
> Shai
> 
> On Sun, Feb 28, 2010 at 2:25 PM, Uwe Schindler  wrote:
> 
> I think you should open an issue! I like this refactoring, maybe we can still 
> let it implement List but only deprecated and most methods 
> should throw UOE. Just keep get() and so on.
> 
>  
> -
> 
> Uwe Schindler
> 
> H.-H.-Meier-Allee 63, D-28213 Bremen
> 
> http://www.thetaphi.de
> 
> eMail: u...@thetaphi.de
> 
>  
> From: Shai Erera [mailto:ser...@gmail.com] 
> Sent: Sunday, February 28, 2010 1:20 PM
> 
> 
> To: java-dev@lucene.apache.org
> 
> Subject: Re: SegmentInfos extends Vector
> 
>  
> Yes that's what I've been thinking as well - SegmentInfos should have a 
> segments-related API, not a List related. Whether the infos inside are kept 
> in a Map, List, Collection or array is an implementation detail. In fact, I 
> have a code which uses the API and could really benefit from a Map-like 
> interface, but perhaps other code needs things ordered (which is why we can 
> keep a TreeMap inside, or LinkedHahsMap). That's a great example to why it 
> should have its own API.
> 
> The Lucene code usually calls SegmentInfos.info(int), but some places call 
> get(int) (which is inherited from Vector). That's bad.
> 
> SegmentInfos is public, though it's tagged with @lucene.experimental. I think 
> it should be tagged with @lucene.internal as there's nothing experimental 
> about it?
> 
> I don't mind doing the refactoring. Not sure how this will affect back-compat 
> (is it acceptable for this classs?). I've touched SegmentInfos in 
> LUCENE-2289, so I'll wait for someone to pick it up first, so that I don't 
> work on it in parallel.
> 
> Thanks,
> Shai
> 
> On Sun, Feb 28, 2010 at 1:37 PM, Uwe Schindler  wrote:
> 
> I think this is historically. I have seen this in my big 3.0 generification 
> patches, too. But I did not wanted to change it as Vector has other 
> allocation schema than ArrayList. But maybe we should simply change it, it’s 
> a package-private class, right?
> 
>  
> But in general subclassing those implementations is not the best thing you 
> can do. In general the class should extend Object or something else and just 
> have final field of type List<…>. Exposing the whole API of Lis

Re: Lucene Query Parser Syntax document

2010-02-28 Thread DM Smith

On Feb 28, 2010, at 8:17 AM, Uwe Schindler wrote:

> We may add a symbolic link to the actual version?

I would like something like http://lucene.apache.org/java/current to point to 
the latest release's docs.

> But you are right, the syntax is now per-release. Also the system 
> requirements since 3.0 is per-release. The top level site now only contains 
> general infos about lucene no longer any details that may change between 
> releases.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
>> -Original Message-
>> From: DM Smith [mailto:dmsmith...@gmail.com]
>> Sent: Sunday, February 28, 2010 2:12 PM
>> To: java-dev@lucene.apache.org
>> Subject: Lucene Query Parser Syntax document
>> 
>> Earlier I had linked to
>> http://lucene.apache.org/java/docs/queryparsersyntax.html in my product
>> manual. That no longer works.
>> 
>> Searching I found that the document is per release. Not sure when that
>> changed, but having found it at
>> 
>> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html
>> 
>> I noticed that it declares that the syntax is that of 1.9.
>> 
>> The documents for 2.9 and later have this corrected.
>> 
>> I'm wondering if it there is a durable link to the current
>> documentation that does not change with each release?
>> 
>> -- DM
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-15 Thread DM Smith

My 2 cents as one who has no aspirations of ever being a committer.

I think with the pending re-org of contrib and the value of contrib, it 
doesn't make much sense to have the distinction between core and contrib 
let alone for contributors.


Regarding the former low bar, either prune the list (voluntarily or 
forcefully), prune individuals when they commit something they really, 
really shouldn't have (e.g. no discussion, no consensus), or give 
several opportunities to do right then prune.


But in any case, spell out the expectations and document it (perhaps in 
the wiki).


I think it can work and there will be little if any problem with it.

-- DM

On 03/15/2010 02:33 PM, Grant Ingersoll wrote:

On Mar 15, 2010, at 1:25 PM, Mark Miller wrote:

   

On 03/15/2010 08:33 AM, Grant Ingersoll wrote:
 

Right, Mark.  I think we would be effectively raising the bar to some extent 
for what it takes to be a committer.
   

That's part of my point though - some are contrib committers with a lower bar - 
now they are core/solr committers with that lower bar, but someone else that 
came along would not get to the same position now?
 

I think they may just have a little more work to do, either that or maybe we 
just have a little more faith that the right things will be done.

   
 

  We'd also be making contrib a first class citizen (not that it ever wasn't, 
but some people have that perception).
   

I think because it was kind of true. I could come along before and donate 
contrib x, and never show I worked well with the community or build up the 
merit needed to be a committer, and be made a contrib committer simply to 
maintain my module. That's happened plenty.
 

True.  I guess what I'm saying is we can still make them committers and it may be that they still only will 
work on "their" module, but we should base our vote on them being "full" committers.  I 
don't like the notion of modules belonging to someone (not that you were implying that, I know.)  I guess I 
just see it as you either have earned merit or not.  That's how we do it in Solr and Mahout and they both 
have modules/contribs and it also fits more with the notion of "one project, one set of committers".

   
 

  Finally, I think we need to recognize that not everyone needs to be a 
McCandless in order to contribute in a helpful way.
   

We obviously recognize that or else I wouldn't be here! I think its more about 
fitting in - showing you get and follow the Apache way. Showing that ideas and 
changes you might push are in line with what the other committers thing is 
appropriate of a core/solr committer. Talent is not key here - community is. 
The bar for this has been *much* higher core than contrib in the past. And 
contrib has had different bars over time - I think it was even lower in the 
past at points.
 

Agreed.

   
 

  I think sometimes we forget that you can do svn revert.
   

I hate to have to do that. I don't think its a great way to handle this - we 
could make everyone a committer at a drop of a hat and say we can just revert. 
I wouldn't call for a revert except in exceptional circumstances. I don't think 
that's the point.
 

Right, obviously I wasn't implying we'd want to do it, but we can if it is 
absolutely necessary.
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-13 Thread DM Smith

I like the concept of version, but I'm concerned about it too.

The current Version mechanism allows one to use more than one Version in 
their code. Imagine that we are at 3.2 and one was unable to upgrade to 
a most version for a particular feature. Let's also suppose that at 3.2 
a new feature was introduced and was taken advantage of. But at 3.5 that 
new feature is versioned but one is unable to upgrade for it, too. Now 
what? Use 3.0 for the one feature and 3.2 for the other?


What about the interoperability of versioned features? Does a version 
3.0 class play well with a 3.2 versioned class? How do we test that?


A long term issue is that of bw compat for the version itself. The bw 
compat contract is two fold: API and index. The API has a shorter 
lifetime of compatibility than that of an index. How does one deprecate 
a particular version for the api but not the index? How does one know 
whether one versioned feature impacts the index and an other does not?


I'm hoping that I'm imagining a problem that will never actually arise.

Shai, to your suggestion: Because the version mechanism is not a single 
value for the entire library but rather feature by feature. I don't see 
how a global setter can help.


-- DM

On 04/13/2010 11:27 AM, Shai Erera wrote:

Hi

I'd like to propose a relaxation on the Version API. Uwe, please read 
the entire email before you reply :).


I was thinking, following a question on the user list, that the 
Version-based API may not be very intuitive to users, especially those 
who don't care about versioning, as well as very inconvenient. So 
there are two issues here:
1) How should one use Version smartly so that he keeps backwards 
compatibility. I think we all know the answer, but a Wiki page with 
some "best practices" tips would really help users use it.
2) How can one write sane code, which doesn't pass versions all over 
the place if: (1) he doesn't care about versions, or (2) he cares, and 
sets the Version to the same value in his app, in all places.


Also, I think that today we offer a flexibility to users, to set 
different Versions on different objects in the life span of their 
application - which is a good flexibility but can also lead people to 
shoot themselves in the legs if they're not careful -- e.g. upgrading 
Version across their app, but failing to do so for one or two places ...


So the change I'd like to propose is to mostly alleviate (2) and 
better protect users - I DO NOT PROPOSE TO GET RID OF Version :).


I was thinking that we can add on Version a DEFAULT version, which the 
caller can set. So Version.setDefault and Version.getDefault will be 
added, as static members (more on the static-ness of it later). We 
then change the API which requires Version to also expose an API which 
doesn't require it, and that API will call Version.getDefault(). 
People can use it if they want to ...


Few points:
1) As a default DEFAULT Version is controversial, I don't want to 
propose it, even though I think Lucene can define the DEFAULT to be 
the latest. Instead, I propose that Version.getDefault throw a 
DefaultVersionNotSetException if it wasn't set, while an API which 
relies on the default Version is called (I don't want to return null, 
not sure how safe it is).
2) That DEFAULT Version is static, which means it will affect all 
indexing code running inside the JVM. Which is fine:

2.1) Perhaps all the indexing code should use the same Version
2.2) If you know that's not the case, then pass Version to the API 
which requires it - you cannot use the 'default Version' API -- 
nothing changes for you.
One case is missing -- you might not know if your code is the only 
indexing code which runs in the JVM ... I don't have a solution to 
that, but I think it'll be revealed pretty quickly, and you can change 
your code then ...


So to summarize - the current Version API will remain and people can 
still use it. The DEFAULT Version API is meant for convenience for 
those who don't want to pass Version everywhere, for the reasons I 
outlined above. This will also clean our test code significantly, as 
the tests will set the DEFAULT version to TEST_VERSION_CURRENT at 
start ...


The changes to the Version class will be very simple.

If people think that's acceptable, I can open an issue and work on it.

Shai



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-14 Thread DM Smith

On 04/14/2010 09:13 AM, Robert Muir wrote:
Its not sidetracked at all. there seem to be more compelling 
alternatives to achieve the same thing, so we should consider 
alternative solutions, too.
Maybe have the index store the version(s) and use that when constructing 
a reader or writer?
Given enough minor releases, it is likely that different analyzers would 
use different versions. So each feature would need to be represented.




On Wed, Apr 14, 2010 at 8:54 AM, Earwin Burrfoot > wrote:


The thread somehow got sidetracked. So, let's get this carriage back
on its rails?

Let me remind - we have an API on hands that is mandatory and tends to
be cumbersome.
Proposed solution does indeed have ultrascary word "static" in it. But
if you brace yourself and look closer - the use of said static is
opt-in and heavily guarded.
So even a long-standing hater of everything static like me is tempted.


On Wed, Apr 14, 2010 at 16:30, Grant Ingersoll
mailto:gsing...@apache.org>> wrote:
>
> On Apr 14, 2010, at 12:49 AM, Robert Muir wrote:
>
>>
>> On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey
mailto:mar...@rectangular.com>> wrote:
>> New class names would work, too.
>>
>> I only mention that for the sake of completeness, though --
it's not a
>> suggestion.
>>
>> Right, to me this is just as bad.
>> In my eyes, the Version thing really shows the problem with the
analysis stuff:
>> * Used by QueryParsers, etc at search and index time, with no
real clean way to do back-compat
>> * Concepts like Version and class-naming push some of the
burden to the user: users decide the back-compat level, but it
still leaves devs with back-compat management hassle.
>>
>> The idea of having a real versioned-module is the same as
Version and class-naming, except it both pushes the burden to the
user in a more natural way (people are used to versioned jar files
and things like that... not Version constants), and it relieves
devs of the back compat
>>
>> In all honesty with the current scheme, release schedules of
Lucene, and Lucene's policy, the analysis stuff will soon deadlock
into being nearly unmaintainable, and to many users, the API is
already unconsumable: its difficult to write reusable analyzers
due to historical relics in the API, methods are named
inappropriately, e.g. Tokenizer.reset(Reader) and
TokenStream.reset(), they don't understand Version, and probably a
few other things I am forgetting that are basically impossible to
fix right now with the current state of affairs.
>
>
> The thing I keep going back to is that somehow Lucene has
managed for years (and I mean lots of years) w/o stuff like
Version and all this massive back compatibility checking.  I'm
still undecided as to whether that is a good thing or not.  I also
am not sure whether it in the past we just missed/ignored more
back compatibility issues or whether now we are creating more back
compat. issues due to more rapid change.  I agree, though, that
all of this stuff is making it harder and harder to develop (and I
don't mean for us committers, I mean for end consumers.)
>
> I also agree about Robert's point about the incorrectness of
naming something 3.0 versus 3.1 when 3.1 is the thing that has all
the new features and is really the "major" release.
>
> -Grant
>
-
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org

> For additional commands, e-mail: java-dev-h...@lucene.apache.org

>
>



--
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com
)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-dev-h...@lucene.apache.org





--
Robert Muir
rcm...@gmail.com 




Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On 04/15/2010 09:49 AM, Robert Muir wrote:

wrong, it doesnt fix the analyzers problem.

you need to reindex.

On Thu, Apr 15, 2010 at 9:39 AM, Earwin Burrfoot <mailto:ear...@gmail.com>> wrote:


On Thu, Apr 15, 2010 at 17:17, Yonik Seeley
mailto:yo...@lucidimagination.com>>
wrote:
> Seamless online upgrades have their place too... say you are
upgrading
> one server at a time in a cluster.

Nothing here that can't be solved with an upgrade tool. Down one
server, upgrade index, upgrade sofware, up.



Having read the thread, I have a few comments. Much of it is summary.

The current proposal requires re-index on every upgrade to Lucene. Plain 
and simple.


Robert is right about the analyzers.

There are three levels of backward compatibility, though we talk about 2.

First, the index format. IMHO, it is a good thing for a major release to 
be able to read the prior major release's index. And the ability to 
convert it to the current format via optimize is also good. Whatever is 
decided on this thread should take this seriously.


Second, the API. The current mechanism to use deprecations to migrate 
users to a new API is both a blessing and a curse. It is a blessing to 
end users so that they have a clear migration path. It is a curse to 
development because the API is bloated with the old and the new. Further 
it causes unfortunate class naming, with the tendency to migrate away 
from the good name. It is a curse to end users because it can cause 
confusion.


While I like the mechanism of deprecations to migrate me from one 
release to another, I'd be open to another mechanism.  So much effort is 
put into API bw compat that might be better spent on another mechanism. 
E.g. thorough documentation.


Third, the behavior. WRT, Analyzers (consisting of tokenizers, stemmers, 
stop words, ...) if the token stream changes, the index is no longer 
valid. It may appear to work, but it is broken. The token stream applies 
not only to the indexed documents, but also to the user supplied query. 
A simple example, if from one release to another the stop word 'a' is 
dropped, then phrase searches including 'a' won't work as 'a' is not in 
the index. Even a simple, obvious bug fix that changes the stream is bad.


Another behavior change is an upgrade in Java version. By forcing users 
to go to Java 5 with Lucene 3, the version of Unicode changed. This in 
itself causes a change in some token streams.


With a change to a token stream, the index must be re-created to ensure 
expected behavior. If the original input is no longer available or the 
index cannot be rebuilt for whatever reason, then lucene should not be 
upgraded.


It is my observation, though possibly not correct, that core only has 
rudimentary analysis capabilities, handling English very well. To handle 
other languages well "contrib/analyzers" is required. Until recently it 
did not get much love. There have been many bw compat breaking changes 
(though w/ version one can probably get the prior behavior). IMHO, most 
of contrib/analyzers should be core. My guess is that most non-trivial 
applications will use contrib/analyzers.


The other problem I have is the assumption that re-index is feasible and 
that indexes are always server based. Re-index feasibility has already 
been well-discussed on this thread from a server side perspective. There 
are many client side applications, like mine, where the index is built 
and used on the clients computer. In my scenario the user builds indexes 
individually for books. From the index perspective, the sentence is the 
Lucene document and the book is the index. Building an index is 
voluntary and takes time proportional to the size of the document and 
time inversely proportional to the power of the computer. Our user base 
are those with ancient, underpowered laptops in 3-rd world countries. On 
those machines it might take 10 minutes to create an index and during 
that time the machine is fairly unresponsive. There is no opportunity to 
"do it in the background."


So what are my choices? (rhetorical) With each new release of my app, 
I'd like to exploit the latest and greatest features of Lucene. And I'm 
going to change my app with features which may or may not be related to 
the use of Lucene. Those latter features are what matter the most to my 
user base. They don't care what technologies are used to do searches. If 
the latest Lucene jar does not let me use Version (or some other 
mechanism) to maintain compatibility with an older index, the user will 
have to re-index. Or I can forgo any future upgrades with Lucene. 
Neither are very palatable.


-- DM Smith







Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:

First, the index format. IMHO, it is a good thing for a major release to be
able to read the prior major release's index. And the ability to convert it
to the current format via optimize is also good. Whatever is decided on this
thread should take this seriously.
 

Optimize is a bad way to convert to current.
1. conversion is not guaranteed, optimizing already optimized index is a noop
2. it merges all your segments. if you use BalancedSegmentMergePolicy,
that destroys your segment size distribution

Dedicated upgrade tool (available both from command-line and
programmatically) is a good way to convert to current.
1. conversion happens exactly when you need it, conversion happens for
sure, no additional checks needed
2. it should leave all your segments as is, only changing their format

   

It is my observation, though possibly not correct, that core only has
rudimentary analysis capabilities, handling English very well. To handle
other languages well "contrib/analyzers" is required. Until recently it did
not get much love. There have been many bw compat breaking changes (though
w/ version one can probably get the prior behavior). IMHO, most of
contrib/analyzers should be core. My guess is that most non-trivial
applications will use contrib/analyzers.
 

I counter - most non-trivial applications will use their own analyzers.
The more modules - the merrier. You can choose precisely what you need.
   
By and large an analyzer is a simple wrapper for a tokenizer and some 
filters. Are you suggesting that most non-trivial apps write their own 
tokenizers and filters?


I'd find that hard to believe. For example, I don't know enough Chinese, 
Farsi, Arabic, Polish, ... to come up with anything better than what 
Lucene has to tokenize, stem or filter these.


   

Our user base are those with ancient,
underpowered laptops in 3-rd world countries. On those machines it might
take 10 minutes to create an index and during that time the machine is
fairly unresponsive. There is no opportunity to "do it in the background."
 

Major Lucene releases (feature-wise, not version-wise) happen like
once in a year, or year-and-a-half.
Is it that hard for your users to wait ten minutes once a year?
   
 I said that was for one index. Multiply that times the number of books 
available (300+) and yes, it is too much to ask. Even if a small subset 
is indexed, say 30, that's around 5 hours of waiting.


Under consideration is the frequency of breakage. Some are suggesting a 
greater frequency than yearly.


DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On 04/15/2010 03:04 PM, Earwin Burrfoot wrote:

BTW Earwin, we can come up w/ a migrate() method on IW to accomplish
manual migration on the segments that are still on old versions.
That's not the point about whether optimize() is good or not. It is
the difference between telling the customer to run a 5-day migration
process, or a couple of hours. At the end of the day, the same
migration code will need to be written whether for the manual or
automatic case. And probably by the same developer which changed the
index format. It's the difference of when does it happen.
 

Converting stuff is easier then emulating, that's exactly why I want a
separate tool.
There's no need to support cross-version merging, nor to emulate old APIs.

I also don't understand why offline migration is going to take days
instead of hours for online migration??
WTF, it's gonna be even faster, as it doesn't have to merge things.

   
Will it be able to be used within a client application that creates and 
uses local indexes?


I;m assuming it will be faster than re-indexing.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On 04/15/2010 03:12 PM, Earwin Burrfoot wrote:

On Thu, Apr 15, 2010 at 23:07, DM Smith  wrote:
   

On 04/15/2010 03:04 PM, Earwin Burrfoot wrote:
 

BTW Earwin, we can come up w/ a migrate() method on IW to accomplish
manual migration on the segments that are still on old versions.
That's not the point about whether optimize() is good or not. It is
the difference between telling the customer to run a 5-day migration
process, or a couple of hours. At the end of the day, the same
migration code will need to be written whether for the manual or
automatic case. And probably by the same developer which changed the
index format. It's the difference of when does it happen.

 

Converting stuff is easier then emulating, that's exactly why I want a
separate tool.
There's no need to support cross-version merging, nor to emulate old APIs.

I also don't understand why offline migration is going to take days
instead of hours for online migration??
WTF, it's gonna be even faster, as it doesn't have to merge things.


   

Will it be able to be used within a client application that creates and uses
local indexes?

I;m assuming it will be faster than re-indexing.
 

As I said earlier in the topic, it is obvious the tool has to have
both programmatic and command-line interfaces.
I will also reiterate - it only upgrades the index structurally. If
you changed your analyzers - that's your problem and you have to deal
with it
Good. (Sorry I missed that. There's just too much in the thread to keep 
track of ;)


As long as my "old" analyzers will still work with the new lucene-core 
jar, I'm fat, dumb and happy with the upgraded index.



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On 04/15/2010 03:25 PM, Shai Erera wrote:

We should create a migrate() API on IW which will touch just those
segments and not incur a full optimize. That API can also be used for
an offline migration tool, if we decide that's what we want.

   
What about an index that has already called optimize()? I presume it 
will be upgraded with what ever is decided?



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On Apr 15, 2010, at 4:50 PM, Shai Erera wrote:

> Robert ... I'm sorry but changes to Analyzers don't *force* people to 
> reindex. They can simply choose not to use the latest version. They can 
> choose not to upgrade a Unicode version. They can copy the entire Analyzer 
> code to match their needs. Index format changes is what I'm worried about 
> because that *forces* people to reindex.

In several threads and issues it has been pointed out that upgrading Unicode 
versions is not an obvious choice or even controllable. It is dictated by the 
version of Java, the version of the OS and any Unicode specific libraries.

A desktop application which internally uses lucene has no control over the 
automatic update of Java (yes it can detect the version change and refuse to 
run or force an upgrade) or when the user feels like upgrading the OS (not sure 
how to detect the Unicode version of an arbitrary OS. Not sure I want to).

Even with server applications, some shared servers have one version of Java 
that all use. And the owner of an individual application might have no say in 
if or when that is upgraded.

This is to say that one needs to be ready to re-index at all times unless it 
can be controlled.

One way to handle the Java/Unicode is to use ICU at a specific version and 
control its upgrade.

One way to handle the OS problem (which really is one of user input) is to keep 
up with the changes to Unicode and create a filter that handles the differences 
normalizing to the Unicode version of the index (if that's even possible).

Still goes to your point. The onus is on the application not on Lucene.

-- DM
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API "relaxation"

2010-04-15 Thread DM Smith

On Apr 15, 2010, at 5:28 PM, Shai Erera wrote:

> DM I think ICU is great. But currently we use JFlex and you can run Java 10 
> if you want, but as long as JFlex is compiled w/ Java 1.4, that's what you'll 
> get. Luckily Uwe and Robert recently bumped it up to Java 1.5. Such a change 
> should be clearly documented in CHANGES so people are aware of this, and at 
> least until they figure out what they want to do with it, they should take 
> the pre-3.1 analyzers (assuming that's the next release w/ JFlex 1.5 
> tokenizers) and use them.

I'm not sure I understand. Is JFlex used by every tokenizer?

> 
> Alternatively, we can think of writing an ICU analyzer/tokenizer, but we're 
> still using JFlex, so I don't know how much control we have on that ...

Robert has already started one. (1488 I think).

> 
> Shai
> 
> On Fri, Apr 16, 2010 at 12:21 AM, DM Smith  wrote:
> 
> On Apr 15, 2010, at 4:50 PM, Shai Erera wrote:
> 
> > Robert ... I'm sorry but changes to Analyzers don't *force* people to 
> > reindex. They can simply choose not to use the latest version. They can 
> > choose not to upgrade a Unicode version. They can copy the entire Analyzer 
> > code to match their needs. Index format changes is what I'm worried about 
> > because that *forces* people to reindex.
> 
> In several threads and issues it has been pointed out that upgrading Unicode 
> versions is not an obvious choice or even controllable. It is dictated by the 
> version of Java, the version of the OS and any Unicode specific libraries.
> 
> A desktop application which internally uses lucene has no control over the 
> automatic update of Java (yes it can detect the version change and refuse to 
> run or force an upgrade) or when the user feels like upgrading the OS (not 
> sure how to detect the Unicode version of an arbitrary OS. Not sure I want 
> to).
> 
> Even with server applications, some shared servers have one version of Java 
> that all use. And the owner of an individual application might have no say in 
> if or when that is upgraded.
> 
> This is to say that one needs to be ready to re-index at all times unless it 
> can be controlled.
> 
> One way to handle the Java/Unicode is to use ICU at a specific version and 
> control its upgrade.
> 
> One way to handle the OS problem (which really is one of user input) is to 
> keep up with the changes to Unicode and create a filter that handles the 
> differences normalizing to the Unicode version of the index (if that's even 
> possible).
> 
> Still goes to your point. The onus is on the application not on Lucene.
> 
> -- DM
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 
> 



Document proximity

2005-03-30 Thread DM Smith
Hi,
I hope I am posting to the right list.
We (sword and jsword at crosswire.org) are indexing bibles with each 
verse becoming a document, with the verse text being indexed and the 
verse reference being stored. This way we can search the text and get 
which verses have hits.

The problem is that verse is an artifical document boundary.
Frequently, verses cut a paragraph into parts, a poem into stanzas, ... 
and the significant parts are across verses. (But we usually don't have 
these in our markup)

Is there any thought of adding a NEAR operator that will work across 
documents?

Specifically, find x NEAR y, where the distance given to near is not 
understood as words but documents.

(We do have a solution that stands entirely outside of lucene, but it 
would be better (for us :) if Lucene had the capability.)

It would also be good to have the ability to have search automatically 
consider that adjacent documents are flowing unless some token in the 
doucment interrupts the flow. In this case, search would return a 
compound document as a hit.

Thanks,
   DM Smith
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Document proximity

2005-03-30 Thread DM Smith
We already have a solution, and it is external to Lucene. We look for
hits on things that are to be adjacent, get their "canonical"
reference and then compare the distances between these. While this
works well, I was hoping for a solution within Lucene.

This does not give us the ability to look for phrases across verse boundaries.

As to storing book or chapter in the index, we don't do that, just the
whole reference.
This is worth looking into as it would help in doing range restricted
searches. Today, we do the restriction after the search.


On Wed, 30 Mar 2005 15:02:53 +0200, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> DM Smith wrote:
> > Hi,
> >
> > I hope I am posting to the right list.
> 
> Yes.
> 
> >
> > We (sword and jsword at crosswire.org) are indexing bibles with each
> > verse becoming a document, with the verse text being indexed and the
> > verse reference being stored. This way we can search the text and get
> > which verses have hits.
> >
> > The problem is that verse is an artifical document boundary.
> 
> You could "smear" the document boundary by adding a number of tokens
> from adjacent verses, directly preceding or following a given verse.
> Perhaps even adding a full verse from each side.
> 
> If you wish, you could also artificially lower their score by adding
> gaps (token.setPositionIncrement()), but then exact matches would not
> work across boundaries, in such case you would have to add a phrase
> query with a slop to your main query.
> 
> >
> > Frequently, verses cut a paragraph into parts, a poem into stanzas, ...
> > and the significant parts are across verses. (But we usually don't have
> > these in our markup)
> >
> > Is there any thought of adding a NEAR operator that will work across
> > documents?
>  >
>  > Specifically, find x NEAR y, where the distance given to near is not
>  > understood as words but documents.
>  >
> 
> I assume that you also add fields for books and chapters. While the
> chapter boundary is sometimes disputed, the book boundaries are pretty
> accurate ;-). You could create an equivalent of the "near" operator by
> limiting your search within a single book (by adding a required clause),
> and then from the list of hits (which should be pretty small in that
> case) you could programmatically select verses that match your proximity
> criteria.
> 
> > It would also be good to have the ability to have search automatically
> > consider that adjacent documents are flowing unless some token in the
> > doucment interrupts the flow. In this case, search would return a
> > compound document as a hit.
> 
> Lucene doesn't have a notion of compound documents, it's up to the
> application to do that. However, it's easy to retrieve documents that
> precede or follow a given document. It's also easy to retieve documents
> that contain a given term (similar to a primary key), let's say "John
> 1:12". You could also add a field to flag a given document as the "end
> of chapter", or "end of book".
> 
> I would be more than happy to help you find a good solution - I'm a
> born-again Christian, and I use the Sword application from time to time...
> 
> --
> Best regards,
> Andrzej Bialecki
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Benchmark - Wintel faster than Unix (?)

2005-04-21 Thread DM Smith
At home, running a dual boot WinXPsp2 and Fedora Core 3, I found that 
FC3 was faster. At least initially.
The difference was staggering. Indexing a Bible, creating one doc per 
verse and storing the verse reference but not storing the verse, took a 
couple of minutes under FC3 and 2.5+ hours under windows.

Then I turned off WinXP fast index support and also turned off active 
virus scan. Then the times were comparable.

It seems that windows was fast indexing and virus scanning each 
transient file created by lucene.

Lesson: Comparisons are difficult to make.
Anthony Vito wrote:

_Not_ to start wars over pentium/vs/opteron/vs/sparc or
unix/vs/linux/vs/windows. I thought this was a very valid observation.
That confuses many a good man. Also, there are most likely many people
making hardware decisions for Lucene going into production ( I know I
made one ) The better the decisions are, the faster Lucene will run, and
be perceived as the high quality piece of software it is.

On Mon, 2005-04-04 at 07:09, Philipp Breuss wrote:
 

Hello,
we were doing Lucene Perfomance tests with the same index and the same
amount of data on Sun Unix machines and Windows machines with following
results:
Index: 3.1 Mio documents
Index in RAM
Server 1:
Sun V880, 4 CPUs, 8 GB RAM; OS: Unix
Server 2:
HP Proliant DL560 G1", 4 CPUs mit je 2,7 GHz, 1 GB RAM; OS Windows 2000
Results: 
Average search time Server 1: 5,5
Average search time Server 2: 1,6s

The windows machine (server 2) is about 5 times faster than the quite a bit
more expensice unix machine (server 1). 

Can anybody explain this?
   

Sure. There are many many factors at work here. 

1.) pure clock speed. Get the obvious out of the way first. 2.7Ghz
clocks are going to beat the crap out of SparcIII 925Mhz ( That's what's
in the V880 right? ) all day when running little Java programs.
2.) RAM IO subsystem. Those 2.7Ghz clocks are feed by a _much_ faster
bus (although a shared bus) Then the V880 has, and you're blowing the
8Mb cache's on the Sparcs.
3.) Getting more theoretical now... The Sparcs have 32 general purpose
registers. When running Java over a JIT on these chips you lose on the
initialization, and on the execution. It takes longer for the JIT to
paint the registers, and it doesn't do as good of a job because it
doesn't have the time. This has always been a problem with Java on
Sparcs.
 

Did anybody make similar experiences? 
   

Yes. I've developed and ran many Java programs on an 8 way V880 with
32gigs of main memory. You _only_ win if your programs and highly
concurrent, and you start to need better then 3+ gig heap sizes. I also
did not have access to a VM with a 64bit data model for the Sparc. I
suspect you don't either. Is this true?
 

Which HW+OS confirgurations deliver the best perfomance?
   

To answer this from a theoretical standpoint, not having lots of
different machines to test We can think about what Java needs, and
specifically what Lucene needs. The fastest chips for _most_ Java
applications (especially concurrent ones) is the Opteron. Low latency IO
subsystem, point to point bus, hidden hardware optimized registers that
the JIT doesn't have to paint, MOESI cache coherency... yadda yadda.
Lucene specifically will benefit from a large cache, and a low latency
main memory setup, since it's datasets will almost always blow the cache
except in the tiniest applications. Granted, the difference between Xeon
setups probably isn't enough to warrant _new_ hardware purchases... but
it's something to think about next time when you're trying to get ever
last bit for your dollar.
On a side note against the above benchmark. You could probably turn the
tables very quickly if you timed several(hundreds of) concurrent
searches and started to load the machines down. This is what you pay for
when you buy a V880, a multi user system that can take one hell of
beating and remain stable, and responsive.
-vito

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: svn commit: r164695 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/search/Hit.java src/java/org/apache/lucene/search/HitIterator.java src/java/org/apache/lucene/search/Hits.java s

2005-04-26 Thread DM Smith
Erik Hatcher wrote:
On Apr 26, 2005, at 2:38 PM, Daniel Naber wrote:
On Tuesday 26 April 2005 02:21, [EMAIL PROTECTED] wrote:
+  public String toString() {
+try {
+  return getDocument().toString();
+} catch (IOException e) {
+  return null;
+}
+  }

Wouldn't it be better here to re-throw the exception as a 
RuntimeException?

I don't know would it?  I have no preference, though it seems ok 
to me to simply return null since this is the toString method.  For a 
Document, the toString is only useful for debugging anyway.
Two thoughts:
If getDocument().toString() cannot possibly throw an IOException, but it 
is part of the signature, then it does not matter.

Once lucene is at 1.4, it would be better to use an assert in the catch 
and not throw an error but return "" instead of null. The asserts can be 
removed at runtime by passing flags to the program. Assertions are best 
used for situations that should never happen.

public String toString()
{
   try {
  return getDocument().toString();
   } catch (IOException e) {
  assert false : e;
  return "";
   }
}
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Proposal for change to DefaultSimilarity's lengthNorm to fix "short document" problem

2005-07-08 Thread DM Smith
At crosswire.org we are using Lucene to index Bibles with each Bible 
having its own index and each
verse in the Bible is a document in the index. So each document is 
short. Length depends upon the

language of translation, but the lengths are from 2 to less than 100.

In our case the existing bias seems appropriate and it does not appear 
to break down for extremely

short documents.

I would suggest that if the bias is changed that it be based upon the 
length and distribution of documents

in the index. Or it be driven by programmer supplied parameters.

Mark Bennett wrote:


Our client, Rojo, is considering overriding the default implementation of
lengthNorm to fix the bias towards extremely short RSS documents.

The general idea put forth by Doug was that longer documents tend to have
more instances of matching words simply because they are longer, whereas
shorter documents tend to be more precise and should therefore be considered
more authoritative.

While we generally agree with this idea, it seems to break down for
extremely short documents.  For example, one and two word documents tend to
be test messages, error messages, or simple answers with no accompanying
context.

I've seen discussions of this before from Doug, Chuck, Kevin and Sanji;
likely others have posted as well.  We'd like to get your feedback on our
current idea for a new implementation, and perhaps eventually see about
getting the default Lucene formula changed.

Pictures speak louder than words.  I've attached a graph of what I'm about
to talk about, and if the attachment is not visible, I've also posted it
online at:
http://ideaeng.com/customers/rojo/lucene-doclength-normalization.gif

Looking at the graph, the default Lucene implementation is represented by
the dashed dark-purple line.  As you can see it's giving the highest scores
for documents with less than 5 words, with the max score going to single
word documents.  Doug's quick fix for clipping the score for documents with
less than 100 terms is shown in light purple.

Rojo's idea was to target documents of a particular length (we've chosen 50
for this graph), and then have a smooth curve that slopes away from there
for larger and smaller documents.  The red, green and blue curves are some
experiments I did trying to stretch out the standard "bell curve" (see
http://en.wikipedia.org/wiki/Normal_distribution)

The "flat" and "stretch" factors are specific to my formula.  I've tried
playing around with how gradual the curve slopes away for smaller and larger
documents; for example, the red curve really "punishes" documents with less
than 5 words.

We'd really appreciate your feedback on this, as we do plan to do
"something".  After figuring out what the curve "should be", the next items
on our end are implementation and fixing our excising indices, which I'll
save for a later post.

Thanks in advance for your feedback,
Mark Bennett
[EMAIL PROTECTED]
(on behalf of rojo.com)





 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexWriter and system properties

2005-07-12 Thread DM Smith

From the perspective of a user of Lucene:
IMHO, having system properties for a third-party library is not good:
1) System properties are not explicit in the library's api.
2) System properties are applied non-local to the use of a library's api.
3) System properties represents global variables, not local control.
4) System properties may or may not be cached. If cached, this may be 
early enough to make it hard to set it after application startup.
5) System properties force an application to have additional mechanisms 
for startup.

6) System properties are seldom documented well.
...

Daniel Naber wrote:


Hi,

there's a bug report (#34359) asking to catch and ignore access exceptions 
when reading system properties so Lucene can be used in an applet. I 
wanted to apply that patch, but now I'm not sure anymore: does it make 
sense for Lucene to read settings from system properties? Shouldn't that 
be left to the application that uses Lucene? There are set... calls for 
most of these settings, so it's trivial to implement this for a user of 
Lucene.


Regards
Daniel

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 35838] New: - Java NIO patch against Lucene 1.9

2005-07-24 Thread DM Smith
I tried to use bugzilla to record the comment, but I haven't the id
yet. See below for comment:

On 7/23/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
> RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> .
> ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
> INSERTED IN THE BUG DATABASE.
> 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=35838
> 
>Summary: Java NIO patch against Lucene 1.9
>Product: Lucene
>Version: unspecified
>   Platform: All
> OS/Version: All
> Status: NEW
>   Severity: normal
>   Priority: P2
>  Component: Store
> AssignedTo: java-dev@lucene.apache.org
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Robert Engels previously submitted a patch against Lucene 1.4 for a Java NIO-
> based Directory implementation.  It also included some changes to FSDirectory
> to allow better concurrency when searching from multiple threads.  The
> complete thread is at:
> 
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/%
> [EMAIL PROTECTED]
> 
> This thread ended with Doug Cutting suggesting that someone port Robert's
> changes to the SVN trunk.  This is what I've done in this patch.
> 
> There are two parts to the patch.  The first part modifies FieldsReader,
> CompoundFileReader, and SegmentReader, to allow better concurrency when
> reading an index.  The second part includes the new NioFSDirectory
> implementation, and makes small changes to FSDirectory and IndexInput to
> accomodate this change.  I'll put a more detailed outline of the changes to
> each file in a separate message.
> 
> To use the new NioFSDirectory, set the system property
> org.apache.lucene.FSDirectory.class to
> org.apache.lucene.store.NioFSDirectory.  This will cause
> FSDirectory.getDirectory() to return an NioFSDirectory instance.  By default,
> NioFile limits the number of concurrent channels to 4, but you can override
> this by setting the system property org.apache.lucene.nio.channels.

It has been noted in another thread that System properties are being replaced.
Should this mechanism be used here or not?
Wouldn't a static setter on FSDirectory work as well?

> 
> I did some performance tests with these patches.  The biggest improvement came
> from the concurrency improvements.  NioFSDirectory performed about the same as
> FSDirectory (with the concurrency improvements).
> 
> I ran my tests under Fedora Core 1; uname -a reports:
> Linux myhost 2.4.22-1.2199.nptlsmp #1 SMP Wed Aug 4 11:48:29 EDT 2004 i686
> i686 i386 GNU/Linux
> 
> The machine is a dual xeon 2.8GHz with 4GB RAM, and the tests were run against
> a 9GB compound index file.  The tests were run "hot" -- with everything
> already cached by linux's filesystem cache.  The numbers are:
> 
> FSDirectory without patch:  13.3 searches per second
> FSDirectory WITH concurrency patch: 14.3 searches per second
> 
> Both tests were run with 6 concurrent threads, which gave the highest numbers
> in each case.  I suspect that the concurrency improvements would make a bigger
> difference on a more realistic test where the index isn't all cached in RAM
> already, since the I/O happens whild holding the sychronized lock.  Patches to
> follow...
> 
> Thoughts?
> 
> --
> Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are the assignee for the bug, or are watching the assignee.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene does NOT use UTF-8

2005-08-30 Thread DM Smith

Daniel Naber wrote:


On Monday 29 August 2005 19:56, Ken Krugler wrote:
 


"Lucene writes strings as a VInt representing the length of the
string in Java chars (UTF-16 code units), followed by the character
data."
   

But wouldn't UTF-16 mean 2 bytes per character? That doesn't seem to be the 
case.


UTF-16 is a fixed 2 byte/char representation. But one cannot equate the 
character count with the byte count. Each Java char is 2 bytes. I think 
all that is being said is that the VInt is equal to str.length() as java 
gives it.


On an unrelated project we are determining whether we should use a 
denormalized (letter followed by an accents) or a normalized form 
(letter with accents) of accented characters as we present the text to a 
GUI. We have found that font support varies but appears to be better for 
denormalized. This is not an issue for storage, as it can be transformed 
before it goes to screen. However, it is useful to know which form it is in.


The reason I mention this is that I seem to remember that the length of 
the java string varies with the representation. So then the count would 
not be the number of glyphs that the user sees. Please correct me if I 
am wrong.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene does NOT use UTF-8.

2005-08-30 Thread DM Smith



Ken Krugler wrote:

I think the VInt should be the numbers of bytes to be stored using 
the UTF-8

encoding.

It is trivial to use the String methods identified before to do the
conversion. The String(char[]) allocates a new char array.

For performance, you can use the actual CharSet encoding classes - 
avoiding

all of the lookups performed by the String class.



Regardless of what underlying support is used, if you want to write 
out the VInt value as UTF-8 bytes versus Java chars, the Java String 
has to either be converted to UTF-8 in memory first, or pre-scanned. 
The first is a memory hit, and the second is a performance hit. I 
don't know the extent of either, but it's there.


Note that since the VInt is a variable size, you can't write out the 
bytes first and then fill in the correct value later.


Sure you can. Do a "tell" to get the position. Write any number. Write 
the text. Do another "tell" to note the position. Based on the 
difference between the two "tells", you have the length. Rewind to the 
first "tell" and write out the number. Then advance to the end.


I am not recommending this, but it can be done.

There may be other ways.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Version 1.9

2005-09-18 Thread DM Smith
On 9/18/05, Jeff Breidenbach <[EMAIL PROTECTED]> wrote:
> 
> Putting on my Debian maintainer hat:
> 
> 1) Does it make sense for Linux distributions to ship
> Lucene 1.9, or simply wait for 2.0? (I'm thinking 2.0...)


I think it should include what is available. If pre-2.0, it should be 1.4.3and 
1.9. If post 2.0, then 1.4.3 and 2.0, possibly 1.9. I don't think it would 
be good to presume that software has upgraded from 1.4.3 to 2.0, at least 
for a while.


Re: Basic Question on Documents and File Format

2005-11-11 Thread DM Smith

Ashwin Satyanarayana wrote:


Hello,

I am new to Lucene. I was trying to use Lucene with TREC-6 Data. The dataset 
for TREC-6 used in 1997 contains many input files.  Each input file hasmultiple 
documents
(some files contain over 200 documents) tagged by DOCNO. The result given
by Lucene to a query is a list of files and not documents.

Q1) Is there a way of getting the query results in terms of documents
within the files rather than files ( without modifying the code)?
 

In lucene a Document object is the unit of search/storage/indexing. It 
may or may not correspond to an user's view of files or documents.




Q2) If the above is not posssible, what would be the best way to modify
the code?
 

To achieve what you want, I think you need to store and/or index each of 
your documents as a lucene Document. You may also want to store the file 
name and document identifier as a lucene field in the lucene Document.




Thanks and Regards,
Ashwin

Questions on how to use lucene should be addressed to the lucene users 
mailing list. This one is for developers developing lucene itself.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene 1.9 Field implementation

2005-11-13 Thread DM Smith
I was looking at how compatible 1.9 is with 1.4.3 wrt my implementation.
Turns out it works without problem. There were only three deprecations that
would need to be taken care of to migrate to 2.0.

When I looked into the changes that would need to be made to go to 2.0, I
noticed an opportunity for change. (I could make the changes with
appropriate junit tests, if the following is acceptable.)

In the implementation of Field there are several "type safe enumerations"
such as Store. While the implementation is a good way to constrain
parameters to acceptable values, I think that it could be improved a bit. In
the routines that actually care about what kind of Store was passed there is
a cascading if...then...else, which compares the passed parameter to the
known set of Store objects. For each possible match there is a block of code
that interprets the meaning of that Store. Finally, if there is no match an
exception is thrown in the final else clause.

The problem with this idiom, is that cascading if...thenelse are needed
when ever a Store object is used. Further, if another kind of Store were to
be added (say, a jdbc store) then potentially a lot of code would need to be
changed.

Example:

if (store == Store.YES){
  this.isStored = true;
  this.isCompressed = false;
}
else if (store == Store.COMPRESS) {
  this.isStored = true;
  this.isCompressed = true;
}
else if (store == Store.NO){
  this.isStored = false;
  this.isCompressed = false;
}
else
  throw new IllegalArgumentException("unknown store parameter " + store);



If each Store object were to have behavior, then the usage of Store could be
simplified. So the above code would look like:

this.isStored = store.isStored();
this.isCompressed = store.isCompressed();


And if performance and resources are not too much of an issue then rather
than copying out the values, one could use the Store object directly:

this.store = store;


If there were more behavior that needed to be added to a Store, changes
would be minimized. Rather than locating all the cascading if...then...else
blocks, one would merely add a new Store, new behavior to all Store objects
and only add calls where needed.

The code to the Store class would look something like:
(This change should not *force* any changes elsewhere in 1.9.)

// Note: subclassing Parameter has no real value in this implementation,
//   except perhaps to house a name
// Note: This class could be abstract just as easily.
public static final class Store implements Serializable {

// Note: we are not going to serialize the name
private transient String name;

private Store(String name) {
  this.name  = name;
}

// Note: these could be abstract just as easily.
public boolean isStored() { return false; }
public boolean isCompressed() { return false; }

/** Store the original field value in the index in a compressed
form. This is
 * useful for long documents and for binary valued fields.
 */
public static final Store COMPRESS = new Store("COMPRESS")
{
public boolean isStored() { return true;  }
public boolean isCompressed() { return true;  }
);

/** Store the original field value in the index. This is useful
for short texts
 * like a document's title which should be displayed with the results. The
 * value is stored in its original form, i.e. no analyzer is used
before it is
 * stored.
 */
public static final Store YES = new Store("YES")
{
public boolean isStored() { return true;  }
public boolean isCompressed() { return false; }
};

/** Do not store the field value in the index. */
public static final Store NO = new Store("NO");
{
public boolean isStored() { return false; }
public boolean isCompressed() { return false; }
);

// Note: the serialization in Parameter will not work:
// We have to build the correct object.
// Rather than serializing the whole object, use a number to optimize storage
// Support for serialization
private static int nextObj;
private final int obj = nextObj++;
private static final Store[] VALUES =
{
COMPRESS,
YES,
NO,
};

Object readResolve()
{
return VALUES[obj];
}

// To deserialize from a config file allow for lookup based on name:
/**
 * Lookup method to convert from a String
 * @throws ClassCastException if aName is not a valid Store.
 */
public static Store fromString(String aName)
{
for (int i = 0; i < VALUES.length; i++)
{
Store store = VALUES[i];
if (store.name.equalsIgnoreCase(aName))
{
return store;
}
}

throw new ClassCastException("unknown Store " + aName);
}

// Just for the sake of completeness
/**
 * Prevent subclasses from overri

Re: "Advanced" query language

2005-12-06 Thread DM Smith
One thing I like about the possibility of XML (as opposed to other 
syntax) is that I could create query templates and process them with 
XSLT. And I can do this client side and also in most modern browsers.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NioFile cache performance

2005-12-09 Thread DM Smith

John Haxby wrote:


Robert Engels wrote:

Using a 4mb file (so I could be "guarantee" the disk data would be in 
the OS cache as well), the test shows the following results.



Which OS?   If it's Linux, what kernel version and distro?   What 
hardware (disk type, controller etc).


It's important to know: I/O (and caching) is very different between 
Linux 2.4 and 2.6.   The choice of I/O scheduler can also make a 
significant difference on 2.6, depending on the workload.   The type 
of disk and its controller is also important -- and when you get 
really picky, the mobo model number.


I don't dispute your finding for a second, but it would be good to run 
the same test on other platforms to get comparative data: not least 
because you can get the kind of I/O time improvement you're seeing on 
some workloads on different versions of the Linux kernel.


I think that the results were informative from a comparative basis on a 
single machine. It compared different techniques and showed their 
relative performance on that machine.


I also agree that the architecture of the machine can play an important 
part in how code performs. I wrote a piece of software that ran well on 
a 4-way, massive raid configuration, with gobs of ram only to have it 
re-targeted to a 1-way, small ram box, where it had to be rewritten to 
run at all.


Perhaps, it would be good to establish guidelines for reporting 
performance, including the posting of test data and test code.


This may encourage others to download the data and code, perform the 
test and report the results.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NioFile cache performance

2005-12-09 Thread DM Smith

Robert Engels wrote:


As stated in a previous email - good idea.

All of the code and testcases were attached to the original email.

The testcases were the answer to a request for such (at least a month ago if
not longer).
 


I am sorry, if I gave you the wrong impression.

I was merely suggesting a formalization of the process and that there be 
documentation on the Lucene website that outlines how performance tests 
and data should be provided, and how people can participate and provide 
their results.


I have seen this issue come up several times (perhaps the following is 
an oversimplification):
Someone will suggest a performance enhancement and perhaps supply the 
code. Then there will be a general discussion about the merits of the 
change and the validity of the results, with question about the factors 
involved and statements regarding how architectures widely differ and 
the outcomes can be significantly different. If enough "voters" like the 
change, then it is committed.


Should there be a representative set of architectures to which 
performance test should be targeted? (For example, I have written an 
application that uses lucene to index and search bibles. And the minimum 
hardware requirement is a Win98 laptop, which many of our user's have.)



-Original Message-
From: DM Smith [mailto:[EMAIL PROTECTED]
Sent: Friday, December 09, 2005 7:07 AM
To: java-dev@lucene.apache.org
Subject: Re: NioFile cache performance


John Haxby wrote:

 


Robert Engels wrote:

   


Using a 4mb file (so I could be "guarantee" the disk data would be in
the OS cache as well), the test shows the following results.
 


Which OS?   If it's Linux, what kernel version and distro?   What
hardware (disk type, controller etc).

It's important to know: I/O (and caching) is very different between
Linux 2.4 and 2.6.   The choice of I/O scheduler can also make a
significant difference on 2.6, depending on the workload.   The type
of disk and its controller is also important -- and when you get
really picky, the mobo model number.

I don't dispute your finding for a second, but it would be good to run
the same test on other platforms to get comparative data: not least
because you can get the kind of I/O time improvement you're seeing on
some workloads on different versions of the Linux kernel.
   



I think that the results were informative from a comparative basis on a
single machine. It compared different techniques and showed their
relative performance on that machine.

I also agree that the architecture of the machine can play an important
part in how code performs. I wrote a piece of software that ran well on
a 4-way, massive raid configuration, with gobs of ram only to have it
re-targeted to a 1-way, small ram box, where it had to be rewritten to
run at all.

Perhaps, it would be good to establish guidelines for reporting
performance, including the posting of test data and test code.

This may encourage others to download the data and code, perform the
test and report the results.
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-486) Core Test should not have dependencies on the Demo code

2006-01-11 Thread DM Smith

I have a principle that I code by:
   The Principle of Least Surprise - Write code in such a way that it 
minimize surprises.


It is surprising to me that test cases would have a dependency on demo 
code. IMHO, the dependency should be removed.


Yesterday I installed lucene from jpackage using yum. It also installed 
lucene-demos as that was a dependency. This made no sense to me and 
probably is a problem in the jpackage RPM and not here. But perhaps it 
is a side effect of this issue.


Erik Hatcher (JIRA) wrote:

   [ http://issues.apache.org/jira/browse/LUCENE-486?page=comments#action_12362429 ] 


Erik Hatcher commented on LUCENE-486:
-

I concur with Grant on this - the dependency from test to demo has caused me 
annoyance as well.   I'm in favor of a fix to it, but haven't looked at Grant's 
solution yet.

 


Core Test should not have dependencies on the Demo code
---

Key: LUCENE-486
URL: http://issues.apache.org/jira/browse/LUCENE-486
Project: Lucene - Java
   Type: Test
   Versions: 1.4
   Reporter: Grant Ingersoll
   Priority: Minor
Attachments: FileDocument.java, testdoc.patch

The TestDoc.java Test file has a dependency on the Demo FileDocument code.  
Some of us don't keep the Demo code around after downloading, so this breaks 
the build.
Patch will be along shortly
   



 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-30 Thread DM Smith

Andrzej Bialecki wrote:

Paul Elschot wrote:

On Saturday 28 January 2006 19:27, eks dev wrote:
 

might be interesting:
http://www.iis.uni-stuttgart.de/intset/

Another way to represent Bit(Integer)Set. Should
outperform nicely BitSet or HashBitSet as far as
iteration speed and memory is concern. In Lucene where
distribution of set bits is typically exponential...
usage in caching, Filter... 


This gives some context, a performance comparison program,
and indicates that the licence for the context is LGPL:

http://www.iis.uni-stuttgart.de/personen/lippold/MathCollection/index-en.html 

  


Unfortunately, the license distributed with the JAR (which we must 
assume takes precedence over whatever is stated on the web pages) is 
much more restrictive, it's the Java Research License, which 
specifically disallows any commercial use. So, short of reimplementing 
it from scratch it's of no use except for academic study. Pity.
The idea is fairly simple: If most uses of BitSet are sparse over a 
large universe, then using a tree of BitSets would be cheaper.


The implementation needs an Interface representation of BitSet to work, 
with BitSet being derived from that implementation. Thus the code uses 
IntegerSet as the fundamental interface. This is essentially all the 
methods in BitSet, with BitSet being changed to IntegerSet where ever it 
occurs. BitIntegerSet is a copy of BitSet from Sun but trivially 
modified to implement IntegerSet.


It is this usage of BitSet that is causing the basic problem.







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.9 RC1

2006-02-15 Thread DM Smith
Not to get to far ahead, but what is the schedule relation between 1.9 
and 2.0?

What are the dependencies on releasing 2.0?

Doug Cutting wrote:
I'd like to push out a 1.9 release candidate in the next week or so. 
Are there any patches folks are really hoping to sneak into 1.9?  If 
so, now's the time.


Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.9 RC1

2006-02-15 Thread DM Smith

Erik Hatcher wrote:

On Feb 15, 2006, at 9:11 AM, DM Smith wrote:
Not to get to far ahead, but what is the schedule relation between 
1.9 and 2.0?

What are the dependencies on releasing 2.0?


My understanding is that 2.0 will be 1.9 with all the deprecated API 
removed.  Maybe there are other features planned?


Would that mean that 1.9 and 2.0 will be released at the same time?



Erik




Doug Cutting wrote:
I'd like to push out a 1.9 release candidate in the next week or so. 
Are there any patches folks are really hoping to sneak into 1.9?  If 
so, now's the time.


Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wildcard search with variable length

2006-02-22 Thread DM Smith

Andrzej Bialecki wrote:

Tiago Silveira wrote:
IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it 
doesn't

justify keeping the old, undocumented, arguably incorrect behavior.
  


I have a different view on this issue - IMHO treating "?" as "exactly 
one character" is counterintuitive for people familiar with the use of 
wildcards: in all popular regular expression languages, and also in 
DTD/XML world, a single "?" metacharacter means "zero or one", which 
is probably why the original behavior was introduced (or at least it 
was more compatible with the use of "?" in other contexts).


There are two distinctly different traditions for ?, *, and +. One is 
globbing (standard in UNIX shells) and the other is regular expression. 
In the case of globbing ? has always stood for a single character, * 
stands for one or more and + is not defined. In regular expression, 
these modify the prior regular expression to mean 0 or 1; 0 or more; and 
1 or more.


Lucene seems to support globbing (trailing) and not regex. To me this is 
clear in the documentation.


That said, a search seems to be a kind of regex and blending these two 
traditions leads to confusion. Though the first time I tried lucene to 
do a search, I used these metacharacters as if they were regex modifiers 
not globbing characters. (Natural behavior of a perl programmer!) It did 
not work as expected. This led me to read the docs and then I understood 
the errors of my ways.


Personally, I don't want an either/or. I want a both/and. Modern unix 
shells provide both/and, albeit with different syntax.


I see this more as a feature request than an argument as to the 
usefulness or properness of either. Both are useful. Both are proper. 
Both are intuitive. Both are counterintuitive. It all depends on your 
"tradition".




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wildcard search with variable length

2006-02-22 Thread DM Smith

John Haxby wrote:

Doug Cutting wrote:


DM Smith wrote:

Personally, I don't want an either/or. I want a both/and. Modern 
unix shells provide both/and, albeit with different syntax.


I see this more as a feature request than an argument as to the 
usefulness or properness of either. Both are useful. Both are 
proper. Both are intuitive. Both are counterintuitive. It all 
depends on your "tradition".


+1

Doug


Doesn't the RegexQuery do this for you?

jch


I have not looked at it (yet). If so, that would be the "both/and".

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 1.9 RC1 release available

2006-02-27 Thread DM Smith



Erik Hatcher wrote:


On Feb 25, 2006, at 3:24 PM, Daniel Naber wrote:

On Freitag 24 Februar 2006 00:50, Doug Cutting wrote:


Are these all modules that don't need external libs?


So far as I know!


I found another module that requires external libraries: regex. These 
are
even defined in the additional.dependencies property in the 
build.xml, but

it seems it's not used (at least not for copying the libs to the
distribution).


I personally don't think we should be distributing any external 
dependencies.  Whoever builds the releases needs to have the 
dependencies locally, but 3rd party JARs, even Apache ones, should not 
go along for the .tar/zip ride IMO.  In the same manner that Ant 
doesn't ship with junit.jar or any other 3rd party dependencies, it 
still was compiled with them.


I'm happy to go with the flow of the consensus though, and if folks 
want the other JARs to go along then that's fine also.  There should 
definitely be some docs that explain these 3rd party dependencies, and 
I'll add that to the regex docs that I'm going to work on tomorrow.

My opinion as a user of the lucene:

If I understand correctly, there are no dependencies for lucene itself, 
but only for contrib? If so, please don't package jars. If not, document 
them and let us get them if we use the classes that require them.


On Linux I use jpackage for installs, I expect that the dependencies to 
be broken out as separate installs.
As far as Windows goes, I don't have any problem getting jars as I need 
them.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 1.9-final release available

2006-02-28 Thread DM Smith

To ask the obvious question: When will 2.0 be released?
I would favor an earlier release date so that there won't be much 
difference between 1.9 and 2.0 except what's stated below.


Doug Cutting wrote:

Release 1.9-final of Lucene is now available from:

http://www.apache.org/dyn/closer.cgi/lucene/java/

This release has many improvements since release 1.4.3, including new 
features, performance improvements, bug fixes, etc.  For details, see:


http://svn.apache.org/viewcvs.cgi/*checkout*/lucene/java/tags/lucene_1_9_final/CHANGES.txt 



1.9 will be the last 1.x release. It is both back-compatible with 
1.4.3 and forward-compatible with the upcoming 2.0 release. Many 
methods and classes in 1.4.3 have been deprecated in 1.9 and will be 
removed in 2.0.  Applications must compile against 1.9 without 
deprecation warnings before they are compatible with 2.0.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index compatibility question

2006-03-01 Thread DM Smith

I have just upgraded to 1.9-final and am now testing my use of it.
One question regarding compatibility.
Does 1.4.3 search 1.9-final built indexes?
I find that 1.9 reads my 1.4.3 built indexes just fine. But not the 
other way around.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: this == that

2006-05-01 Thread DM Smith

karl wettin wrote:
The code is filled with string equality code using == rather than 
equals(). I honestly don't think it saves a single clock tick as the 
JIT takes care of it when the first line of code in the equals method 
is if (this == that) return true;

If the strings are intern() then it should be a touch faster.
If the strings are not interned then I think it may be a premature 
optimization.


IMHO, using intern to optimize space is a reasonable optimization, but 
using == to compare such strings is error prone as it is possible that 
the comparison is looking at strings that have not been interned.


Unless it object identity is what is being tested or intern is an 
invariant, I think it is dangerous. It is easy to forget to intern or to 
propagate the pattern via cut and paste to an inappropriate context.


Please correct me if I'm wrong.

I can commit to do the changes to the core code if it is considered 
interesting.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: J2ME

2006-05-10 Thread DM Smith
I have an application I'd like to move to J2ME which uses lucene for 
creating and searching indexes. I can get by with the capabilities of 
search.


karl wettin wrote:

On Sun, 2006-05-07 at 12:55 +0100, [EMAIL PROTECTED] wrote:

  

Some what off topic, but I've started looking in to porting Lucene to
J2ME (that leaves me with only pre-JCF collections and no floats). I
have absolutely no idea what to use it for, but imagine something in the
lines of distributed collaborate filtering could be fun.
  

We have done this two years ago, for Lucene 1.2.




I would love to take a look at the code if it is availabe.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene 2.0

2006-05-18 Thread DM Smith

Could someone enumerate what needs to be done before 2.0 is released.
From following this thread, it was stated that 2.0 was 1.9 with 
deprecations removed.

Recently it appears to be becoming much more than that.

Personally, I'd like to see a 2.0 now and if there are changes then 
subsequent releases, say 2.0.1 for bugs and 2.1 for changes to API or 
file structure (e.g. byte count vs char count)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.0

2006-05-18 Thread DM Smith



Chris Hostetter wrote:

: Could someone enumerate what needs to be done before 2.0 is released.
:  From following this thread, it was stated that 2.0 was 1.9 with
: deprecations removed.
: Recently it appears to be becoming much more than that.

I believe Doug's suggestion was to hold off just long enough to fix any
egregious bugs, or apply any "safe" patches for bugs that have allready
been fixed but not yet applied.

the "2.0 is the same as 1.9 but with deprecations removed" policy was
about features, but if there are bugs that can be fixed easily, let's fix
them.

at the monent, there are two Jira issues with a "Fix" version of 2.0 still
unresolved: LUCENE-556 and LUCENE-546 .. presumably, unless someone marks
any other bugs as "Fix for 2.0" 2.0 can be released once those bugs have
been fixed, or sooner if it's decided that nbo one has the time to fix
those (or the existing patches aren't safe to apply hastily)
  


Both of these are unassigned. 546 has an attached patch. 556 mentions a 
work around.


Well, I am looking forward to someone else taking this on;) and getting 
2.0 out the door!




...or at least, that's the way i udnerstand it ... i'm no policy maker.


-Hoss


I think this is in keeping with what was said here.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



JPackage was Re: [VOTE] 2.0 release this Friday?

2006-05-23 Thread DM Smith
I read the ReleaseTodo and saw a task to push to maven, but not to 
JPackage. Any possibility of adding a task to notify the JPackage folks 
of the release and perhaps in the future maintaining the package?


Otis Gospodnetic wrote:

+1

If you want, I can try doing this, but I'm likely going to have some questions. 
 I think the first one would be:
Is http://wiki.apache.org/jakarta-lucene/ReleaseTodo up to date?

Otis

- Original Message 
From: Doug Cutting <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, May 22, 2006 12:42:01 PM
Subject: [VOTE] 2.0 release this Friday?

I propose to make Lucene release 2.0.0 this Friday, the 26th of May.

If there are bugs whose patches that you feel should be included in this 
release, please lobby to have them committed prior to this date.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

Please don't move to Java 5.

My reasons are simple (and some perhaps stem out of old information or 
misinformation):
   MacOS 9 does not run Java 1.5, which is one of my target platforms. 
Has Java 5 been ported to all target platforms?
   Java 5 has nice syntax sugar but no real substance other than the 
stronger type checking.

   (My opinion based on porting to Java 5 and then back to Java 1.4.2.)
   Not all support tooling (e.g. java2html, checkstyle, findbugs, ...) 
supports Java 5 syntax. This reduces my ability to qa code using these 
tools.

   Java 5 moves lucene away from the possibility of ever working on J2ME.
   Java 5 moves away from running on an open source java, e.g. gjc.
   The performance benefits of a Java 5 JVM are independent of Java 5 
source.
   Going to Java 5 requires all applications using Lucene to upgrade to 
Java 5.


Sure Java 5 has been out for a while and Java 6 is around the corner, 
but ask yourself why it is not the defacto standard version of Java.


karl wettin wrote:

Will code with 1.5 syntax be committed?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

Robert Engels wrote:

If you need to run on OS9 then run Lucene 1.9 (or it seems 2.0, just not
2.1).

You have a working, stable release that runs under 1.4. There are MANY
applications that don't run under OS9 now (they require OSX). Why should
Lucene be any different? I am fairly certain you cannot even purchase OS9
from Apple anymore.
  
Lucene should be different from other applications because it is 
different. Lucene is not an application, it is a service library. It is 
used by applications.


Java applications are different than other applications. The promise of 
"write once, run anywhere" begins to fizzle when "anywhere" is no longer 
a goal.


I am using a fair number of other apache libraries and none of them 
require Java 5. Do any of apache libraries require Java 5? Will Lucene 
be the first?


For my application (www.crosswire.org/bibledesktop) we provide software 
that reads Lucene indexed Bibles. Our clientèle are pastors, 
missionaries, students, churches and lay people that have a history of 
running very old machines. We are also toying with porting to PDAs and 
cell phones.



1.9 is a fine Lucene release. I suggest stopping 1.4 JDK support at 1.9. 2.0
is bound to have many bug fixes, etc. and having the developers work in 1.4
when everyone else is in 1.5 seems crazy.
  
2.0 has been released as Java 1.4 compatible. One of the tenets espoused 
here is that only a major release will break existing code. It will 
otherwise maintain backward compatibility. Based on that Lucene 2.0 
should stay at Java 1.4. (I do believe that is what the other emails in 
the thread are agreeing on)


Personally I don't care if contrib maintains backward compatibility and 
I don't think that the "backward compatibility rule" was applied to 
contrib (at least in the years that I have been lurking here).


When it comes to 2.1, to me it all depends on when it is released. If it 
were to be released in the next year, I would not like it as I assume 
that 2.1 will provide significant performance gains just as 2.0 did and 
I would want to provide those advantages to my user base.


Also, the other discussion of gjc seems to suggest that the hopes of a 
Lucene 2.1 are pinned on gjc supporting Java 5 features. If gjc 
compatibility is a reasonable requisite, then I don't think that it is 
reasonable for Lucene to go ahead of it.



I think the issues like ThreadLocal, etc. which are fixed in the 1.5
libraries are reason enough to move. 


Are you saying that these were fixed in the API? If so, I think your 
argument is a good one. And I think you hit the nail on the head: Java 5 
should be considered when it solves a real problem. In this thread the 
primary arguments have been on how nice it would be to write in Java 5. 
While I agree it would be nicer, syntactic sugar not solve problems.


But if it is fixed in the JVM, then it can be documented that merely 
running Java 5 fixes the ThreadLocal issues. I wholeheartedly recommend 
running  Java 1.4 applications under a Java 5 jvm. And while I don't 
like it, I don't mind telling my user base that the problem they 
encountered has been reported and fixed and the solution for them is to 
put up with the problem or upgrade.



You can't get Sun to fix these old JDK
issues, why should we be attempting to work around them.
  


I think that Lucene should be very clear in what it sees as its mission 
and its supported community. From this mailing list, I get the 
impression that it is primarily one of server side applications. In this 
case there is some level of control over the execution environment. My 
use is client side application and I don't have much control over the 
execution environment. I think it may be constructive to poll the lucene 
users mailing list as to what they need.



-Original Message-
From: DM Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 30, 2006 8:52 AM

To: java-dev@lucene.apache.org
Subject: Re: Lucene and Java 1.5

Please don't move to Java 5.

My reasons are simple (and some perhaps stem out of old information or
misinformation):
MacOS 9 does not run Java 1.5, which is one of my target platforms. 
Has Java 5 been ported to all target platforms?

Java 5 has nice syntax sugar but no real substance other than the
stronger type checking.
(My opinion based on porting to Java 5 and then back to Java 1.4.2.)
Not all support tooling (e.g. java2html, checkstyle, findbugs, ...)
supports Java 5 syntax. This reduces my ability to qa code using these
tools.
Java 5 moves lucene away from the possibility of ever working on J2ME.
Java 5 moves away from running on an open source java, e.g. gjc.
The performance benefits of a Java 5 JVM are independent of Java 5
source.
Going to Java 5 requires all applications using Lucene to upgrade to
Java 5.

Sure Java 5 has been out for a while and Java 6 is around the corner, but
ask your

Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith
By stating that I needed to run on Mac OS 9, this also implies that I 
need to run on OSX prior to Tiger (10.4) which does not have Java 5 and 
according to everything that I read, won't. OSX 10.3 does not seem like 
an unreasonable target platform for Lucene applications.


Robert Engels wrote:

If you need to run on OS9 then run Lucene 1.9 (or it seems 2.0, just not
2.1).

You have a working, stable release that runs under 1.4. There are MANY
applications that don't run under OS9 now (they require OSX). Why should
Lucene be any different? I am fairly certain you cannot even purchase OS9
from Apple anymore.

1.9 is a fine Lucene release. I suggest stopping 1.4 JDK support at 1.9. 2.0
is bound to have many bug fixes, etc. and having the developers work in 1.4
when everyone else is in 1.5 seems crazy.

I think the issues like ThreadLocal, etc. which are fixed in the 1.5
libraries are reason enough to move. You can't get Sun to fix these old JDK
issues, why should we be attempting to work around them.

-Original Message-
From: DM Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 30, 2006 8:52 AM

To: java-dev@lucene.apache.org
Subject: Re: Lucene and Java 1.5

Please don't move to Java 5.

My reasons are simple (and some perhaps stem out of old information or
misinformation):
MacOS 9 does not run Java 1.5, which is one of my target platforms. 
Has Java 5 been ported to all target platforms?

Java 5 has nice syntax sugar but no real substance other than the
stronger type checking.
(My opinion based on porting to Java 5 and then back to Java 1.4.2.)
Not all support tooling (e.g. java2html, checkstyle, findbugs, ...)
supports Java 5 syntax. This reduces my ability to qa code using these
tools.
Java 5 moves lucene away from the possibility of ever working on J2ME.
Java 5 moves away from running on an open source java, e.g. gjc.
The performance benefits of a Java 5 JVM are independent of Java 5
source.
Going to Java 5 requires all applications using Lucene to upgrade to
Java 5.

Sure Java 5 has been out for a while and Java 6 is around the corner, but
ask yourself why it is not the defacto standard version of Java.

karl wettin wrote:
  

Will code with 1.5 syntax be committed?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

Robert Engels wrote:

If you can control them to run 1.4, you can probably control them to run
1.5.
  


I cannot control my application's users to run Java 1.4. We moved from 
Java 1.3 to Java 1.4 only after all platforms our users were running had 
a Java 1.4 jvm available. We did make a conscious decision to continue 
support for platforms that our application actually ran on and not to 
worry about platforms on which our software did not actually run.



Any performance gains offered by 2.1 would pale in comparison to your user's
upgrading their machines. If not, they stick with 2.0 based Lucene, and run
it under 1.4


That's fine if 2.0 is being actively supported with regard to bug fixes 
that are found after 2.1 is released. That is will the Lucene committers 
allow contributions of bug fix patches against a 2.0 maintenance branch?


 


Why don't your users run MS-DOS? It would be the fastest on their machines?
They don't because it is impractical. It is also impractical to continue to
develop software against 1.4 when Sun does not actively support it (they
don't fix bugs in it), so the code in Lucene becomes "unclean" when we need
to add "fixes" for JDK issues which are already fixed in a later JVM.

  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

What features should be encouraged? discouraged? not allowed?

For example, should annotations be used where possible? or is it purely 
optional?


What about "static import"? I find that DoHicky.WHATS_IT a bit more 
revealing than WHATS_IT, as it tells me its origin.


Should enhanced for loops be used where possible? Does that mean that 
existing loops should change? If not, does the mix of old loop and new 
loop create a point of confusion or a maintenance problem?


Should autoboxing/unboxing be used? Under what circumstances is the 
performance hit acceptable.


What have other projects done? Can we learn from their experiences?

Chris Hostetter wrote:

: important new facilities. Repeating my earlier question, why should a
: platform that is 2 years behind for java expect to be at the latest and
: greatest level for lucene? I'd propose 2.0 (+ branched patches) be the
: 1.4 release distribution, with 2.1 free to move up to 1.5.

I would ammend that proposal slightly...

1a) Lucene Core 2.0.* releases garuntee java1.4 compatibility
1b) Lucene Contrib modules in 2.0.* releases are free to require any java
version they choose.

2a) Lucene Core 2.1.* release garuntee java1.5 compatibility.
2b) Lucene Contrib modules in 2.1.* releases are free to require any java
version they choose.

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

Erik Hatcher wrote:

On May 30, 2006, at 11:45 AM, DM Smith wrote:
By stating that I needed to run on Mac OS 9, this also implies that I 
need to run on OSX prior to Tiger (10.4) which does not have Java 5 
and according to everything that I read, won't. OSX 10.3 does not 
seem like an unreasonable target platform for Lucene applications.


for all such arguments, my take is (as a fervent Mac-head myself) that 
we allow folks to innovate using whatever technical details they want 
and let lucene evolve as the state of the art of languages changes.  
there are always older versions of lucene that work quite well enough 
on other versions of java, etc.  those that need to maintain back 
compatibility should step forward to work on that as things evolve.


certainly we are not suggesting that we go crazy using features of a 
newer JDK "just because"... but if there is a performance advantage 
then we have an obligation to pursue it.


I agree. But, there is a difference between the performance of a JDK and 
that of a JVM. We need to be certain that the JDK is required for the 
performance boost and not just the JVM before we pursue it.


  for new development like the GData server, Solr, etc, we should be 
loose and allow the creative individuals to do their own thing.  for 
lucene core, we need compelling reasons to jump to a higher JDK 
requirement.


Agreed. But, I have not heard one compelling argument for the JDK 5 for 
core. (JVM certainly)





we will not hold up progress because of the few that don't upgrade 
their macs when steve jobs waves his magic wand.


LOL! Thanks!


Erik - from a snazzy speedy MacBook Pro running OS X 10.4.6 and 
not looking back.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene and Java 1.5

2006-05-30 Thread DM Smith

Robert Engels wrote:

I think the code "cleanliness" of 1.5
Perhaps, but only if it is retro-actively applied to the entire code 
base. It creates confusion when there is a blend of coding styles. Some 
enhanced for loops, some old fashioned iterators; some new collections, 
some old.


Some of the features make code harder to read, e.g. some generic 
implementations can be downright obtuse.



 and the better concurrent classes are
a huge benefit.
  


Yes. And Lucene could have been using them for years: 
http://g.oswego.edu/dl/cpj/index.html
It was Doug Lea's work that was recast and incorporated into Java 5. And 
according to a note on that site, the differences are fairly trivial and 
easy to figure out.

I have used it since Java 1.3. Lucene could too.

I don't find this to be a compelling reason to abandon 1.4 for core Lucene.


I know in our project we developed many similar classes, and these can no be
replaced by core JDK classes. I also find 1.5 code far easier to read and
work with.

I just don't understand why a "few' voices can hold back progress. These
"few" can just run older versions of Lucene.


Do you really have a handle on your user community? The developers here 
all seem to be chomping at the bit to upgrade to Java 5 (cautiously, I 
hope). But they are power users. How about the regular schmoes that use 
Lucene?


And I don't think that a few days of discussion on a developers list 
(with me weighing in today) constitutes holding back progress.


Progress should be measured attainment of goals. Going to Java 5 should 
provide measurable attainment of one or more goals. And if only a few, 
it should be weighed to determine whether it is really worth it.



 1.5 has been released for
almost 4 years on most major platforms. Not using 1.5 for such a high
profile project is absurd.
  
I'm not sure what constitutes a major platform? Is it Windows, Linux and 
Solaris?

AFAIK, there has not been an open source implementation of Java 5 yet.

Anyway, I've said my piece and unless there  is something new to say, 
I'll be quiet ;)




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 30, 2006 11:55 AM

To: java-dev@lucene.apache.org
Subject: Re: Lucene and Java 1.5

On May 30, 2006, at 11:45 AM, DM Smith wrote:
  
By stating that I needed to run on Mac OS 9, this also implies that I 
need to run on OSX prior to Tiger (10.4) which does not have Java
5 and according to everything that I read, won't. OSX 10.3 does not 
seem like an unreasonable target platform for Lucene applications.



for all such arguments, my take is (as a fervent Mac-head myself) that we
allow folks to innovate using whatever technical details they want and let
lucene evolve as the state of the art of languages changes.  there are
always older versions of lucene that work quite well enough on other
versions of java, etc.  those that need to maintain back compatibility
should step forward to work on that as things evolve.

certainly we are not suggesting that we go crazy using features of a newer
JDK "just because"... but if there is a performance advantage then we have
an obligation to pursue it.  for new development like the GData server,
Solr, etc, we should be loose and allow the creative individuals to do their
own thing.  for lucene core, we need compelling reasons to jump to a higher
JDK requirement.

we will not hold up progress because of the few that don't upgrade their
macs when steve jobs waves his magic wand.

Erik - from a snazzy speedy MacBook Pro running OS X 10.4.6 and not
looking back.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r410680 - in /lucene/java/branches/lucene_2_0: CHANGES.txt src/jsp/results.jsp

2006-06-01 Thread DM Smith
Just my opinion based on working with SVN with an entirely different  
development model:
Trunk is production and branches are different people individual or  
collaborative efforts, with one branch for maintenance changes. When  
enough maintenance changes are ready for a release, we merge it into  
trunk, cut a tag and release the tag. Merging from trunk into  
development branches occurs prior to releasing the development  
branch. When it happens is at a time that is agreeable to the  
developers for the branch. Most developers merge trunk into their  
branch shortly after a release. In this model, all merges are in and  
out of trunk and apart from merges, trunk never changes. One short  
coming of this model is that the changes within a branch are not  
readily visible via svn info.


Given the nature of this project, I am not recommending this model.  
There are simply to few releases and the interval between them is too  
long. The other nature of this project is that "developer branches"  
are peoples personal work area and they check in via patches.


What I wanted to comment on was how merging behaves. I found merging  
works very well provided that a change is not applied to two  
different copies that are to be merged (e.g. a branch and trunk) In  
this case, it will almost always produce a conflict. When using merge  
to apply the changes to a copy and then merging it into trunk works  
very well. When it does merge without a conflict it does it silently.


Since the merge is purely textual, it cannot determine whether the  
merge preserves the semantic meaning of the code. If care is taken to  
only change what is necessary, this generally works as expected. If  
the code is significantly refactored and then receives a merge, it  
will require careful review.


So if trunk (2.1 development)  is going to diverge significantly from  
the branch (2.0 maintenance), sooner or later a change won't make  
sense to merge. At that point it won't make sense to merge ever  
again. It may also be that a patch only applies to 2.0 and not 2.1.  
So I guess, I recommend not using merge ever.


If a file has changed between the 2.1 copy and the 2.0 copy, then the  
patch will probably only work on the one from which it was made. If  
the changes are not too great then using a merge tool (such as the  
one that comes with TortoiseSVN) after the patch is applied to the  
one is easiest.


-- DM


On Jun 1, 2006, at 4:34 PM, Daniel Naber wrote:


On Donnerstag 01 Juni 2006 01:12, Otis Gospodnetic wrote:

I saw this commit on trunk.  Did you simply make the same change  
in both

branches/lucene_2_0 and in trunk?


Yes, I copied the changes manually. I would have thought that the  
person
who commits can also best decide whether something should be  
backported.

When someone later merges a a large amount of patches he would need to
rely on the changelog entry or study the patch, or am I missing  
something?


Regards
 Daniel

--
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-19 Thread DM Smith
Just got back from a long weekend vacation without any net access.  
Talk about withdrawal:)


I have just gotten through reading this entire thread... Whew.


On Jun 19, 2006, at 8:48 PM, Robert Engels wrote:



People making these arguments against 1.5 sound really ill- 
informed, or

lazy. Neither of which is good for open-source development.



Ouch. I'm not sure which I am: Is it ill-informed or lazy?

I lurk here to see what is being developed and am impressed with the  
care and the thoughtfulness that goes into the code. I'm probably  
better served by joining the user's mailing list, but I find this  
more educational.


So, my comment is that of a user.

I'll repeat myself. I am a contributor to the open source project,  
BibleDesktop, which allows a user to search Bibles using boolean  
logic. We have settled on Java 1.4 because all of our user community  
has Java 1.4 available. Our user community consists of people and  
groups that use hand me down hardware, that was past due when they  
got it. Most of these users are not computer literate, but use their  
computer as a tool to do their work. So even if their hardware could  
be upgraded to a newer OS, it it not likely. (The vast majority of  
our user base uses Windows 98, but a few use MacOS 9!)


When will we stop supporting Win98 and MacOS 9? When our users no  
longer use it. (No a lone hold out won't stop progress... And yes  
Win98 runs Java 1.5 just fine! But if it weren't for those reliable  
Mac machines, we might not have to stay with Java 1.4!)


We use quite a few apache and jakarta libraries and we upgrade to the  
latest and greatest as soon as we can. So far, there have been no  
Java 5.0 libraries and the new libraries have not provided any  
stability/performance problems.


Can I stick with 2.0.x? Certainly. However, I'd rather not. I keep  
reading about refactoring providing a significant, incremental  
improvement, and I'd like to provide that, especially for those older  
machines!


Can Lucene's going to Java 1.5 change/influence a migration of  
BibleDesktop to Java 1.5. Nope. The only thing that can influence  
that is "business decisions"


So, which is it: Ill-informed or lazy?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith
I would like to suggest that a central core Lucene be identified and 
that core be maintained as compatible for Java 1.4.


It has also been stated that J2ME compatibility is a future goal. It 
would be nice to consider that in defining a central core. (BTW, there 
are two J2ME standards, one is a subset of Java 1.3 and the newer is a 
subset of Java 1.4, but it is not widely implemented on target devices. 
Since Lucene does not support J2ME today, perhaps the 1.4  version is 
appropriate.)


Outside of that central core, I think Java 5, even Java 6 is just fine. 
And probably this is where most of the contributions will be.


As I said earlier, I am not a Lucene developer, but a Lucene user.

Here is an example of what I use (very trivial):
For indexing:

IndexWriter writer = new IndexWriter(tempPath.getCanonicalPath(), new 
SimpleAnalyzer(), true)
loop over all (verseReference,verseText) pairs
   Document doc = new Document();
   doc.add(new Field(FIELD_NAME, verseReference, Field.Store.YES, 
Field.Index.NO));
   doc.add(new Field(FIELD_BODY, new StringReader(verseText)));
   writer.addDocument(doc);
end loop
writer.optimize();
writer.close();

And for searching:

IndexSearcher searcher = new IndexSearcher(path);
Analyzer analyzer = new SimpleAnalyzer();
QueryParser parser = new QueryParser(LuceneIndex.FIELD_BODY, analyzer);
Query query = parser.parse(search);
Hits hits = searcher.search(query);
for (int i = 0; i < hits.length(); i++)
{
   Verse verse = 
VerseFactory.fromString(hits.doc(i).get(LuceneIndex.FIELD_NAME));
   // PassageTally understands a score of 0 as the verse not participating
   int score = (int) (hits.score(i) * 100 + 1);
   tally.add(verse, score);
}

I just don't see why Java 5 needs to be behind this kind of usage.

See below for more responses.

Robert Engels wrote:

To set the record straight, I think the Lucene product and community are
fantastic. Period.

I was also not the one who starting in with what could be termed
'aggressive' language.

Our company does not fully support 1.5. I was the loudest voice against the
move to 1.5.

After almost 2 years I now back the move. Why? Several reasons:

1. Sun is very slow, if at all to fix bugs in 1.4 (of which there are many).
For example, the current problems in Lucene regarding ThreadLocals. Although
this is not a bug per se, it is probably not intuitive or desired behavior.
The Lucene developers have been forced to both diagnose and create
workarounds "problems" already fixed in 1.5. The licensing of Java does not
allow for the easy fix bugs by non-Sun developers.
  


I'm not certain, but IIRC earlier messages in the first Java 5 thread, 
this was not a change that prevented compiling under Java 5 for a Java 
1.4 target.


I think that this is an example of where we need to be clear about 
runtime compatibility. Java 1.4 programs compiled with a Java 1.4 
compiler run better under Java 5. Programs that don't use Java 5 
features can be compiled with Java 1.4 compatibility using the Java 5 
compiler.


As long as the bugs are fixed in Java 5 and it can be cross-compiled for 
Java 1.4, the fix becomes available under a Java 1.4 jre.



2. The type safe collections are far more efficient to program/debug with.
  


I personally find this to be the case, but it does not change "business 
requirements" of a target application.



3. The standardized concurrent facilities can be of great benefit to
multithreaded programs.
  


These can be used without going to Java 5. I have been using them since 
Java 1.3. Granted it would be a dependency and a first for Lucene. The 
license clearly places the cpj code in the public domain. This means 
that it can be distributed within the Lucene jar.



4. It is what students graduating from college understand and use.
5. It is what the currently available books explain and use.
  
True, but they are still taught the "old" way. Anyone needing to 
maintain or enhance existing pre Java 5 code will have to know the old 
way. Given the caliper of developers here and those that provide 
patches, I doubt that there is anyone that would have any difficulty 
writing Java 1.4 code.



It just seems that many people believe that if there is ONE person (or a
minority) that can't switch, then Lucene cannot switch. It seems that Bob is
in this category of never being able to switch (I am fairly certain 1.5 will
probably never be released for OS 9) - does that mean that Lucene developers
can never use 1.5 features? What about the argument I made of using
alternative algorithms that may not be as useable on older, slower machines.
  


Java 5 will never be released for MacOS 10.3. This OS is still current, 
supported and widely used.
I mentioned MacOS 9, because it is a business requirement for me. And 
yes, it will be many year before we will drop support for it. And during 
those years, I would like to have the benefit of bug fixes, performance 
enhancements and new features (such as leading wild card searches, if 

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith

On 6/20/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


Sorry, for some reason my Yahoo email doesn't prepend ">" on replies, so
I'll use "OG" for my lines.

- Original Message 
From: Dan Armbrust <[EMAIL PROTECTED]>




Also, at my place of employment we have about 40,000 desktop computers

that are all centrally managed - down to every point release of every
single piece of software.  There are multiple applications using java
that are installed on these machines.  Each application has to be
certified and fully tested with a newer version of java before a newer
version of java can be installed.  As you can imagine, that severely
hampers the pace of java updates.  We are just getting 1.4 installed on
these machines now.  When you are managing that many machines in a
clinical environment - you have to play it safe.  There are no upgrades
for an upgrades sake, or for syntactic sugar.  There has to be a real
problem to even get the process started.  I'm sure many other people
have similar situations.

OG: Again, exactly.  If this is an environment where upgrades are very
carefully planned and thus probably rare, why does this environment care SO
much to have the cutting edge Lucene, when at the same time they are ok with
a old version of Java?




In my situation, I am constantly working on improving an open source
application. Our use of Lucene is very trivial (from a lucene perspective)
but critical to the application. If there are bug fixes, enhancements and
performance improvements, I want to use them to improve my user's
experience. So, each time there is a release of Lucene, I get it, test it
and if it in itself offers an improvement, I release our application just
upgrading the lucene jar.


Also - I don't know much about the Java mobile platform - but I thought

I had read before that they are limited to the 1.3 or 1.4 feature set?
If this is true, do we really want to remove an entire ecosystem of
potential users?  Over syntactic sugar?

OG: It is NOT syntactic sugar only.  This is FUD! :)  Really.  I just
found a bug in my code that was hidden for several weeks because I was using
a List instead of List, for instance!




I think I was the first to suggest that some of Java 5's features were
syntatic sugar. In saying this, I had no intention of spreading "Fear",
"Uncertainty" or "Doubt". That strong type checking is valuable is certainly
beyond a doubt. It is a syntax addition to the language that's really sweet!
However, one can carefully write correct code without it.

The other features such as autoboxing have a runtime cost that probably
would be best to avoid. But boy is it easier to write code when you don't
have to convert an integer to an Integer!

Using the new iterator construct is simple and much more straightforward,
but it does not add much to the code.


While I'm not completely opposed to the argument that I should just have

to stay with the Lucene 2.0.x release with applications that need to run
in 1.4 environments - Lucene is an integral part of that code.  If
performance improvements are made to the core, I want those in my code.
  If bugs are found and fixed - I want those fixes too.  As a matter of
fact - until the 2.0 release, I was using a build from the trunk because
of a bug that I found in Lucene, (and someone else was gracious enough
to fix for me).

OG: But I benchmarked Java 1.4 and 1.5 a few weeks ago.  1.5 is
_substantially_ faster.  If you want performance improvements, why not also
upgrade Java then?  Ths really bugs me.  People want the latest and greatest
Lucene, but are okay with the old Java, yet they claim they want
performance, bug fixes, etc.




One can get the performance gains just by using the Java 5 jre.




I don't think that the caliber of developers that are working on the
Lucene core are going to be slowed down any by using 1.4 syntax over
1.5.  (It actually takes longer to type in all of those generics :)  All
of my tools - Eclipse and Java 1.5 - have a check box that will cause
them to generate 1.4 compatible code.  Its really _not_ a big deal to
write 1.4 code even if you are used to 1.5.  This particular argument
just isn't compelling to me.

OG: Please read what I wrote all the way up.  In my mind, it is not so
much about core Lucene developers, as it is about external
contributions.  Core developer will know what we agreed on and will write
the code to suit our agreement.  External contributor will contribute code
she/he wrote for work.  As the poll shows, more people use 1.5 at work,
thus...

My personal opinion for the path that Lucene should take:

Core bugs fixes must be 1.4 compatible.
Core improvements must be 1.4 compatible.
Contrib / sandbox can be 1.5 or 1.6.



How many external contributions are to the "core" Lucene?
If the "core" Lucene contribution can be applied and then "downgraded" to
Java 1.4 easily, what harm is in that?


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith

This poll does not indicate anything about Lucene. It is open to anyone who
goes to quimble and searches on Java.

On 6/16/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


It looks like I would have won a beer had anyone wagered me.

1.5 IS the Java version that the majority Lucene users use, not 1.4!

Does this mean we can now start accepting 1.5 code?

Otis

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, June 16, 2006 11:48:15 AM
Subject: Survey: Lucene and Java 1.4 vs. 1.5

Hello everyone,

If you have 15 seconds to spare, please let us (Lucene developers) know
which version of Java you are using with Lucene: 1.4 or 1.5

All it takes is 1 click on one of the two choices:
  http://www.quimble.com/poll/view/2156

No cheating, please.  Thanks!
Otis



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith

On 6/20/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:




In any case, there is still GCJ too.  If GCJ supported 1.5, and we
could make a 1.4 library with Retrotranslator, that should cover most
users, right?



I just took a look at GCJ again.

If I am not mistaken: future support for 1.5 in gcj is ambiguous and/or will
be incomplete.

GCJ uses Classpath to provide open implementations of Java. It's stated goal
is to be Java 1.2 compliant by the time it gets to a 1.0 release. It is
largely 1.4 compliant. Significant for me, the swing classes don't support
all the Look and Feels I support, e.g. no MacOS LaF. And while it says that
there is some support for some Java 5 classes, I could not find an
enumeration of them. I would speculate that incorporating the concurrency
classes will be relatively easy since the cpj, on which Sun's implementation
is based, is public domain. However, I would also speculate that all bets
are off on the other classes as to when they will show up.

GCJ will be using Eclipse's java compiler as its new compiler. When this
happens it will have the support for the new language features.
Interestingly this is dependent upon the GPLv3 which is stated as being a
year away from being finalized (I don't know when the page was written, so
it could be some time soon) and I don't know if it needs to be finalized
before the compiler is released.


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith


On Jun 20, 2006, at 5:09 PM, Otis Gospodnetic wrote:


 - Original Message 
From: DM Smith


On 6/20/06, Otis Gospodnetic  wrote: Sorry, for some reason my  
Yahoo email doesn't prepend ">" on replies, so I'll use "OG" for my  
lines.


In my situation, I am constantly working on improving an open  
source application. Our use of Lucene is very trivial (from a  
lucene perspective) but critical to the application. If there are  
bug fixes, enhancements and performance improvements, I want to use  
them to improve my user's experience. So, each time there is a  
release of Lucene, I get it, test it and if it in itself offers an  
improvement, I release our application just upgrading the lucene jar.


OG: Again, there have been a LOT of JVM and JDK improvements since  
1.4, too, but you are still using 1.4.



I am using the Java 5 compiler to build a 1.4 compatible binary. So I  
get the compiler improvements for all my users.






OG: But I benchmarked Java 1.4 and 1.5 a few weeks ago.  1.5 is  
_substantially_ faster.  If you want performance improvements, why  
not also upgrade Java then?  Ths really bugs me.  People want the  
latest and greatest Lucene, but are okay with the old Java, yet  
they claim they want performance, bug fixes, etc.



It's not up to me. Each user of BibleDesktop has to decide for  
themselves. Users of MacOS 10.3 and earlier are stuck using Java 1.4.  
Users that have upgraded to Java 5 get the advantages of that  
runtime. As for me I am running Java 5.





One can get the performance gains just by using the Java 5 jre.

OG: Correct.  But one can also not get a performance improvement or  
a bug fix if it comes as part of an external contribution that  
happens to use 1.5 because the contributor uses 1.5 in his/her work  
and doesn't have time to "downgrade" the code, just so it can be  
accepted in Lucene.



That's the core argument that you are making and it is a good one. If  
it could be designated in Jira whether the attachment were Java 5  
then others (perhaps myself) could take the patch, downgrade it and  
attach it to the same issue. It sure would beat forking the project.






How many external contributions are to the "core" Lucene?
If the "core" Lucene contribution can be applied and then  
"downgraded" to Java 1.4 easily, what harm is in that?


  OG: I don't know the number, but JIRA would be the place to  
look.  My guess is about a dozen or more people.
Steve Rowe found something that can "downgrade" 1.5 code to 1.4 and  
looks promising.


If so then perhaps the committers could run the code through it after  
applying the patch. Then the contributers would not be adversely  
affected.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread DM Smith


On Jun 20, 2006, at 5:21 PM, Yonik Seeley wrote:


On 6/20/06, DM Smith <[EMAIL PROTECTED]> wrote:

> In any case, there is still GCJ too.  If GCJ supported 1.5, and we
> could make a 1.4 library with Retrotranslator, that should cover  
most

> users, right?

If I am not mistaken: future support for 1.5 in gcj is ambiguous  
and/or will

be incomplete.


You don't use GCJ right?



Correct. We couldn't because of our use of Swing. It appears that it  
is sufficiently far along that it is worth trying again. The problem  
we have is trying to explain to users how to install java in order to  
get our application to work. If we could redistribute java as a  
seamless part of our application we would.


We are planning to migrate from Swing to Eclipse's RCP/JFace/SWT and  
then we can and would use GCJ. If Lucene goes to Java 5, we will need  
to re-examine those plans.





GCJ is currently incomplete, and needs patches to get it to work with
lucene (and lucene committers have accepted patches to ease this
porting in the past).  GCJ support in Lucene isn't  as much for the
end-user IMO, but for developers who maintain other Lucene ports.

Time will tell how good the Java5 support is for GCJ.  Hopefully less
time rather than more ;-)


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene  
search server


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Core vs Contrib

2006-06-20 Thread DM Smith

I think that it might be good to define 3 levels:
fundamental - what all programs probably will use
useful - what many programs might use
and
contrib - mostly examples and code that is not quite ready to be  
classed as useful


On Jun 16, 2006, at 6:03 PM, Chris Hostetter wrote:



Are there any written (or unwritten) guidelines on when something  
should
be commited to the core code base vs when a contrib module should  
be used?


Obviously if a new feature rquires changing APIs omodifying one of the
existing core classes, then that kind of needs to be in the core --  
and
there is precidence for the idea thatlangauge specific analyzers  
should go

in contrib; and then of course there are things like the Span queries
which seem like htey would have been a prime canidate for a contrib  
module
but they aren't (possibly just because when they were added there  
was no

"contrib" -- just the sandbox, and it didn't rev with lucene core).

...But I'm just wondering if as we move forward, there should be some
sated policy "unless there is a specific reason why it must be in the
core, put it in a contrib" to help keep the core small -- or if i'm  
wrong

about hte general sentiment of the Lucene core.


(FYI: my impedus for asking this question is LUCENE-406 -- I think  
it's a
pretty handy feature that everyone might want, but that doesn't  
mean it's

not just as usefull commited in contrib/miscellaneous)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-21 Thread DM Smith


On Jun 20, 2006, at 6:43 PM, Yonik Seeley wrote:


On 6/20/06, DM Smith <[EMAIL PROTECTED]> wrote:

The problem
we have is trying to explain to users how to install java in order to
get our application to work.


Ahh... if you didn't already have a large code base, I'd suggest
trying PyLucene in conjunction with freeze (which can make a python
program into a standalone executable).

Some kind of native-code thing does seem ideal for wide distribution
to non-technical end users.


Yes. We have a large code base. So rewriting it is non-trivial.

We are using NSIS for Windows in a non-GUI mode to check for java,  
download and install it if not present and if possible and then  
launch the application. Very nice when it works, but many report that  
the download does not work.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Core vs Contrib

2006-06-21 Thread DM Smith


On Jun 21, 2006, at 3:43 AM, Chris Hostetter wrote:



: I think that it might be good to define 3 levels:
: fundamental - what all programs probably will use
: useful - what many programs might use
: contrib - mostly examples and code that is not quite ready to be
: classed as useful

Those three levels make sense -- but they don't map to what's  
currently

available in the Subversion repository.  Unless I create a new
"useful" directory and make the neccessary changes to the build  
system to

build everything in it, my current choices are to put new features in
contrib, or add them to the core.


My reasoning for this suggestion was simple:
We have been discussing Java 5 and it was suggested (not just by me)  
that core lucene be kept at Java 1.4. The current model is as you  
stated, but it suggests that as time goes on more and more very  
useful stuff is added to the core without adding to fundamental, base  
functionality. The nature of contrib seems like a "use at your own  
risk," "your mileage may vary" kind of code. And more of an example  
than a production quality.


My guess is that the very heart of lucene won't change much from an  
API perspective.


And yes, it does represent a bit of work and a bit more management.



: On Jun 16, 2006, at 6:03 PM, Chris Hostetter wrote:

: > Are there any written (or unwritten) guidelines on when something
: > should
: > be commited to the core code base vs when a contrib module should
: > be used?
: >
: > Obviously if a new feature rquires changing APIs omodifying one  
of the
: > existing core classes, then that kind of needs to be in the  
core --

: > and
: > there is precidence for the idea thatlangauge specific analyzers
: > should go
: > in contrib; and then of course there are things like the Span  
queries
: > which seem like htey would have been a prime canidate for a  
contrib

: > module
: > but they aren't (possibly just because when they were added there
: > was no
: > "contrib" -- just the sandbox, and it didn't rev with lucene  
core).

: >
: > ...But I'm just wondering if as we move forward, there should  
be some
: > sated policy "unless there is a specific reason why it must be  
in the
: > core, put it in a contrib" to help keep the core small -- or if  
i'm

: > wrong
: > about hte general sentiment of the Lucene core.
: >
: >
: > (FYI: my impedus for asking this question is LUCENE-406 -- I think
: > it's a
: > pretty handy feature that everyone might want, but that doesn't
: > mean it's
: > not just as usefull commited in contrib/miscellaneous)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-21 Thread DM Smith


On Jun 20, 2006, at 6:42 PM, Robert Engels wrote:

Finding good SWT support on anything but the latest (and major)  
OS's is

going to be rather poor and inconsistent. Just check the SWT bugs
(especially for things like printing).

For a company that seems to want to allow their users to stay in  
the dark

ages - good luck with SWT.



This is the major reason we have not done it. We are still in the  
planning phase. SWT keeps getting better and only through testing on  
the platforms on which our users are currently on will we make the  
final decision to switch.





-Original Message-
From: DM Smith [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 20, 2006 5:24 PM
To: java-dev@lucene.apache.org
Subject: Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)


On Jun 20, 2006, at 5:21 PM, Yonik Seeley wrote:


On 6/20/06, DM Smith <[EMAIL PROTECTED]> wrote:

In any case, there is still GCJ too.  If GCJ supported 1.5, and we
could make a 1.4 library with Retrotranslator, that should cover

most

users, right?


If I am not mistaken: future support for 1.5 in gcj is ambiguous
and/or will be incomplete.


You don't use GCJ right?



Correct. We couldn't because of our use of Swing. It appears that  
it is
sufficiently far along that it is worth trying again. The problem  
we have is

trying to explain to users how to install java in order to get our
application to work. If we could redistribute java as a seamless  
part of our

application we would.

We are planning to migrate from Swing to Eclipse's RCP/JFace/SWT  
and then we
can and would use GCJ. If Lucene goes to Java 5, we will need to re- 
examine

those plans.




GCJ is currently incomplete, and needs patches to get it to work with
lucene (and lucene committers have accepted patches to ease this
porting in the past).  GCJ support in Lucene isn't  as much for the
end-user IMO, but for developers who maintain other Lucene ports.

Time will tell how good the Java5 support is for GCJ.  Hopefully less
time rather than more ;-)


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search
server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-21 Thread DM Smith


On Jun 21, 2006, at 7:52 AM, Grant Ingersoll wrote:


This sounds reasonable to me...


But it is not at all reasonable for us. Our application is designed  
with a write one, run anywhere mentality for the hardware/OS base  
that our users currently have. Again many of our users use old,  
beyond belief machines that anyone in their right mind would have  
gotten rid of. Wait, that's precisely why they have them. They are  
hand me downs from those who were in their right mind


Recently we went through an exercise to "migrate" to Java 5. We  
upgraded all our iterator loops, made the use of collections type- 
safe, added annotations, refactored our type-safe enums..., every  
last new language feature was examined and applied if it did not  
affect performance. There was really no necessity to do it. We did it  
for "fun."


That release was not received well and we found out that we have a  
much larger base of users on Mac 10.3 and earlier.


Unfortunately, we also made other material changes and going back to  
Java 1.4 was a fall forward rather than a revert. But we did go back  
to Java 1.4.


All releases of BibleDesktop for the last 4 years support MacOS 9 and  
higher, Windows 98 and higher (don't know whether it runs on Win95)  
and Linux as far back as I know.





Robert Engels wrote:

I don't follow...

If a user came to you and said I want to run BibleDesktop, and  
they have
MS-DOS, you would tell them you can't (or you might have to run  
the very old

BibleDesktop 1.0).

If they told you they have Windows 98 with Java 1.4 and 256mb or  
memory, you

would say you can run BibleDesktop 2.0 (which includes Lucene 2.0).

If they told you they have Windows XP with Java 1.5, you would say  
you can

run BibleDesktop 3.0 (which includes Lucene 2.1).

Certainly seems like a packaging/marketing issue for you. Your  
users would
not know if they were running Lucene 1.4, 1.9 2.0 or 2.1, nor  
would they

care.



-Original Message-
From: DM Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, June  
20, 2006 5:17 PM

To: java-dev@lucene.apache.org
Subject: Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)


On Jun 20, 2006, at 5:09 PM, Otis Gospodnetic wrote:



 ----- Original Message 
From: DM Smith


On 6/20/06, Otis Gospodnetic  wrote: Sorry, for some reason my  
Yahoo email doesn't prepend ">" on replies, so I'll use "OG" for  
my lines.


In my situation, I am constantly working on improving an open  
source application. Our use of Lucene is very trivial (from a  
lucene perspective) but critical to the application. If there are  
bug fixes, enhancements and performance improvements, I want to  
use them to improve my user's experience. So, each time there is  
a release of Lucene, I get it, test it and if it in itself offers  
an improvement, I release our application just upgrading the  
lucene jar.


OG: Again, there have been a LOT of JVM and JDK improvements  
since 1.4, too, but you are still using 1.4.





I am using the Java 5 compiler to build a 1.4 compatible binary.  
So I  get the compiler improvements for all my users.




OG: But I benchmarked Java 1.4 and 1.5 a few weeks ago.  1.5 is   
_substantially_ faster.  If you want performance improvements,  
why  not also upgrade Java then?  Ths really bugs me.  People  
want the  latest and greatest Lucene, but are okay with the old  
Java, yet  they claim they want performance, bug fixes, etc.





It's not up to me. Each user of BibleDesktop has to decide for   
themselves. Users of MacOS 10.3 and earlier are stuck using Java  
1.4.  Users that have upgraded to Java 5 get the advantages of  
that  runtime. As for me I am running Java 5.





One can get the performance gains just by using the Java 5 jre.

OG: Correct.  But one can also not get a performance improvement  
or  a bug fix if it comes as part of an external contribution  
that  happens to use 1.5 because the contributor uses 1.5 in his/ 
her work  and doesn't have time to "downgrade" the code, just so  
it can be  accepted in Lucene.





That's the core argument that you are making and it is a good one.  
If  it could be designated in Jira whether the attachment were  
Java 5  then others (perhaps myself) could take the patch,  
downgrade it and  attach it to the same issue. It sure would beat  
forking the project.






How many external contributions are to the "core" Lucene?
If the "core" Lucene contribution can be applied and then   
"downgraded" to Java 1.4 easily, what harm is in that?


  OG: I don't know the number, but JIRA would be the place to   
look.  My guess is about a dozen or more people.
Steve Rowe found something that can "downgrade" 1.5 code to 1.4  
and  looks promising.




If so then perhaps the committers could run the code through it  
after  applying the patch. Then the contri

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-21 Thread DM Smith

On 6/21/06, Robert Engels <[EMAIL PROTECTED]> wrote:


It sounds like you did something ill-advised. Why change your code to 1.5if
a significant portion of your users can run it, and the previous release
was
not essentially bug free (if it was, your users would not have seen any
difference).



It was ill advised. When we heard that  Java 5 was released for MacOS, we
did not read the fine print (i.e. for  10.4 and higher only). Shame on us.
It was a fun experiment though:) Our thought was to have a consistent code
style throughout. We did not want Java 5 syntax to creep in gradually only
as code was changed.

The previous release was essentially bug free.


It also seems very unlikely you need any significant changes to Lucene (I

reviewed your projects),




I appreciate that you took a look at it!


and if Lucene progresses along with the current

state of hardware your users won't be able to  run it anyway.




Correct. Our use of Lucene  is very simple, but very central to our product.
That is why I have suggested in a separate thread that the central core of
lucene be maintained from all the other great additions.

I am not sure what you mean by "if Lucene progresses" I am impressed
with how Lucene performs on an old Windows box and on an old iMac. Are you
saying that future releases will have a bigger resource footprint?


I still don't understand the harm in BibleDesktop staying at 2.0 (even

forever if you'd like - so you'd have one version).




The only harm would be that I could not provide them with new features that
are implemented in Java 1.5 Lucene. Same goes for bugs and performance
enhancements.

And we may very well have to do that at the point that Lucene embraces
1.5for its fundamental features.


At some point Lucene

WILL BE 1.5, your users will still not be able to run it - what would you
do
then - you would run the last version of Lucene that worked with 1.4.2.




Yes. That will be the case.

Your users obviously don't use the latest and greatest software, so why

should Lucene be any different.




Our users want the latest and greatest BibleDesktop (at any rate, that's
what we tell ourselves). Lucene is immaterial to them.

In the meantime, maybe the good lord will see fit to perform some miracle

and upgrade your user's systems.




I would like that very much!


-Original Message-

From: DM Smith [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 21, 2006 7:58 AM
To: java-dev@lucene.apache.org
Subject: Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)


On Jun 21, 2006, at 7:52 AM, Grant Ingersoll wrote:

> This sounds reasonable to me...

But it is not at all reasonable for us. Our application is designed with a
write one, run anywhere mentality for the hardware/OS base that our users
currently have. Again many of our users use old, beyond belief machines
that
anyone in their right mind would have gotten rid of. Wait, that's
precisely
why they have them. They are hand me downs from those who were in their
right mind

Recently we went through an exercise to "migrate" to Java 5. We upgraded
all
our iterator loops, made the use of collections type- safe, added
annotations, refactored our type-safe enums..., every last new language
feature was examined and applied if it did not affect performance. There
was
really no necessity to do it. We did it for "fun."

That release was not received well and we found out that we have a much
larger base of users on Mac 10.3 and earlier.

Unfortunately, we also made other material changes and going back to Java
1.4 was a fall forward rather than a revert. But we did go back to Java
1.4.

All releases of BibleDesktop for the last 4 years support MacOS 9 and
higher, Windows 98 and higher (don't know whether it runs on Win95) and
Linux as far back as I know.


>
> Robert Engels wrote:
>> I don't follow...
>>
>> If a user came to you and said I want to run BibleDesktop, and they
>> have MS-DOS, you would tell them you can't (or you might have to run
>> the very old BibleDesktop 1.0).
>>
>> If they told you they have Windows 98 with Java 1.4 and 256mb or
>> memory, you would say you can run BibleDesktop 2.0 (which includes
>> Lucene 2.0).
>>
>> If they told you they have Windows XP with Java 1.5, you would say
>> you can run BibleDesktop 3.0 (which includes Lucene 2.1).
>>
>> Certainly seems like a packaging/marketing issue for you. Your users
>> would not know if they were running Lucene 1.4, 1.9 2.0 or 2.1, nor
>> would they care.
>>
>>
>>
>> -Original Message-
>> From: DM Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 20,
>> 2006 5:17 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)
>>
>>
>> On Jun 20, 20

Re: Core vs Contrib

2006-06-21 Thread DM Smith

Otis,
I should have taken a closer look before I made my comment. I have just now.
So never mind.
DM

On 6/21/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


Chris,

Judging from a cursory look at LUCENE-406, that should be in the core (my
reasoning was: well, if it's simply the direct opposite of what's already in
the core, it makes sense to sit right next to its opposite in the core with
the "opposite name", or else who'll think to look in contrib, especially
contrib/misc).

We currently have only 2 places for code - core and contrib.  Where new
stuff should go can be discussed, but I think it's almost always very
obvious.

I think of this as:
code/fundamental - what all programs use by using the Lucene API
contrib/useful - what many programs might use - current contrib

I think "mostly examples and code that is not quite ready to be classed as
useful"-type code typically doesn't get committed.  Sometimes it sits in
JIRA and rots, but typically if code is not useful people don't contribute
it, or at least that's what I've observed.

Otis

- Original Message 
From: Chris Hostetter <[EMAIL PROTECTED]>
To: Lucene Dev 
Sent: Wednesday, June 21, 2006 3:43:23 AM
Subject: Re: Core vs Contrib


: I think that it might be good to define 3 levels:
: fundamental - what all programs probably will use
: useful - what many programs might use
: contrib - mostly examples and code that is not quite ready to be
: classed as useful

Those three levels make sense -- but they don't map to what's currently
available in the Subversion repository.  Unless I create a new
"useful" directory and make the neccessary changes to the build system to
build everything in it, my current choices are to put new features in
contrib, or add them to the core.

: On Jun 16, 2006, at 6:03 PM, Chris Hostetter wrote:

: > Are there any written (or unwritten) guidelines on when something
: > should
: > be commited to the core code base vs when a contrib module should
: > be used?
: >
: > Obviously if a new feature rquires changing APIs omodifying one of the
: > existing core classes, then that kind of needs to be in the core --
: > and
: > there is precidence for the idea thatlangauge specific analyzers
: > should go
: > in contrib; and then of course there are things like the Span queries
: > which seem like htey would have been a prime canidate for a contrib
: > module
: > but they aren't (possibly just because when they were added there
: > was no
: > "contrib" -- just the sandbox, and it didn't rev with lucene core).
: >
: > ...But I'm just wondering if as we move forward, there should be some
: > sated policy "unless there is a specific reason why it must be in the
: > core, put it in a contrib" to help keep the core small -- or if i'm
: > wrong
: > about hte general sentiment of the Lucene core.
: >
: >
: > (FYI: my impedus for asking this question is LUCENE-406 -- I think
: > it's a
: > pretty handy feature that everyone might want, but that doesn't
: > mean it's
: > not just as usefull commited in contrib/miscellaneous)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Bug in BooleanScorer2

2006-06-21 Thread DM Smith

Hi,
I have just tried to compile lucene with ecj, Eclipse's compiler, and it
complains of errors with BooleanScorer2. The problematic construction is
present 2x in the class:
   if (doc() > lastScoredDoc) {
 lastScoredDoc = doc();
 coordinator.nrMatchers += super.nrMatchers;
   }
It complains about the calls to doc(), with the following error message:
  The method doc is defined in an inherited type and in an enclosing
scope.

Not sure what the solution should be:
this.doc();
BooleanScorer2.this.doc();
or
super.doc();

If I did, I'd send a patch.

-- DM Smith


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-21 Thread DM Smith

On 6/21/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


Hi,

- Original Message 
From: DM Smith <[EMAIL PROTECTED]>

On Jun 21, 2006, at 7:52 AM, Grant Ingersoll wrote:

> This sounds reasonable to me...

OG: It sounded reasonable to me, too.  I don't quite understand why that's
SO hard to accept.  Btw., I run Lucene 1.4.3 on Java 1.4.2 as well, so I'm
not trying to push 1.5 because of my personal interests.

OG: To be perfectly honest, and that may turn out to be a mistake, this
argument is starting to sound kind of selfish - a handful of people want the
latest Lucene, but won't let it advance with 1.5 contributions because 
1.5wouldn't work for them.  At the same time they are not thinking of the
majority that would benefit from new contributions.




I'm not trying to be selfish. I guess that part of the problem is that I am
not a contributor, but a user. Perhaps I need to "put money where my mouth
is" I'd be happy to maintain the core code as 1.4, if that would be
acceptable. That is, when a patch is provided that is Java 5, I'd supply one
that is 1.4.

Again, we still run Lucene 1.4.3 on Java 1.4.2 on technorati.com , for

example, and I'm purely arguing this case because I don't want to reject the
nice people who contribute 1.5 code.




It has already been agreed upon that Java 5 is appropriate for contrib.

Generally open source projects have a policy to change as little of the file
as possible, only changing what is necessary. I have not heard it stated
here, but it appears that this is followed here too. If that is the case, I
expect that most of what is currently Lucene will continue to be binary
compatible with Java 1.4. (All of it is still binary compatible with 1.2,
but requires a 1.4 jre to run. Under Java 1.3 I get about 65 errors and most
of those are cascading, errors because of errors of classes not found).

IMHO, changes to a file should not look out of place. That is they should
follow the same style as the rest of the file. So, i would argue, if a file
uses old fashioned iteration for all of its loops then a new loop should not
use the enhanced for loop. If collections are not parameterized then a new
collection added to a file should not be paramaterized. If the api does not
currently expose parameterized collections then a new method should not.
Unless the whole file is changed to be consistent.

What I have noticed is that these changes frequently have a ripple effect if
they hit the api.

So I think that some guidelines should be provided before any Java 5 code is
allowed.

If the guidelines become that existing code's api is retained as much as
possible, that existing code changes as little as possible, that each file
needs to maintain an internal consistency of style, that packages need to
maintain a consistent style of the api, ... then I don't see the existing
core of Lucene changing much from an external perspective.

Where I see it changing is when there is a new feature to be added to core
that adds a new class and it takes advantage of Java 5 in such a way that it
does not make sense to do it in Java 1.4. In the first thread on this
subject, it was suggested (not by me) that this would be the point at which
Java 5 would be seriously considered. IMHO, at that point it should not be a
discussion about Java 5 but whether the feature and its implementation are
appropriate at that point in time. And I think that prior to that the debate
concerning Java 5 vs Java 1.4 generates lots of heat and little light.

Anyway, I think I have said all the more that I will say on this thread.

But it is not at all reasonable for us. Our application is designed

with a write one, run anywhere mentality for the hardware/OS base
that our users currently have. Again many of our users use old,

OG: Not to be mean or anything like that, but it sounds like that's a
problem with the design of this particular application.  It has to run on
customers' computers that you are not in control of, yet designed in a
monolithic write one, run anywhere fashion.  Isn't this always going to be a
problem for you?




I meant "write once" not "write one". Not that it changes anything :)

The applicaiton is not monolithic, but plugin based. We hope that the core
libraries that we provide could be used in other contexts by providing a
different UI, e.g. web, pda, phone, 

I think that before you comment on its design, it would be worth taking a
look at first. And because of that I don't take your comment as mean.


If your users are already users of hand-me-down computers and are very used

to running old software and hardware, do they really need the latest and
greatest Lucene?  Why?




The biggest reason is for performance enhancements. They have had the
greatest effect on these older machines. As I have said in another post, I
am very impressed with Lucene's perfor

Re: Bug in BooleanScorer2

2006-06-21 Thread DM Smith

On 6/21/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: Not sure what the solution should be:

I haven't looed at the code so i'm not sure ... but my first thought would
be to try both this.doc() and BooleanScorer2.this.doc(), and see which one
passes all the unit tests.  Of course, if both of them pass the unit
tests, then I'd be *really* worried.



Start worrying!
this.doc() and BooleanScorer2.this.doc() both passed the test.

super.doc() would be the same as this.doc()


Re: Bug in BooleanScorer2

2006-06-21 Thread DM Smith

On 6/21/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: > I haven't looed at the code so i'm not sure ... but my first thought
would
: > be to try both this.doc() and BooleanScorer2.this.doc(), and see which
one
: > passes all the unit tests.  Of course, if both of them pass the unit
: > tests, then I'd be *really* worried.

: Start worrying!
: this.doc() and BooleanScorer2.this.doc() both passed the test.

Hmmm... that is somewhat ominous isn't it?

I would suggest opening a bug on this to track it (with specific compiler
version, file version and line number info if you could) .. and if you
have more time to help debug all the better



I have entered a bug for it and tried to debug it to see if there were an
obvious solution. Because I don't understand the code, I think someone else
will need to determine what it is doing.


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-22 Thread DM Smith


On Jun 22, 2006, at 4:31 AM, John Haxby wrote:


DM Smith wrote:
Generally open source projects have a policy to change as little  
of the file

as possible, only changing what is necessary.
Hmmm.   Necessary by what criterion?   Necessary to make, say,  
Lucene exploit the new interator constructs to avoid run-time type- 
checking?   Necessary to make the code more readable?   Necessary  
to prevent use with Java 1.4? :-)


I'm not sure I've ever seen a policy expressed in that way --  
patches generally should be clear, concise and do what they're  
intended to do, but that doesn't necessarily mean minimising the  
size of the patch and it doesn't necessarily mean keeping the  
source compatible with some old compiler or environment.


I wasn't trying to be argumentative (with this statement). I probably  
stated it badly.


I simply meant that the change that is being made should be done in  
such a way that one applying the patch can readily see what is being  
changed. The most common case of unnecessary change is that of  
whitespace. Changing indentation, changing the placement of curly  
braces, reordering methods and variables and so forth are all  
unnecessary.


There are structural changes to the code that can be done that do not  
change the behavior of the code. Some developers feel strongly about  
one construct over another. For example take the following example:

public int f(int x)
{
if (x > 0)
return x * 2;
else
return x * -2;
}

Some think its dumb that there is an else clause as above.
public int f(int x)
{
if (x > 0)
return x * 2;
return x * -2;
}

Others feel strongly that there should only be one exit in a method:
public int f(int x)
{
int ret = 0;
if (x > 0)
ret = x * -2;
else
ret = x * -2;
return ret;
}

Some like brevity:
public int f(int x) { return x*(x>0?2:-2;}

Such a change is most likely unnecessary.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-22 Thread DM Smith

On 6/22/06, John Haxby <[EMAIL PROTECTED]> wrote:


DM Smith wrote:
> I simply meant that the change that is being made should be done in
> such a way that one applying the patch can readily see what is being
> changed. The most common case of unnecessary change is that of
> whitespace. Changing indentation, changing the placement of curly
> braces, reordering methods and variables and so forth are all
> unnecessary.
>
> [snip]
> Such a change is most likely unnecessary.
Others, probably including me, would disagree.   Changes to make the
source have a consistent style and a consistent layout are not
uncommon.




I agree with you 100% that consistent style and layout are important.


  Look through the Linux kernel change logs for "whitespace

clean up" (or "white space" and "cleanup", spaces are optional :-)).
The GNU glibc maintainers will reject patches that do not conform to the
coding style for glibc -- and that includes stylistic choices like the
ones you mentioned (that I cut in the interests of brevity).




And I also agree here that the committers have the responsibility to be the
gatekeepers of that.



Similarly, and I'm struggling to keep vaguely on-topic here, the Java
1.5 iteration constructs are functionally no different to their 1.4
equivalent.   But to dismiss the 1.5 changes as "syntactic sugar" or
"fluff" is to denigrate their importance to the reliability and
maintenance of software.



In an earlier note, I suggested that there needs to be guidance as to how
Java 5 constructs are to be incorporated into code, contrib and core.
(Sooner or later, core will change to Java 5) Or does anything go?


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-22 Thread DM Smith

On 6/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote:


DM Smith wrote:
> In an earlier note, I suggested that there needs to be guidance as to
how
> Java 5 constructs are to be incorporated into code, contrib and core.
> (Sooner or later, core will change to Java 5) Or does anything go?

Once we decide to accept Java 5 code, we should of course encourage new
contributions to use new language features that improve, e.g., type
safety and maintainability.  If someone wishes to upgrade existing code
to use new language features, these should be done as separate
contributions.

We could state a goal of upgrading all existing code, but that won't
make it happen.  I prefer not to make ambitious roadmaps, but rather
have things driven bottom-up, by contributors.  So my bottom-up-oriented
guideline is that I would not reject contributions that do nothing but
upgrade existing code to use new language features.

Is that the sort of guidance you seek, or do you think we need something
more specific, with feature-by-feature guidelines?  Developing such
guidelines collaboratively might be difficult.




One of the things I liked about 2.0 was that 1.9 was a bridge to it from
1.4.3 via deprecations. It made migration fairly straightforward. I would
like to see this going forward to 2.1. If so there needs to be some thought
to how and whether the existing API will be deprecated in the same fashion
as Java 5 is introduced.

I was thinking more specific, feature-by-feature guidelines. There are not
that many new language features, so I don't think that it would be too
onerous. Since the committers are ultimately the ones to accept or reject a
contribution, I think they can decide.

For example (my opinions),

Static Import
   Try to avoid. It tends to make code unreadable.

Autoboxing/unboxing
   Use sparingly, and if used, provide test cases showing that performance
sensitive code is not adversely affected.

Varargs
   Don't use as a substitute for overloading.
   Use as a replacement for an array of objects. e.g. void f(T[] t) becomes
void f(T... t).

Enhanced for loops:
   Use whenever possible.
   When modifying existing code to add one, change all others, if possible.

Typesafe Enum
   Use wherever possible, internally.
   Associate behavior with the enumerations when it makes sense.
   Replace existing enumerations that are exposed via the API cautiously,
maintaining compatibility with 2.0 as was done with 1.9's deprecations.

Generics
   Use for all internal collections.
   When changing a class, ensure that the class is self consistent in its
usage of generics.
   Enhance existing collections that are exposed via the API cautiously,
maintaining compatibility with 2.0 as was done with 1.9's deprecations.

Annotations:
   ??


Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-22 Thread DM Smith


On Jun 22, 2006, at 5:46 PM, Doug Cutting wrote:


DM Smith wrote:
One of the things I liked about 2.0 was that 1.9 was a bridge to  
it from
1.4.3 via deprecations. It made migration fairly straightforward.  
I would
like to see this going forward to 2.1. If so there needs to be  
some thought
to how and whether the existing API will be deprecated in the same  
fashion

as Java 5 is introduced.


If 2.1 will break API compatibility then it should be called 3.0.   
Major release numbers are all about API compatibility.  If code  
works with Lucene X.Y, then it should also compile against Lucene  
X.Y+1, but it may not work with Lucene X+1.Z.


The third digit in release numbers is for bugfix releases.  In  
effect, X.Y.0 is a beta, or release candidate.  We should never add  
new features or do anything but fix serious bugs in these releases.


I guess this is what has been bugging me. I see the introduction of  
Java 5 into the API as a compatibility break. (Using it internally is  
a client problem for me) I probably would not have spoken up (as  
loudly) if it was declared that 3.0 would be Java 5 and break  
compatibility. (I would still like the 2.9 bridge of deprecations as  
1.9 was.) 3.0 just sounds a lot further off than 2.1 :-)


I have been surveying the code to see what the impact would be to  
using the new Java 5 language features on existing code. So far I  
have gotten through a bit more than half of it. So what I am about to  
say is "half" baked.


The first obvious use of it is the various "enums". My experience  
with them is that the client code does not need to change. (Unless  
methods are added to the enum members and even then probably no  
change will be needed.)


Internally, there are quite a number of collections being used. Using  
generics for these make sense. I have only seen one place where  
collections are used as part of the interface, and that with the new  
Fieldable stuff, i.e. List getFields(); Since this is new, it would  
not be subject to breaking the API since.


The only place I saw that varargs might be used is with the String[]  
stopWords parameter to several methods. But my guess is that it does  
not serve much of an advantage for it, as one won't likely inline a  
list of stop words.


I think the place where it will really come into play will be new code.





JVM version is orthogonal to this, no?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-07 Thread DM Smith

Otis,
	First let me say, I don't want to rehash the arguments for or  
against Java 1.5. We can all go back and read the last two major  
threads on the issue. I don't think there is anything new to say.


However, I think statements like:
"no strong arguments" (I think the arguments were reasonable)
"only a few people argued for it" (Only a few argued against it)
		"very little interest" (Very few votes are on any Jira issue, so  
what does that say)
		"adversaries" (I am not an adversary, I am a very interested party  
with a personal interest in the outcome)

are inflammatory.

	I am willing to do the back port if it is possible and if it does  
not do violence to the implementation.


	There are a number of patches sitting in Jira and it is not clear to  
me which are even close to being applied. I am not interested in  
doing work on patches that are old or might sit around for a while  
until they are applied (and therefore become out of sync).


	If the patches are identified as being worthy of being applied and  
are also identified as being Java 1.5, I will port it and it's test  
if it make sense.


	It has already been granted that contrib allow Java 1.5. So I  
presume that the build has been updated to allow for 1.5 in contrib  
and not in core. If this is not the case I think that the first  
committer (or submitter) of Java 1.5 code to contrib has the  
responsibility to change the build system (or at least ensure that it  
is done.)


	As to the build system, I am not the right person to see that it  
works. I am using Eclipse to do the builds. I maintain 2 workspaces,  
one with core only and that is Java 1.4.2 and the other is core and  
contrib and that is Java 1.5. I have done this so I can help "back  
port" to Java 1.4.


	However, I think you have identified that the core people need to  
make a decision and the rest of us need to go with it. So, I suggest  
that Doug convene such a meeting of the minds and communicate the  
decision to the rest of us.


DM



On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote:


Hi Chuck,

I think bulk update would be good (although I'm not sure how it  
would be different from batching deletes and adds, but I'm sure  
there is a difference, or else you wouldn't have done it).

Java 1.5 - no conclusion, but personally I felt:
- no strong arguments for 1.4, only a few people argued for it
- very little interest from 1.4 adversaries in helping with  
backporting to 1.4 or updating the build system to do the retro  
thing with 1.5 code


So I think you should contribute your code.  This will give us a  
real example of having something possibly valuable, and written  
with 1.5 features, so we can finalize 1.4 vs. 1.5 discussion,  
probably with a vote on lucene-dev.


Otis

- Original Message 
From: Chuck Williams <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, July 6, 2006 5:07:41 PM
Subject: Re: [jira] Commented: (LUCENE-565) Supporting  
deleteDocuments in IndexWriter (Code and Performance Results Provided)


robert engels wrote on 07/06/2006 12:24 PM:

I guess we just chose a much simpler way to do this...

Even with you code changes, to see the modification made using the
IndexWriter, it must be closed, and a new IndexReader opened.

So a far simpler way is to get the collection of updates first, then

using opened indexreader,
for each doc in collection
  delete document using "key"
endfor

open indexwriter
for each doc in collection
  add document
endfor

open indexreader


I don't see how your way is any faster. You must always flush to disk
and open the indexreader to see the changes.




Bulk updates however require yet another approach.  Sorry to change
topics here, but I'm wondering if there was a final decision on the
question of java 1.5 in the core.  If I submitted a bulk update
capability that required java 1.5, would it be eligible for  
inclusion in

the core or not?

Chuck


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread DM Smith


On Jul 8, 2006, at 12:41 PM, Doug Cutting wrote:



Since GCJ is effectively available on all platforms, we could say  
that we will start accepting 1.5 features when a GCJ release  
supports those features.  Does that seem reasonable?


I have been doing a bit of reading on GCJ compatibility. I think it  
is going to come in 2 parts:

1) It supports all the new language features of Java 1.5.
2) It has an implementation of all the new classes and methods that  
Lucene uses.


For me the test is that it is released for MacOSX.

With these three things, I'd be happy :)

DM Smith, stick in the mud :)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread DM Smith


On Jul 8, 2006, at 12:56 PM, Chuck Williams wrote:



I prefer to contribute to Lucene, but my workload simply
does not allow time to be spent on backporting.


I'll stand by my offer to do the backporting when it is possible and  
does not do violence to the implementation.


I'd prefer to wait until the patch that is in Jira is ready to be  
applied. At that point post the request here and I'll see if it is  
doable.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread DM Smith


On Jul 11, 2006, at 12:17 AM, Daniel John Debrunner wrote:


Doug Cutting wrote:

Since GCJ is effectively available on all platforms, we could say  
that
we will start accepting 1.5 features when a GCJ release supports  
those

features.  Does that seem reasonable?


Seems potentially a little strange to me. Does this mean Lucene  
would be

limited to the set of 1.5 features actually implemented by GCJ? So if
there is a 1.5 feature that is not supported by GCJ (while others are)
it cannot be used?

Seems more natural to support the complete 1.5 as defined by Sun/Java,
not the subset implemented by one open source compiler.



Eclipse has a built in compiler called ecj and it can compile Java  
1.6 code today. However, unless classes are provided at runtime for  
linking, one will get build errors.


The same is true with gcj. It still does not fully support Java 1.4,  
(almost there...) classes, though it supports all language features.  
However, on Fedora, Eclipse is built with ecj and to me this  
demonstrates that it is close enough for most use cases.


Gcj will have support for the language features before it supports  
all the new classes.


In terms of Lucene, I believe that the most important classes that  
are wanted are the concurrency ones. (At least that is how I have  
read the posts here.)


I think the measure of readiness is not that it compiles today with  
gcj, but that the Java 1.5 classes and features that are likely to be  
used by lucene are implemented and pass all lucene tests.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread DM Smith


On Jul 11, 2006, at 3:51 AM, Doug Cutting wrote:


Andi Vajda wrote:
I'd be interested in doing this but what is it that we're after in  
'supporting gcj' actually ?


I think it would sufficient to:

1. Compile only .jar and .class with gcj (not .java).
2. Pass all unit tests on a single platform.

This would provide an existence proof that Lucene can run under  
GCJ, and doesn't require solving GCJ's porting issues.




For me the platform of choice would be MacOS X, since 10.3 will never  
have Java 5. (IIRC, 10.4 has only been out for about a year.)

Most of the other platforms will.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: System Requirements

2006-12-09 Thread DM Smith


On Dec 9, 2006, at 9:27 AM, Grant Ingersoll wrote:


  Do we claim to support J2ME?



I don't think so. I think that J2ME is a subset of Java 1.3.  
Personally, I'd like to see support for it even if it were a limited  
subset of Lucene. Perhaps one that could search a lucene index.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-15 Thread DM Smith
I've been reading this thread and here is my take as I will be  
updating jpackage for a 2.1 release.


The content of a binary distribution differs widely as to what is  
included in it. It obviously needs to have one or more jar files that  
represent the product. Beyond that I have never really cared. Some  
include the source; others the javadoc; others readme, licenses,  
changes and the like; etc.


I don't think that one should ever expect to build from a binary  
package. Even if one could.


What is necessary for JPackage (or any other build system) is the  
pristine SOURCE package from which the jars, javadoc, readmes,  
licenses and the like can be generated or obtained.


For JPackage, I have needed to patch the source so that it can build  
(e.g. JPackage 1.6 and 1.7 are Java 1.4.2 distributions and the Java  
5 specific stuff needs to be excuded or modified).


So for me, the question of a release is whether I can package the  
source package for JPackage and whether I can use the binary package  
as is when I don't grab it from JPackage.


From this thread, I gather Lucene 2.1 is ready enough for me.


On Feb 14, 2007, at 12:20 PM, Yonik Seeley wrote:


Release artifacts for review are at
http://people.apache.org/~yonik/staging_area/lucene/
Please vote to officially release these packages as Lucene 2.1.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Modularization (was: Re: New flexible query parser)

2009-03-21 Thread DM Smith


On Mar 21, 2009, at 7:23 AM, Grant Ingersoll wrote:



On Mar 21, 2009, at 11:26 AM, Michael McCandless wrote:

What if (maybe for 3.0, since we can mix in 1.5 sources at that
point?) we change how Lucene is bundled, such that core queries and
contrib/query/* are in one JAR (lucene-query-3.0.jar)?  And
lucene-analyzers-3.0.jar would include contrib/analyzers/* and
org/apache/lucene/analysis/*.  And lucene-queryparser.jar, etc.



Since we are just talking about packaging, why can't we have both/ 
all of the above?  Individual jars, as well as one "big" jar, that  
contains everything (or, everything that has only dependencies we  
can ship, or "everything" that we deem important for an OOTB  
experience).  I, for one, find it annoying to have to go get  
snowball, analyzers, spellchecking and highlighting separate in most  
cases b/c I almost always use all of them and don't particularly  
care if there are extra classes in a JAR, but can appreciate the  
need to do that in specific instances where leaner versions are  
needed.  After all, the Ant magic to do all of this is pretty  
trivial given we just need to combine the various jars into a single  
jar (while keeping the indiv. ones)


If there is a sense that some contribs aren't maintained or aren't  
as "good", then we need to ask ourselves whether they are:
1. stable and solid and don't need much care and are doing just fine  
thank you very much, or,

2. need to be archived, since they only serve as a distraction, or
3. in need of a new champion to maintain/promote them


From a user's perspective (i.e. mine):
I like the idea regarding having more jars. Specifically, I'd like a  
jar that was devoted alone to reading an index. Ultimately, I'd like  
it to work in a J2ME environment, but that is entirely a different  
thread.


There are parts that are needed for both reading and writing  
(directory, analyzers, tokens, and such). And there are parts dealing  
with writing.


There is a distinction between core and contrib regarding backward  
compatibility and quality (perhaps perceived quality).


To me the hardest part in wrapping my head around contrib is that I am  
not clear on why something is in contrib, what it can do, whether it  
is just an example, an alternate way of doing something or it is  
useful exactly as provided.


There are parts of contrib that I see as essential to my application  
(pretty much Grant's list), that I can use as is. While there are many  
different applications of Lucene, my guess is that a non-trivial  
application of Lucene needs to use various contribs. Some contribs are  
high quality and I think deserve the kind of attention that core gets.


What I'd like to see is not more stuff move into core from contrib.  
But rather that we have two levels of contrib: One recommended for use  
and maintained at the same level as core. The other is stuff that is  
"use if you find it useful, and at your own risk". That is, as it is  
today.


I understand the desire to have one jar do it all. Nothing wrong with  
having that too, perhaps lucene-essentials.jar that holds all useful,  
recommended, highly maintained, well-explained stuff.


As to the whole question of the oobe for reviewers, today, it is what  
does Lucene-core.jar do. With more jars it would be what does this  
core collection of jars do or what does lucene-esssentials.


-- DM Smith





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread DM Smith


On Mar 26, 2009, at 6:55 AM, Michael McCandless wrote:


 think we need to deprecate HC, in favor of MRHC (or if we can think
of a better name... ResultCollector?).


I like your suggestion for the name.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-01 Thread DM Smith


On Apr 1, 2009, at 5:29 AM, Shai Erera (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694443 
#action_12694443 ]


Shai Erera commented on LUCENE-1575:


I did not do any "svn copy", just used Eclipse refactoring to change  
the name of the class to Collector. I did not understand though from  
your comment if I should do it differently and post another patch,  
or is that a hint to how someone can still apply the patch?


Assuming you have Subclipse installed into Eclipse:

Subclipse will do an svn rename (aka move) when refactoring names. It  
does an add of the new name and a delete of the old, but retains  
history of the file. This is equal to svn copy followed by svn delete.


If you create a copy of a file inside of Eclipse, Subclipse will do an  
svn copy.


There is a Subclipse property DeferFileDelete that when set (with a  
value of "true") on a folder will change the delete behavior of all  
files below it. I set it on the root of my Eclipse projects, because I  
don't like Subclipse's delete behavior.


I don't understand how it could mess up patching.

I think what was suggested was to go to the file system and do an OS  
copy of the file. Or use Eclipse without Subclipse.


-- DM



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: ArabicAnalyzer

2009-05-02 Thread DM Smith


On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:

I've wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer  
and ArabicFilter. It can handle Arabic text very well.


I've tested it with large set of Arabic documents and it worked OK  
both in term of accuracy and performance.


The code is released under Apache 2.0 license. And I would be very  
happy if you include it with the code tree.


Sounds super. Do you know if it will handle Farsi as well?

-- DM Smith



Re: ArabicAnalyzer

2009-05-03 Thread DM Smith


On May 3, 2009, at 2:56 AM, Robert Muir wrote:


have you looked at the existing ar analyzer in contrib?
I like your analyzer but glancing at your code I think you can get  
the same behavior with the existing one (it also has stopwords &  
stemming but you can disable that). lemme know if i am missing  
something!


wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:  0.374
arabicanalyzer: Average Precision:  0.316 <-- inappropriate  
stemming/stopwords
persianalyzer:   Average Precision:  0.481 <-- i can contrib  
this if someone needs it.


Please do contribute it. While I don't know Persian at all, the  
program I am working on is translated into Farsi and we have several  
indexed texts.






thanks,
robert

On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy > wrote:

Well I don't know really... but it shouldn't be hard to support it.

--- On Sun, 5/3/09, DM Smith  wrote:

From: DM Smith 
Subject: Re: ArabicAnalyzer
To: java-dev@lucene.apache.org
Date: Sunday, May 3, 2009, 4:05 AM



On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:

I've wrote a simple (but yet useful) ArabicAnalyzer,  
ArabicTokenizer and ArabicFilter. It can handle Arabic text very  
well.


I've tested it with large set of Arabic documents and it worked OK  
both in term of accuracy and performance.


The code is released under Apache 2.0 license. And I would be very  
happy if you include it with the code tree.


Sounds super. Do you know if it will handle Farsi as well?

-- DM Smith





--
Robert Muir
rcm...@gmail.com




Re: [jira] Commented: (LUCENE-1629) contrib intelligent Analyzer for Chinese

2009-05-07 Thread DM Smith
I'd prefer it to stay 1.4 for now and would be willing to make the  
change, if needed.


-- DM

On May 7, 2009, at 3:04 PM, Michael McCandless (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707042 
#action_12707042 ]


Michael McCandless commented on LUCENE-1629:


bq. There is lots of code depending on Java 1.5, I use enum,  
generalization frequently. Because I saw these points on apache wiki:


Well... "in general" contrib packages can be 1.5, but the analyzers  
contrib package is widely used, and is not 1.5 now, so it's a  
biggish change to force it to 1.5 with this.  We should at least  
separate discuss in on java-dev if we want to consider allowing 1.5  
code into contrib-analyzers.


We could hold off on committing this until 3.0?


contrib intelligent Analyzer for Chinese


   Key: LUCENE-1629
   URL: https://issues.apache.org/jira/browse/LUCENE-1629
   Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
  Affects Versions: 2.4.1
   Environment: for java 1.5 or higher, lucene 2.4.1
  Reporter: Xiaoping Gao
   Attachments: analysis-data.zip, LUCENE-1629.patch


I wrote a Analyzer for apache lucene for analyzing sentences in  
Chinese language. it's called "imdict-chinese-analyzer", the  
project on google code is here: http://code.google.com/p/imdict-chinese-analyzer/
In Chinese, "我是中国人"(I am Chinese), should be tokenized as  
"我"(I)   "是"(am)   "中国人"(Chinese), not "我" "是中" "国 
人". So the analyzer must handle each sentence properly, or there  
will be mis-understandings everywhere in the index constructed by  
Lucene, and the accuracy of the search engine will be affected  
seriously!
Although there are two analyzer packages in apache repository which  
can handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each  
character or every two adjoining characters as a single word, this  
is obviously not true in reality, also this strategy will increase  
the index size and hurt the performance baddly.
The algorithm of imdict-chinese-analyzer is based on Hidden Markov  
Model (HMM), so it can tokenize chinese sentence in a really  
intelligent way. Tokenizaion accuracy of this model is above 90%  
according to the paper "HHMM-based Chinese Lexical analyzer  
ICTCLAL" while other analyzer's is about 60%.
As imdict-chinese-analyzer is a really fast and intelligent. I want  
to contribute it to the apache lucene repository.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-19 Thread DM Smith


On May 18, 2009, at 11:31 PM, Robert Muir wrote:

I am curious about this, do you think its a better default because  
it avoids the max boolean clauses problem? or because for a lot of  
these scoring doesn't make much sense anyway?


I ran tests on a pretty big index, you pay a price for the constant  
score/filter method. Its slower for the common case searches, it  
only starts to win for queries that return > 10% or so the index,  
but its significantly slower for narrow queries...


I'm just trying to imagine a case where queries that return > 10% or  
so of the index are actually the common/default...?


It is common in my application, a Bible program, that indexes each  
verse (think of a verse as a numbered sentence) as a separate  
document. We index everything, including words that are typically stop  
words as those might be important to our end users. Besides this, the  
top 280 word roots represent 90% of the occurrences.


And on searches, we return everything in book order, unless the user  
wants to score the result. In that case, we return a small, user  
configurable amount of hits ordered by score.


And we are using Lucene out of the box for the most part. We've  
deviated only to incrementally solve performance problems.







 * Constant score rewrite ought to be the default for most multi-term
   queries




--
Robert Muir
rcm...@gmail.com




Re: Lucene's default settings & back compatibility

2009-05-19 Thread DM Smith


On May 19, 2009, at 6:39 AM, Michael McCandless wrote:


On Mon, May 18, 2009 at 8:51 PM, Yonik Seeley
 wrote:

On Mon, May 18, 2009 at 5:06 PM, Michael McCandless
 wrote:

* StopFilter should enable position increments by default


Is this one an actual improvement in the general case?
A query of "foo bar" then wouldn't match a document with "foo and
bar", but a query of "foo the bar" would.


Well... I think I'd argue that this is an improvement, ie the query
"foo bar" should not in fact match a doc with "foo and bar" (unless
your PhraseQuery is using slop).  If you really want slop in your
matching, you should just use slop.

Query "foo the bar" will match document "foo and bar" in either case,
so it's non-differentiating here.

Also, it's bothersome that by default StopFilter throws away more
information than it needs to.  Ie, it's already discarding words
(that's its purpose) but the fact that it then also discards the holes
left behind, by default, is not good, I think.

I went and re-read http://issues.apache.org/jira/browse/LUCENE-1095.
Since both QueryParser and StopFilter can now preserve position
increments, I'd think we would want to change both to do so (in the
*Settings classes)?

(And, QueryParser is another great example where a *Settings class
would give us much more freedom to fix its quirks w/o breaking back
compat.)

Anyway, this is a great debate, in that any defaults set in Lucene
over time should be scrutinized, through discussions like this, rather
than simply always forcefully left on their back-compat defaults.  The
Settings class would give us this freedom.


I really like the idea of a settings class. Another benefit,  
*especially if it is documented well*, user's would be led to tuning  
parameters.


In this settings class, would there be setters/getters so that one  
could take particular defaults and tweak them? E.g. I like one default  
from 2.4 but will take everything else from 3.0. Therefore, I use the  
3.0 defaults, but change one of the settings to match 2.4, as in:


LuceneSettings myDefaults = LuceneSettings.defaults3_0();
myDefaults.setXXX(LuceneSettings.defaults2_4().getXXX());
LuceneSettings.useDefaults(myDefaults);


-- DM


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-19 Thread DM Smith


On May 19, 2009, at 7:45 AM, Michael McCandless wrote:

On Tue, May 19, 2009 at 6:47 AM, DM Smith   
wrote:


It is common in my application, a Bible program, that indexes each  
verse
(think of a verse as a numbered sentence) as a separate document.  
We index
everything, including words that are typically stop words as those  
might be
important to our end users. Besides this, the top 280 word roots  
represent

90% of the occurrences.
And on searches, we return everything in book order, unless the  
user wants
to score the result. In that case, we return a small, user  
configurable

amount of hits ordered by score.


The ability to turn off scoring when sorting by field, new in 2.9,
should be a good performance boost for your use case (if performance
is important).

And we are using Lucene out of the box for the most part. We've  
deviated

only to incrementally solve performance problems.


Right, my impression is most people will stick w/ Lucene's defaults,
incrementally changing only limited settings they come across, which
is why selecting good defaults is vital to Lucene's growth/adoption
(new users especially simply start w/ our defaults).

But we can't pick good defaults when we're so heavily bound by back- 
compat.


Which is why I find the Settings approach so appealing :)  Suddenly,
on all improvements to Lucene, we have the freedom to change our
defaults so a new user sees all such improvements.


From my perspective as a user:
Backward compatibility is important, but it is not a be-all and end-all.

To me, if I can drop in the new jar and get bug fixes that's great. My  
expectation is that searches against an existing index will still  
return the same or, in the case of bug fixes, better results.


What I need to know is when that is not the case. Today, we use a  
naming convention of the Lucene jars to indicate whether that is true.  
I'd be just as happy if there were a compatibility level that I could  
check (I'm having to do that in our code as I change our analyzers  
frequently enough to be embarrassed).


The problem, which might be addressed in the "fixing" of core vs  
contrib, is that we use lots of contrib (analyzers, snowball,  
highlighting) and want it to maintain backward compatibility too. (I'm  
happy that has been the case!) So, perhaps a compatibility level per  
contribution.


The packagers for jpackage consider nearly every release of Lucene to  
break backward compatibility, because they treat Lucene as a whole.  
Perhaps that is the same with other Linux distributions. But because  
backward compatibility does not apply to contrib in a strict fashion,  
one cannot reliably use Lucene from distributions unless such a policy  
is the case.


In any case, I don't think anyone should just drop in a new jar  
without some testing. At a minimum, they should compile with  
deprecations turned on.


Regarding deprecations, I'd also be just as happy if a method was marked
	@deprecated This behavior has changed in with this release,  
2.4.3.

That is, as a warning of changed behavior.

And then on the 3.0 release the warning could be removed.

But then again, my use of Lucene, while very important to my  
application, is very simple and easy to change.


-- DM




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-20 Thread DM Smith
I like the idea of settings however it is implemented. With the  
blurring of core and contrib in the repackaging of Lucene, the issue  
of backward compatibility becomes more difficult. (maybe, I'm  
imagining problems where they don't exist.)


My concern with any of these mechanisms: codifying past behavior. What  
would be the expectation and policy regarding the keeping of such  
settings? Do these now become deprecated? Do we keep a 2.1 settings  
when we are releasing 2.4?


There is an even simpler solution, using existing policy: Frequent  
releases.

Would this be a big issue if we had frequent releases?

To go from 2.0 to 3.0 there is a 2.9 release, where the difference  
between the 2.9 release and the 3.0 release is the removal of  
deprecations. (Though with this release, it will be a bit bigger as it  
will also require Java 5.)


Every time we approach a release, there is a flurry of activity and  
the release gets pushed, for all practical purposes, indefinitely.


Pushed to absurdity: Only have x.0 (perhaps x.0.1) and x.9 releases.  
That is don't have a x.1 minor release. And have releases once a week,  
so that 2 times a month we have a major release. So twice a month we  
can break API compatibility and once a month we can break index  
compatibility.


The stability of the API over time is important to users. Having  
infrequent releases with a great product is a plus. (I'm really glad  
as I'm still stuck using Java 1.4!) Having the bridge via deprecation  
to newness is a great transitional help.


IMHO, the real challenge it to manage the release process. Managing  
that will help manage backward compatibility.


If you were to look at the schedule for Fedora, Eclipse,  
OpenOffice, ..., you'd find that each has a release plan with distinct  
stages. At each stage there is a release (testing/alpha/beta/RC1/ 
RC2/...) As the release process is being entered, generally a release  
branch is created. New development continues on trunk and something of  
perceived value may be ported to the branch. At some point there is a  
feature freeze and only bug fixes are accepted on the release branch.  
Having a branch with parallel development is a very strong  
encouragement to have a quick release, as  it is a pain to have it.


-- DM

On May 20, 2009, at 7:22 AM, Michael McCandless wrote:


On Tue, May 19, 2009 at 4:50 PM, Yonik Seeley
 wrote:

Right, that's exactly why I want to fix it (only one behavior  
allowed

and so for all of 2.* we must match the 2.0 behavior).


I meant one jar per per-jvm gives you one behavior (as is the case  
now).
But by setting a static actsAs version number, you could get a 2.*  
jar

to behave as if it were 2.0, even as behaviors evolve.


So I think you're suggesting something like this: when you use Lucene,
if you want "latest and greatest" defaults, do nothing.

If instead you want defaults to match a particular past minor release,
you must call (say) LuceneVersions.setVersion(VERSION_21).

Any place inside Lucene that has defaults that need to vary by version
would then check this, and act accordingly.

I absolutely love the simplicity of this solution (far simpler than
*Settings classes).  It would achieve what I'm aiming for, which is to
always be free on every minor release to set the defaults for new
users to the latest & greatest.

But:

 1) It means any usage of Lucene inside the JRE must share that same
version default

 2) It's a change to our back-compat policy, in that it requires the
app to declare what version compatibility it requires.

On #1, maybe this is in fact just fine, since as you pointed out
that's de-facto what we have today; it's just that the "actsAs" is
hardwired to 2.0 for all 2.x releases.

On #2, I think shifting the burden onto those apps that do in fact
need strict back-compat on upgrading, to have to set the actsAs is a
good change to our policy.  After all, we think such users are the
minority and putting the burden on new users of Lucene seems
unreasonable.

So net/net I'm +1!

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-21 Thread DM Smith


On May 21, 2009, at 7:17 AM, Michael McCandless wrote:


 1) Default settings can change; we will always choose defaults based
on "latest & greatest for new users".  This only affects "runtime
behavior".  EG in 2.9, when sorting by field you won't get scores
by default.  When we do this we should clearly document the
change, and what settings one could use to get back to the old
behavior, in CHANGES.txt.


I'd reverse 1 and 2 and note in 1 that the old behavior might be  
deprecated.




 2) An API, once released as deprecated, is fair game to be removed
in the next minor release.


I presume you mean that it will be present for at least one full minor  
release. So, if at 3.1.5 a deprecation is introduced, then it won't be  
removed until 3.3 at the earliest, because 3.2 was the first minor  
release in which it appeared at the start. I don't think it is fair to  
expect users to get every last point release.


If so +1 from a user.



We still only make bug fixes on point releases, support the index file
format until the next major release -- those don't change.


Is it just the index file format? I would hope that the behavior of  
filters, analyzers and such would not change so as to invalidate an  
index.


-- DM


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: SegmentReader instantiation

2009-05-21 Thread DM Smith

Michael McCandless wrote:

On Thu, May 21, 2009 at 10:53 AM, Earwin Burrfoot  wrote:

  

I agree we should probably remove it, unless there are users relying
on this.  Maintaining side-by-side sources is difficult with time.
  

As I said in the initial message, this feature introduces no runtime
behaviour changes, so you can't really 'rely' on it and break if it's
removed.



Well maybe someone loves the performance improvement... and took
it further by making their own native code extensions.  I'm not
sure how much these gains are.  But people can get quite crazy when
it comes to performance :)

  

Can you send an email to java-user to take a quick survey on whether
anyone is somehow needing this?
  

Never subscribed there. Too low signal-to-noise ratio. I can, but ..
is it a must? :)



In fact I find many good ideas for improving Lucene come from our
users, and one can't really understand what's important in Lucene
without being grounded on how it's used.  "Development" and "using" go
hand in hand.

The discussions that take place there spawn still more ideas, and
following those dicussions causes me to think harder about the areas
being discussed, so I learn more myself about Lucene and find
more things to improve and ponder.

Not to mention when there's a sneaky bug, it usually appears on the
users list first.  I jump a those ;)

So, yeah, I think it is a must.  It's likely nobody will respond after
a few days, then we should remove gcj.

I'll go ask if anyone is relying on gcj native code on java-user.


Fedora uses Lucene for Eclipse and uses gcj for Eclipse. It might be 
used elsewhere. Don't know if that means they need the gcj stuff in 
Lucene. I just wish they'd rework to use openjdk.


-- DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-21 Thread DM Smith

Michael McCandless wrote:

On Thu, May 21, 2009 at 8:24 AM, DM Smith  wrote:
  

On May 21, 2009, at 7:17 AM, Michael McCandless wrote:



 1) Default settings can change; we will always choose defaults based
   on "latest & greatest for new users".  This only affects "runtime
   behavior".  EG in 2.9, when sorting by field you won't get scores
   by default.  When we do this we should clearly document the
   change, and what settings one could use to get back to the old
   behavior, in CHANGES.txt.
  

I'd reverse 1 and 2 and note in 1 that the old behavior might be deprecated.



OK.

  

 2) An API, once released as deprecated, is fair game to be removed
   in the next minor release.
  

I presume you mean that it will be present for at least one full minor
release. So, if at 3.1.5 a deprecation is introduced, then it won't be
removed until 3.3 at the earliest, because 3.2 was the first minor release
in which it appeared at the start. I don't think it is fair to expect users
to get every last point release.



Right.

  

We still only make bug fixes on point releases, support the index file
format until the next major release -- those don't change.
  

Is it just the index file format? I would hope that the behavior of filters,
analyzers and such would not change so as to invalidate an index.



Can you give an example of such changes?  EG if we fix a bug in
StandardAnalyzer, we will default it to fixed for new users and expect
you on upgrading to read CHANGES.txt and change your app to set that
setting to its non-defaulted value.
  
I guess I'm not too concerned with bug fixes. I'm kind of a nut when it 
comes to correctness. But, I'd want to know that such a bug broke strict 
backward compatibility. I guess I don't want backward compatibility to 
get too much in the way of fixing bugs. (I think sometimes it has.) I 
wouldn't expect a compatibility flag to preserve buggy behavior. I guess 
I'm willing to go to extra effort to work with bug fixes. But I wouldn't 
expect others to feel the same way.


Off the top of my head, in addition to Robert's stop word list, let's 
say that the filter that strips accents (I can't remember the name) is 
changed to be more than Latin-1 to ASCII folding. That would invalidate 
existing indexes.


Or a new and improved filter is created to replace a class I use and the 
old class is deprecated. If that old class goes away, my index is 
invalidated.


So if the stream of tokens out of an analyzer changes or the results of 
a filter is different, an index built with them is invalidated. If the 
output remains the same, I shouldn't care what has changed internally 
and probably don't care if the API has changed.


I don't know if it matters to this discussion, but there's a lot in 
contrib that people (of which I am one :) expect to be stable. I'm 
looking forward to the repackaging effort.


-- DM



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-21 Thread DM Smith

Michael McCandless wrote:

On Thu, May 21, 2009 at 12:19 PM, Robert Muir  wrote:
  

even as simple as changing default stopword list for some analyzer could be
an issue, if the user doesn't re-index in response to that change.



OK, right.

So say we forgot to include "the" in the default English stopwords
list (yes, an extreme example...).
  
"The" would be a bug fix. I think most users would expect that to be 
fixed. They might be willing, as I would be, to require all their 
indexes using that stopword list to be rebuilt.


How about a change that would be a bit more controversial, to which some 
would agree and others would not.


I wonder how many people are creating metadata about indexes so that 
they can track when an index could/should/must be rebuilt? Some kind of 
"versioned tool chain info" for the index. If analyzers and filters can 
change output then it needs to be tracked.


-- DM


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Yonik Seeley wrote:

On Fri, May 22, 2009 at 1:22 PM, Michael McCandless
 wrote:
  

(That said, unrelated to this discussion, I would actually like to
record per-segment which version of Lucene wrote the segment; this
would be very helpful when debugging issues like LUCENE-1474 where I
need to know if the segments were written by 2.4.0 or 2.4.1).



That's a great idea, if for debugging only, and it shouldn't be
limited  to just the version that wrote the segment.  I could see a
debug section or file that could even contain more info if the right
flags are set.


I would like to see this, too. In addition, I'd like to store what was 
used to create the index, that is the ordered chain of analyzers and 
filters on a per field basis.


But whether it is baked into the index or a separate file, or not part 
of Lucene, I'm in the process of figuring out how/where to add it to my 
code.


-- DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Michael McCandless wrote:

On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
 wrote:
  

when working on 3.1 if we make some great improvement, I'd like new users in
3.1 to see the improvement by default.
  

Sounds like an argument for more frequent major releases.



Yeah.  Or "rebranding" what we now call minor as major releases, by
changing our policy ;) Or "rebranding" to Lucene 2009.

But: localized improvements (like the sizable performance gain from
turning off scoring when sorting by field) should not have to wait for
a major release to benefit new users.  I think they should be on by
default on the next release.


This proposed policy change of allowing backward compatibility in the 
API to change within a major release is nothing more than smoke and 
mirrors. But I see two side effects:
1) Debian, Fedora, and perhaps other Linux distributions, see minor 
releases as maintaining backward compatibility. With Debian, they bump 
their major revision number with each break in backward compatibility. I 
didn't check, but my guess is that the version name of Lucene in Debian 
corresponds with that of Lucene itself. I'd hate for that to change. How 
would you like to see Debian to name it Lucene 4 or Lucene 5, when we 
are doing Lucene 3.x. It gets confusing. (Real example:  libsword7, 
which corresponds to the 1.5.11 release of SWORD and libsword8 
corresponds to 1.6.0.)


2) Backward compatibility of the index is at least 2 major revisions and 
that is not proposed to change. Now with this, we effectively postpone 
it indefinitely. Rather than the index being allowed to change when the 
API has broken compatibility at most 2 times, with this proposed change, 
we can break API compatibility a dozen times. At the future point where 
this policy is brought into question, with something like "Now that we 
can break backward compatibility in the API frequently, we need to 
change our policy for the index to match", then we will have come full 
circle.


At first, I liked the idea a lot, but now less so. Now I'm leaning 
toward changing major revision number when backward compatibility 
changes and for more frequent major releases if that is what it takes.


This was the thrust of my tongue-in-cheek proposal of weekly minor and 
monthly major releases.


I also share Marvin's and others' concerns about sneaky bugs introduced 
by globals. In my situation, Lucene is part of a desktop application and 
the user can create hundreds of indexes and use them within the 
application. With a *.deb or *.rpm, we'll have to specify that they 
cannot use anything but the minor release for which the application was 
designed. Before, we could say that one could drop in anything of the 
same major release number.


I don't think I am alone or unique in embedding Lucene into a desktop 
application. I know it is a part of Eclipse (at least on Fedora).


This change might have the opposite effect of making people's perception 
of Lucene as one of instability. Guard carefully against that, please!


-- DM



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Marvin Humphrey wrote:

I feel the opposite: I'd like new users to see improvements by
default, and users that require strict back-compate to ask for that.



By "strict back-compat", do you mean "people who would like their search app to
not fail silently"? ;)  A "new user" who follows your advice...

   // haha stupid noob 
   StandardAnalyzer analyzer = new StandardAnalyzer(Versons.LATEST);


... is going to get screwed when the default tokenization behavior changes.
And it would be much worse if we follow my preference for making the arg
optional without following my preference for keeping defaults intact:

   // haha eat it luser 
   StandardAnalyzer analyzer = new StandardAnalyzer();


It's either make the arg mandatory when changing default behavior and
recommend that new users pass a fixed argument, or make it optional but keep
defaults intact between major releases.
I think I see your point: A new user is such only for the first release 
that they use Lucene. For a first use, there is no backward 
compatibility problem. On the use of a subsequent release, their code 
still gets the latest and greatest and now by the choice they were 
guided to make, they may have broken backward compatibility.


So for any user, the only save, thus acceptable use is to never have 
Versions.LATEST, but only a specific version.


-- DM


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Michael McCandless wrote:

On Fri, May 22, 2009 at 2:27 PM, DM Smith  wrote:
  

Marvin Humphrey wrote:


I feel the opposite: I'd like new users to see improvements by
default, and users that require strict back-compate to ask for that.



By "strict back-compat", do you mean "people who would like their search
app to
not fail silently"? ;)  A "new user" who follows your advice...

  // haha stupid noob   StandardAnalyzer analyzer = new
StandardAnalyzer(Versons.LATEST);

... is going to get screwed when the default tokenization behavior
changes.
And it would be much worse if we follow my preference for making the arg
optional without following my preference for keeping defaults intact:

  // haha eat it luser   StandardAnalyzer analyzer = new
StandardAnalyzer();

It's either make the arg mandatory when changing default behavior and
recommend that new users pass a fixed argument, or make it optional but
keep
defaults intact between major releases.
  

I think I see your point: A new user is such only for the first release that
they use Lucene. For a first use, there is no backward compatibility
problem. On the use of a subsequent release, their code still gets the
latest and greatest and now by the choice they were guided to make, they may
have broken backward compatibility.

So for any user, the only save, thus acceptable use is to never have
Versions.LATEST, but only a specific version.



Right, we would have to not provide Versions.LATEST, ie if you want
latest, you'd pick Versions.LUCENE_29 (in 2.9).


Why go to all this trouble for a new user?

Let's pretend that there are 1,000 new users every release. After 12 
releases, there are still only 1000 new users but now 11000 old users.


How does it help an old user?

Those 11000 old users now have to update their code to 
Versions.Lucene_301 (or whatever the latest is) to get the latest 
changes, but they are also going to have to understand what that means 
and figure out what parts of their application now behave in a broken 
manner. Where are they to go to find out that info? CHANGES.txt?


When I was a new user, I had to look at example code, read faqs, wiki, 
javadoc, java-users  It was a learning curve, fortunately not steep.


Don't those resources need to be maintained so as to match the 
best/recommended practices? Can't that be the place where new users are 
informed?


-- DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Michael McCandless wrote:

Well... I would expect & hope Lucene's adoption is growing with time,
so the number of new users should increase on each release.  For a
healthy project that's relatively young compared to its potential user
base, that growth should be exponential.

And, I'd expect the vast majority of old users don't ever upgrade.

Furthermore, when a reviewer (typically a "new user") tests Lucene
against other search engines, and fails to check our Wiki for all the
things we all know you have to do to get good search or indexing
performance, and then reports in a well-read blog somewhere that
Lucene's performance isn't great when compared to other search
engines, and lots of other people read that, cite it, share it, etc.,
those people are less inclined to try Lucene.  This then stunts
Lucene's growth.
  
I would think a reviewer would have to read something other than just 
javadocs to figure out how to set up Lucene. While the javadocs are 
good, and getting better, I did not find them helpful at first. The 
class-at-a-time approach to documentation is too fragmented for me. So, 
what is it that they use that leads to such unfavorable results?



Yes, we all sit here and say "well that's not a fair review because
you didn't properly tune Lucene", yet, this kind of thing happens all
the time.  If Lucene had better defaults out of the box it'd reduce
how often that happens.

Mike

On Fri, May 22, 2009 at 2:49 PM, DM Smith  wrote:
  

Michael McCandless wrote:


On Fri, May 22, 2009 at 2:27 PM, DM Smith  wrote:

  

Marvin Humphrey wrote:



I feel the opposite: I'd like new users to see improvements by
default, and users that require strict back-compate to ask for that.




By "strict back-compat", do you mean "people who would like their search
app to
not fail silently"? ;)  A "new user" who follows your advice...

 // haha stupid noob   StandardAnalyzer analyzer = new
StandardAnalyzer(Versons.LATEST);

... is going to get screwed when the default tokenization behavior
changes.
And it would be much worse if we follow my preference for making the arg
optional without following my preference for keeping defaults intact:

 // haha eat it luser   StandardAnalyzer analyzer = new
StandardAnalyzer();

It's either make the arg mandatory when changing default behavior and
recommend that new users pass a fixed argument, or make it optional but
keep
defaults intact between major releases.

  

I think I see your point: A new user is such only for the first release
that
they use Lucene. For a first use, there is no backward compatibility
problem. On the use of a subsequent release, their code still gets the
latest and greatest and now by the choice they were guided to make, they
may
have broken backward compatibility.

So for any user, the only save, thus acceptable use is to never have
Versions.LATEST, but only a specific version.



Right, we would have to not provide Versions.LATEST, ie if you want
latest, you'd pick Versions.LUCENE_29 (in 2.9).
  

Why go to all this trouble for a new user?

Let's pretend that there are 1,000 new users every release. After 12
releases, there are still only 1000 new users but now 11000 old users.

How does it help an old user?

Those 11000 old users now have to update their code to Versions.Lucene_301
(or whatever the latest is) to get the latest changes, but they are also
going to have to understand what that means and figure out what parts of
their application now behave in a broken manner. Where are they to go to
find out that info? CHANGES.txt?

When I was a new user, I had to look at example code, read faqs, wiki,
javadoc, java-users  It was a learning curve, fortunately not steep.

Don't those resources need to be maintained so as to match the
best/recommended practices? Can't that be the place where new users are
informed?

-- DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

  



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith

Earwin Burrfoot wrote:



 4. [Maybe?] Allow certain limited changes that will require source
code changes in your app on upgrading to a new minor release:
adding a new method to an interface, adding a new abstract method
to an abstract class, renaming of deprecated methods.


Yahoo! The right to rename deprecated things makes the need to
deprecate VS simply remove bearable.


I've also noticed the ugly name problem. I would be in favor of a 
cleanup of ugly names.


Using the existing policy mechanism, one could (I haven't thought this 
through):


In 3.0, remove the deprecations.

Do a 3.9 release with:
a) add methods and classes with the good names. These should be an exact 
copy of the ugly named code.

b) deprecate the ugly names.
c) no other changes.

Release 4.0 with deprecations removed.

These three releases could happen simultaneously.

(Of course, if we want to do this, we could have a policy that we have a 
2.9.0 and an 2.9.1 (rather than 3.9) followed by a 3.0 with good names.)


Now we are back to good names. And drifting can start all over again.

-- DM

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



  1   2   3   >