from:"Uwe Schindler"

Add a scoring DistanceQuery that does not need caches and separate filters
--

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1


In a chat with Chris Male and my own ideas when implemnting for PANGAEA, I 
thought about the broken distance query in contrib. It lacks the folloing 
features:
- It needs a query for the encldoing bbox (which is constant score)
- It needs a separate filter for filtering out distances
- It has no scoring, so if somebody wants to sort by distance, he needs to use 
the custom sort. For that to work, spatial caches distance calculation (which 
is borken for multi-segment search)

The idea is now to combine all three things into one query, but customizeable:

We first thought about extending CustomScoreQuery and calculate the distance 
from FieldCache in the customScore method and return a score of 1 for 
distance=0, score=0 on the max distance and score0 for farer hits, that are in 
the bounding box but not in the distance circle. To filter out such negative 
scores, we would need to override the scorer in CustomScoreQuery which is 
priate.

My proposal is now to use a very stripped down CustomScoreQuery (but not extend 
it) that does call a method getDistance(docId) in its scorer's advance and 
nextDoc that calculates the distance for the current doc. It stores this 
distance also in the scorer. If the distance  maxDistance it throws away the 
hit and calls nextDoc() again. The score() method will reurn per default 
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
distance. So the distance is only calculated one time in nextDoc()/advance().

To be able to plug in custom scoring, the following methods in the query can be 
overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance - 
distance)/maxDistance; allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer to 
get the distance.

This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

[
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

A first idea of the Query, it does not even compile as some classes are missing
(coming with Chris' later patches), but it shows how it should work and how its
customizeable.

Add a scoring DistanceQuery that does not need caches and separate filters
--

Key: LUCENE-2395
URL: https://issues.apache.org/jira/browse/LUCENE-2395
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/spatial
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: DistanceQuery.java

In a chat with Chris Male and my own ideas when implementing for PANGAEA, I
thought about the broken distance query in contrib. It lacks the following
features:
- It needs a query/filter for the enclosing bbox (which is constant score)
- It needs a separate filter for filtering out hits to far away (inside bbox
but outside distance limit)
- It has no scoring, so if somebody wants to sort by distance, he needs to
use the custom sort. For that to work, spatial caches distance calculation
(which is broken for multi-segment search)
The idea is now to combine all three things into one query, but customizeable:
We first thought about extending CustomScoreQuery and calculate the distance
from FieldCache in the customScore method and return a score of 1 for
distance=0, score=0 on the max distance and score0 for farer hits, that are
in the bounding box but not in the distance circle. To filter out such
negative scores, we would need to override the scorer in CustomScoreQuery
which is priate.
My proposal is now to use a very stripped down CustomScoreQuery (but not
extend it) that does call a method getDistance(docId) in its scorer's advance
and nextDoc that calculates the distance for the current doc. It stores this
distance also in the scorer. If the distance maxDistance it throws away the
hit and calls nextDoc() again. The score() method will reurn per default
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated
distance. So the distance is only calculated one time in nextDoc()/advance().
To be able to plug in custom scoring, the following methods in the query can
be overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance
- distance)/maxDistance; allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an
DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a
NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g.
wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer
to get the distance.
- a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns
for a given doc id the lat/lng. This method is called per IndexReader one
time in scorer creation and will retrieve the coordinates. By that we support
FieldCache or whatever.
This query is almost finished in my head, it just needs coding :-)

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.


[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857384#action_12857384
 ] 

Uwe Schindler commented on LUCENE-2396:
---

Are you sure you want to use LUCENE_CURRENT in some ctors?

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.


[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857402#action_12857402
 ] 

Uwe Schindler commented on LUCENE-2396:
---

bq. Static? Weren't you against that!? 

He meant a static final! It is just to make the analyzers that depend on core 
stuff fix to a specific version. Until we have no more analyzers in core 
exspect Whitespace.

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Proposal about Version API relaxation

2010-04-15 Thread Uwe Schindler

Hi Earwin,

I am strongly +1 on this. I would also make the Release Manager for 3.1, if 
nobody else wants to do this. I would like to take the preflex tag or some 
revisions before (maybe without the IndexWriterConfig, which is a really new 
API) to be 3.1 branch. And after that port some of my post-flex-changes like 
the StandardTokenizer refactoring back (so we can produce the old analyzer 
still without Java 1.4).

So +1 on branching pre-flex and release as 3.1 soon. The Unicode improvements 
rectify a new release. I think also s1monw wants to have this.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Earwin Burrfoot [mailto:ear...@gmail.com]
 Sent: Thursday, April 15, 2010 8:15 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 I'd like to remind that Mike's proposal has stable branches.
 
 We can branch off preflex trunk right now and wrap it up as 3.1.
 Current trunk is declared as future 4.0 and all backcompat cruft is
 removed from it.
 If some new features/bugfixes appear in trunk, and they don't break
 stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
 etc
 
 Thus, devs are free to work without back-compat burden, bleeding edge
 users get their blood, conservative users get their stability + a
 subset of new features from stable branches.
 
 
 On Thu, Apr 15, 2010 at 22:02, DM Smith dmsmith...@gmail.com wrote:
  On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
 
  First, the index format. IMHO, it is a good thing for a major
 release to
  be
  able to read the prior major release's index. And the ability to
 convert
  it
  to the current format via optimize is also good. Whatever is
 decided on
  this
  thread should take this seriously.
 
 
  Optimize is a bad way to convert to current.
  1. conversion is not guaranteed, optimizing already optimized index
 is a
  noop
  2. it merges all your segments. if you use
 BalancedSegmentMergePolicy,
  that destroys your segment size distribution
 
  Dedicated upgrade tool (available both from command-line and
  programmatically) is a good way to convert to current.
  1. conversion happens exactly when you need it, conversion happens
 for
  sure, no additional checks needed
  2. it should leave all your segments as is, only changing their
 format
 
 
 
  It is my observation, though possibly not correct, that core only
 has
  rudimentary analysis capabilities, handling English very well. To
 handle
  other languages well contrib/analyzers is required. Until
 recently it
  did
  not get much love. There have been many bw compat breaking changes
  (though
  w/ version one can probably get the prior behavior). IMHO, most of
  contrib/analyzers should be core. My guess is that most non-trivial
  applications will use contrib/analyzers.
 
 
  I counter - most non-trivial applications will use their own
 analyzers.
  The more modules - the merrier. You can choose precisely what you
 need.
 
 
  By and large an analyzer is a simple wrapper for a tokenizer and some
  filters. Are you suggesting that most non-trivial apps write their
 own
  tokenizers and filters?
 
  I'd find that hard to believe. For example, I don't know enough
 Chinese,
  Farsi, Arabic, Polish, ... to come up with anything better than what
 Lucene
  has to tokenize, stem or filter these.
 
 
 
  Our user base are those with ancient,
  underpowered laptops in 3-rd world countries. On those machines it
 might
  take 10 minutes to create an index and during that time the machine
 is
  fairly unresponsive. There is no opportunity to do it in the
  background.
 
 
  Major Lucene releases (feature-wise, not version-wise) happen like
  once in a year, or year-and-a-half.
  Is it that hard for your users to wait ten minutes once a year?
 
 
   I said that was for one index. Multiply that times the number of
 books
  available (300+) and yes, it is too much to ask. Even if a small
 subset is
  indexed, say 30, that's around 5 hours of waiting.
 
  Under consideration is the frequency of breakage. Some are suggesting
 a
  greater frequency than yearly.
 
  DM
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 
 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Proposal about Version API relaxation

2010-04-15 Thread Uwe Schindler

I wish we could have a face to face talk like in the evenings at ApacheCon :(

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
 Ingersoll
 Sent: Thursday, April 15, 2010 9:46 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 From IRC:
 why do I get the feeling that everyone is in heated agreement on the
 Version thread?
 there are some cases that mean people will have to reindex
 in those cases, we should tell people they will have to reindex
 then they can decide to upgrade or not
 all other cases, just do the sensible thing and test first
 I have yet to meet anyone who simply drops a new version into
 production and says go
 
 So, as I said earlier, why don't we just move forward with it, strive
 to support reading X-1 index format in X and let the user know the
 cases in which they will have to re-index. If a migration tool is
 necessary, then someone can write it at the appropriate time.  Just as
 was said w/ the Solr merge, it's software.  If it doesn't work, we can
 change it.  Thank goodness we don't have a back compatibility policy
 for our policies!
 
 -Grant
 
 
 
 
 On Apr 15, 2010, at 3:35 PM, Michael McCandless wrote:
 
  Unfortunately, live searching against an old index can get very
 hairy.
  EG look at what I had to do for the flex API on pre-flex index flex
  emulation layer.
 
  It's also not great because it gives the illusion that all is good,
  yet, you've taken a silent hit (up to ~10% or so) in your search
  perf.
 
  Whereas building  maintaining a one-time index migration tool, in
  contrast, is much less work.
 
  I realize the migration tool has issues -- it fixes the hard changes
  but silently allows the soft changes to break (ie, your analyzers my
  not produce the same tokens, until we move all core analyzers outside
  of core, so they are separately versioned), but it seems like a good
  compromise here?
 
  Mike
 
  2010/4/15 Shai Erera ser...@gmail.com:
  The reason Earwin why online migration is faster is because when u
  finally need to *fully* migrate your index, most chances are that
 most
  of the segments are already on the newer format. Offline migration
  will just keep the application idle for some amount of time until
 ALL
  segments are migrated.
 
  During the lifecycle of the index, segments are merged anyway, so
  migrating them on the fly virtually costs nothing. At the end, when
 u
  upgrade to a Lucene version which doesn't support the previous index
  format, you'll on the worse case need to migrate few large segments
  which were never merged. I don't know how many of those there will
 be
  as it really depends on the application, but I'd bet this process
 will
  touch just a few segments. And hence, throughput wise it will be a
 lot
  faster.
 
  We should create a migrate() API on IW which will touch just those
  segments and not incur a full optimize. That API can also be used
 for
  an offline migration tool, if we decide that's what we want.
 
  Shai
 
  On Thursday, April 15, 2010, jm jmugur...@gmail.com wrote:
  Not sure if plain users are allowed/encouraged to post in this
 list,
  but wanted to mention (just an opinion from a happy user), as other
  users have, that not all of us can reindex just like that. It would
  not be 10 min for one of our installations for sure...
 
  First, i would need to implement some code to reindex, cause my
 source
  data is postprocessed/compressed/encrypted/moved after it arrives
 to
  the application, so I would need to retrieve all etc. And then
  reindexing it would take days.
  javier
 
  On Thu, Apr 15, 2010 at 9:04 PM, Earwin Burrfoot ear...@gmail.com
 wrote:
  BTW Earwin, we can come up w/ a migrate() method on IW to
 accomplish
  manual migration on the segments that are still on old versions.
  That's not the point about whether optimize() is good or not. It
 is
  the difference between telling the customer to run a 5-day
 migration
  process, or a couple of hours. At the end of the day, the same
  migration code will need to be written whether for the manual or
  automatic case. And probably by the same developer which changed
 the
  index format. It's the difference of when does it happen.
 
  Converting stuff is easier then emulating, that's exactly why I
 want a
  separate tool.
  There's no need to support cross-version merging, nor to emulate
 old APIs.
 
  I also don't understand why offline migration is going to take
 days
  instead of hours for online migration??
  WTF, it's gonna be even faster, as it doesn't have to merge
 things.
 
  --
  Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
  Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
  ICQ: 104465785
 
  --
 ---
  To unsubscribe, e-mail: java-dev-unsubscr

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

[
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

small updates to Chris' patches.

Add a scoring DistanceQuery that does not need caches and separate filters
--

Key: LUCENE-2395
URL: https://issues.apache.org/jira/browse/LUCENE-2395
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/spatial
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: DistanceQuery.java

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

[
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2395:
--

Attachment: (was: DistanceQuery.java)

Add a scoring DistanceQuery that does not need caches and separate filters
--

Key: LUCENE-2395
URL: https://issues.apache.org/jira/browse/LUCENE-2395
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/spatial
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: DistanceQuery.java

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

[
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

Added Weight.explain() and fixed a missing replacement.

Add a scoring DistanceQuery that does not need caches and separate filters
--

Key: LUCENE-2395
URL: https://issues.apache.org/jira/browse/LUCENE-2395
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/spatial
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: DistanceQuery.java, DistanceQuery.java

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: issues.apache.org compromised: please update your passwords

2010-04-14 Thread Uwe Schindler

   Hi Grant,
  
It is that user, who is assigned to the very early JIRA issues,
 e.g.:
https://issues.apache.org/jira/browse/LUCENE-1
  
I changed the password of this user in response to that email (for
 security), but I think we should simply let infra remove it. The
 problem is, almost anybody can instruct JIRA to reset the password and
 let JIRA send it again to the email which is the public java-dev
 list. And then it is public again.
 
  If the user is still needed (for whatever reason) maybe the user can
  be disabled, or maybe they can be removed from the list of users who
  have update access to the JIRA.
 
  But so long as the user is not an administrator, then it's no
  different really from any other account that can be created by Joe
  Public.
 
 Yes, that account has no special access. If someone wants to unassign
 the 319
 issues this user is the 'assignee' of, then the account can be deleted:
 
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/order=
 ASCsorter/field=priorityassignee=java-
 dev%40lucene.apache.orgreset=trueassigneeSelect=specificusermode=hid
 e
 

I disabled the account by assigning a dummy eMail and gave it a random password.

I was not able to unassign the issues, as most issues were Closed, where no 
modifications can be done anymore. Reopening and changing assignment and 
reverting to closed is too risky, as after reopening you don’t know anymore 
which issues you need to revert to closed after unassignment...

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Proposal about Version API relaxation

2010-04-14 Thread Uwe Schindler

+1, Thanks for this detailed explanation! In my apps I have no problem to 
define a static default myself. And passing this to every ctor is easy, so 
where is the problem? Look at solr, since we introduced the version param to 
solrconfig, you have exactly that behavior, but its limited to this solr 
installation using this solr config. And you can still override.

Lucene is a library, no application, so it's not in lucene's responsibility to 
handle such things. Configuration and configuration objects passing around is 
an application responsibility.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Wednesday, April 14, 2010 6:58 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 On 04/14/2010 12:29 PM, Marvin Humphrey wrote:
  On Wed, Apr 14, 2010 at 08:30:14AM -0400, Grant Ingersoll wrote:
 
  The thing I keep going back to is that somehow Lucene has managed
 for years
  (and I mean lots of years) w/o stuff like Version and all this
 massive back
  compatibility checking.
 
  Non-constant global variables are an anti-pattern.
 
 
 I think clinging to such rules in the face of all situations is an
 anti-pattern :) I take it as a rule of thumb.
 
 In regards to this discussion:
 
 I agree that the Version stuff is a bit of a mess. I also agree that
 many users will want to just use one version across their app that is
 easy to change.
 
 I disagree that we should allow that behavior by just using a
 constructor without the Version param - or that you would be forced to
 set the static Version setting by trying to run your app and seeing an
 exception happen. That is all a bit ugly.
 
 Too many users will not understand Version or care to if they see they
 can skip passing it. IMO, you should have to specify that you are
 looking for this behavior. In which case, why not just specify it using
 the version param itself :) E.g. if a user wants to get this kind of
 static behavior, they can just choose to do it on their own, and pass
 their *own* static Version constant to all the constructors.
 
 I don't think we need to go through this hassle and introduce a less
 than ideal solution just so that users can pass one less param -
 especially when I think you should explicitly choose this behavior
 rather than get it by ignoring the Version param.
 
 --
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Proposal about Version API relaxation

2010-04-14 Thread Uwe Schindler

 And 2.9's backwards compatibility layer in
 TokenStream
 was significantly slower.

I protest! No, it was not slower, only at the beginning because of missing 
reflection caching! But this also affected the *new* API. With 2.9.x and old 
TokenStreams there is no speed difference, really.

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Proposal about Version API relaxation

2010-04-13 Thread Uwe Schindler

Hi Shai,

 

one of the problem I have is: That is a static default! We want to get rid of 
them (and did it mostly, only some relicts remain), so there are no plans to 
reimplement such a thing again. The badest one is BooleanQuery.maxClauseCount. 
The same applies to all types of sysprops. As Lucene and solr is mostly running 
in servlet containers, this type of thing  makes web applications no longer 
isolated. This is also a general contract for libraries: never ever rely on 
sysprops or statics.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Shai Erera [mailto:ser...@gmail.com] 
Sent: Tuesday, April 13, 2010 5:27 PM
To: java-dev@lucene.apache.org
Subject: Proposal about Version API relaxation

 

Hi

I'd like to propose a relaxation on the Version API. Uwe, please read the 
entire email before you reply :).

I was thinking, following a question on the user list, that the Version-based 
API may not be very intuitive to users, especially those who don't care about 
versioning, as well as very inconvenient. So there are two issues here:
1) How should one use Version smartly so that he keeps backwards compatibility. 
I think we all know the answer, but a Wiki page with some best practices tips 
would really help users use it.
2) How can one write sane code, which doesn't pass versions all over the place 
if: (1) he doesn't care about versions, or (2) he cares, and sets the Version 
to the same value in his app, in all places.

Also, I think that today we offer a flexibility to users, to set different 
Versions on different objects in the life span of their application - which is 
a good flexibility but can also lead people to shoot themselves in the legs if 
they're not careful -- e.g. upgrading Version across their app, but failing to 
do so for one or two places ...

So the change I'd like to propose is to mostly alleviate (2) and better protect 
users - I DO NOT PROPOSE TO GET RID OF Version :).

I was thinking that we can add on Version a DEFAULT version, which the caller 
can set. So Version.setDefault and Version.getDefault will be added, as static 
members (more on the static-ness of it later). We then change the API which 
requires Version to also expose an API which doesn't require it, and that API 
will call Version.getDefault(). People can use it if they want to ...

Few points:
1) As a default DEFAULT Version is controversial, I don't want to propose it, 
even though I think Lucene can define the DEFAULT to be the latest. Instead, I 
propose that Version.getDefault throw a DefaultVersionNotSetException if it 
wasn't set, while an API which relies on the default Version is called (I don't 
want to return null, not sure how safe it is).
2) That DEFAULT Version is static, which means it will affect all indexing code 
running inside the JVM. Which is fine:
2.1) Perhaps all the indexing code should use the same Version
2.2) If you know that's not the case, then pass Version to the API which 
requires it - you cannot use the 'default Version' API -- nothing changes for 
you.
One case is missing -- you might not know if your code is the only indexing 
code which runs in the JVM ... I don't have a solution to that, but I think 
it'll be revealed pretty quickly, and you can change your code then ...

So to summarize - the current Version API will remain and people can still use 
it. The DEFAULT Version API is meant for convenience for those who don't want 
to pass Version everywhere, for the reasons I outlined above. This will also 
clean our test code significantly, as the tests will set the DEFAULT version to 
TEST_VERSION_CURRENT at start ...

The changes to the Version class will be very simple.

If people think that's acceptable, I can open an issue and work on it.

Shai

RE: [jira] Account password

2010-04-13 Thread Uwe Schindler

LOL!

This user is assigned to very old bugzilla issues :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: j...@apache.org [mailto:j...@apache.org]
 Sent: Tuesday, April 13, 2010 10:54 PM
 To: java-dev@lucene.apache.org
 Subject: [jira] Account password
 
 
   You (or someone else) has reset your password.
 
 -
 
 Your password has been changed to: MCwqNr
 
 You can change your password here:
 
https://issues.apache.org/jira/secure/ViewProfile.jspa
 
 Here are the details of your account:
 -
 Username: java-dev@lucene.apache.org
Email: java-dev@lucene.apache.org
Full Name: Lucene Developers
 Password: MCwqNr
 (You can always retrieve these via the Forgot Password link on the
 signup page)
 --
 This message is automatically generated by JIRA.
 -
 If you think it was sent incorrectly contact one of the administrators:
 https://issues.apache.org/jira/secure/Administrators.jspa
 -
 For more information on JIRA, see:
 http://www.atlassian.com/software/jira
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: [jira] Account password

2010-04-13 Thread Uwe Schindler

I changed the password, so its no longer public.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Tuesday, April 13, 2010 11:59 PM
 To: java-dev@lucene.apache.org
 Subject: RE: [jira] Account password
 
 LOL!
 
 This user is assigned to very old bugzilla issues :-)
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
  -Original Message-
  From: j...@apache.org [mailto:j...@apache.org]
  Sent: Tuesday, April 13, 2010 10:54 PM
  To: java-dev@lucene.apache.org
  Subject: [jira] Account password
 
 
You (or someone else) has reset your password.
 
  -
 
  Your password has been changed to: MCwqNr
 
  You can change your password here:
 
 https://issues.apache.org/jira/secure/ViewProfile.jspa
 
  Here are the details of your account:
  -
  Username: java-dev@lucene.apache.org
 Email: java-dev@lucene.apache.org
 Full Name: Lucene Developers
  Password: MCwqNr
  (You can always retrieve these via the Forgot Password link on the
  signup page)
  --
  This message is automatically generated by JIRA.
  -
  If you think it was sent incorrectly contact one of the
 administrators:
  https://issues.apache.org/jira/secure/Administrators.jspa
  -
  For more information on JIRA, see:
  http://www.atlassian.com/software/jira
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: issues.apache.org compromised: please update your passwords

2010-04-13 Thread Uwe Schindler

Hi Grant,

It is that user, who is assigned to the very early JIRA issues, e.g.:
https://issues.apache.org/jira/browse/LUCENE-1

I changed the password of this user in response to that email (for security), 
but I think we should simply let infra remove it. The problem is, almost 
anybody can instruct JIRA to reset the password and let JIRA send it again to 
the email which is the public java-dev list. And then it is public again.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
 Ingersoll
 Sent: Wednesday, April 14, 2010 1:50 AM
 To: java-dev@lucene.apache.org
 Subject: Re: issues.apache.org compromised: please update your
 passwords
 
 FYI, this is for real.  Some have asked me if it is made up.  I don't
 know who owns that user, so we should ask on infra, I suspect.  Also,
 this applies to all  user accounts too on JIRA.
 
 On Apr 13, 2010, at 12:25 PM, r...@apache.org wrote:
 
  Dear Lucene Developers,
 
  You are receiving this email because you have a login, 'java-
 d...@lucene.apache.org', on the Apache JIRA installation,
 https://issues.apache.org/jira/
 
  On April 6 the issues.apache.org server was hacked. The attackers
 were able to install a trojan JIRA login screen and later get full root
 access:
 
  https://blogs.apache.org/infra/entry/apache_org_04_09_2010
 
  We are assuming that the attackers have a copy of the JIRA database,
 which includes a hash (SHA-512 unsalted) of the password
  you set when signing up as 'java-dev@lucene.apache.org' to JIRA. If
 the password you set was not of great quality (eg. based on a
 dictionary word), it
  should be assumed that the attackers can guess your password from the
 password hash via brute force.
 
  The upshot is that someone malicious may know both your email address
 and a password of yours.
 
  This is a problem because many people reuse passwords across online
 services. If you reuse passwords across systems, we urge you to change
  your passwords on ALL SYSTEMS that might be using the compromised
 JIRA password. Prime examples might be gmail or hotmail accounts,
 online
  banking sites, or sites known to be related to your email's domain,
 lucene.apache.org.
 
  Naturally we would also like you to reset your JIRA password. That
 can be done at:
 
 
 https://issues.apache.org/jira/secure/ForgotPassword!default.jspa?usern
 ame=java-...@lucene.apache.org
 
  We (the Apache JIRA administrators) sincerely apologize for this
 security breach. If you have any questions, please let us know by
 email.
  We are also available on the #asfinfra IRC channel on
 irc.freenode.net.
 
 
  Regards,
 
  The Apache Infrastructure Team
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

2010-04-11 Thread Uwe Schindler

Robert,

as the comment says, it’s a hack. How about simply adding a public getter 
method for the matchVersion  to the base class StopwordAwareAna?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Saturday, April 10, 2010 7:52 PM
 To: java-comm...@lucene.apache.org
 Subject: svn commit: r932773 -
 /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc
 hVersion.java
 
 Author: rmuir
 Date: Sat Apr 10 17:51:30 2010
 New Revision: 932773
 
 URL: http://svn.apache.org/viewvc?rev=932773view=rev
 Log:
 fix failing test, StdAnalyzer now stores this in its superclass
 
 Modified:
 
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 
 Modified:
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/
 solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277
 3view=diff
 ===
 ===
 ---
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java (original)
 +++
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java Sat Apr 10 17:51:30 2010
 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte
  tok = (StandardTokenizer) tsi.getTokenizer();
  assertFalse(tok.isReplaceInvalidAcronym());
 
 -// this is a hack to get the private matchVersion field in
 StandardAnalyzer, may break in later lucene versions - we have no
 getter :(
 -final Field matchVersionField =
 StandardAnalyzer.class.getDeclaredField(matchVersion);
 +// this is a hack to get the private matchVersion field in
 StandardAnalyzer's superclass, may break in later lucene versions - we
 have no getter :(
 +final Field matchVersionField =
 StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion)
 ;
  matchVersionField.setAccessible(true);
 
  type = schema.getFieldType(textStandardAnalyzerDefault);
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

2010-04-11 Thread Uwe Schindler

This is why i added the comment. But I forgot about it when I committed the 
lucene refactoring J So lets fix it with a simple getter!

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Sunday, April 11, 2010 11:47 AM
To: java-dev@lucene.apache.org
Subject: Re: svn commit: r932773 - 
/lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

 

I agree we should do something better, I do not like the way the test looks now 
(no offense) as it is prone to break... 

On Sun, Apr 11, 2010 at 5:39 AM, Uwe Schindler u...@thetaphi.de wrote:

Robert,

as the comment says, it’s a hack. How about simply adding a public getter 
method for the matchVersion  to the base class StopwordAwareAna?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Saturday, April 10, 2010 7:52 PM
 To: java-comm...@lucene.apache.org
 Subject: svn commit: r932773 -
 /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc
 hVersion.java

 Author: rmuir
 Date: Sat Apr 10 17:51:30 2010
 New Revision: 932773

 URL: http://svn.apache.org/viewvc?rev=932773 
 http://svn.apache.org/viewvc?rev=932773view=rev view=rev
 Log:
 fix failing test, StdAnalyzer now stores this in its superclass

 Modified:

 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java

 Modified:
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/
 solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277
 3view=diff
 ===
 ===
 ---
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java (original)
 +++
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java Sat Apr 10 17:51:30 2010
 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte
  tok = (StandardTokenizer) tsi.getTokenizer();
  assertFalse(tok.isReplaceInvalidAcronym());

 -// this is a hack to get the private matchVersion field in
 StandardAnalyzer, may break in later lucene versions - we have no
 getter :(
 -final Field matchVersionField =
 StandardAnalyzer.class.getDeclaredField(matchVersion);
 +// this is a hack to get the private matchVersion field in
 StandardAnalyzer's superclass, may break in later lucene versions - we
 have no getter :(
 +final Field matchVersionField =
 StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion)
 ;
  matchVersionField.setAccessible(true);

  type = schema.getFieldType(textStandardAnalyzerDefault);





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-- 
Robert Muir
rcm...@gmail.com

[jira] Resolved: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion


 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2389.
---

Resolution: Fixed

Committed revision: 932864

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch, LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers


 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154-Jakarta-BCEL.patch

Slightly improved patch to correctly work with CharTermAttribute (as it defines 
methods also defined by ProxyAttributeImpl as final, so override failure).

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-Jakarta-BCEL.patch, 
 LUCENE-2154-Jakarta-BCEL.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154-javassist.patch, LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855739#action_12855739
]

Uwe Schindler commented on LUCENE-2386:
---

I dont understand the whole issue, too.

For me it is perfectly fine, if I open an IndexWriter with create=true, that
the index is created empty first. This has the big advantage, that IndexReaders
can open it and will not fail with not found. OK this can be done by a commit
directly after creating, but for such code like create indexwriter with
create=true if not exist else append, this is more work to do.

The question is also, what happens if you call IndexWriter.getReader() without
the initial commit? Does this work with your patch?

For me this patch is to heavy for the small improvement, and its a behaviour
change and no real bug.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch

I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh
Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems
unnecessarily, and kind of brings back an autoCommit mode, in a strange way
... why do we need that commit? Do we really expect people to open an
IndexReader on an empty Directory which they just passed to an IW w/
create=true? If they want, they can simply call commit() right away on the IW
they created.
I ran into this when writing a test which committed N times, then compared
the number of commits (via IndexReader.listCommits) and was surprised to see
N+1 commits.
Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter
jumping on me .. so the change might not be that simple. But I think it's
manageable, so I'll try to attack it (and IFD specifically !) back :).

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855792#action_12855792
]

Uwe Schindler commented on LUCENE-2386:
---

Thanks Earwin, thats exactly my opinion, too. For me the whole behaviour is
defined and correct. The create param in the ctor is just an initialization of
the directory to be a defined index (empty at the beginning).

Maybe we should remove the create param from IndexWriter ctor/config at all,
and just define a static utility method in IW, that initializes an empty
directory. The standard ctors in IW then should thow IndexNotFound if the
directory is not yet initialized. This way, we dont need those strange create
params.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch, LUCENE-2386.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Updated patch, now also KeywordAnalyzer and PerFieldAnalyzerWrapper made final 
and the backwards layer removed.

I will commit this later this day and proceed with contrib. Robert, we should 
talk who does which one!

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Updated patch after last commit.

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855590#action_12855590
 ] 

Uwe Schindler commented on LUCENE-2372:
---

Committed core part in revision: 932749

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

Enforce TokenStream impl / Analyzer finalness by an assertion
-

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler


As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
on the decorator pattern. At least all TokenStream and Analyzer implementations 
in Lucene and Solr should be final.

The attached patch adds an assertion to the ctors of both classes that does the 
corresponding checks:
- Analyzers must be final or private classes or anonymous inner classes
- TokenStreams must be final or private classes or anonymous inner classes or 
have a final incrementToken()

I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion


 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Fix Version/s: 3.1

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion


 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Attachment: LUCENE-2389.patch

Patch.

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion


 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Attachment: LUCENE-2389.patch

Improved patch that also makes Analyzers with final (reusable)TokenStream() 
possible.

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch, LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Here a first patch for the core tokenstreams. Tests not yet changed.

The following things were additionally fixed:
- StandardAnalyzer was made final (backwards break, we forgot to made it final
in the 3.0 TS finalization issue). This enabled me to subclass
StopwordAnalyzerBase and remove heavy code duplication. The original code also
contained a bug in the tokenStream method (no setReplaceInvalidAcronym) which
was correctin reusableTokenStream. Now it is correct.

I will post further patches for core.

Replace deprecated TermAttribute by new CharTermAttribute
-

Key: LUCENE-2372
URL: https://issues.apache.org/jira/browse/LUCENE-2372
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: LUCENE-2372.patch

After LUCENE-2302 is merged to trunk with flex, we need to carry over all
tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
We should also think about adding a AttributeFactory that creates a subclass
of CharTermAttributeImpl that returns collation keys in toBytesRef()
accessor. CollationKeyFilter is then obsolete, instead you can simply convert
every TokenStream to indexing only CollationKeys by changing the attribute
implementation.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2302:
--

Attachment: LUCENE-2302-toString.patch

Patch that fixes the toString() problems in Token and adds missing CHANGES.txt, 
fixes backwards tests and updates javadocs to document the backwards break.

Deprecating Token should be done in another issue.

I will commit this soon, to be able to go forward with tokenstream conversion!

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2302.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed revision: 932369

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.


[ 
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855358#action_12855358
 ] 

Uwe Schindler commented on LUCENE-2364:
---

+1

Term is still used at a lot of places in internal code, but that can be changed 
easily. One of those places is MTQ :-)

 Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery  
 Co.
 -

 Key: LUCENE-2364
 URL: https://issues.apache.org/jira/browse/LUCENE-2364
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
 Fix For: 3.1


 It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery 
 (as both queries convert the strings to BytesRef internally). For 
 NumericRange support in Solr it will be needed to support numerics as ByteRef 
 in single-term queries.
 When this will be added, don't forget to change TestNumericRangeQueryXX to 
 use the BytesRef ctor of TRQ.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Patch that removes deprecated usage of TermAttribute from Lucene Core 
completely, all tests also fixed.

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Small updates.

Just one question: The only non-final Analyzer left is KeywordAnalyzer. If I
make it final and also use ReusableTokenizerBase, we can remove the
overridesTokenStream check completely? The question is, whoever wants to
override this class.

StandardAnalyzer was made final in this patch, why not also this one?

Replace deprecated TermAttribute by new CharTermAttribute
-

Key: LUCENE-2372
URL: https://issues.apache.org/jira/browse/LUCENE-2372
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
Fix For: 3.1

Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855493#action_12855493
 ] 

Uwe Schindler commented on LUCENE-2372:
---

Did it already for StandardAna (see patch).

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute


[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855498#action_12855498
 ] 

Uwe Schindler commented on LUCENE-2372:
---

One more: PerFieldAnalyzerWrapper :( - Sorry

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854882#action_12854882
 ] 

Uwe Schindler commented on LUCENE-2074:
---

As requested on the mailing list, I will look into resetting the zzBuffer on 
Tokenizer.reset(Reader).

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

[
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854886#action_12854886
]

Uwe Schindler commented on LUCENE-2074:
---

I plan to commit this soon! So any patch will get outdated, thats why i want to
fix this here. And as this patch removes direct access from the Tokenizer to
the lexer (as it is only accessible through an interface now), we have to
change the jflex file to do it correctly.

Use a separate JFlex generated Unicode 4 by Java 5 compatible
StandardTokenizer
---

Key: LUCENE-2074
URL: https://issues.apache.org/jira/browse/LUCENE-2074
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch,
LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch,
LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch,
LUCENE-2074.patch

The current trunk version of StandardTokenizerImpl was generated by Java 1.4
(according to the warning). In Java 3.0 we switch to Java 1.5, so we should
regenerate the file.
After regeneration the Tokenizer behaves different for some characters.
Because of that we should only use the new TokenizerImpl when
Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

[
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854890#action_12854890
]

Uwe Schindler commented on LUCENE-2074:
---

You dont need the jflex binaries in general, only if you reconstruct the source
files (using ant jflex). And its easy to generate, check out and start mvn
install.

Use a separate JFlex generated Unicode 4 by Java 5 compatible
StandardTokenizer
---

Key: LUCENE-2074
URL: https://issues.apache.org/jira/browse/LUCENE-2074
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

Here a new patch, with the zzBuffer reset to default implemented in a separate 
reset(Reader) method. As yyReset is generated as final, I had to change the 
name.

Before apply, run:

{noformat}
svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.* 
svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.* 
{noformat}

I will commit this in a day or two!

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

[
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

Updated also the error message about missing jflex when calling ant jflex to
regenerate the lexers. The message now contains instructions for downloading
and building JFlex. Also add CHANGES.txt.

Use a separate JFlex generated Unicode 4 by Java 5 compatible
StandardTokenizer
---

Key: LUCENE-2074
URL: https://issues.apache.org/jira/browse/LUCENE-2074
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: (was: LUCENE-2074.patch)

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
-

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


When indexing large documents, the lexer buffer may stay large forever. This 
sub-issue resets the lexer buffer back to the default on reset(Reader).

This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854900#action_12854900
 ] 

Uwe Schindler commented on LUCENE-2074:
---

Created sub-issue: LUCENE-2384

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.


[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854903#action_12854903
 ] 

Uwe Schindler commented on LUCENE-2384:
---

For JFlex this does not help as the Jflex-generated code always needs a Reader. 
This is special here, the lexer will not need to load the whole document into 
the reader, it only needs sometimes a large look forward/backwards buffer.

 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.


[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854908#action_12854908
 ] 

Uwe Schindler commented on LUCENE-2384:
---

{quote}
patch to reset the zzBuffer when the input is reseted. The code is really taken 
from 
https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com
 so I can't really grant license to use it but I think the guy realeased it as 
public domain by posting it to the mailing list. 
I tested it and it seems to work for me. Just including it here is case 
somebody want to apply the patch directly to 3.0.1 (although it's better to 
wait for 3.1)
{quote}

Your fix adds an addtional complexity. Just reset the buffer back to the 
default ZZ_BUFFERSIZE if grown on reset. Your patch always reallocates a new 
buffer.

Use this:
{code}
public final void reset(Reader r) {
  // reset to default buffer size, if buffer has grown
  if (zzBuffer.length  ZZ_BUFFERSIZE) {
zzBuffer = new char[ZZ_BUFFERSIZE];
  }
  yyreset(r);
}
{code}

 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: reset.diff


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core


[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855136#action_12855136
 ] 

Uwe Schindler commented on LUCENE-2385:
---

The patch does not look like you svn moved the files. To preserve history, you 
should do a svn move of the file in your local repository and then modify it 
to reflect the package changes (if any).

Did you do this?

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core


[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855150#action_12855150
 ] 

Uwe Schindler commented on LUCENE-2385:
---

In general we place a list of all svn move/copy command together with the 
patch, executeable from the root dir. If you paste those commands into your 
terminal and then apply the patch, it works. One example is the jflex issue 
(ok, the commands are shortened).

Another possibility is to have a second checkout, where you arrange the files 
correctly (svn moved/copied) and one for creating the patches.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core


[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855164#action_12855164
 ] 

Uwe Schindler commented on LUCENE-2385:
---

Yeah thats fine!

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: IndexWriter memory leak?

2010-04-08 Thread Uwe Schindler

There is one possibility, that could be fixed:

As Tokenizers are reused, the analyzer holds a reference to the last used 
Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this 
is the case for you, that may be easy to do. So Tokenizer.close() looks like 
this:

  /** By default, closes the input Reader. */
  @Override
  public void close() throws IOException {
input.close();
input = null; // -- new!
  }

Any comments from other committers?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Ruben Laguna [mailto:ruben.lag...@gmail.com]
 Sent: Thursday, April 08, 2010 2:50 PM
 To: java-u...@lucene.apache.org
 Subject: Re: IndexWriter memory leak?
 
 I will double check in the afternoon the heapdump.hprof. But I think
 that
 *some* readers are indeed held by
 docWriter.threadStates[0].consumer.fieldHash[1].fields[],
 as shown in [1] (this heapdump contains only live objects).  The
 heapdump
 was taken after IndexWriter.commit() /IndexWriter.optimize() and all
 the
 Documents were already indexed and GCed (I will double check).
 
 So that would mean that the Reader is retained in memory by the
 following
 chaing of references,
 
 DocumentsWriter - DocumentsWriterThreadState -
 DocFieldProcessorPerThread
 - DocFieldProcessorPerField - Fieldable - Field (fieldsData)
 
 I'll double check with Eclipse MAT as I said that this chain is
 actually
 made of hard references only (no SoftReferences,WeakReferences, etc). I
 will
 also double check also that there is no live Document that is
 referencing
 the Reader via the Field.
 
 
 [1] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg
 
 On Thu, Apr 8, 2010 at 2:16 PM, Uwe Schindler u...@thetaphi.de wrote:
 
  Readers are not held. If you indexed the document and gced the
 document
  instance they readers are gone.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Ruben Laguna [mailto:ruben.lag...@gmail.com]
   Sent: Thursday, April 08, 2010 1:28 PM
   To: java-u...@lucene.apache.org
   Subject: Re: IndexWriter memory leak?
  
   Now that the zzBuffer issue is solved...
  
   what about the references to the Readers held by docWriter. Tika´s
   ParsingReaders are quite heavyweight so retaining those in memory
   unnecesarily is also a hidden memory leak. Should I open a bug
 report
   on
   that one?
  
   /Rubén
  
   On Thu, Apr 8, 2010 at 12:11 PM, Shai Erera ser...@gmail.com
 wrote:
  
Guess we were replying at the same time :).
   
On Thu, Apr 8, 2010 at 1:04 PM, Uwe Schindler u...@thetaphi.de
   wrote:
   
 I already answered, that I will take care of this!

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Shai Erera [mailto:ser...@gmail.com]
  Sent: Thursday, April 08, 2010 12:00 PM
  To: java-u...@lucene.apache.org
  Subject: Re: IndexWriter memory leak?
 
  Yes, that's the trimBuffer version I was thinking about, only
   this guy
  created a reset(Reader, int) and does both ops (resetting +
 trim)
   in
  one
  method call. More convenient. Can you please open an issue to
   track
  that?
  People will have a chance to comment on whether we (Lucene)
   should
  handle
  that, or it should be a JFlex fix. Based on the number of
 replies
   this
  guy
  received (0 !), I doubt JFlex would consider it a problem.
 But we
   can
  do
  some small service to our users base by protecting against
 such
  problems.
 
  And while you're opening the issue, if you want to take a
 stab at
  fixing it
  and post a patch, it'd be great :).
 
  Shai
 
  On Thu, Apr 8, 2010 at 12:51 PM, Ruben Laguna
  ruben.lag...@gmail.comwrote:
 
   I was investigating this a little further and in the JFlex
   mailing
  list I
   found [1]
  
   I don't know much about flex / JFlex but it seems that this
 guy
  resets the
   zzBuffer to 16384 or less when setting the input for the
 lexer
  
  
   Quoted from  shef she...@ya...
  
  
   I set
  
   %buffer 0
  
   in the options section, and then added this method to the
   lexer:
  
  /**
   * Set the input for the lexer. The size parameter
 really
   speeds
  things
   up,
   * because by default, the lexer allocates an internal
   buffer of
  16k.
   For
   * most strings, this is unnecessarily large. If the
 size
   param is
   0 or greater
   * than 16k, then the buffer is set to 16k. If the size
   param is
   smaller, then
   * the buf will be set to the exact size

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

[
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

New patch with replacement of deprecated TermAttribute - CharTermAttribute. It
also fixes the reset()/reset(Reader) methods to be conform to all other
Tokenizers and the documentations. The current one was resetting multiple
times. This has no effect on backwards. Also improve the JFlex classpath
detection to work with svn checkouts or future release zips.

I will commit this soon when all tests ran.

Use a separate JFlex generated Unicode 4 by Java 5 compatible
StandardTokenizer
---

Key: LUCENE-2074
URL: https://issues.apache.org/jira/browse/LUCENE-2074
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space


[ 
https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854396#action_12854396
 ] 

Uwe Schindler commented on LUCENE-2376:
---

You mean insane amount of fields with norms...?

 java.lang.OutOfMemoryError:Java heap space
 --

 Key: LUCENE-2376
 URL: https://issues.apache.org/jira/browse/LUCENE-2376
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1
 Environment: Windows
Reporter: Shivender Devarakonda
 Attachments: InfoStreamOutput.txt


 I see an OutOfMemory error in our product and it is happening when we have 
 some data objects on which we built the index. I see the following 
 OutOfmemory error, this is happening after we call Indexwriter.optimize():
 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12]  In thread 
 Lucene Merge Thread #12 and the message is 
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: Java heap space
 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] 
 Uncaught Exception in thread Lucene Merge Thread #12
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.util.HashMap.resize(HashMap.java:462)
   at java.util.HashMap.addEntry(HashMap.java:755)
   at java.util.HashMap.put(HashMap.java:385)
   at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256)
   at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366)
   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
   at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
 4/06/10 02:03:42.895 PM PDT [ERROR]  this writer hit an OutOfMemoryError; 
 cannot complete optimize

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

[
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854594#action_12854594
]

Uwe Schindler commented on LUCENE-2380:
---

The structure should look like String and StringIndex, but I am not sure, if we
need real BytesRefs. In my opinion, it should be an array of byte[], where each
byte[] is allocated with the termsize from the enums BytesRef and copied over -
this is. This is no problem, as the terms need to be replicated either way, as
the BytesRef from the enum is reused. The only problem is that byte[] is mising
the cool bytesref methods like utf8ToString() that may be needed by consumers.

getStrings and getStringIndex should be deprecated. We cannot emulate them
using BytesRef.utf8ToString, as the String[] arrays are raw and allow no
wrapping. If FieldCache would use accessor methods and not raw arrays, we would
not have that problem...

Add FieldCache.getTermBytes, to load term data as byte[]

Key: LUCENE-2380
URL: https://issues.apache.org/jira/browse/LUCENE-2380
Project: Lucene - Java
Issue Type: Improvement
Reporter: Michael McCandless
Fix For: 3.1

With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode
string, but not necessarily), so we need to push this up the search stack.
FieldCache now has getStrings and getStringIndex; we need corresponding
methods to load terms as native byte[], since in general they may not be
representable as String. This should be quite a bit more RAM efficient too,
for US ascii content since each character would then use 1 byte not 2.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

[
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854639#action_12854639
]

Uwe Schindler commented on LUCENE-2380:
---

This goes again in the direction of not having arrays in FieldCache anymore,
but instead have accessor methods taking a docid and giving back the data
(possibly as a reference). So getBytes(docid) returns a reused BytesRef with
offset and length of the requested term. For native types we should also go
away from arrays and only provide accessor methods. Java is so fast and possiby
inlines the method call. So for native types we could also use a FloatBuffer or
ByteBuffer or whatever from java.nio.

Add FieldCache.getTermBytes, to load term data as byte[]

Key: LUCENE-2380
URL: https://issues.apache.org/jira/browse/LUCENE-2380
Project: Lucene - Java
Issue Type: Improvement
Reporter: Michael McCandless
Fix For: 3.1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2383) Some small fixes after the flex merge...


[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler commented on LUCENE-2383:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (int doc= target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...


[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:23 PM:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

  was (Author: thetaphi):
FCRF looks ok, I would only change the nextDoc() loop in the 
deletions-aware iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (int doc= target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?
  
 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...


[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:24 PM:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc)
return doc = NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return doc = NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

  was (Author: thetaphi):
FCRF looks ok, I would only change the nextDoc() loop in the 
deletions-aware iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?
  
 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Commit freeze in flex branch

2010-04-07 Thread Uwe Schindler

Thanks for praise! And also thanks to Mike for scanning 20K patch lines :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, April 07, 2010 10:13 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Commit freeze in flex branch
 
 Yes +1 to that -- thanks Uwe!!
 
 And thanks for the many other people who helped out on flex.  It's a
 big and exciting improvement :)
 
 Mike
 
 On Wed, Apr 7, 2010 at 4:11 PM, Michael Busch busch...@gmail.com
 wrote:
  Uwe, thanks for doing all the svn work!  Was a smooth transition!
 
   Michael
 
  On 4/6/10 12:27 PM, Uwe Schindler wrote:
 
  The freeze is over, we merged successfully.
 
  If you had a flex branch checked out:
   svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Tuesday, April 06, 2010 12:51 PM
  To: java-dev@lucene.apache.org
  Subject: Commit freeze in flex branch
 
  I am trying to reintegrate the flex branch into current trunk.
 After
  this has done, no more commits to flex! (after a reintegrate, the
 svn
  book says, that you should not touch the branch anymore) - Flex
  development can then proceed in trunk. It may happen that solr
  compilation/tests fail (because of recent changes in flex branch),
 I
  will fix this separately, so please do not complain, just let solr
  broken for a short time!
 
  It would be good if nobody would commit anything to flex anymore!
 After
  the merge, you can switch your flex checkouts.
 
  Before committing the merge, I will post a mega patch for review,
 that
  we have not missed anything during trunk-flex merges.
 
  Commits to trunk are OK, but should be spare.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
 
 
  ---
 --
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
  
 -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Commit freeze in flex branch

2010-04-06 Thread Uwe Schindler

I am trying to reintegrate the flex branch into current trunk. After this has 
done, no more commits to flex! (after a reintegrate, the svn book says, that 
you should not touch the branch anymore) - Flex development can then proceed in 
trunk. It may happen that solr compilation/tests fail (because of recent 
changes in flex branch), I will fix this separately, so please do not complain, 
just let solr broken for a short time!

It would be good if nobody would commit anything to flex anymore! After the 
merge, you can switch your flex checkouts.

Before committing the merge, I will post a mega patch for review, that we have 
not missed anything during trunk-flex merges.

Commits to trunk are OK, but should be spare.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2370) Reintegrate flex branch into trunk

Reintegrate flex branch into trunk
--

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


This issue is for reintegrating the flex branch into current trunk. I will post 
the patch here for review and commit, when all contributors to flex have 
reviewed the patch.

Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

[
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here the patch just for review!

You cannot really apply it as it does not contains changes that are simply svn
copied from flex (that are all new files added by flex). The idea behind this
patch is only that everybody working on flex should scroll through it and
verify that actually changed files are fine; e.g. we did not miss a change to
trunk in flex (such a missing merge would apply as a revert in the patch).

My working copy tests fine, only solr is not compiling anymore because of
recent changes in NumericUtils internal class that are non backwards
compatible. I will commit this patch before and break solr, but will fix it
soon!

Reintegrate flex branch into trunk
--

Key: LUCENE-2370
URL: https://issues.apache.org/jira/browse/LUCENE-2370
Project: Lucene - Java
Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

Attachments: LUCENE-2370.patch

This issue is for reintegrating the flex branch into current trunk. I will
post the patch here for review and commit, when all contributors to flex have
reviewed the patch.
Before committing, I will tag both trunk and flex.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: (was: LUCENE-2370.patch)

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

sorry, new patch.

The flex branch still contains some whitespace problems in contrib, but this is 
ok for now. I will check them and fix as far as i see.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here a new patch with lots of cleanups, thanks rmuir. Also reverted 
whitespace-only files

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370-solrfixes.patch

Here some fixes for Solr:
- makes it compile after flex merge
- has some really dirty hacks. Numeric field contents should no longer be seen 
as Strings, they are now BytesRefs. This affects AnalysisRequestHandler and 
also the converter methods in TrieField type. They should use BytesRefs after 
flex has landed.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

New patch, reverted all  files with whitespace-only changes.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here the final patch after cooperative reviewing in IRC. I will commit the 
merge now for Solr+Lucene.

The following points are still broken:
- DirectoryReader readded a bug (Mike McCandless knows)
- TestIndexWriterReader in trunk and backwards has some test commented out, 
they have to do with above problem

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2370) Reintegrate flex branch into trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854134#action_12854134
 ] 

Uwe Schindler commented on LUCENE-2370:
---

Committed revision: 931278

I leave the issue open until the bugs are fixed.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Commit freeze in flex branch

2010-04-06 Thread Uwe Schindler

The freeze is over, we merged successfully.

If you had a flex branch checked out:
 svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Tuesday, April 06, 2010 12:51 PM
 To: java-dev@lucene.apache.org
 Subject: Commit freeze in flex branch
 
 I am trying to reintegrate the flex branch into current trunk. After
 this has done, no more commits to flex! (after a reintegrate, the svn
 book says, that you should not touch the branch anymore) - Flex
 development can then proceed in trunk. It may happen that solr
 compilation/tests fail (because of recent changes in flex branch), I
 will fix this separately, so please do not complain, just let solr
 broken for a short time!
 
 It would be good if nobody would commit anything to flex anymore! After
 the merge, you can switch your flex checkouts.
 
 Before committing the merge, I will post a mega patch for review, that
 we have not missed anything during trunk-flex merges.
 
 Commits to trunk are OK, but should be spare.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2370) Reintegrate flex branch into trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2370.
---

Resolution: Fixed

Mike fixed the missing merges! Thanks.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Closed: (LUCENE-2332) Mrge CharTermAttribute and deprecations to trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-2332.
-

Resolution: Invalid

Flex was merged, so this is no longer needed.

 Mrge CharTermAttribute and deprecations to trunk
 

 Key: LUCENE-2332
 URL: https://issues.apache.org/jira/browse/LUCENE-2332
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 This should be merged to trunk until flex lands, so the analyzers can be 
 ported to new api.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

Replace deprecated TermAttribute by new CharTermAttribute
-

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1


After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
tokenizers and consumers of the TokenStreams to the new CharTermAttribute.

We should also think about adding a AttributeFactory that creates a subclass of 
CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. 
CollationKeyFilter is then obsolete, instead you can simply convert every 
TokenStream to indexing only CollationKeys by changing the attribute 
implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)


[ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854199#action_12854199
 ] 

Uwe Schindler commented on LUCENE-2302:
---

I will create a patch with option #2 and lots of documentation and changed 
backwards tests.

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl

Add introspection API to AttributeSource/AttributeImpl
--

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


AttributeSource/TokenStream inspection in Solr needs to have some insight into 
the contents of AttributeImpls. As LUCENE-2302 has some problems with 
toString() [which is not structured and conflicts with CharSequence's 
definition for CharTermAttribute], I propose an simple API that get a default 
implementation in AttributeImpl (just like toString() current):

- IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
iterator (for most attributes its a singleton) of a key-value pair, e.g. 
term-foobar,startOffset-Integer.valueOf(0),...
- AttributeSource gets the same method, it just concat the iterators of each 
getAttributeImplsIterator() AttributeImpl

No backwards problems occur, as the default toString() method will work like 
before (it just gets iterator and lists), but we simply remove the 
documentation for the format. (Char)TermAttribute gets a special impl fo 
toString() according to CharSequence and a corresponding iterator.

I also want to remove the abstract hashCode() and equals() methods from 
AttributeImpl, as they are not needed and just create work for the implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl

Add introspection API to AttributeSource/AttributeImpl
--

 Key: LUCENE-2375
 URL: https://issues.apache.org/jira/browse/LUCENE-2375
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


AttributeSource/TokenStream inspection in Solr needs to have some insight into 
the contents of AttributeImpls. As LUCENE-2302 has some problems with 
toString() [which is not structured and conflicts with CharSequence's 
definition for CharTermAttribute], I propose an simple API that get a default 
implementation in AttributeImpl (just like toString() current):

- IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
iterator (for most attributes its a singleton) of a key-value pair, e.g. 
term-foobar,startOffset-Integer.valueOf(0),...
- AttributeSource gets the same method, it just concat the iterators of each 
getAttributeImplsIterator() AttributeImpl

No backwards problems occur, as the default toString() method will work like 
before (it just gets iterator and lists), but we simply remove the 
documentation for the format. (Char)TermAttribute gets a special impl fo 
toString() according to CharSequence and a corresponding iterator.

I also want to remove the abstract hashCode() and equals() methods from 
AttributeImpl, as they are not needed and just create work for the implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Deleted: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl