[jira] Commented: (LUCENE-2399) Add support for ICU's Normalizer2

2010-04-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858104#action_12858104
 ] 

Uwe Schindler commented on LUCENE-2399:
---

Hurra! You used the StringBuilder as buffer to not create a new String instance 
each time and only need to copy the buffer. This could also be a good trick for 
the PatternReplaceFilter from Solr.

bq. i made this filter final, to avoid a ticket from the policeman. 

How did you get the filter through the assert statement without final? 
Strange...

 Add support for ICU's Normalizer2
 -

 Key: LUCENE-2399
 URL: https://issues.apache.org/jira/browse/LUCENE-2399
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-2399.patch, LUCENE-2399.patch


 While there are separate Case Folding, Normalization, and Ignorable-removal 
 filters in LUCENE-1488,
 the new ICU Normalizer2 API does this all at once with nfkc_cf (based on the 
 new NFKC_Casefold property in Unicode).
 This is great, because it provides a ton of unicode functionality that is 
 really needed.
 And the new Normalizer2 API takes CharSequence and writes to Appendable...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2399) Add support for ICU's Normalizer2

2010-04-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858108#action_12858108
 ] 

Uwe Schindler commented on LUCENE-2399:
---

I know, you were running the test without assertion from Eclipse! :-)

{noformat}
[junit] TokenStream implementation classes or at least their incrementToken() 
implementation must be final
[junit] junit.framework.AssertionFailedError: TokenStream implementation 
classes or at least their incrementToken() implementation must be final
[junit] at 
org.apache.lucene.analysis.TokenStream.assertFinal(TokenStream.java:117)
{noformat}

So for me the assertion worked. The *second* patch of course works with 
icu-4_4.jar! So great and I am happy about the cool interfaces at 
CharTermAttribute.

I just wanted to check that the my deputy sheriff did not miss something 
because of wrong instructions.

 Add support for ICU's Normalizer2
 -

 Key: LUCENE-2399
 URL: https://issues.apache.org/jira/browse/LUCENE-2399
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-2399.patch, LUCENE-2399.patch


 While there are separate Case Folding, Normalization, and Ignorable-removal 
 filters in LUCENE-1488,
 the new ICU Normalizer2 API does this all at once with nfkc_cf (based on the 
 new NFKC_Casefold property in Unicode).
 This is great, because it provides a ton of unicode functionality that is 
 really needed.
 And the new Normalizer2 API takes CharSequence and writes to Appendable...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: official GIT repository / switch to GIT?

2010-04-17 Thread Uwe Schindler
Hi,

In my opinion: Definitely NOT!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Thomas Koch [mailto:tho...@koch.ro]
 Sent: Saturday, April 17, 2010 9:21 AM
 To: solr-dev; java-dev@lucene.apache.org
 Subject: official GIT repository / switch to GIT?
 
 Hi,
 
 at least since august 2009 nobody has dared to ask this question, so
 let's
 start a flamewar:
 Don't you think, it's time for lucene and solr to switch to GIT?
 
 And now seriously:
 I did the last packaging of SOLR 1.4 for Debian and I intend to
 continue doing
 so. Since I'm doing the packaging in GIT, I'm asking myself, whether I
 should
 base the packaging GIT repository on the SOLR repo found at
 git.apache.org?
 However if the one from git.a.o is not stable and may crash at any
 given time,
 this would not be a good idea.
 And the best thing for those packagers like me would be of course, if
 the GIT
 repo would be the official one.
 
 And I wonder, if there are really people using SVN and downloading
 douzens of
 patch files from jira? Isn't it, that everybody already uses git-svn?
 
 Best regards,
 
 Thomas Koch, http://www.koch.ro
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
Add a scoring DistanceQuery that does not need caches and separate filters
--

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1


In a chat with Chris Male and my own ideas when implemnting for PANGAEA, I 
thought about the broken distance query in contrib. It lacks the folloing 
features:
- It needs a query for the encldoing bbox (which is constant score)
- It needs a separate filter for filtering out distances
- It has no scoring, so if somebody wants to sort by distance, he needs to use 
the custom sort. For that to work, spatial caches distance calculation (which 
is borken for multi-segment search)

The idea is now to combine all three things into one query, but customizeable:

We first thought about extending CustomScoreQuery and calculate the distance 
from FieldCache in the customScore method and return a score of 1 for 
distance=0, score=0 on the max distance and score0 for farer hits, that are in 
the bounding box but not in the distance circle. To filter out such negative 
scores, we would need to override the scorer in CustomScoreQuery which is 
priate.

My proposal is now to use a very stripped down CustomScoreQuery (but not extend 
it) that does call a method getDistance(docId) in its scorer's advance and 
nextDoc that calculates the distance for the current doc. It stores this 
distance also in the scorer. If the distance  maxDistance it throws away the 
hit and calls nextDoc() again. The score() method will reurn per default 
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
distance. So the distance is only calculated one time in nextDoc()/advance().

To be able to plug in custom scoring, the following methods in the query can be 
overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance - 
distance)/maxDistance; allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer to 
get the distance.

This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

A first idea of the Query, it does not even compile as some classes are missing 
(coming with Chris' later patches), but it shows how it should work and how its 
customizeable.

 Add a scoring DistanceQuery that does not need caches and separate filters
 --

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: DistanceQuery.java


 In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
 thought about the broken distance query in contrib. It lacks the following 
 features:
 - It needs a query/filter for the enclosing bbox (which is constant score)
 - It needs a separate filter for filtering out hits to far away (inside bbox 
 but outside distance limit)
 - It has no scoring, so if somebody wants to sort by distance, he needs to 
 use the custom sort. For that to work, spatial caches distance calculation 
 (which is broken for multi-segment search)
 The idea is now to combine all three things into one query, but customizeable:
 We first thought about extending CustomScoreQuery and calculate the distance 
 from FieldCache in the customScore method and return a score of 1 for 
 distance=0, score=0 on the max distance and score0 for farer hits, that are 
 in the bounding box but not in the distance circle. To filter out such 
 negative scores, we would need to override the scorer in CustomScoreQuery 
 which is priate.
 My proposal is now to use a very stripped down CustomScoreQuery (but not 
 extend it) that does call a method getDistance(docId) in its scorer's advance 
 and nextDoc that calculates the distance for the current doc. It stores this 
 distance also in the scorer. If the distance  maxDistance it throws away the 
 hit and calls nextDoc() again. The score() method will reurn per default 
 weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
 distance. So the distance is only calculated one time in nextDoc()/advance().
 To be able to plug in custom scoring, the following methods in the query can 
 be overridden:
 - float getDistanceScore(double distance) - returns per default: (maxDistance 
 - distance)/maxDistance; allows score customization
 - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
 DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
 NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
 wrap a Query with QueryWrapperFilter
 - support a setter for the GeoDistanceCalculator that is used by the scorer 
 to get the distance.
 - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
 for a given doc id the lat/lng. This method is called per IndexReader one 
 time in scorer creation and will retrieve the coordinates. By that we support 
 FieldCache or whatever.
 This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857384#action_12857384
 ] 

Uwe Schindler commented on LUCENE-2396:
---

Are you sure you want to use LUCENE_CURRENT in some ctors?

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857402#action_12857402
 ] 

Uwe Schindler commented on LUCENE-2396:
---

bq. Static? Weren't you against that!? 

He meant a static final! It is just to make the analyzers that depend on core 
stuff fix to a specific version. Until we have no more analyzers in core 
exspect Whitespace.

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Proposal about Version API relaxation

2010-04-15 Thread Uwe Schindler
Hi Earwin,

I am strongly +1 on this. I would also make the Release Manager for 3.1, if 
nobody else wants to do this. I would like to take the preflex tag or some 
revisions before (maybe without the IndexWriterConfig, which is a really new 
API) to be 3.1 branch. And after that port some of my post-flex-changes like 
the StandardTokenizer refactoring back (so we can produce the old analyzer 
still without Java 1.4).

So +1 on branching pre-flex and release as 3.1 soon. The Unicode improvements 
rectify a new release. I think also s1monw wants to have this.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Earwin Burrfoot [mailto:ear...@gmail.com]
 Sent: Thursday, April 15, 2010 8:15 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 I'd like to remind that Mike's proposal has stable branches.
 
 We can branch off preflex trunk right now and wrap it up as 3.1.
 Current trunk is declared as future 4.0 and all backcompat cruft is
 removed from it.
 If some new features/bugfixes appear in trunk, and they don't break
 stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
 etc
 
 Thus, devs are free to work without back-compat burden, bleeding edge
 users get their blood, conservative users get their stability + a
 subset of new features from stable branches.
 
 
 On Thu, Apr 15, 2010 at 22:02, DM Smith dmsmith...@gmail.com wrote:
  On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
 
  First, the index format. IMHO, it is a good thing for a major
 release to
  be
  able to read the prior major release's index. And the ability to
 convert
  it
  to the current format via optimize is also good. Whatever is
 decided on
  this
  thread should take this seriously.
 
 
  Optimize is a bad way to convert to current.
  1. conversion is not guaranteed, optimizing already optimized index
 is a
  noop
  2. it merges all your segments. if you use
 BalancedSegmentMergePolicy,
  that destroys your segment size distribution
 
  Dedicated upgrade tool (available both from command-line and
  programmatically) is a good way to convert to current.
  1. conversion happens exactly when you need it, conversion happens
 for
  sure, no additional checks needed
  2. it should leave all your segments as is, only changing their
 format
 
 
 
  It is my observation, though possibly not correct, that core only
 has
  rudimentary analysis capabilities, handling English very well. To
 handle
  other languages well contrib/analyzers is required. Until
 recently it
  did
  not get much love. There have been many bw compat breaking changes
  (though
  w/ version one can probably get the prior behavior). IMHO, most of
  contrib/analyzers should be core. My guess is that most non-trivial
  applications will use contrib/analyzers.
 
 
  I counter - most non-trivial applications will use their own
 analyzers.
  The more modules - the merrier. You can choose precisely what you
 need.
 
 
  By and large an analyzer is a simple wrapper for a tokenizer and some
  filters. Are you suggesting that most non-trivial apps write their
 own
  tokenizers and filters?
 
  I'd find that hard to believe. For example, I don't know enough
 Chinese,
  Farsi, Arabic, Polish, ... to come up with anything better than what
 Lucene
  has to tokenize, stem or filter these.
 
 
 
  Our user base are those with ancient,
  underpowered laptops in 3-rd world countries. On those machines it
 might
  take 10 minutes to create an index and during that time the machine
 is
  fairly unresponsive. There is no opportunity to do it in the
  background.
 
 
  Major Lucene releases (feature-wise, not version-wise) happen like
  once in a year, or year-and-a-half.
  Is it that hard for your users to wait ten minutes once a year?
 
 
   I said that was for one index. Multiply that times the number of
 books
  available (300+) and yes, it is too much to ask. Even if a small
 subset is
  indexed, say 30, that's around 5 hours of waiting.
 
  Under consideration is the frequency of breakage. Some are suggesting
 a
  greater frequency than yearly.
 
  DM
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 
 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Proposal about Version API relaxation

2010-04-15 Thread Uwe Schindler
I wish we could have a face to face talk like in the evenings at ApacheCon :(

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
 Ingersoll
 Sent: Thursday, April 15, 2010 9:46 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 From IRC:
 why do I get the feeling that everyone is in heated agreement on the
 Version thread?
 there are some cases that mean people will have to reindex
 in those cases, we should tell people they will have to reindex
 then they can decide to upgrade or not
 all other cases, just do the sensible thing and test first
 I have yet to meet anyone who simply drops a new version into
 production and says go
 
 So, as I said earlier, why don't we just move forward with it, strive
 to support reading X-1 index format in X and let the user know the
 cases in which they will have to re-index. If a migration tool is
 necessary, then someone can write it at the appropriate time.  Just as
 was said w/ the Solr merge, it's software.  If it doesn't work, we can
 change it.  Thank goodness we don't have a back compatibility policy
 for our policies!
 
 -Grant
 
 
 
 
 On Apr 15, 2010, at 3:35 PM, Michael McCandless wrote:
 
  Unfortunately, live searching against an old index can get very
 hairy.
  EG look at what I had to do for the flex API on pre-flex index flex
  emulation layer.
 
  It's also not great because it gives the illusion that all is good,
  yet, you've taken a silent hit (up to ~10% or so) in your search
  perf.
 
  Whereas building  maintaining a one-time index migration tool, in
  contrast, is much less work.
 
  I realize the migration tool has issues -- it fixes the hard changes
  but silently allows the soft changes to break (ie, your analyzers my
  not produce the same tokens, until we move all core analyzers outside
  of core, so they are separately versioned), but it seems like a good
  compromise here?
 
  Mike
 
  2010/4/15 Shai Erera ser...@gmail.com:
  The reason Earwin why online migration is faster is because when u
  finally need to *fully* migrate your index, most chances are that
 most
  of the segments are already on the newer format. Offline migration
  will just keep the application idle for some amount of time until
 ALL
  segments are migrated.
 
  During the lifecycle of the index, segments are merged anyway, so
  migrating them on the fly virtually costs nothing. At the end, when
 u
  upgrade to a Lucene version which doesn't support the previous index
  format, you'll on the worse case need to migrate few large segments
  which were never merged. I don't know how many of those there will
 be
  as it really depends on the application, but I'd bet this process
 will
  touch just a few segments. And hence, throughput wise it will be a
 lot
  faster.
 
  We should create a migrate() API on IW which will touch just those
  segments and not incur a full optimize. That API can also be used
 for
  an offline migration tool, if we decide that's what we want.
 
  Shai
 
  On Thursday, April 15, 2010, jm jmugur...@gmail.com wrote:
  Not sure if plain users are allowed/encouraged to post in this
 list,
  but wanted to mention (just an opinion from a happy user), as other
  users have, that not all of us can reindex just like that. It would
  not be 10 min for one of our installations for sure...
 
  First, i would need to implement some code to reindex, cause my
 source
  data is postprocessed/compressed/encrypted/moved after it arrives
 to
  the application, so I would need to retrieve all etc. And then
  reindexing it would take days.
  javier
 
  On Thu, Apr 15, 2010 at 9:04 PM, Earwin Burrfoot ear...@gmail.com
 wrote:
  BTW Earwin, we can come up w/ a migrate() method on IW to
 accomplish
  manual migration on the segments that are still on old versions.
  That's not the point about whether optimize() is good or not. It
 is
  the difference between telling the customer to run a 5-day
 migration
  process, or a couple of hours. At the end of the day, the same
  migration code will need to be written whether for the manual or
  automatic case. And probably by the same developer which changed
 the
  index format. It's the difference of when does it happen.
 
  Converting stuff is easier then emulating, that's exactly why I
 want a
  separate tool.
  There's no need to support cross-version merging, nor to emulate
 old APIs.
 
  I also don't understand why offline migration is going to take
 days
  instead of hours for online migration??
  WTF, it's gonna be even faster, as it doesn't have to merge
 things.
 
  --
  Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
  Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
  ICQ: 104465785
 
  --
 ---
  To unsubscribe, e-mail: java-dev-unsubscr

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

small updates to Chris' patches.

 Add a scoring DistanceQuery that does not need caches and separate filters
 --

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: DistanceQuery.java


 In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
 thought about the broken distance query in contrib. It lacks the following 
 features:
 - It needs a query/filter for the enclosing bbox (which is constant score)
 - It needs a separate filter for filtering out hits to far away (inside bbox 
 but outside distance limit)
 - It has no scoring, so if somebody wants to sort by distance, he needs to 
 use the custom sort. For that to work, spatial caches distance calculation 
 (which is broken for multi-segment search)
 The idea is now to combine all three things into one query, but customizeable:
 We first thought about extending CustomScoreQuery and calculate the distance 
 from FieldCache in the customScore method and return a score of 1 for 
 distance=0, score=0 on the max distance and score0 for farer hits, that are 
 in the bounding box but not in the distance circle. To filter out such 
 negative scores, we would need to override the scorer in CustomScoreQuery 
 which is priate.
 My proposal is now to use a very stripped down CustomScoreQuery (but not 
 extend it) that does call a method getDistance(docId) in its scorer's advance 
 and nextDoc that calculates the distance for the current doc. It stores this 
 distance also in the scorer. If the distance  maxDistance it throws away the 
 hit and calls nextDoc() again. The score() method will reurn per default 
 weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
 distance. So the distance is only calculated one time in nextDoc()/advance().
 To be able to plug in custom scoring, the following methods in the query can 
 be overridden:
 - float getDistanceScore(double distance) - returns per default: (maxDistance 
 - distance)/maxDistance; allows score customization
 - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
 DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
 NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
 wrap a Query with QueryWrapperFilter
 - support a setter for the GeoDistanceCalculator that is used by the scorer 
 to get the distance.
 - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
 for a given doc id the lat/lng. This method is called per IndexReader one 
 time in scorer creation and will retrieve the coordinates. By that we support 
 FieldCache or whatever.
 This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2395:
--

Attachment: (was: DistanceQuery.java)

 Add a scoring DistanceQuery that does not need caches and separate filters
 --

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: DistanceQuery.java


 In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
 thought about the broken distance query in contrib. It lacks the following 
 features:
 - It needs a query/filter for the enclosing bbox (which is constant score)
 - It needs a separate filter for filtering out hits to far away (inside bbox 
 but outside distance limit)
 - It has no scoring, so if somebody wants to sort by distance, he needs to 
 use the custom sort. For that to work, spatial caches distance calculation 
 (which is broken for multi-segment search)
 The idea is now to combine all three things into one query, but customizeable:
 We first thought about extending CustomScoreQuery and calculate the distance 
 from FieldCache in the customScore method and return a score of 1 for 
 distance=0, score=0 on the max distance and score0 for farer hits, that are 
 in the bounding box but not in the distance circle. To filter out such 
 negative scores, we would need to override the scorer in CustomScoreQuery 
 which is priate.
 My proposal is now to use a very stripped down CustomScoreQuery (but not 
 extend it) that does call a method getDistance(docId) in its scorer's advance 
 and nextDoc that calculates the distance for the current doc. It stores this 
 distance also in the scorer. If the distance  maxDistance it throws away the 
 hit and calls nextDoc() again. The score() method will reurn per default 
 weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
 distance. So the distance is only calculated one time in nextDoc()/advance().
 To be able to plug in custom scoring, the following methods in the query can 
 be overridden:
 - float getDistanceScore(double distance) - returns per default: (maxDistance 
 - distance)/maxDistance; allows score customization
 - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
 DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
 NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
 wrap a Query with QueryWrapperFilter
 - support a setter for the GeoDistanceCalculator that is used by the scorer 
 to get the distance.
 - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
 for a given doc id the lat/lng. This method is called per IndexReader one 
 time in scorer creation and will retrieve the coordinates. By that we support 
 FieldCache or whatever.
 This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2395:
--

Attachment: DistanceQuery.java

Added Weight.explain() and fixed a missing replacement.

 Add a scoring DistanceQuery that does not need caches and separate filters
 --

 Key: LUCENE-2395
 URL: https://issues.apache.org/jira/browse/LUCENE-2395
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: DistanceQuery.java, DistanceQuery.java


 In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
 thought about the broken distance query in contrib. It lacks the following 
 features:
 - It needs a query/filter for the enclosing bbox (which is constant score)
 - It needs a separate filter for filtering out hits to far away (inside bbox 
 but outside distance limit)
 - It has no scoring, so if somebody wants to sort by distance, he needs to 
 use the custom sort. For that to work, spatial caches distance calculation 
 (which is broken for multi-segment search)
 The idea is now to combine all three things into one query, but customizeable:
 We first thought about extending CustomScoreQuery and calculate the distance 
 from FieldCache in the customScore method and return a score of 1 for 
 distance=0, score=0 on the max distance and score0 for farer hits, that are 
 in the bounding box but not in the distance circle. To filter out such 
 negative scores, we would need to override the scorer in CustomScoreQuery 
 which is priate.
 My proposal is now to use a very stripped down CustomScoreQuery (but not 
 extend it) that does call a method getDistance(docId) in its scorer's advance 
 and nextDoc that calculates the distance for the current doc. It stores this 
 distance also in the scorer. If the distance  maxDistance it throws away the 
 hit and calls nextDoc() again. The score() method will reurn per default 
 weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
 distance. So the distance is only calculated one time in nextDoc()/advance().
 To be able to plug in custom scoring, the following methods in the query can 
 be overridden:
 - float getDistanceScore(double distance) - returns per default: (maxDistance 
 - distance)/maxDistance; allows score customization
 - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
 DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
 NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
 wrap a Query with QueryWrapperFilter
 - support a setter for the GeoDistanceCalculator that is used by the scorer 
 to get the distance.
 - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
 for a given doc id the lat/lng. This method is called per IndexReader one 
 time in scorer creation and will retrieve the coordinates. By that we support 
 FieldCache or whatever.
 This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: issues.apache.org compromised: please update your passwords

2010-04-14 Thread Uwe Schindler
   Hi Grant,
  
It is that user, who is assigned to the very early JIRA issues,
 e.g.:
https://issues.apache.org/jira/browse/LUCENE-1
  
I changed the password of this user in response to that email (for
 security), but I think we should simply let infra remove it. The
 problem is, almost anybody can instruct JIRA to reset the password and
 let JIRA send it again to the email which is the public java-dev
 list. And then it is public again.
 
  If the user is still needed (for whatever reason) maybe the user can
  be disabled, or maybe they can be removed from the list of users who
  have update access to the JIRA.
 
  But so long as the user is not an administrator, then it's no
  different really from any other account that can be created by Joe
  Public.
 
 Yes, that account has no special access. If someone wants to unassign
 the 319
 issues this user is the 'assignee' of, then the account can be deleted:
 
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/order=
 ASCsorter/field=priorityassignee=java-
 dev%40lucene.apache.orgreset=trueassigneeSelect=specificusermode=hid
 e
 

I disabled the account by assigning a dummy eMail and gave it a random password.

I was not able to unassign the issues, as most issues were Closed, where no 
modifications can be done anymore. Reopening and changing assignment and 
reverting to closed is too risky, as after reopening you don’t know anymore 
which issues you need to revert to closed after unassignment...

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Proposal about Version API relaxation

2010-04-14 Thread Uwe Schindler
+1, Thanks for this detailed explanation! In my apps I have no problem to 
define a static default myself. And passing this to every ctor is easy, so 
where is the problem? Look at solr, since we introduced the version param to 
solrconfig, you have exactly that behavior, but its limited to this solr 
installation using this solr config. And you can still override.

Lucene is a library, no application, so it's not in lucene's responsibility to 
handle such things. Configuration and configuration objects passing around is 
an application responsibility.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Wednesday, April 14, 2010 6:58 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Proposal about Version API relaxation
 
 On 04/14/2010 12:29 PM, Marvin Humphrey wrote:
  On Wed, Apr 14, 2010 at 08:30:14AM -0400, Grant Ingersoll wrote:
 
  The thing I keep going back to is that somehow Lucene has managed
 for years
  (and I mean lots of years) w/o stuff like Version and all this
 massive back
  compatibility checking.
 
  Non-constant global variables are an anti-pattern.
 
 
 I think clinging to such rules in the face of all situations is an
 anti-pattern :) I take it as a rule of thumb.
 
 In regards to this discussion:
 
 I agree that the Version stuff is a bit of a mess. I also agree that
 many users will want to just use one version across their app that is
 easy to change.
 
 I disagree that we should allow that behavior by just using a
 constructor without the Version param - or that you would be forced to
 set the static Version setting by trying to run your app and seeing an
 exception happen. That is all a bit ugly.
 
 Too many users will not understand Version or care to if they see they
 can skip passing it. IMO, you should have to specify that you are
 looking for this behavior. In which case, why not just specify it using
 the version param itself :) E.g. if a user wants to get this kind of
 static behavior, they can just choose to do it on their own, and pass
 their *own* static Version constant to all the constructors.
 
 I don't think we need to go through this hassle and introduce a less
 than ideal solution just so that users can pass one less param -
 especially when I think you should explicitly choose this behavior
 rather than get it by ignoring the Version param.
 
 --
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Proposal about Version API relaxation

2010-04-14 Thread Uwe Schindler
 And 2.9's backwards compatibility layer in
 TokenStream
 was significantly slower.

I protest! No, it was not slower, only at the beginning because of missing 
reflection caching! But this also affected the *new* API. With 2.9.x and old 
TokenStreams there is no speed difference, really.

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Proposal about Version API relaxation

2010-04-13 Thread Uwe Schindler
Hi Shai,

 

one of the problem I have is: That is a static default! We want to get rid of 
them (and did it mostly, only some relicts remain), so there are no plans to 
reimplement such a thing again. The badest one is BooleanQuery.maxClauseCount. 
The same applies to all types of sysprops. As Lucene and solr is mostly running 
in servlet containers, this type of thing  makes web applications no longer 
isolated. This is also a general contract for libraries: never ever rely on 
sysprops or statics.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Shai Erera [mailto:ser...@gmail.com] 
Sent: Tuesday, April 13, 2010 5:27 PM
To: java-dev@lucene.apache.org
Subject: Proposal about Version API relaxation

 

Hi

I'd like to propose a relaxation on the Version API. Uwe, please read the 
entire email before you reply :).

I was thinking, following a question on the user list, that the Version-based 
API may not be very intuitive to users, especially those who don't care about 
versioning, as well as very inconvenient. So there are two issues here:
1) How should one use Version smartly so that he keeps backwards compatibility. 
I think we all know the answer, but a Wiki page with some best practices tips 
would really help users use it.
2) How can one write sane code, which doesn't pass versions all over the place 
if: (1) he doesn't care about versions, or (2) he cares, and sets the Version 
to the same value in his app, in all places.

Also, I think that today we offer a flexibility to users, to set different 
Versions on different objects in the life span of their application - which is 
a good flexibility but can also lead people to shoot themselves in the legs if 
they're not careful -- e.g. upgrading Version across their app, but failing to 
do so for one or two places ...

So the change I'd like to propose is to mostly alleviate (2) and better protect 
users - I DO NOT PROPOSE TO GET RID OF Version :).

I was thinking that we can add on Version a DEFAULT version, which the caller 
can set. So Version.setDefault and Version.getDefault will be added, as static 
members (more on the static-ness of it later). We then change the API which 
requires Version to also expose an API which doesn't require it, and that API 
will call Version.getDefault(). People can use it if they want to ...

Few points:
1) As a default DEFAULT Version is controversial, I don't want to propose it, 
even though I think Lucene can define the DEFAULT to be the latest. Instead, I 
propose that Version.getDefault throw a DefaultVersionNotSetException if it 
wasn't set, while an API which relies on the default Version is called (I don't 
want to return null, not sure how safe it is).
2) That DEFAULT Version is static, which means it will affect all indexing code 
running inside the JVM. Which is fine:
2.1) Perhaps all the indexing code should use the same Version
2.2) If you know that's not the case, then pass Version to the API which 
requires it - you cannot use the 'default Version' API -- nothing changes for 
you.
One case is missing -- you might not know if your code is the only indexing 
code which runs in the JVM ... I don't have a solution to that, but I think 
it'll be revealed pretty quickly, and you can change your code then ...

So to summarize - the current Version API will remain and people can still use 
it. The DEFAULT Version API is meant for convenience for those who don't want 
to pass Version everywhere, for the reasons I outlined above. This will also 
clean our test code significantly, as the tests will set the DEFAULT version to 
TEST_VERSION_CURRENT at start ...

The changes to the Version class will be very simple.

If people think that's acceptable, I can open an issue and work on it.

Shai



RE: [jira] Account password

2010-04-13 Thread Uwe Schindler
LOL!

This user is assigned to very old bugzilla issues :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: j...@apache.org [mailto:j...@apache.org]
 Sent: Tuesday, April 13, 2010 10:54 PM
 To: java-dev@lucene.apache.org
 Subject: [jira] Account password
 
 
   You (or someone else) has reset your password.
 
 -
 
 Your password has been changed to: MCwqNr
 
 You can change your password here:
 
https://issues.apache.org/jira/secure/ViewProfile.jspa
 
 Here are the details of your account:
 -
 Username: java-dev@lucene.apache.org
Email: java-dev@lucene.apache.org
Full Name: Lucene Developers
 Password: MCwqNr
 (You can always retrieve these via the Forgot Password link on the
 signup page)
 --
 This message is automatically generated by JIRA.
 -
 If you think it was sent incorrectly contact one of the administrators:
 https://issues.apache.org/jira/secure/Administrators.jspa
 -
 For more information on JIRA, see:
 http://www.atlassian.com/software/jira
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [jira] Account password

2010-04-13 Thread Uwe Schindler
I changed the password, so its no longer public.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Tuesday, April 13, 2010 11:59 PM
 To: java-dev@lucene.apache.org
 Subject: RE: [jira] Account password
 
 LOL!
 
 This user is assigned to very old bugzilla issues :-)
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
  -Original Message-
  From: j...@apache.org [mailto:j...@apache.org]
  Sent: Tuesday, April 13, 2010 10:54 PM
  To: java-dev@lucene.apache.org
  Subject: [jira] Account password
 
 
You (or someone else) has reset your password.
 
  -
 
  Your password has been changed to: MCwqNr
 
  You can change your password here:
 
 https://issues.apache.org/jira/secure/ViewProfile.jspa
 
  Here are the details of your account:
  -
  Username: java-dev@lucene.apache.org
 Email: java-dev@lucene.apache.org
 Full Name: Lucene Developers
  Password: MCwqNr
  (You can always retrieve these via the Forgot Password link on the
  signup page)
  --
  This message is automatically generated by JIRA.
  -
  If you think it was sent incorrectly contact one of the
 administrators:
  https://issues.apache.org/jira/secure/Administrators.jspa
  -
  For more information on JIRA, see:
  http://www.atlassian.com/software/jira
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: issues.apache.org compromised: please update your passwords

2010-04-13 Thread Uwe Schindler
Hi Grant,

It is that user, who is assigned to the very early JIRA issues, e.g.:
https://issues.apache.org/jira/browse/LUCENE-1

I changed the password of this user in response to that email (for security), 
but I think we should simply let infra remove it. The problem is, almost 
anybody can instruct JIRA to reset the password and let JIRA send it again to 
the email which is the public java-dev list. And then it is public again.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
 Ingersoll
 Sent: Wednesday, April 14, 2010 1:50 AM
 To: java-dev@lucene.apache.org
 Subject: Re: issues.apache.org compromised: please update your
 passwords
 
 FYI, this is for real.  Some have asked me if it is made up.  I don't
 know who owns that user, so we should ask on infra, I suspect.  Also,
 this applies to all  user accounts too on JIRA.
 
 On Apr 13, 2010, at 12:25 PM, r...@apache.org wrote:
 
  Dear Lucene Developers,
 
  You are receiving this email because you have a login, 'java-
 d...@lucene.apache.org', on the Apache JIRA installation,
 https://issues.apache.org/jira/
 
  On April 6 the issues.apache.org server was hacked. The attackers
 were able to install a trojan JIRA login screen and later get full root
 access:
 
  https://blogs.apache.org/infra/entry/apache_org_04_09_2010
 
  We are assuming that the attackers have a copy of the JIRA database,
 which includes a hash (SHA-512 unsalted) of the password
  you set when signing up as 'java-dev@lucene.apache.org' to JIRA. If
 the password you set was not of great quality (eg. based on a
 dictionary word), it
  should be assumed that the attackers can guess your password from the
 password hash via brute force.
 
  The upshot is that someone malicious may know both your email address
 and a password of yours.
 
  This is a problem because many people reuse passwords across online
 services. If you reuse passwords across systems, we urge you to change
  your passwords on ALL SYSTEMS that might be using the compromised
 JIRA password. Prime examples might be gmail or hotmail accounts,
 online
  banking sites, or sites known to be related to your email's domain,
 lucene.apache.org.
 
  Naturally we would also like you to reset your JIRA password. That
 can be done at:
 
 
 https://issues.apache.org/jira/secure/ForgotPassword!default.jspa?usern
 ame=java-...@lucene.apache.org
 
  We (the Apache JIRA administrators) sincerely apologize for this
 security breach. If you have any questions, please let us know by
 email.
  We are also available on the #asfinfra IRC channel on
 irc.freenode.net.
 
 
  Regards,
 
  The Apache Infrastructure Team
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

2010-04-11 Thread Uwe Schindler
Robert,

as the comment says, it’s a hack. How about simply adding a public getter 
method for the matchVersion  to the base class StopwordAwareAna?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Saturday, April 10, 2010 7:52 PM
 To: java-comm...@lucene.apache.org
 Subject: svn commit: r932773 -
 /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc
 hVersion.java
 
 Author: rmuir
 Date: Sat Apr 10 17:51:30 2010
 New Revision: 932773
 
 URL: http://svn.apache.org/viewvc?rev=932773view=rev
 Log:
 fix failing test, StdAnalyzer now stores this in its superclass
 
 Modified:
 
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 
 Modified:
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/
 solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277
 3view=diff
 ===
 ===
 ---
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java (original)
 +++
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java Sat Apr 10 17:51:30 2010
 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte
  tok = (StandardTokenizer) tsi.getTokenizer();
  assertFalse(tok.isReplaceInvalidAcronym());
 
 -// this is a hack to get the private matchVersion field in
 StandardAnalyzer, may break in later lucene versions - we have no
 getter :(
 -final Field matchVersionField =
 StandardAnalyzer.class.getDeclaredField(matchVersion);
 +// this is a hack to get the private matchVersion field in
 StandardAnalyzer's superclass, may break in later lucene versions - we
 have no getter :(
 +final Field matchVersionField =
 StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion)
 ;
  matchVersionField.setAccessible(true);
 
  type = schema.getFieldType(textStandardAnalyzerDefault);
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

2010-04-11 Thread Uwe Schindler
This is why i added the comment. But I forgot about it when I committed the 
lucene refactoring J So lets fix it with a simple getter!

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Sunday, April 11, 2010 11:47 AM
To: java-dev@lucene.apache.org
Subject: Re: svn commit: r932773 - 
/lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java

 

I agree we should do something better, I do not like the way the test looks now 
(no offense) as it is prone to break... 

On Sun, Apr 11, 2010 at 5:39 AM, Uwe Schindler u...@thetaphi.de wrote:

Robert,

as the comment says, it’s a hack. How about simply adding a public getter 
method for the matchVersion  to the base class StopwordAwareAna?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Saturday, April 10, 2010 7:52 PM
 To: java-comm...@lucene.apache.org
 Subject: svn commit: r932773 -
 /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc
 hVersion.java

 Author: rmuir
 Date: Sat Apr 10 17:51:30 2010
 New Revision: 932773

 URL: http://svn.apache.org/viewvc?rev=932773 
 http://svn.apache.org/viewvc?rev=932773view=rev view=rev
 Log:
 fix failing test, StdAnalyzer now stores this in its superclass

 Modified:

 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java

 Modified:
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/
 solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277
 3view=diff
 ===
 ===
 ---
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java (original)
 +++
 lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch
 Version.java Sat Apr 10 17:51:30 2010
 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte
  tok = (StandardTokenizer) tsi.getTokenizer();
  assertFalse(tok.isReplaceInvalidAcronym());

 -// this is a hack to get the private matchVersion field in
 StandardAnalyzer, may break in later lucene versions - we have no
 getter :(
 -final Field matchVersionField =
 StandardAnalyzer.class.getDeclaredField(matchVersion);
 +// this is a hack to get the private matchVersion field in
 StandardAnalyzer's superclass, may break in later lucene versions - we
 have no getter :(
 +final Field matchVersionField =
 StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion)
 ;
  matchVersionField.setAccessible(true);

  type = schema.getFieldType(textStandardAnalyzerDefault);





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-- 
Robert Muir
rcm...@gmail.com



[jira] Resolved: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

2010-04-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2389.
---

Resolution: Fixed

Committed revision: 932864

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch, LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-04-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154-Jakarta-BCEL.patch

Slightly improved patch to correctly work with CharTermAttribute (as it defines 
methods also defined by ProxyAttributeImpl as final, so override failure).

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-Jakarta-BCEL.patch, 
 LUCENE-2154-Jakarta-BCEL.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154-javassist.patch, LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855739#action_12855739
 ] 

Uwe Schindler commented on LUCENE-2386:
---

I dont understand the whole issue, too.

For me it is perfectly fine, if I open an IndexWriter with create=true, that 
the index is created empty first. This has the big advantage, that IndexReaders 
can open it and will not fail with not found. OK this can be done by a commit 
directly after creating, but for such code like create indexwriter with 
create=true if not exist else append, this is more work to do.

The question is also, what happens if you call IndexWriter.getReader() without 
the initial commit? Does this work with your patch?

For me this patch is to heavy for the small improvement, and its a behaviour 
change and no real bug.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855792#action_12855792
 ] 

Uwe Schindler commented on LUCENE-2386:
---

Thanks Earwin, thats exactly my opinion, too. For me the whole behaviour is 
defined and correct. The create param in the ctor is just an initialization of 
the directory to be a defined index (empty at the beginning).

Maybe we should remove the create param from IndexWriter ctor/config at all, 
and just define a static utility method in IW, that initializes an empty 
directory. The standard ctors in IW then should thow IndexNotFound if the 
directory is not yet initialized. This way, we dont need those strange create 
params.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Updated patch, now also KeywordAnalyzer and PerFieldAnalyzerWrapper made final 
and the backwards layer removed.

I will commit this later this day and proceed with contrib. Robert, we should 
talk who does which one!

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Updated patch after last commit.

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855590#action_12855590
 ] 

Uwe Schindler commented on LUCENE-2372:
---

Committed core part in revision: 932749

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, 
 LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

2010-04-10 Thread Uwe Schindler (JIRA)
Enforce TokenStream impl / Analyzer finalness by an assertion
-

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler


As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
on the decorator pattern. At least all TokenStream and Analyzer implementations 
in Lucene and Solr should be final.

The attached patch adds an assertion to the ctors of both classes that does the 
corresponding checks:
- Analyzers must be final or private classes or anonymous inner classes
- TokenStreams must be final or private classes or anonymous inner classes or 
have a final incrementToken()

I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

2010-04-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Fix Version/s: 3.1

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

2010-04-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Attachment: LUCENE-2389.patch

Patch.

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion

2010-04-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2389:
--

Attachment: LUCENE-2389.patch

Improved patch that also makes Analyzers with final (reusable)TokenStream() 
possible.

 Enforce TokenStream impl / Analyzer finalness by an assertion
 -

 Key: LUCENE-2389
 URL: https://issues.apache.org/jira/browse/LUCENE-2389
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2389.patch, LUCENE-2389.patch


 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based 
 on the decorator pattern. At least all TokenStream and Analyzer 
 implementations in Lucene and Solr should be final.
 The attached patch adds an assertion to the ctors of both classes that does 
 the corresponding checks:
 - Analyzers must be final or private classes or anonymous inner classes
 - TokenStreams must be final or private classes or anonymous inner classes or 
 have a final incrementToken()
 I will commit this after robert have fixed solr streams.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Here a first patch for the core tokenstreams. Tests not yet changed.

The following things were additionally fixed:
- StandardAnalyzer was made final (backwards break, we forgot to made it final 
in the 3.0 TS finalization issue). This enabled me to subclass 
StopwordAnalyzerBase and remove heavy code duplication. The original code also 
contained a bug in the tokenStream method (no setReplaceInvalidAcronym) which 
was correctin reusableTokenStream. Now it is correct.

I will post further patches for core.

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

2010-04-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2302:
--

Attachment: LUCENE-2302-toString.patch

Patch that fixes the toString() problems in Token and adds missing CHANGES.txt, 
fixes backwards tests and updates javadocs to document the backwards break.

Deprecating Token should be done in another issue.

I will commit this soon, to be able to go forward with tokenstream conversion!

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

2010-04-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2302.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed revision: 932369

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855358#action_12855358
 ] 

Uwe Schindler commented on LUCENE-2364:
---

+1

Term is still used at a lot of places in internal code, but that can be changed 
easily. One of those places is MTQ :-)

 Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery  
 Co.
 -

 Key: LUCENE-2364
 URL: https://issues.apache.org/jira/browse/LUCENE-2364
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
 Fix For: 3.1


 It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery 
 (as both queries convert the strings to BytesRef internally). For 
 NumericRange support in Solr it will be needed to support numerics as ByteRef 
 in single-term queries.
 When this will be added, don't forget to change TestNumericRangeQueryXX to 
 use the BytesRef ctor of TRQ.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Patch that removes deprecated usage of TermAttribute from Lucene Core 
completely, all tests also fixed.

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2372:
--

Attachment: LUCENE-2372.patch

Small updates.

Just one question: The only non-final Analyzer left is KeywordAnalyzer. If I 
make it final and also use ReusableTokenizerBase, we can remove the 
overridesTokenStream check completely? The question is, whoever wants to 
override this class.

StandardAnalyzer was made final in this patch, why not also this one?

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855493#action_12855493
 ] 

Uwe Schindler commented on LUCENE-2372:
---

Did it already for StandardAna (see patch).

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855498#action_12855498
 ] 

Uwe Schindler commented on LUCENE-2372:
---

One more: PerFieldAnalyzerWrapper :( - Sorry

 Replace deprecated TermAttribute by new CharTermAttribute
 -

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch


 After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
 tokenizers and consumers of the TokenStreams to the new CharTermAttribute.
 We should also think about adding a AttributeFactory that creates a subclass 
 of CharTermAttributeImpl that returns collation keys in toBytesRef() 
 accessor. CollationKeyFilter is then obsolete, instead you can simply convert 
 every TokenStream to indexing only CollationKeys by changing the attribute 
 implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854882#action_12854882
 ] 

Uwe Schindler commented on LUCENE-2074:
---

As requested on the mailing list, I will look into resetting the zzBuffer on 
Tokenizer.reset(Reader).

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854886#action_12854886
 ] 

Uwe Schindler commented on LUCENE-2074:
---

I plan to commit this soon! So any patch will get outdated, thats why i want to 
fix this here. And as this patch removes direct access from the Tokenizer to 
the lexer (as it is only accessible through an interface now), we have to 
change the jflex file to do it correctly.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854890#action_12854890
 ] 

Uwe Schindler commented on LUCENE-2074:
---

You dont need the jflex binaries in general, only if you reconstruct the source 
files (using ant jflex). And its easy to generate, check out and start mvn 
install.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

Here a new patch, with the zzBuffer reset to default implemented in a separate 
reset(Reader) method. As yyReset is generated as final, I had to change the 
name.

Before apply, run:

{noformat}
svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.* 
svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.* 
{noformat}

I will commit this in a day or two!

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

Updated also the error message about missing jflex when calling ant jflex to 
regenerate the lexers. The message now contains instructions for downloading 
and building JFlex. Also add CHANGES.txt.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: (was: LUCENE-2074.patch)

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Uwe Schindler (JIRA)
Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
-

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


When indexing large documents, the lexer buffer may stay large forever. This 
sub-issue resets the lexer buffer back to the default on reset(Reader).

This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854900#action_12854900
 ] 

Uwe Schindler commented on LUCENE-2074:
---

Created sub-issue: LUCENE-2384

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854903#action_12854903
 ] 

Uwe Schindler commented on LUCENE-2384:
---

For JFlex this does not help as the Jflex-generated code always needs a Reader. 
This is special here, the lexer will not need to load the whole document into 
the reader, it only needs sometimes a large look forward/backwards buffer.

 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854908#action_12854908
 ] 

Uwe Schindler commented on LUCENE-2384:
---

{quote}
patch to reset the zzBuffer when the input is reseted. The code is really taken 
from 
https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com
 so I can't really grant license to use it but I think the guy realeased it as 
public domain by posting it to the mailing list. 
I tested it and it seems to work for me. Just including it here is case 
somebody want to apply the patch directly to 3.0.1 (although it's better to 
wait for 3.1)
{quote}

Your fix adds an addtional complexity. Just reset the buffer back to the 
default ZZ_BUFFERSIZE if grown on reset. Your patch always reallocates a new 
buffer.

Use this:
{code}
public final void reset(Reader r) {
  // reset to default buffer size, if buffer has grown
  if (zzBuffer.length  ZZ_BUFFERSIZE) {
zzBuffer = new char[ZZ_BUFFERSIZE];
  }
  yyreset(r);
}
{code}

 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: reset.diff


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855136#action_12855136
 ] 

Uwe Schindler commented on LUCENE-2385:
---

The patch does not look like you svn moved the files. To preserve history, you 
should do a svn move of the file in your local repository and then modify it 
to reflect the package changes (if any).

Did you do this?

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855150#action_12855150
 ] 

Uwe Schindler commented on LUCENE-2385:
---

In general we place a list of all svn move/copy command together with the 
patch, executeable from the root dir. If you paste those commands into your 
terminal and then apply the patch, it works. One example is the jflex issue 
(ok, the commands are shortened).

Another possibility is to have a second checkout, where you arrange the files 
correctly (svn moved/copied) and one for creating the patches.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855164#action_12855164
 ] 

Uwe Schindler commented on LUCENE-2385:
---

Yeah thats fine!

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: IndexWriter memory leak?

2010-04-08 Thread Uwe Schindler
There is one possibility, that could be fixed:

As Tokenizers are reused, the analyzer holds a reference to the last used 
Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this 
is the case for you, that may be easy to do. So Tokenizer.close() looks like 
this:

  /** By default, closes the input Reader. */
  @Override
  public void close() throws IOException {
input.close();
input = null; // -- new!
  }

Any comments from other committers?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Ruben Laguna [mailto:ruben.lag...@gmail.com]
 Sent: Thursday, April 08, 2010 2:50 PM
 To: java-u...@lucene.apache.org
 Subject: Re: IndexWriter memory leak?
 
 I will double check in the afternoon the heapdump.hprof. But I think
 that
 *some* readers are indeed held by
 docWriter.threadStates[0].consumer.fieldHash[1].fields[],
 as shown in [1] (this heapdump contains only live objects).  The
 heapdump
 was taken after IndexWriter.commit() /IndexWriter.optimize() and all
 the
 Documents were already indexed and GCed (I will double check).
 
 So that would mean that the Reader is retained in memory by the
 following
 chaing of references,
 
 DocumentsWriter - DocumentsWriterThreadState -
 DocFieldProcessorPerThread
 - DocFieldProcessorPerField - Fieldable - Field (fieldsData)
 
 I'll double check with Eclipse MAT as I said that this chain is
 actually
 made of hard references only (no SoftReferences,WeakReferences, etc). I
 will
 also double check also that there is no live Document that is
 referencing
 the Reader via the Field.
 
 
 [1] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg
 
 On Thu, Apr 8, 2010 at 2:16 PM, Uwe Schindler u...@thetaphi.de wrote:
 
  Readers are not held. If you indexed the document and gced the
 document
  instance they readers are gone.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Ruben Laguna [mailto:ruben.lag...@gmail.com]
   Sent: Thursday, April 08, 2010 1:28 PM
   To: java-u...@lucene.apache.org
   Subject: Re: IndexWriter memory leak?
  
   Now that the zzBuffer issue is solved...
  
   what about the references to the Readers held by docWriter. Tika´s
   ParsingReaders are quite heavyweight so retaining those in memory
   unnecesarily is also a hidden memory leak. Should I open a bug
 report
   on
   that one?
  
   /Rubén
  
   On Thu, Apr 8, 2010 at 12:11 PM, Shai Erera ser...@gmail.com
 wrote:
  
Guess we were replying at the same time :).
   
On Thu, Apr 8, 2010 at 1:04 PM, Uwe Schindler u...@thetaphi.de
   wrote:
   
 I already answered, that I will take care of this!

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Shai Erera [mailto:ser...@gmail.com]
  Sent: Thursday, April 08, 2010 12:00 PM
  To: java-u...@lucene.apache.org
  Subject: Re: IndexWriter memory leak?
 
  Yes, that's the trimBuffer version I was thinking about, only
   this guy
  created a reset(Reader, int) and does both ops (resetting +
 trim)
   in
  one
  method call. More convenient. Can you please open an issue to
   track
  that?
  People will have a chance to comment on whether we (Lucene)
   should
  handle
  that, or it should be a JFlex fix. Based on the number of
 replies
   this
  guy
  received (0 !), I doubt JFlex would consider it a problem.
 But we
   can
  do
  some small service to our users base by protecting against
 such
  problems.
 
  And while you're opening the issue, if you want to take a
 stab at
  fixing it
  and post a patch, it'd be great :).
 
  Shai
 
  On Thu, Apr 8, 2010 at 12:51 PM, Ruben Laguna
  ruben.lag...@gmail.comwrote:
 
   I was investigating this a little further and in the JFlex
   mailing
  list I
   found [1]
  
   I don't know much about flex / JFlex but it seems that this
 guy
  resets the
   zzBuffer to 16384 or less when setting the input for the
 lexer
  
  
   Quoted from  shef she...@ya...
  
  
   I set
  
   %buffer 0
  
   in the options section, and then added this method to the
   lexer:
  
  /**
   * Set the input for the lexer. The size parameter
 really
   speeds
  things
   up,
   * because by default, the lexer allocates an internal
   buffer of
  16k.
   For
   * most strings, this is unnecessarily large. If the
 size
   param is
   0 or greater
   * than 16k, then the buffer is set to 16k. If the size
   param is
   smaller, then
   * the buf will be set to the exact size

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

New patch with replacement of deprecated TermAttribute - CharTermAttribute. It 
also fixes the reset()/reset(Reader) methods to be conform to all other 
Tokenizers and the documentations. The current one was resetting multiple 
times. This has no effect on backwards. Also improve the JFlex classpath 
detection to work with svn checkouts or future release zips.

I will commit this soon when all tests ran.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854396#action_12854396
 ] 

Uwe Schindler commented on LUCENE-2376:
---

You mean insane amount of fields with norms...?

 java.lang.OutOfMemoryError:Java heap space
 --

 Key: LUCENE-2376
 URL: https://issues.apache.org/jira/browse/LUCENE-2376
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1
 Environment: Windows
Reporter: Shivender Devarakonda
 Attachments: InfoStreamOutput.txt


 I see an OutOfMemory error in our product and it is happening when we have 
 some data objects on which we built the index. I see the following 
 OutOfmemory error, this is happening after we call Indexwriter.optimize():
 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12]  In thread 
 Lucene Merge Thread #12 and the message is 
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: Java heap space
 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] 
 Uncaught Exception in thread Lucene Merge Thread #12
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.util.HashMap.resize(HashMap.java:462)
   at java.util.HashMap.addEntry(HashMap.java:755)
   at java.util.HashMap.put(HashMap.java:385)
   at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256)
   at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366)
   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
   at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
 4/06/10 02:03:42.895 PM PDT [ERROR]  this writer hit an OutOfMemoryError; 
 cannot complete optimize

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854594#action_12854594
 ] 

Uwe Schindler commented on LUCENE-2380:
---

The structure should look like String and StringIndex, but I am not sure, if we 
need real BytesRefs. In my opinion, it should be an array of byte[], where each 
byte[] is allocated with the termsize from the enums BytesRef and copied over - 
this is. This is no problem, as the terms need to be replicated either way, as 
the BytesRef from the enum is reused. The only problem is that byte[] is mising 
the cool bytesref methods like utf8ToString() that may be needed by consumers.

getStrings and getStringIndex should be deprecated. We cannot emulate them 
using BytesRef.utf8ToString, as the String[] arrays are raw and allow no 
wrapping. If FieldCache would use accessor methods and not raw arrays, we would 
not have that problem...

 Add FieldCache.getTermBytes, to load term data as byte[]
 

 Key: LUCENE-2380
 URL: https://issues.apache.org/jira/browse/LUCENE-2380
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1


 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
 string, but not necessarily), so we need to push this up the search stack.
 FieldCache now has getStrings and getStringIndex; we need corresponding 
 methods to load terms as native byte[], since in general they may not be 
 representable as String.  This should be quite a bit more RAM efficient too, 
 for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854639#action_12854639
 ] 

Uwe Schindler commented on LUCENE-2380:
---

This goes again in the direction of not having arrays in FieldCache anymore, 
but instead have accessor methods taking a docid and giving back the data 
(possibly as a reference). So getBytes(docid) returns a reused BytesRef with 
offset and length of the requested term. For native types we should also go 
away from arrays and only provide accessor methods. Java is so fast and possiby 
inlines the method call. So for native types we could also use a FloatBuffer or 
ByteBuffer or whatever from java.nio.

 Add FieldCache.getTermBytes, to load term data as byte[]
 

 Key: LUCENE-2380
 URL: https://issues.apache.org/jira/browse/LUCENE-2380
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.1


 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
 string, but not necessarily), so we need to push this up the search stack.
 FieldCache now has getStrings and getStringIndex; we need corresponding 
 methods to load terms as native byte[], since in general they may not be 
 representable as String.  This should be quite a bit more RAM efficient too, 
 for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler commented on LUCENE-2383:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (int doc= target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:23 PM:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

  was (Author: thetaphi):
FCRF looks ok, I would only change the nextDoc() loop in the 
deletions-aware iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (int doc= target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?
  
 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681
 ] 

Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:24 PM:
---

FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware 
iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc)
return doc = NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return doc = NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?

  was (Author: thetaphi):
FCRF looks ok, I would only change the nextDoc() loop in the 
deletions-aware iterator to:

{code}
do {
  doc++;
  if (doc = maxDoc) return NO_MORE_DOCS;
} while (skipDocs.get(doc) || !matchDoc(doc));
return doc;
{code}

and the same in advance(), little bit changed:

{code}
for (doc = target; doc  maxDoc; doc++) {
  if  (!skipDocs.get(doc)  matchDoc(doc))
return doc;
}
return NO_MORE_DOCS;
{code}

The try catch is then unneeded. This seems clearer for me. The non-skipdocs 
iterator is performanter with the try...catch, as it preserves one bounds 
check. But we need to do the bounds check here in all cases, why not do 
up-front?
  
 Some small fixes after the flex merge...
 

 Key: LUCENE-2383
 URL: https://issues.apache.org/jira/browse/LUCENE-2383
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2383.patch


 Changes:
   * Re-introduced specialization optimization to FieldCacheRangeQuery;
 also fixed bug (was failing to check deletions in advance)
   * Changes 2 checkIndex methods from protected - public
   * Add some missing null checks when calling MultiFields.getFields or
 IndexReader.fields()
   * Tweak'd CHANGES a bit
   * Removed some small dead code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Commit freeze in flex branch

2010-04-07 Thread Uwe Schindler
Thanks for praise! And also thanks to Mike for scanning 20K patch lines :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, April 07, 2010 10:13 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Commit freeze in flex branch
 
 Yes +1 to that -- thanks Uwe!!
 
 And thanks for the many other people who helped out on flex.  It's a
 big and exciting improvement :)
 
 Mike
 
 On Wed, Apr 7, 2010 at 4:11 PM, Michael Busch busch...@gmail.com
 wrote:
  Uwe, thanks for doing all the svn work!  Was a smooth transition!
 
   Michael
 
  On 4/6/10 12:27 PM, Uwe Schindler wrote:
 
  The freeze is over, we merged successfully.
 
  If you had a flex branch checked out:
   svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Tuesday, April 06, 2010 12:51 PM
  To: java-dev@lucene.apache.org
  Subject: Commit freeze in flex branch
 
  I am trying to reintegrate the flex branch into current trunk.
 After
  this has done, no more commits to flex! (after a reintegrate, the
 svn
  book says, that you should not touch the branch anymore) - Flex
  development can then proceed in trunk. It may happen that solr
  compilation/tests fail (because of recent changes in flex branch),
 I
  will fix this separately, so please do not complain, just let solr
  broken for a short time!
 
  It would be good if nobody would commit anything to flex anymore!
 After
  the merge, you can switch your flex checkouts.
 
  Before committing the merge, I will post a mega patch for review,
 that
  we have not missed anything during trunk-flex merges.
 
  Commits to trunk are OK, but should be spare.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
 
 
  ---
 --
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
  
 -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Commit freeze in flex branch

2010-04-06 Thread Uwe Schindler
I am trying to reintegrate the flex branch into current trunk. After this has 
done, no more commits to flex! (after a reintegrate, the svn book says, that 
you should not touch the branch anymore) - Flex development can then proceed in 
trunk. It may happen that solr compilation/tests fail (because of recent 
changes in flex branch), I will fix this separately, so please do not complain, 
just let solr broken for a short time!

It would be good if nobody would commit anything to flex anymore! After the 
merge, you can switch your flex checkouts.

Before committing the merge, I will post a mega patch for review, that we have 
not missed anything during trunk-flex merges.

Commits to trunk are OK, but should be spare.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)
Reintegrate flex branch into trunk
--

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


This issue is for reintegrating the flex branch into current trunk. I will post 
the patch here for review and commit, when all contributors to flex have 
reviewed the patch.

Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here the patch just for review!

You cannot really apply it as it does not contains changes that are simply svn 
copied from flex (that are all new files added by flex). The idea behind this 
patch is only that everybody working on flex should scroll through it and 
verify that actually changed files are fine; e.g. we did not miss a change to 
trunk in flex (such a missing merge would apply as a revert in the patch).

My working copy tests fine, only solr is not compiling anymore because of 
recent changes in NumericUtils internal class that are non backwards 
compatible. I will commit this patch before and break solr, but will fix it 
soon!

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: (was: LUCENE-2370.patch)

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

sorry, new patch.

The flex branch still contains some whitespace problems in contrib, but this is 
ok for now. I will check them and fix as far as i see.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here a new patch with lots of cleanups, thanks rmuir. Also reverted 
whitespace-only files

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370-solrfixes.patch

Here some fixes for Solr:
- makes it compile after flex merge
- has some really dirty hacks. Numeric field contents should no longer be seen 
as Strings, they are now BytesRefs. This affects AnalysisRequestHandler and 
also the converter methods in TrieField type. They should use BytesRefs after 
flex has landed.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

New patch, reverted all  files with whitespace-only changes.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2370:
--

Attachment: LUCENE-2370.patch

Here the final patch after cooperative reviewing in IRC. I will commit the 
merge now for Solr+Lucene.

The following points are still broken:
- DirectoryReader readded a bug (Mike McCandless knows)
- TestIndexWriterReader in trunk and backwards has some test commented out, 
they have to do with above problem

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854134#action_12854134
 ] 

Uwe Schindler commented on LUCENE-2370:
---

Committed revision: 931278

I leave the issue open until the bugs are fixed.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Commit freeze in flex branch

2010-04-06 Thread Uwe Schindler
The freeze is over, we merged successfully.

If you had a flex branch checked out:
 svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Tuesday, April 06, 2010 12:51 PM
 To: java-dev@lucene.apache.org
 Subject: Commit freeze in flex branch
 
 I am trying to reintegrate the flex branch into current trunk. After
 this has done, no more commits to flex! (after a reintegrate, the svn
 book says, that you should not touch the branch anymore) - Flex
 development can then proceed in trunk. It may happen that solr
 compilation/tests fail (because of recent changes in flex branch), I
 will fix this separately, so please do not complain, just let solr
 broken for a short time!
 
 It would be good if nobody would commit anything to flex anymore! After
 the merge, you can switch your flex checkouts.
 
 Before committing the merge, I will post a mega patch for review, that
 we have not missed anything during trunk-flex merges.
 
 Commits to trunk are OK, but should be spare.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2370.
---

Resolution: Fixed

Mike fixed the missing merges! Thanks.

 Reintegrate flex branch into trunk
 --

 Key: LUCENE-2370
 URL: https://issues.apache.org/jira/browse/LUCENE-2370
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, 
 LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch


 This issue is for reintegrating the flex branch into current trunk. I will 
 post the patch here for review and commit, when all contributors to flex have 
 reviewed the patch.
 Before committing, I will tag both trunk and flex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2332) Mrge CharTermAttribute and deprecations to trunk

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-2332.
-

Resolution: Invalid

Flex was merged, so this is no longer needed.

 Mrge CharTermAttribute and deprecations to trunk
 

 Key: LUCENE-2332
 URL: https://issues.apache.org/jira/browse/LUCENE-2332
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 This should be merged to trunk until flex lands, so the analyzers can be 
 ported to new api.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-06 Thread Uwe Schindler (JIRA)
Replace deprecated TermAttribute by new CharTermAttribute
-

 Key: LUCENE-2372
 URL: https://issues.apache.org/jira/browse/LUCENE-2372
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Uwe Schindler
 Fix For: 3.1


After LUCENE-2302 is merged to trunk with flex, we need to carry over all 
tokenizers and consumers of the TokenStreams to the new CharTermAttribute.

We should also think about adding a AttributeFactory that creates a subclass of 
CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. 
CollationKeyFilter is then obsolete, instead you can simply convert every 
TokenStream to indexing only CollationKeys by changing the attribute 
implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

2010-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854199#action_12854199
 ] 

Uwe Schindler commented on LUCENE-2302:
---

I will create a patch with option #2 and lots of documentation and changed 
backwards tests.

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl

2010-04-06 Thread Uwe Schindler (JIRA)
Add introspection API to AttributeSource/AttributeImpl
--

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


AttributeSource/TokenStream inspection in Solr needs to have some insight into 
the contents of AttributeImpls. As LUCENE-2302 has some problems with 
toString() [which is not structured and conflicts with CharSequence's 
definition for CharTermAttribute], I propose an simple API that get a default 
implementation in AttributeImpl (just like toString() current):

- IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
iterator (for most attributes its a singleton) of a key-value pair, e.g. 
term-foobar,startOffset-Integer.valueOf(0),...
- AttributeSource gets the same method, it just concat the iterators of each 
getAttributeImplsIterator() AttributeImpl

No backwards problems occur, as the default toString() method will work like 
before (it just gets iterator and lists), but we simply remove the 
documentation for the format. (Char)TermAttribute gets a special impl fo 
toString() according to CharSequence and a corresponding iterator.

I also want to remove the abstract hashCode() and equals() methods from 
AttributeImpl, as they are not needed and just create work for the implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl

2010-04-06 Thread Uwe Schindler (JIRA)
Add introspection API to AttributeSource/AttributeImpl
--

 Key: LUCENE-2375
 URL: https://issues.apache.org/jira/browse/LUCENE-2375
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


AttributeSource/TokenStream inspection in Solr needs to have some insight into 
the contents of AttributeImpls. As LUCENE-2302 has some problems with 
toString() [which is not structured and conflicts with CharSequence's 
definition for CharTermAttribute], I propose an simple API that get a default 
implementation in AttributeImpl (just like toString() current):

- IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
iterator (for most attributes its a singleton) of a key-value pair, e.g. 
term-foobar,startOffset-Integer.valueOf(0),...
- AttributeSource gets the same method, it just concat the iterators of each 
getAttributeImplsIterator() AttributeImpl

No backwards problems occur, as the default toString() method will work like 
before (it just gets iterator and lists), but we simply remove the 
documentation for the format. (Char)TermAttribute gets a special impl fo 
toString() according to CharSequence and a corresponding iterator.

I also want to remove the abstract hashCode() and equals() methods from 
AttributeImpl, as they are not needed and just create work for the implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Deleted: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl

2010-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler deleted LUCENE-2375:
--


 Add introspection API to AttributeSource/AttributeImpl
 --

 Key: LUCENE-2375
 URL: https://issues.apache.org/jira/browse/LUCENE-2375
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-04-05 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2354.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed revision: 930821

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch, LUCENE-2354.patch, LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853336#action_12853336
 ] 

Uwe Schindler commented on LUCENE-2364:
---

This would also make MTQ's rewrite mode internal collectors better, as they 
convert BytesRef terms from the enums to String Terms, passing to TermQuery and 
inside TermScorer convert back. Whith real binary terms (numerics are not yet 
real binary, they are UTF-8 conform ascii bytes), this would break.

 Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery  
 Co.
 -

 Key: LUCENE-2364
 URL: https://issues.apache.org/jira/browse/LUCENE-2364
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
 Fix For: Flex Branch


 It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery 
 (as both queries convert the strings to BytesRef internally). For 
 NumericRange support in Solr it will be needed to support numerics as ByteRef 
 in single-term queries.
 When this will be added, don't forget to change TestNumericRangeQueryXX to 
 use the BytesRef ctor of TRQ.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-04-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2354:
--

Attachment: LUCENE-2354.patch

Here updated patch with cleaned up NumericUtils (no String methods anymore). 
For now I just commented them out, if we want to reactivate parts of them. 
Before release the methods should be removed.

I changed all tests (and deactivated tests in backwards) using those String 
methods. Also rewrote the CartesianShapeFilter in contrib/spatial to use flex 
API (optimized for the one-term-case without OpenBitSet allocation). Also 
changed spatial tests to use NumericField.

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch, LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-04 Thread Uwe Schindler (JIRA)
Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery  
Co.
-

 Key: LUCENE-2364
 URL: https://issues.apache.org/jira/browse/LUCENE-2364
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
 Fix For: Flex Branch


It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery 
(as both queries convert the strings to BytesRef internally). For NumericRange 
support in Solr it will be needed to support numerics as ByteRef in single-term 
queries.

When this will be added, don't forget to change TestNumericRangeQueryXX to use 
the BytesRef ctor of TRQ.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-04-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2354:
--

Attachment: LUCENE-2354.patch

Updated patch with lots of javadocs cleanups and new getPrefixCodedXxxShift() 
methods. Also optimized some methods.

I will commit this tomorrow!

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch, LUCENE-2354.patch, LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2363) Classes BooleanFilter and FilterClause missing in 2.2

2010-04-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2363.
---

Resolution: Invalid

These classes are in the queries contrib, not in lucene-core. So you have to 
add lucene-queries.jar to your classpath (its in the contrib subfolder). Also 
bugs in version 2.2 will no longer be fixed. Current are version 2.9.2 and 3.01.

 Classes BooleanFilter and FilterClause missing in 2.2
 -

 Key: LUCENE-2363
 URL: https://issues.apache.org/jira/browse/LUCENE-2363
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
 Environment: Windows
Reporter: Amit Wamburkar

 I downloaded lucene-core-2.2.0.jar and started using it. But when i tried to 
 created objects of the classes: BooleanFilter and FilterClause , could not 
 find them in the jar. In fact i want to use them so that i can get rid of 
 BooleanQuery which is causing exception BooleanQuery$TooManyClauses. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2363) Classes BooleanFilter and FilterClause missing in 2.2

2010-04-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-2363.
-


 Classes BooleanFilter and FilterClause missing in 2.2
 -

 Key: LUCENE-2363
 URL: https://issues.apache.org/jira/browse/LUCENE-2363
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
 Environment: Windows
Reporter: Amit Wamburkar

 I downloaded lucene-core-2.2.0.jar and started using it. But when i tried to 
 created objects of the classes: BooleanFilter and FilterClause , could not 
 find them in the jar. In fact i want to use them so that i can get rid of 
 BooleanQuery which is causing exception BooleanQuery$TooManyClauses. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Landing the flex branch

2010-04-01 Thread Uwe Schindler
Hi,

we should think about how to merge the changes to trunk. I can try this out 
during the weekend, to merge back the changes to trunk, but this can be very 
hard. So we have the following options:

Try a merge back: This would let flex appear as a single commit to trunk, so 
the history of trunk would be preserved. If somebody wants to see the changes 
in the flex branch, he could ask for them (e.g. in TortoiseSVN there is a 
checkbox Include merged revisions). If this is not easy or fails, we can do 
the following:

- Create a big diff between current trunk and flex (after flex is merged up to 
trunk). Attach this patch to an issue and let everybody review. After that we 
can apply the patch to trunk. This would result in the same behavior for trunk, 
no changes lost, but all changes in flex cannot be reviewed.
- Delete current trunk and svn move the branch to trunk (after flex is merged 
up to trunk): This would make the history of flex the current history. The 
drawback: You losse latest trunk changes since the split of flex. Instead you 
will only see the merge messages. Therefore we should see this only as a last 
chance.

Comments?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, March 30, 2010 5:35 PM
 To: java-dev@lucene.apache.org
 Subject: Landing the flex branch
 
 I think the time has finally come!  Pending one issue (LUCENE-2354 --
 Uwe), I think flex is ready to land I think the other issues with
 Fix
 Version = Flex Branch can be moved to 3.1 after we land.
 
 We still use the pre-flex APIs in a number of places... I think this
 is actually good (so we continue to test the back-compat emulation
 layer).  With time we can cut them over.
 
 After flex, there are a number of fun things to explore.  EG, we need
 to make attributes work well with codecs  indexing/searching (with
 Multi/DirReader, serailize/unserialize, etc.); we need a BytesRef +
 packed ints FieldCache StringIndex variant which should use much less
 RAM in certain cases; we should build a fast core PForDelta codec;
 more queries can cutover to operating directly on byte[] terms, etc.
 But these can all come with time...
 
 Thoughts/issues/objections?
 
 Mike
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851849#action_12851849
 ] 

Uwe Schindler commented on LUCENE-2310:
---

I am also +1 on the indexer interface.

I just repeat myself: We still need TokenStream, an AttributeSource alone is 
too less. But that is away from that issue: Indexable provides an iterator of 
fields that consist of name and TokenStream and some options (possibly like 
omitNorms). If you just dont want to have close() in TokenStream, let's remove 
it. end() is needed for offsets, the indexer need to take care. 
incrementToken() is the iterator approach. What else is there? Reset may be 
invisible to indexer (I would refactor that and would make a subclass of 
TokenStream that supports reset, ResetableTokenStream - like Tokenizer supports 
reset(Reader), which is also a subclass). The abstract TokenStream then is only 
consisting of incrementToken() and end() + the AttributeSource access methods. 
Attributes needed by indexer are only TermToBytesRefAttribute, 
PositionIncrementAtt, OffsetAttribute and PayloadAttribute.

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851856#action_12851856
 ] 

Uwe Schindler commented on LUCENE-2310:
---

Yeah!

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-30 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851598#action_12851598
 ] 

Uwe Schindler commented on LUCENE-2354:
---

Will work here the next days and rewrite the tests.

One problem: Solr at the moment uses the deprecated string api for building a 
TermQuery. This should be replaced by a NRQ with upper==lower(inclusive), as 
this disables scoring, which is wrong for Numeric fields.

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

2010-03-30 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851596#action_12851596
 ] 

Uwe Schindler commented on LUCENE-2302:
---

Will add the javadocs and think about the CharSequence problems again. It's 
tricky :(

I have less time at the moment, will do hopefully until the weekend. The same 
for LUCENE-2354, which needs some test rewriting.

 Replacement for TermAttribute+Impl with extended capabilities (byte[] 
 support, CharSequence, Appendable)
 

 Key: LUCENE-2302
 URL: https://issues.apache.org/jira/browse/LUCENE-2302
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, 
 LUCENE-2302.patch, LUCENE-2302.patch


 For flexible indexing terms can be simple byte[] arrays, while the current 
 TermAttribute only supports char[]. This is fine for plain text, but e.g 
 NumericTokenStream should directly work on the byte[] array.
 Also TermAttribute lacks of some interfaces that would make it simplier for 
 users to work with them: Appendable and CharSequence
 I propose to create a new interface CharTermAttribute with a clean new API 
 that concentrates on CharSequence and Appendable.
 The implementation class will simply support the old and new interface 
 working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
 this. So if somebody adds a TermAttribute, he will get an implementation 
 class that can be also used as CharTermAttribute. As both attributes create 
 the same impl instance both calls to addAttribute are equal. So a TokenFilter 
 that adds CharTermAttribute to the source will work with the same instance as 
 the Tokenizer that requested the (deprecated) TermAttribute.
 To also support byte[] only terms like Collation or NumericField needs, a 
 separate getter-only interface will be added, that returns a reusable 
 BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
 also support this interface. For backwards compatibility with old 
 self-made-TermAttribute implementations, the indexer will check with 
 hasAttribute(), if the BytesRef getter interface is there and if not will 
 wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
 new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
 indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010
 ] 

Uwe Schindler commented on LUCENE-2354:
---

bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same 
as trunk).

Yes. And i think we should keep it for now using 7 bit. Problems start when the 
sort order of terms is needed (which is the case for NRQ). As default in flex 
is the UTF-8 term comparator, it would not sort correctly for numeric fields 
with full 8 bits?

bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search 
perf would improve but only a tiny bit since NRQ visits so few terms?

I dont think you will notice a difference. A standard int range contains maybe 
10 to 20 sub-ranges (at maximum), so converting between string and TermRef 
should not count. But the new implementation is more clean. In principle we 
could remove the whole char[]/String based API in NumericUtils - I only have to 
rewrite the tests and remove the NumericUtils test in backwards (as no longer 
applies then, too).

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010
 ] 

Uwe Schindler edited comment on LUCENE-2354 at 3/29/10 5:23 PM:


bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same 
as trunk).

Yes. And i think we should keep it for now using 7 bit. Problems start when the 
sort order of terms is needed (which is the case for NRQ). As default in flex 
is the UTF-8 term comparator, it would not sort correctly for numeric fields 
with full 8 bits?

By the way, the recently added backwards test checks that an old index with 
NumericField behaves as before! This is why I added a new zip file to 
TestBackwardCompatibility.

bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search 
perf would improve but only a tiny bit since NRQ visits so few terms?

I dont think you will notice a difference. A standard int range contains maybe 
10 to 20 sub-ranges (at maximum), so converting between string and TermRef 
should not count. But the new implementation is more clean. In principle we 
could remove the whole char[]/String based API in NumericUtils - I only have to 
rewrite the tests and remove the NumericUtils test in backwards (as no longer 
applies then, too).

  was (Author: thetaphi):
bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, 
same as trunk).

Yes. And i think we should keep it for now using 7 bit. Problems start when the 
sort order of terms is needed (which is the case for NRQ). As default in flex 
is the UTF-8 term comparator, it would not sort correctly for numeric fields 
with full 8 bits?

bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search 
perf would improve but only a tiny bit since NRQ visits so few terms?

I dont think you will notice a difference. A standard int range contains maybe 
10 to 20 sub-ranges (at maximum), so converting between string and TermRef 
should not count. But the new implementation is more clean. In principle we 
could remove the whole char[]/String based API in NumericUtils - I only have to 
rewrite the tests and remove the NumericUtils test in backwards (as no longer 
applies then, too).
  
 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2315) AttributeSource's methods for accessing attributes should be final, else its easy to corrupt the internal states

2010-03-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2315:
-

Assignee: Uwe Schindler

 AttributeSource's methods for accessing attributes should be final, else its 
 easy to corrupt the internal states
 

 Key: LUCENE-2315
 URL: https://issues.apache.org/jira/browse/LUCENE-2315
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1


 The methods that operate and modify the internal maps of AttributeSource 
 should be final, which is a backwards break. But anybody that overrides such 
 methods simply creates a buggy AS either case.
 I want to makeall impls final (in general the class should be final at all, 
 but it is made for extension in TokenStream). So its important that the 
 implementations are final!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2354:
--

Attachment: LUCENE-2354.patch

Here a first preview patch.

NumericUtils still contains lots of unused String-based methods, I think we 
should remove them, the class is expert-only and also experimental. Backwards 
compatibility is broken even with those backwards layers (as the split 
functions were changed to use BytesRefs. Also these backwards methods are 
simply slow now (as the byte[] is copied to char[] and vice-versa).

The new NumericTokenStream now uses a special NumericTermAttribute, so possibly 
Filters coming later have access to shift value and so on. This attribute also 
implements the TermToBytesRefAttribute for the indexer. Please note: This 
attribute is a hack and does not support copyTo/clone/, so you cannot put 
away tokens (which is not needed), but its still possible to add further 
attributes to numeric tokens (which is why the attribute is there).

The NumericTokenStream backwards test was removed, because the new stream does 
no longer contain a TermAttribute, so the test always fails.

TODO: A better inline-hashCode generation for the numeric-to-BytesRef 
transformation

 Convert NumericUtils and NumericTokenStream to use BytesRef instead of 
 Strings/char[]
 -

 Key: LUCENE-2354
 URL: https://issues.apache.org/jira/browse/LUCENE-2354
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Flex Branch
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: Flex Branch

 Attachments: LUCENE-2354.patch


 After LUCENE-2302, we should use TermToBytesRefAttribute to index using 
 NumericTokenStream. This also should convert the whole NumericUtils to use 
 BytesRef when converting numerics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2306) contrib/xml-query-parser: NumericRangeFilter support

2010-03-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-2306:
---


I will commit my changes to the package names and a missing super.tearDown() 
soon.

But I found one other thing:
NRQ allows one or both of the bounds to be null (like TermRangeQuery). But the 
builder enforces both attributes to be present.

Also I dont like the default type of int, I would instead enforce the type. 
Will post a patch soon.

 contrib/xml-query-parser: NumericRangeFilter support
 

 Key: LUCENE-2306
 URL: https://issues.apache.org/jira/browse/LUCENE-2306
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 3.0.1
Reporter: Jingkei Ly
Assignee: Mark Harwood
 Fix For: 3.1

 Attachments: LUCENE-2306.patch, LUCENE-2306.patch


 Create a FilterBuilder for NumericRangeFilter so that it may be used with the 
 XML query parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2306) contrib/xml-query-parser: NumericRangeFilter support

2010-03-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850495#action_12850495
 ] 

Uwe Schindler commented on LUCENE-2306:
---

Committed package and test fixes in revision: 928177

 contrib/xml-query-parser: NumericRangeFilter support
 

 Key: LUCENE-2306
 URL: https://issues.apache.org/jira/browse/LUCENE-2306
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 3.0.1
Reporter: Jingkei Ly
Assignee: Mark Harwood
 Fix For: 3.1

 Attachments: LUCENE-2306.patch, LUCENE-2306.patch


 Create a FilterBuilder for NumericRangeFilter so that it may be used with the 
 XML query parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2306) contrib/xml-query-parser: NumericRangeQuery and -Filter support

2010-03-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2306:
--

Summary: contrib/xml-query-parser: NumericRangeQuery and -Filter support  
(was: contrib/xml-query-parser: NumericRangeFilter support)

 contrib/xml-query-parser: NumericRangeQuery and -Filter support
 ---

 Key: LUCENE-2306
 URL: https://issues.apache.org/jira/browse/LUCENE-2306
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 3.0.1
Reporter: Jingkei Ly
Assignee: Mark Harwood
 Fix For: 3.1

 Attachments: LUCENE-2306.patch, LUCENE-2306.patch


 Create a FilterBuilder for NumericRangeFilter so that it may be used with the 
 XML query parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >