[jira] Commented: (LUCENE-2399) Add support for ICU's Normalizer2
[ https://issues.apache.org/jira/browse/LUCENE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858104#action_12858104 ] Uwe Schindler commented on LUCENE-2399: --- Hurra! You used the StringBuilder as buffer to not create a new String instance each time and only need to copy the buffer. This could also be a good trick for the PatternReplaceFilter from Solr. bq. i made this filter final, to avoid a ticket from the policeman. How did you get the filter through the assert statement without final? Strange... Add support for ICU's Normalizer2 - Key: LUCENE-2399 URL: https://issues.apache.org/jira/browse/LUCENE-2399 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-2399.patch, LUCENE-2399.patch While there are separate Case Folding, Normalization, and Ignorable-removal filters in LUCENE-1488, the new ICU Normalizer2 API does this all at once with nfkc_cf (based on the new NFKC_Casefold property in Unicode). This is great, because it provides a ton of unicode functionality that is really needed. And the new Normalizer2 API takes CharSequence and writes to Appendable... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2399) Add support for ICU's Normalizer2
[ https://issues.apache.org/jira/browse/LUCENE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858108#action_12858108 ] Uwe Schindler commented on LUCENE-2399: --- I know, you were running the test without assertion from Eclipse! :-) {noformat} [junit] TokenStream implementation classes or at least their incrementToken() implementation must be final [junit] junit.framework.AssertionFailedError: TokenStream implementation classes or at least their incrementToken() implementation must be final [junit] at org.apache.lucene.analysis.TokenStream.assertFinal(TokenStream.java:117) {noformat} So for me the assertion worked. The *second* patch of course works with icu-4_4.jar! So great and I am happy about the cool interfaces at CharTermAttribute. I just wanted to check that the my deputy sheriff did not miss something because of wrong instructions. Add support for ICU's Normalizer2 - Key: LUCENE-2399 URL: https://issues.apache.org/jira/browse/LUCENE-2399 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-2399.patch, LUCENE-2399.patch While there are separate Case Folding, Normalization, and Ignorable-removal filters in LUCENE-1488, the new ICU Normalizer2 API does this all at once with nfkc_cf (based on the new NFKC_Casefold property in Unicode). This is great, because it provides a ton of unicode functionality that is really needed. And the new Normalizer2 API takes CharSequence and writes to Appendable... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: official GIT repository / switch to GIT?
Hi, In my opinion: Definitely NOT! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Thomas Koch [mailto:tho...@koch.ro] Sent: Saturday, April 17, 2010 9:21 AM To: solr-dev; java-dev@lucene.apache.org Subject: official GIT repository / switch to GIT? Hi, at least since august 2009 nobody has dared to ask this question, so let's start a flamewar: Don't you think, it's time for lucene and solr to switch to GIT? And now seriously: I did the last packaging of SOLR 1.4 for Debian and I intend to continue doing so. Since I'm doing the packaging in GIT, I'm asking myself, whether I should base the packaging GIT repository on the SOLR repo found at git.apache.org? However if the one from git.a.o is not stable and may crash at any given time, this would not be a good idea. And the best thing for those packagers like me would be of course, if the GIT repo would be the official one. And I wonder, if there are really people using SVN and downloading douzens of patch files from jira? Isn't it, that everybody already uses git-svn? Best regards, Thomas Koch, http://www.koch.ro - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Uwe Schindler Fix For: 3.1 In a chat with Chris Male and my own ideas when implemnting for PANGAEA, I thought about the broken distance query in contrib. It lacks the folloing features: - It needs a query for the encldoing bbox (which is constant score) - It needs a separate filter for filtering out distances - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom sort. For that to work, spatial caches distance calculation (which is borken for multi-segment search) The idea is now to combine all three things into one query, but customizeable: We first thought about extending CustomScoreQuery and calculate the distance from FieldCache in the customScore method and return a score of 1 for distance=0, score=0 on the max distance and score0 for farer hits, that are in the bounding box but not in the distance circle. To filter out such negative scores, we would need to override the scorer in CustomScoreQuery which is priate. My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance for the current doc. It stores this distance also in the scorer. If the distance maxDistance it throws away the hit and calls nextDoc() again. The score() method will reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So the distance is only calculated one time in nextDoc()/advance(). To be able to plug in custom scoring, the following methods in the query can be overridden: - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance; allows score customization - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter - support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance. This query is almost finished in my head, it just needs coding :-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: DistanceQuery.java A first idea of the Query, it does not even compile as some classes are missing (coming with Chris' later patches), but it shows how it should work and how its customizeable. Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Uwe Schindler Fix For: 3.1 Attachments: DistanceQuery.java In a chat with Chris Male and my own ideas when implementing for PANGAEA, I thought about the broken distance query in contrib. It lacks the following features: - It needs a query/filter for the enclosing bbox (which is constant score) - It needs a separate filter for filtering out hits to far away (inside bbox but outside distance limit) - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom sort. For that to work, spatial caches distance calculation (which is broken for multi-segment search) The idea is now to combine all three things into one query, but customizeable: We first thought about extending CustomScoreQuery and calculate the distance from FieldCache in the customScore method and return a score of 1 for distance=0, score=0 on the max distance and score0 for farer hits, that are in the bounding box but not in the distance circle. To filter out such negative scores, we would need to override the scorer in CustomScoreQuery which is priate. My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance for the current doc. It stores this distance also in the scorer. If the distance maxDistance it throws away the hit and calls nextDoc() again. The score() method will reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So the distance is only calculated one time in nextDoc()/advance(). To be able to plug in custom scoring, the following methods in the query can be overridden: - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance; allows score customization - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter - support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance. - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns for a given doc id the lat/lng. This method is called per IndexReader one time in scorer creation and will retrieve the coordinates. By that we support FieldCache or whatever. This query is almost finished in my head, it just needs coding :-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857384#action_12857384 ] Uwe Schindler commented on LUCENE-2396: --- Are you sure you want to use LUCENE_CURRENT in some ctors? remove version from contrib/analyzers. -- Key: LUCENE-2396 URL: https://issues.apache.org/jira/browse/LUCENE-2396 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2396.patch Contrib/analyzers has no backwards-compatibility policy, so let's remove Version so the API is consumable. if you think we shouldn't do this, then instead explicitly state and vote on what the backwards compatibility policy for contrib/analyzers should be instead, or move it all to core. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857402#action_12857402 ] Uwe Schindler commented on LUCENE-2396: --- bq. Static? Weren't you against that!? He meant a static final! It is just to make the analyzers that depend on core stuff fix to a specific version. Until we have no more analyzers in core exspect Whitespace. remove version from contrib/analyzers. -- Key: LUCENE-2396 URL: https://issues.apache.org/jira/browse/LUCENE-2396 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2396.patch Contrib/analyzers has no backwards-compatibility policy, so let's remove Version so the API is consumable. if you think we shouldn't do this, then instead explicitly state and vote on what the backwards compatibility policy for contrib/analyzers should be instead, or move it all to core. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Proposal about Version API relaxation
Hi Earwin, I am strongly +1 on this. I would also make the Release Manager for 3.1, if nobody else wants to do this. I would like to take the preflex tag or some revisions before (maybe without the IndexWriterConfig, which is a really new API) to be 3.1 branch. And after that port some of my post-flex-changes like the StandardTokenizer refactoring back (so we can produce the old analyzer still without Java 1.4). So +1 on branching pre-flex and release as 3.1 soon. The Unicode improvements rectify a new release. I think also s1monw wants to have this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Earwin Burrfoot [mailto:ear...@gmail.com] Sent: Thursday, April 15, 2010 8:15 PM To: java-dev@lucene.apache.org Subject: Re: Proposal about Version API relaxation I'd like to remind that Mike's proposal has stable branches. We can branch off preflex trunk right now and wrap it up as 3.1. Current trunk is declared as future 4.0 and all backcompat cruft is removed from it. If some new features/bugfixes appear in trunk, and they don't break stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3, etc Thus, devs are free to work without back-compat burden, bleeding edge users get their blood, conservative users get their stability + a subset of new features from stable branches. On Thu, Apr 15, 2010 at 22:02, DM Smith dmsmith...@gmail.com wrote: On 04/15/2010 01:50 PM, Earwin Burrfoot wrote: First, the index format. IMHO, it is a good thing for a major release to be able to read the prior major release's index. And the ability to convert it to the current format via optimize is also good. Whatever is decided on this thread should take this seriously. Optimize is a bad way to convert to current. 1. conversion is not guaranteed, optimizing already optimized index is a noop 2. it merges all your segments. if you use BalancedSegmentMergePolicy, that destroys your segment size distribution Dedicated upgrade tool (available both from command-line and programmatically) is a good way to convert to current. 1. conversion happens exactly when you need it, conversion happens for sure, no additional checks needed 2. it should leave all your segments as is, only changing their format It is my observation, though possibly not correct, that core only has rudimentary analysis capabilities, handling English very well. To handle other languages well contrib/analyzers is required. Until recently it did not get much love. There have been many bw compat breaking changes (though w/ version one can probably get the prior behavior). IMHO, most of contrib/analyzers should be core. My guess is that most non-trivial applications will use contrib/analyzers. I counter - most non-trivial applications will use their own analyzers. The more modules - the merrier. You can choose precisely what you need. By and large an analyzer is a simple wrapper for a tokenizer and some filters. Are you suggesting that most non-trivial apps write their own tokenizers and filters? I'd find that hard to believe. For example, I don't know enough Chinese, Farsi, Arabic, Polish, ... to come up with anything better than what Lucene has to tokenize, stem or filter these. Our user base are those with ancient, underpowered laptops in 3-rd world countries. On those machines it might take 10 minutes to create an index and during that time the machine is fairly unresponsive. There is no opportunity to do it in the background. Major Lucene releases (feature-wise, not version-wise) happen like once in a year, or year-and-a-half. Is it that hard for your users to wait ten minutes once a year? I said that was for one index. Multiply that times the number of books available (300+) and yes, it is too much to ask. Even if a small subset is indexed, say 30, that's around 5 hours of waiting. Under consideration is the frequency of breakage. Some are suggesting a greater frequency than yearly. DM - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Proposal about Version API relaxation
I wish we could have a face to face talk like in the evenings at ApacheCon :( Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Thursday, April 15, 2010 9:46 PM To: java-dev@lucene.apache.org Subject: Re: Proposal about Version API relaxation From IRC: why do I get the feeling that everyone is in heated agreement on the Version thread? there are some cases that mean people will have to reindex in those cases, we should tell people they will have to reindex then they can decide to upgrade or not all other cases, just do the sensible thing and test first I have yet to meet anyone who simply drops a new version into production and says go So, as I said earlier, why don't we just move forward with it, strive to support reading X-1 index format in X and let the user know the cases in which they will have to re-index. If a migration tool is necessary, then someone can write it at the appropriate time. Just as was said w/ the Solr merge, it's software. If it doesn't work, we can change it. Thank goodness we don't have a back compatibility policy for our policies! -Grant On Apr 15, 2010, at 3:35 PM, Michael McCandless wrote: Unfortunately, live searching against an old index can get very hairy. EG look at what I had to do for the flex API on pre-flex index flex emulation layer. It's also not great because it gives the illusion that all is good, yet, you've taken a silent hit (up to ~10% or so) in your search perf. Whereas building maintaining a one-time index migration tool, in contrast, is much less work. I realize the migration tool has issues -- it fixes the hard changes but silently allows the soft changes to break (ie, your analyzers my not produce the same tokens, until we move all core analyzers outside of core, so they are separately versioned), but it seems like a good compromise here? Mike 2010/4/15 Shai Erera ser...@gmail.com: The reason Earwin why online migration is faster is because when u finally need to *fully* migrate your index, most chances are that most of the segments are already on the newer format. Offline migration will just keep the application idle for some amount of time until ALL segments are migrated. During the lifecycle of the index, segments are merged anyway, so migrating them on the fly virtually costs nothing. At the end, when u upgrade to a Lucene version which doesn't support the previous index format, you'll on the worse case need to migrate few large segments which were never merged. I don't know how many of those there will be as it really depends on the application, but I'd bet this process will touch just a few segments. And hence, throughput wise it will be a lot faster. We should create a migrate() API on IW which will touch just those segments and not incur a full optimize. That API can also be used for an offline migration tool, if we decide that's what we want. Shai On Thursday, April 15, 2010, jm jmugur...@gmail.com wrote: Not sure if plain users are allowed/encouraged to post in this list, but wanted to mention (just an opinion from a happy user), as other users have, that not all of us can reindex just like that. It would not be 10 min for one of our installations for sure... First, i would need to implement some code to reindex, cause my source data is postprocessed/compressed/encrypted/moved after it arrives to the application, so I would need to retrieve all etc. And then reindexing it would take days. javier On Thu, Apr 15, 2010 at 9:04 PM, Earwin Burrfoot ear...@gmail.com wrote: BTW Earwin, we can come up w/ a migrate() method on IW to accomplish manual migration on the segments that are still on old versions. That's not the point about whether optimize() is good or not. It is the difference between telling the customer to run a 5-day migration process, or a couple of hours. At the end of the day, the same migration code will need to be written whether for the manual or automatic case. And probably by the same developer which changed the index format. It's the difference of when does it happen. Converting stuff is easier then emulating, that's exactly why I want a separate tool. There's no need to support cross-version merging, nor to emulate old APIs. I also don't understand why offline migration is going to take days instead of hours for online migration?? WTF, it's gonna be even faster, as it doesn't have to merge things. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 -- --- To unsubscribe, e-mail: java-dev-unsubscr
[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: DistanceQuery.java small updates to Chris' patches. Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Uwe Schindler Fix For: 3.1 Attachments: DistanceQuery.java In a chat with Chris Male and my own ideas when implementing for PANGAEA, I thought about the broken distance query in contrib. It lacks the following features: - It needs a query/filter for the enclosing bbox (which is constant score) - It needs a separate filter for filtering out hits to far away (inside bbox but outside distance limit) - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom sort. For that to work, spatial caches distance calculation (which is broken for multi-segment search) The idea is now to combine all three things into one query, but customizeable: We first thought about extending CustomScoreQuery and calculate the distance from FieldCache in the customScore method and return a score of 1 for distance=0, score=0 on the max distance and score0 for farer hits, that are in the bounding box but not in the distance circle. To filter out such negative scores, we would need to override the scorer in CustomScoreQuery which is priate. My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance for the current doc. It stores this distance also in the scorer. If the distance maxDistance it throws away the hit and calls nextDoc() again. The score() method will reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So the distance is only calculated one time in nextDoc()/advance(). To be able to plug in custom scoring, the following methods in the query can be overridden: - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance; allows score customization - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter - support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance. - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns for a given doc id the lat/lng. This method is called per IndexReader one time in scorer creation and will retrieve the coordinates. By that we support FieldCache or whatever. This query is almost finished in my head, it just needs coding :-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: (was: DistanceQuery.java) Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Uwe Schindler Fix For: 3.1 Attachments: DistanceQuery.java In a chat with Chris Male and my own ideas when implementing for PANGAEA, I thought about the broken distance query in contrib. It lacks the following features: - It needs a query/filter for the enclosing bbox (which is constant score) - It needs a separate filter for filtering out hits to far away (inside bbox but outside distance limit) - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom sort. For that to work, spatial caches distance calculation (which is broken for multi-segment search) The idea is now to combine all three things into one query, but customizeable: We first thought about extending CustomScoreQuery and calculate the distance from FieldCache in the customScore method and return a score of 1 for distance=0, score=0 on the max distance and score0 for farer hits, that are in the bounding box but not in the distance circle. To filter out such negative scores, we would need to override the scorer in CustomScoreQuery which is priate. My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance for the current doc. It stores this distance also in the scorer. If the distance maxDistance it throws away the hit and calls nextDoc() again. The score() method will reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So the distance is only calculated one time in nextDoc()/advance(). To be able to plug in custom scoring, the following methods in the query can be overridden: - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance; allows score customization - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter - support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance. - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns for a given doc id the lat/lng. This method is called per IndexReader one time in scorer creation and will retrieve the coordinates. By that we support FieldCache or whatever. This query is almost finished in my head, it just needs coding :-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: DistanceQuery.java Added Weight.explain() and fixed a missing replacement. Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Uwe Schindler Fix For: 3.1 Attachments: DistanceQuery.java, DistanceQuery.java In a chat with Chris Male and my own ideas when implementing for PANGAEA, I thought about the broken distance query in contrib. It lacks the following features: - It needs a query/filter for the enclosing bbox (which is constant score) - It needs a separate filter for filtering out hits to far away (inside bbox but outside distance limit) - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom sort. For that to work, spatial caches distance calculation (which is broken for multi-segment search) The idea is now to combine all three things into one query, but customizeable: We first thought about extending CustomScoreQuery and calculate the distance from FieldCache in the customScore method and return a score of 1 for distance=0, score=0 on the max distance and score0 for farer hits, that are in the bounding box but not in the distance circle. To filter out such negative scores, we would need to override the scorer in CustomScoreQuery which is priate. My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance for the current doc. It stores this distance also in the scorer. If the distance maxDistance it throws away the hit and calls nextDoc() again. The score() method will reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So the distance is only calculated one time in nextDoc()/advance(). To be able to plug in custom scoring, the following methods in the query can be overridden: - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance; allows score customization - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter - support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance. - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns for a given doc id the lat/lng. This method is called per IndexReader one time in scorer creation and will retrieve the coordinates. By that we support FieldCache or whatever. This query is almost finished in my head, it just needs coding :-) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: issues.apache.org compromised: please update your passwords
Hi Grant, It is that user, who is assigned to the very early JIRA issues, e.g.: https://issues.apache.org/jira/browse/LUCENE-1 I changed the password of this user in response to that email (for security), but I think we should simply let infra remove it. The problem is, almost anybody can instruct JIRA to reset the password and let JIRA send it again to the email which is the public java-dev list. And then it is public again. If the user is still needed (for whatever reason) maybe the user can be disabled, or maybe they can be removed from the list of users who have update access to the JIRA. But so long as the user is not an administrator, then it's no different really from any other account that can be created by Joe Public. Yes, that account has no special access. If someone wants to unassign the 319 issues this user is the 'assignee' of, then the account can be deleted: https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/order= ASCsorter/field=priorityassignee=java- dev%40lucene.apache.orgreset=trueassigneeSelect=specificusermode=hid e I disabled the account by assigning a dummy eMail and gave it a random password. I was not able to unassign the issues, as most issues were Closed, where no modifications can be done anymore. Reopening and changing assignment and reverting to closed is too risky, as after reopening you don’t know anymore which issues you need to revert to closed after unassignment... Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Proposal about Version API relaxation
+1, Thanks for this detailed explanation! In my apps I have no problem to define a static default myself. And passing this to every ctor is easy, so where is the problem? Look at solr, since we introduced the version param to solrconfig, you have exactly that behavior, but its limited to this solr installation using this solr config. And you can still override. Lucene is a library, no application, so it's not in lucene's responsibility to handle such things. Configuration and configuration objects passing around is an application responsibility. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, April 14, 2010 6:58 PM To: java-dev@lucene.apache.org Subject: Re: Proposal about Version API relaxation On 04/14/2010 12:29 PM, Marvin Humphrey wrote: On Wed, Apr 14, 2010 at 08:30:14AM -0400, Grant Ingersoll wrote: The thing I keep going back to is that somehow Lucene has managed for years (and I mean lots of years) w/o stuff like Version and all this massive back compatibility checking. Non-constant global variables are an anti-pattern. I think clinging to such rules in the face of all situations is an anti-pattern :) I take it as a rule of thumb. In regards to this discussion: I agree that the Version stuff is a bit of a mess. I also agree that many users will want to just use one version across their app that is easy to change. I disagree that we should allow that behavior by just using a constructor without the Version param - or that you would be forced to set the static Version setting by trying to run your app and seeing an exception happen. That is all a bit ugly. Too many users will not understand Version or care to if they see they can skip passing it. IMO, you should have to specify that you are looking for this behavior. In which case, why not just specify it using the version param itself :) E.g. if a user wants to get this kind of static behavior, they can just choose to do it on their own, and pass their *own* static Version constant to all the constructors. I don't think we need to go through this hassle and introduce a less than ideal solution just so that users can pass one less param - especially when I think you should explicitly choose this behavior rather than get it by ignoring the Version param. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Proposal about Version API relaxation
And 2.9's backwards compatibility layer in TokenStream was significantly slower. I protest! No, it was not slower, only at the beginning because of missing reflection caching! But this also affected the *new* API. With 2.9.x and old TokenStreams there is no speed difference, really. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Proposal about Version API relaxation
Hi Shai, one of the problem I have is: That is a static default! We want to get rid of them (and did it mostly, only some relicts remain), so there are no plans to reimplement such a thing again. The badest one is BooleanQuery.maxClauseCount. The same applies to all types of sysprops. As Lucene and solr is mostly running in servlet containers, this type of thing makes web applications no longer isolated. This is also a general contract for libraries: never ever rely on sysprops or statics. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Shai Erera [mailto:ser...@gmail.com] Sent: Tuesday, April 13, 2010 5:27 PM To: java-dev@lucene.apache.org Subject: Proposal about Version API relaxation Hi I'd like to propose a relaxation on the Version API. Uwe, please read the entire email before you reply :). I was thinking, following a question on the user list, that the Version-based API may not be very intuitive to users, especially those who don't care about versioning, as well as very inconvenient. So there are two issues here: 1) How should one use Version smartly so that he keeps backwards compatibility. I think we all know the answer, but a Wiki page with some best practices tips would really help users use it. 2) How can one write sane code, which doesn't pass versions all over the place if: (1) he doesn't care about versions, or (2) he cares, and sets the Version to the same value in his app, in all places. Also, I think that today we offer a flexibility to users, to set different Versions on different objects in the life span of their application - which is a good flexibility but can also lead people to shoot themselves in the legs if they're not careful -- e.g. upgrading Version across their app, but failing to do so for one or two places ... So the change I'd like to propose is to mostly alleviate (2) and better protect users - I DO NOT PROPOSE TO GET RID OF Version :). I was thinking that we can add on Version a DEFAULT version, which the caller can set. So Version.setDefault and Version.getDefault will be added, as static members (more on the static-ness of it later). We then change the API which requires Version to also expose an API which doesn't require it, and that API will call Version.getDefault(). People can use it if they want to ... Few points: 1) As a default DEFAULT Version is controversial, I don't want to propose it, even though I think Lucene can define the DEFAULT to be the latest. Instead, I propose that Version.getDefault throw a DefaultVersionNotSetException if it wasn't set, while an API which relies on the default Version is called (I don't want to return null, not sure how safe it is). 2) That DEFAULT Version is static, which means it will affect all indexing code running inside the JVM. Which is fine: 2.1) Perhaps all the indexing code should use the same Version 2.2) If you know that's not the case, then pass Version to the API which requires it - you cannot use the 'default Version' API -- nothing changes for you. One case is missing -- you might not know if your code is the only indexing code which runs in the JVM ... I don't have a solution to that, but I think it'll be revealed pretty quickly, and you can change your code then ... So to summarize - the current Version API will remain and people can still use it. The DEFAULT Version API is meant for convenience for those who don't want to pass Version everywhere, for the reasons I outlined above. This will also clean our test code significantly, as the tests will set the DEFAULT version to TEST_VERSION_CURRENT at start ... The changes to the Version class will be very simple. If people think that's acceptable, I can open an issue and work on it. Shai
RE: [jira] Account password
LOL! This user is assigned to very old bugzilla issues :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: j...@apache.org [mailto:j...@apache.org] Sent: Tuesday, April 13, 2010 10:54 PM To: java-dev@lucene.apache.org Subject: [jira] Account password You (or someone else) has reset your password. - Your password has been changed to: MCwqNr You can change your password here: https://issues.apache.org/jira/secure/ViewProfile.jspa Here are the details of your account: - Username: java-dev@lucene.apache.org Email: java-dev@lucene.apache.org Full Name: Lucene Developers Password: MCwqNr (You can always retrieve these via the Forgot Password link on the signup page) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Account password
I changed the password, so its no longer public. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Tuesday, April 13, 2010 11:59 PM To: java-dev@lucene.apache.org Subject: RE: [jira] Account password LOL! This user is assigned to very old bugzilla issues :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: j...@apache.org [mailto:j...@apache.org] Sent: Tuesday, April 13, 2010 10:54 PM To: java-dev@lucene.apache.org Subject: [jira] Account password You (or someone else) has reset your password. - Your password has been changed to: MCwqNr You can change your password here: https://issues.apache.org/jira/secure/ViewProfile.jspa Here are the details of your account: - Username: java-dev@lucene.apache.org Email: java-dev@lucene.apache.org Full Name: Lucene Developers Password: MCwqNr (You can always retrieve these via the Forgot Password link on the signup page) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: issues.apache.org compromised: please update your passwords
Hi Grant, It is that user, who is assigned to the very early JIRA issues, e.g.: https://issues.apache.org/jira/browse/LUCENE-1 I changed the password of this user in response to that email (for security), but I think we should simply let infra remove it. The problem is, almost anybody can instruct JIRA to reset the password and let JIRA send it again to the email which is the public java-dev list. And then it is public again. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Wednesday, April 14, 2010 1:50 AM To: java-dev@lucene.apache.org Subject: Re: issues.apache.org compromised: please update your passwords FYI, this is for real. Some have asked me if it is made up. I don't know who owns that user, so we should ask on infra, I suspect. Also, this applies to all user accounts too on JIRA. On Apr 13, 2010, at 12:25 PM, r...@apache.org wrote: Dear Lucene Developers, You are receiving this email because you have a login, 'java- d...@lucene.apache.org', on the Apache JIRA installation, https://issues.apache.org/jira/ On April 6 the issues.apache.org server was hacked. The attackers were able to install a trojan JIRA login screen and later get full root access: https://blogs.apache.org/infra/entry/apache_org_04_09_2010 We are assuming that the attackers have a copy of the JIRA database, which includes a hash (SHA-512 unsalted) of the password you set when signing up as 'java-dev@lucene.apache.org' to JIRA. If the password you set was not of great quality (eg. based on a dictionary word), it should be assumed that the attackers can guess your password from the password hash via brute force. The upshot is that someone malicious may know both your email address and a password of yours. This is a problem because many people reuse passwords across online services. If you reuse passwords across systems, we urge you to change your passwords on ALL SYSTEMS that might be using the compromised JIRA password. Prime examples might be gmail or hotmail accounts, online banking sites, or sites known to be related to your email's domain, lucene.apache.org. Naturally we would also like you to reset your JIRA password. That can be done at: https://issues.apache.org/jira/secure/ForgotPassword!default.jspa?usern ame=java-...@lucene.apache.org We (the Apache JIRA administrators) sincerely apologize for this security breach. If you have any questions, please let us know by email. We are also available on the #asfinfra IRC channel on irc.freenode.net. Regards, The Apache Infrastructure Team - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java
Robert, as the comment says, it’s a hack. How about simply adding a public getter method for the matchVersion to the base class StopwordAwareAna? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: rm...@apache.org [mailto:rm...@apache.org] Sent: Saturday, April 10, 2010 7:52 PM To: java-comm...@lucene.apache.org Subject: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc hVersion.java Author: rmuir Date: Sat Apr 10 17:51:30 2010 New Revision: 932773 URL: http://svn.apache.org/viewvc?rev=932773view=rev Log: fix failing test, StdAnalyzer now stores this in its superclass Modified: lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java Modified: lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/ solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277 3view=diff === === --- lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java (original) +++ lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java Sat Apr 10 17:51:30 2010 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte tok = (StandardTokenizer) tsi.getTokenizer(); assertFalse(tok.isReplaceInvalidAcronym()); -// this is a hack to get the private matchVersion field in StandardAnalyzer, may break in later lucene versions - we have no getter :( -final Field matchVersionField = StandardAnalyzer.class.getDeclaredField(matchVersion); +// this is a hack to get the private matchVersion field in StandardAnalyzer's superclass, may break in later lucene versions - we have no getter :( +final Field matchVersionField = StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion) ; matchVersionField.setAccessible(true); type = schema.getFieldType(textStandardAnalyzerDefault); - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java
This is why i added the comment. But I forgot about it when I committed the lucene refactoring J So lets fix it with a simple getter! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, April 11, 2010 11:47 AM To: java-dev@lucene.apache.org Subject: Re: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatchVersion.java I agree we should do something better, I do not like the way the test looks now (no offense) as it is prone to break... On Sun, Apr 11, 2010 at 5:39 AM, Uwe Schindler u...@thetaphi.de wrote: Robert, as the comment says, it’s a hack. How about simply adding a public getter method for the matchVersion to the base class StopwordAwareAna? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: rm...@apache.org [mailto:rm...@apache.org] Sent: Saturday, April 10, 2010 7:52 PM To: java-comm...@lucene.apache.org Subject: svn commit: r932773 - /lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatc hVersion.java Author: rmuir Date: Sat Apr 10 17:51:30 2010 New Revision: 932773 URL: http://svn.apache.org/viewvc?rev=932773 http://svn.apache.org/viewvc?rev=932773view=rev view=rev Log: fix failing test, StdAnalyzer now stores this in its superclass Modified: lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java Modified: lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/ solr/analysis/TestLuceneMatchVersion.java?rev=932773r1=932772r2=93277 3view=diff === === --- lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java (original) +++ lucene/dev/trunk/solr/src/test/org/apache/solr/analysis/TestLuceneMatch Version.java Sat Apr 10 17:51:30 2010 @@ -68,8 +68,8 @@ public class TestLuceneMatchVersion exte tok = (StandardTokenizer) tsi.getTokenizer(); assertFalse(tok.isReplaceInvalidAcronym()); -// this is a hack to get the private matchVersion field in StandardAnalyzer, may break in later lucene versions - we have no getter :( -final Field matchVersionField = StandardAnalyzer.class.getDeclaredField(matchVersion); +// this is a hack to get the private matchVersion field in StandardAnalyzer's superclass, may break in later lucene versions - we have no getter :( +final Field matchVersionField = StandardAnalyzer.class.getSuperclass().getDeclaredField(matchVersion) ; matchVersionField.setAccessible(true); type = schema.getFieldType(textStandardAnalyzerDefault); - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com
[jira] Resolved: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion
[ https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2389. --- Resolution: Fixed Committed revision: 932864 Enforce TokenStream impl / Analyzer finalness by an assertion - Key: LUCENE-2389 URL: https://issues.apache.org/jira/browse/LUCENE-2389 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2389.patch, LUCENE-2389.patch As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based on the decorator pattern. At least all TokenStream and Analyzer implementations in Lucene and Solr should be final. The attached patch adds an assertion to the ctors of both classes that does the corresponding checks: - Analyzers must be final or private classes or anonymous inner classes - TokenStreams must be final or private classes or anonymous inner classes or have a final incrementToken() I will commit this after robert have fixed solr streams. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers
[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2154: -- Attachment: LUCENE-2154-Jakarta-BCEL.patch Slightly improved patch to correctly work with CharTermAttribute (as it defines methods also defined by ProxyAttributeImpl as final, so override failure). Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers --- Key: LUCENE-2154 URL: https://issues.apache.org/jira/browse/LUCENE-2154 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Flex Branch Reporter: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-Jakarta-BCEL.patch, LUCENE-2154-Jakarta-BCEL.patch, LUCENE-2154-javassist.patch, LUCENE-2154-javassist.patch, LUCENE-2154.patch, LUCENE-2154.patch The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs. But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855739#action_12855739 ] Uwe Schindler commented on LUCENE-2386: --- I dont understand the whole issue, too. For me it is perfectly fine, if I open an IndexWriter with create=true, that the index is created empty first. This has the big advantage, that IndexReaders can open it and will not fail with not found. OK this can be done by a commit directly after creating, but for such code like create indexwriter with create=true if not exist else append, this is more work to do. The question is also, what happens if you call IndexWriter.getReader() without the initial commit? Does this work with your patch? For me this patch is to heavy for the small improvement, and its a behaviour change and no real bug. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855792#action_12855792 ] Uwe Schindler commented on LUCENE-2386: --- Thanks Earwin, thats exactly my opinion, too. For me the whole behaviour is defined and correct. The create param in the ctor is just an initialization of the directory to be a defined index (empty at the beginning). Maybe we should remove the create param from IndexWriter ctor/config at all, and just define a static utility method in IW, that initializes an empty directory. The standard ctors in IW then should thow IndexNotFound if the directory is not yet initialized. This way, we dont need those strange create params. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2372: -- Attachment: LUCENE-2372.patch Updated patch, now also KeywordAnalyzer and PerFieldAnalyzerWrapper made final and the backwards layer removed. I will commit this later this day and proceed with contrib. Robert, we should talk who does which one! Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2372: -- Attachment: LUCENE-2372.patch Updated patch after last commit. Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855590#action_12855590 ] Uwe Schindler commented on LUCENE-2372: --- Committed core part in revision: 932749 Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion
Enforce TokenStream impl / Analyzer finalness by an assertion - Key: LUCENE-2389 URL: https://issues.apache.org/jira/browse/LUCENE-2389 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based on the decorator pattern. At least all TokenStream and Analyzer implementations in Lucene and Solr should be final. The attached patch adds an assertion to the ctors of both classes that does the corresponding checks: - Analyzers must be final or private classes or anonymous inner classes - TokenStreams must be final or private classes or anonymous inner classes or have a final incrementToken() I will commit this after robert have fixed solr streams. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion
[ https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2389: -- Fix Version/s: 3.1 Enforce TokenStream impl / Analyzer finalness by an assertion - Key: LUCENE-2389 URL: https://issues.apache.org/jira/browse/LUCENE-2389 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based on the decorator pattern. At least all TokenStream and Analyzer implementations in Lucene and Solr should be final. The attached patch adds an assertion to the ctors of both classes that does the corresponding checks: - Analyzers must be final or private classes or anonymous inner classes - TokenStreams must be final or private classes or anonymous inner classes or have a final incrementToken() I will commit this after robert have fixed solr streams. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion
[ https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2389: -- Attachment: LUCENE-2389.patch Patch. Enforce TokenStream impl / Analyzer finalness by an assertion - Key: LUCENE-2389 URL: https://issues.apache.org/jira/browse/LUCENE-2389 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2389.patch As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based on the decorator pattern. At least all TokenStream and Analyzer implementations in Lucene and Solr should be final. The attached patch adds an assertion to the ctors of both classes that does the corresponding checks: - Analyzers must be final or private classes or anonymous inner classes - TokenStreams must be final or private classes or anonymous inner classes or have a final incrementToken() I will commit this after robert have fixed solr streams. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2389) Enforce TokenStream impl / Analyzer finalness by an assertion
[ https://issues.apache.org/jira/browse/LUCENE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2389: -- Attachment: LUCENE-2389.patch Improved patch that also makes Analyzers with final (reusable)TokenStream() possible. Enforce TokenStream impl / Analyzer finalness by an assertion - Key: LUCENE-2389 URL: https://issues.apache.org/jira/browse/LUCENE-2389 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2389.patch, LUCENE-2389.patch As noted in LUCENE-1753 and other issues, TokenStream and Analyzers are based on the decorator pattern. At least all TokenStream and Analyzer implementations in Lucene and Solr should be final. The attached patch adds an assertion to the ctors of both classes that does the corresponding checks: - Analyzers must be final or private classes or anonymous inner classes - TokenStreams must be final or private classes or anonymous inner classes or have a final incrementToken() I will commit this after robert have fixed solr streams. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2372: -- Attachment: LUCENE-2372.patch Here a first patch for the core tokenstreams. Tests not yet changed. The following things were additionally fixed: - StandardAnalyzer was made final (backwards break, we forgot to made it final in the 3.0 TS finalization issue). This enabled me to subclass StopwordAnalyzerBase and remove heavy code duplication. The original code also contained a bug in the tokenStream method (no setReplaceInvalidAcronym) which was correctin reusableTokenStream. Now it is correct. I will post further patches for core. Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2302: -- Attachment: LUCENE-2302-toString.patch Patch that fixes the toString() problems in Token and adds missing CHANGES.txt, fixes backwards tests and updates javadocs to document the backwards break. Deprecating Token should be done in another issue. I will commit this soon, to be able to go forward with tokenstream conversion! Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable) Key: LUCENE-2302 URL: https://issues.apache.org/jira/browse/LUCENE-2302 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array. Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence I propose to create a new interface CharTermAttribute with a clean new API that concentrates on CharSequence and Appendable. The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute. To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2302. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Committed revision: 932369 Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable) Key: LUCENE-2302 URL: https://issues.apache.org/jira/browse/LUCENE-2302 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2302-toString.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array. Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence I propose to create a new interface CharTermAttribute with a clean new API that concentrates on CharSequence and Appendable. The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute. To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855358#action_12855358 ] Uwe Schindler commented on LUCENE-2364: --- +1 Term is still used at a lot of places in internal code, but that can be changed easily. One of those places is MTQ :-) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co. - Key: LUCENE-2364 URL: https://issues.apache.org/jira/browse/LUCENE-2364 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Fix For: 3.1 It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery (as both queries convert the strings to BytesRef internally). For NumericRange support in Solr it will be needed to support numerics as ByteRef in single-term queries. When this will be added, don't forget to change TestNumericRangeQueryXX to use the BytesRef ctor of TRQ. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2372: -- Attachment: LUCENE-2372.patch Patch that removes deprecated usage of TermAttribute from Lucene Core completely, all tests also fixed. Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2372: -- Attachment: LUCENE-2372.patch Small updates. Just one question: The only non-final Analyzer left is KeywordAnalyzer. If I make it final and also use ReusableTokenizerBase, we can remove the overridesTokenStream check completely? The question is, whoever wants to override this class. StandardAnalyzer was made final in this patch, why not also this one? Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855493#action_12855493 ] Uwe Schindler commented on LUCENE-2372: --- Did it already for StandardAna (see patch). Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855498#action_12855498 ] Uwe Schindler commented on LUCENE-2372: --- One more: PerFieldAnalyzerWrapper :( - Sorry Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2372.patch, LUCENE-2372.patch, LUCENE-2372.patch After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854882#action_12854882 ] Uwe Schindler commented on LUCENE-2074: --- As requested on the mailing list, I will look into resetting the zzBuffer on Tokenizer.reset(Reader). Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854886#action_12854886 ] Uwe Schindler commented on LUCENE-2074: --- I plan to commit this soon! So any patch will get outdated, thats why i want to fix this here. And as this patch removes direct access from the Tokenizer to the lexer (as it is only accessible through an interface now), we have to change the jflex file to do it correctly. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854890#action_12854890 ] Uwe Schindler commented on LUCENE-2074: --- You dont need the jflex binaries in general, only if you reconstruct the source files (using ant jflex). And its easy to generate, check out and start mvn install. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Here a new patch, with the zzBuffer reset to default implemented in a separate reset(Reader) method. As yyReset is generated as final, I had to change the name. Before apply, run: {noformat} svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.* svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.* {noformat} I will commit this in a day or two! Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Updated also the error message about missing jflex when calling ant jflex to regenerate the lexers. The message now contains instructions for downloading and building JFlex. Also add CHANGES.txt. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: (was: LUCENE-2074.patch) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854900#action_12854900 ] Uwe Schindler commented on LUCENE-2074: --- Created sub-issue: LUCENE-2384 Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854903#action_12854903 ] Uwe Schindler commented on LUCENE-2384: --- For JFlex this does not help as the Jflex-generated code always needs a Reader. This is special here, the lexer will not need to load the whole document into the reader, it only needs sometimes a large look forward/backwards buffer. Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854908#action_12854908 ] Uwe Schindler commented on LUCENE-2384: --- {quote} patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list. I tested it and it seems to work for me. Just including it here is case somebody want to apply the patch directly to 3.0.1 (although it's better to wait for 3.1) {quote} Your fix adds an addtional complexity. Just reset the buffer back to the default ZZ_BUFFERSIZE if grown on reset. Your patch always reallocates a new buffer. Use this: {code} public final void reset(Reader r) { // reset to default buffer size, if buffer has grown if (zzBuffer.length ZZ_BUFFERSIZE) { zzBuffer = new char[ZZ_BUFFERSIZE]; } yyreset(r); } {code} Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855136#action_12855136 ] Uwe Schindler commented on LUCENE-2385: --- The patch does not look like you svn moved the files. To preserve history, you should do a svn move of the file in your local repository and then modify it to reflect the package changes (if any). Did you do this? Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855150#action_12855150 ] Uwe Schindler commented on LUCENE-2385: --- In general we place a list of all svn move/copy command together with the patch, executeable from the root dir. If you paste those commands into your terminal and then apply the patch, it works. One example is the jflex issue (ok, the commands are shortened). Another possibility is to have a second checkout, where you arrange the files correctly (svn moved/copied) and one for creating the patches. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855164#action_12855164 ] Uwe Schindler commented on LUCENE-2385: --- Yeah thats fine! Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: IndexWriter memory leak?
There is one possibility, that could be fixed: As Tokenizers are reused, the analyzer holds a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this: /** By default, closes the input Reader. */ @Override public void close() throws IOException { input.close(); input = null; // -- new! } Any comments from other committers? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ruben Laguna [mailto:ruben.lag...@gmail.com] Sent: Thursday, April 08, 2010 2:50 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? I will double check in the afternoon the heapdump.hprof. But I think that *some* readers are indeed held by docWriter.threadStates[0].consumer.fieldHash[1].fields[], as shown in [1] (this heapdump contains only live objects). The heapdump was taken after IndexWriter.commit() /IndexWriter.optimize() and all the Documents were already indexed and GCed (I will double check). So that would mean that the Reader is retained in memory by the following chaing of references, DocumentsWriter - DocumentsWriterThreadState - DocFieldProcessorPerThread - DocFieldProcessorPerField - Fieldable - Field (fieldsData) I'll double check with Eclipse MAT as I said that this chain is actually made of hard references only (no SoftReferences,WeakReferences, etc). I will also double check also that there is no live Document that is referencing the Reader via the Field. [1] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg On Thu, Apr 8, 2010 at 2:16 PM, Uwe Schindler u...@thetaphi.de wrote: Readers are not held. If you indexed the document and gced the document instance they readers are gone. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ruben Laguna [mailto:ruben.lag...@gmail.com] Sent: Thursday, April 08, 2010 1:28 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? Now that the zzBuffer issue is solved... what about the references to the Readers held by docWriter. Tika´s ParsingReaders are quite heavyweight so retaining those in memory unnecesarily is also a hidden memory leak. Should I open a bug report on that one? /Rubén On Thu, Apr 8, 2010 at 12:11 PM, Shai Erera ser...@gmail.com wrote: Guess we were replying at the same time :). On Thu, Apr 8, 2010 at 1:04 PM, Uwe Schindler u...@thetaphi.de wrote: I already answered, that I will take care of this! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Thursday, April 08, 2010 12:00 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? Yes, that's the trimBuffer version I was thinking about, only this guy created a reset(Reader, int) and does both ops (resetting + trim) in one method call. More convenient. Can you please open an issue to track that? People will have a chance to comment on whether we (Lucene) should handle that, or it should be a JFlex fix. Based on the number of replies this guy received (0 !), I doubt JFlex would consider it a problem. But we can do some small service to our users base by protecting against such problems. And while you're opening the issue, if you want to take a stab at fixing it and post a patch, it'd be great :). Shai On Thu, Apr 8, 2010 at 12:51 PM, Ruben Laguna ruben.lag...@gmail.comwrote: I was investigating this a little further and in the JFlex mailing list I found [1] I don't know much about flex / JFlex but it seems that this guy resets the zzBuffer to 16384 or less when setting the input for the lexer Quoted from shef she...@ya... I set %buffer 0 in the options section, and then added this method to the lexer: /** * Set the input for the lexer. The size parameter really speeds things up, * because by default, the lexer allocates an internal buffer of 16k. For * most strings, this is unnecessarily large. If the size param is 0 or greater * than 16k, then the buffer is set to 16k. If the size param is smaller, then * the buf will be set to the exact size
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch New patch with replacement of deprecated TermAttribute - CharTermAttribute. It also fixes the reset()/reset(Reader) methods to be conform to all other Tokenizers and the documentations. The current one was resetting multiple times. This has no effect on backwards. Also improve the JFlex classpath detection to work with svn checkouts or future release zips. I will commit this soon when all tests ran. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854396#action_12854396 ] Uwe Schindler commented on LUCENE-2376: --- You mean insane amount of fields with norms...? java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Attachments: InfoStreamOutput.txt I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854594#action_12854594 ] Uwe Schindler commented on LUCENE-2380: --- The structure should look like String and StringIndex, but I am not sure, if we need real BytesRefs. In my opinion, it should be an array of byte[], where each byte[] is allocated with the termsize from the enums BytesRef and copied over - this is. This is no problem, as the terms need to be replicated either way, as the BytesRef from the enum is reused. The only problem is that byte[] is mising the cool bytesref methods like utf8ToString() that may be needed by consumers. getStrings and getStringIndex should be deprecated. We cannot emulate them using BytesRef.utf8ToString, as the String[] arrays are raw and allow no wrapping. If FieldCache would use accessor methods and not raw arrays, we would not have that problem... Add FieldCache.getTermBytes, to load term data as byte[] Key: LUCENE-2380 URL: https://issues.apache.org/jira/browse/LUCENE-2380 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 3.1 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods to load terms as native byte[], since in general they may not be representable as String. This should be quite a bit more RAM efficient too, for US ascii content since each character would then use 1 byte not 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854639#action_12854639 ] Uwe Schindler commented on LUCENE-2380: --- This goes again in the direction of not having arrays in FieldCache anymore, but instead have accessor methods taking a docid and giving back the data (possibly as a reference). So getBytes(docid) returns a reused BytesRef with offset and length of the requested term. For native types we should also go away from arrays and only provide accessor methods. Java is so fast and possiby inlines the method call. So for native types we could also use a FloatBuffer or ByteBuffer or whatever from java.nio. Add FieldCache.getTermBytes, to load term data as byte[] Key: LUCENE-2380 URL: https://issues.apache.org/jira/browse/LUCENE-2380 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 3.1 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods to load terms as native byte[], since in general they may not be representable as String. This should be quite a bit more RAM efficient too, for US ascii content since each character would then use 1 byte not 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2383) Some small fixes after the flex merge...
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681 ] Uwe Schindler commented on LUCENE-2383: --- FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware iterator to: {code} do { doc++; if (doc = maxDoc) return NO_MORE_DOCS; } while (skipDocs.get(doc) || !matchDoc(doc)); return doc; {code} and the same in advance(), little bit changed: {code} for (int doc= target; doc maxDoc; doc++) { if (!skipDocs.get(doc) matchDoc(doc)) return doc; } return NO_MORE_DOCS; {code} The try catch is then unneeded. This seems clearer for me. The non-skipdocs iterator is performanter with the try...catch, as it preserves one bounds check. But we need to do the bounds check here in all cases, why not do up-front? Some small fixes after the flex merge... Key: LUCENE-2383 URL: https://issues.apache.org/jira/browse/LUCENE-2383 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2383.patch Changes: * Re-introduced specialization optimization to FieldCacheRangeQuery; also fixed bug (was failing to check deletions in advance) * Changes 2 checkIndex methods from protected - public * Add some missing null checks when calling MultiFields.getFields or IndexReader.fields() * Tweak'd CHANGES a bit * Removed some small dead code -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681 ] Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:23 PM: --- FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware iterator to: {code} do { doc++; if (doc = maxDoc) return NO_MORE_DOCS; } while (skipDocs.get(doc) || !matchDoc(doc)); return doc; {code} and the same in advance(), little bit changed: {code} for (doc = target; doc maxDoc; doc++) { if (!skipDocs.get(doc) matchDoc(doc)) return doc; } return NO_MORE_DOCS; {code} The try catch is then unneeded. This seems clearer for me. The non-skipdocs iterator is performanter with the try...catch, as it preserves one bounds check. But we need to do the bounds check here in all cases, why not do up-front? was (Author: thetaphi): FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware iterator to: {code} do { doc++; if (doc = maxDoc) return NO_MORE_DOCS; } while (skipDocs.get(doc) || !matchDoc(doc)); return doc; {code} and the same in advance(), little bit changed: {code} for (int doc= target; doc maxDoc; doc++) { if (!skipDocs.get(doc) matchDoc(doc)) return doc; } return NO_MORE_DOCS; {code} The try catch is then unneeded. This seems clearer for me. The non-skipdocs iterator is performanter with the try...catch, as it preserves one bounds check. But we need to do the bounds check here in all cases, why not do up-front? Some small fixes after the flex merge... Key: LUCENE-2383 URL: https://issues.apache.org/jira/browse/LUCENE-2383 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2383.patch Changes: * Re-introduced specialization optimization to FieldCacheRangeQuery; also fixed bug (was failing to check deletions in advance) * Changes 2 checkIndex methods from protected - public * Add some missing null checks when calling MultiFields.getFields or IndexReader.fields() * Tweak'd CHANGES a bit * Removed some small dead code -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2383) Some small fixes after the flex merge...
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854681#action_12854681 ] Uwe Schindler edited comment on LUCENE-2383 at 4/7/10 8:24 PM: --- FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware iterator to: {code} do { doc++; if (doc = maxDoc) return doc = NO_MORE_DOCS; } while (skipDocs.get(doc) || !matchDoc(doc)); return doc; {code} and the same in advance(), little bit changed: {code} for (doc = target; doc maxDoc; doc++) { if (!skipDocs.get(doc) matchDoc(doc)) return doc; } return doc = NO_MORE_DOCS; {code} The try catch is then unneeded. This seems clearer for me. The non-skipdocs iterator is performanter with the try...catch, as it preserves one bounds check. But we need to do the bounds check here in all cases, why not do up-front? was (Author: thetaphi): FCRF looks ok, I would only change the nextDoc() loop in the deletions-aware iterator to: {code} do { doc++; if (doc = maxDoc) return NO_MORE_DOCS; } while (skipDocs.get(doc) || !matchDoc(doc)); return doc; {code} and the same in advance(), little bit changed: {code} for (doc = target; doc maxDoc; doc++) { if (!skipDocs.get(doc) matchDoc(doc)) return doc; } return NO_MORE_DOCS; {code} The try catch is then unneeded. This seems clearer for me. The non-skipdocs iterator is performanter with the try...catch, as it preserves one bounds check. But we need to do the bounds check here in all cases, why not do up-front? Some small fixes after the flex merge... Key: LUCENE-2383 URL: https://issues.apache.org/jira/browse/LUCENE-2383 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2383.patch Changes: * Re-introduced specialization optimization to FieldCacheRangeQuery; also fixed bug (was failing to check deletions in advance) * Changes 2 checkIndex methods from protected - public * Add some missing null checks when calling MultiFields.getFields or IndexReader.fields() * Tweak'd CHANGES a bit * Removed some small dead code -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Commit freeze in flex branch
Thanks for praise! And also thanks to Mike for scanning 20K patch lines :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, April 07, 2010 10:13 PM To: java-dev@lucene.apache.org Subject: Re: Commit freeze in flex branch Yes +1 to that -- thanks Uwe!! And thanks for the many other people who helped out on flex. It's a big and exciting improvement :) Mike On Wed, Apr 7, 2010 at 4:11 PM, Michael Busch busch...@gmail.com wrote: Uwe, thanks for doing all the svn work! Was a smooth transition! Michael On 4/6/10 12:27 PM, Uwe Schindler wrote: The freeze is over, we merged successfully. If you had a flex branch checked out: svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Tuesday, April 06, 2010 12:51 PM To: java-dev@lucene.apache.org Subject: Commit freeze in flex branch I am trying to reintegrate the flex branch into current trunk. After this has done, no more commits to flex! (after a reintegrate, the svn book says, that you should not touch the branch anymore) - Flex development can then proceed in trunk. It may happen that solr compilation/tests fail (because of recent changes in flex branch), I will fix this separately, so please do not complain, just let solr broken for a short time! It would be good if nobody would commit anything to flex anymore! After the merge, you can switch your flex checkouts. Before committing the merge, I will post a mega patch for review, that we have not missed anything during trunk-flex merges. Commits to trunk are OK, but should be spare. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de --- -- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Commit freeze in flex branch
I am trying to reintegrate the flex branch into current trunk. After this has done, no more commits to flex! (after a reintegrate, the svn book says, that you should not touch the branch anymore) - Flex development can then proceed in trunk. It may happen that solr compilation/tests fail (because of recent changes in flex branch), I will fix this separately, so please do not complain, just let solr broken for a short time! It would be good if nobody would commit anything to flex anymore! After the merge, you can switch your flex checkouts. Before committing the merge, I will post a mega patch for review, that we have not missed anything during trunk-flex merges. Commits to trunk are OK, but should be spare. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2370) Reintegrate flex branch into trunk
Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370.patch Here the patch just for review! You cannot really apply it as it does not contains changes that are simply svn copied from flex (that are all new files added by flex). The idea behind this patch is only that everybody working on flex should scroll through it and verify that actually changed files are fine; e.g. we did not miss a change to trunk in flex (such a missing merge would apply as a revert in the patch). My working copy tests fine, only solr is not compiling anymore because of recent changes in NumericUtils internal class that are non backwards compatible. I will commit this patch before and break solr, but will fix it soon! Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: (was: LUCENE-2370.patch) Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370.patch sorry, new patch. The flex branch still contains some whitespace problems in contrib, but this is ok for now. I will check them and fix as far as i see. Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370.patch Here a new patch with lots of cleanups, thanks rmuir. Also reverted whitespace-only files Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370-solrfixes.patch Here some fixes for Solr: - makes it compile after flex merge - has some really dirty hacks. Numeric field contents should no longer be seen as Strings, they are now BytesRefs. This affects AnalysisRequestHandler and also the converter methods in TrieField type. They should use BytesRefs after flex has landed. Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370.patch New patch, reverted all files with whitespace-only changes. Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2370: -- Attachment: LUCENE-2370.patch Here the final patch after cooperative reviewing in IRC. I will commit the merge now for Solr+Lucene. The following points are still broken: - DirectoryReader readded a bug (Mike McCandless knows) - TestIndexWriterReader in trunk and backwards has some test commented out, they have to do with above problem Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854134#action_12854134 ] Uwe Schindler commented on LUCENE-2370: --- Committed revision: 931278 I leave the issue open until the bugs are fixed. Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Commit freeze in flex branch
The freeze is over, we merged successfully. If you had a flex branch checked out: svn switch https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Tuesday, April 06, 2010 12:51 PM To: java-dev@lucene.apache.org Subject: Commit freeze in flex branch I am trying to reintegrate the flex branch into current trunk. After this has done, no more commits to flex! (after a reintegrate, the svn book says, that you should not touch the branch anymore) - Flex development can then proceed in trunk. It may happen that solr compilation/tests fail (because of recent changes in flex branch), I will fix this separately, so please do not complain, just let solr broken for a short time! It would be good if nobody would commit anything to flex anymore! After the merge, you can switch your flex checkouts. Before committing the merge, I will post a mega patch for review, that we have not missed anything during trunk-flex merges. Commits to trunk are OK, but should be spare. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2370) Reintegrate flex branch into trunk
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2370. --- Resolution: Fixed Mike fixed the missing merges! Thanks. Reintegrate flex branch into trunk -- Key: LUCENE-2370 URL: https://issues.apache.org/jira/browse/LUCENE-2370 Project: Lucene - Java Issue Type: Task Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2370-solrfixes.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch, LUCENE-2370.patch This issue is for reintegrating the flex branch into current trunk. I will post the patch here for review and commit, when all contributors to flex have reviewed the patch. Before committing, I will tag both trunk and flex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-2332) Mrge CharTermAttribute and deprecations to trunk
[ https://issues.apache.org/jira/browse/LUCENE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-2332. - Resolution: Invalid Flex was merged, so this is no longer needed. Mrge CharTermAttribute and deprecations to trunk Key: LUCENE-2332 URL: https://issues.apache.org/jira/browse/LUCENE-2332 Project: Lucene - Java Issue Type: New Feature Affects Versions: 3.1 Reporter: Uwe Schindler Assignee: Uwe Schindler This should be merged to trunk until flex lands, so the analyzers can be ported to new api. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute
Replace deprecated TermAttribute by new CharTermAttribute - Key: LUCENE-2372 URL: https://issues.apache.org/jira/browse/LUCENE-2372 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1 Reporter: Uwe Schindler Fix For: 3.1 After LUCENE-2302 is merged to trunk with flex, we need to carry over all tokenizers and consumers of the TokenStreams to the new CharTermAttribute. We should also think about adding a AttributeFactory that creates a subclass of CharTermAttributeImpl that returns collation keys in toBytesRef() accessor. CollationKeyFilter is then obsolete, instead you can simply convert every TokenStream to indexing only CollationKeys by changing the attribute implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854199#action_12854199 ] Uwe Schindler commented on LUCENE-2302: --- I will create a patch with option #2 and lots of documentation and changed backwards tests. Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable) Key: LUCENE-2302 URL: https://issues.apache.org/jira/browse/LUCENE-2302 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array. Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence I propose to create a new interface CharTermAttribute with a clean new API that concentrates on CharSequence and Appendable. The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute. To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl
Add introspection API to AttributeSource/AttributeImpl -- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl
Add introspection API to AttributeSource/AttributeImpl -- Key: LUCENE-2375 URL: https://issues.apache.org/jira/browse/LUCENE-2375 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Deleted: (LUCENE-2375) Add introspection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler deleted LUCENE-2375: -- Add introspection API to AttributeSource/AttributeImpl -- Key: LUCENE-2375 URL: https://issues.apache.org/jira/browse/LUCENE-2375 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2354. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Committed revision: 930821 Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch, LUCENE-2354.patch, LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853336#action_12853336 ] Uwe Schindler commented on LUCENE-2364: --- This would also make MTQ's rewrite mode internal collectors better, as they convert BytesRef terms from the enums to String Terms, passing to TermQuery and inside TermScorer convert back. Whith real binary terms (numerics are not yet real binary, they are UTF-8 conform ascii bytes), this would break. Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co. - Key: LUCENE-2364 URL: https://issues.apache.org/jira/browse/LUCENE-2364 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Fix For: Flex Branch It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery (as both queries convert the strings to BytesRef internally). For NumericRange support in Solr it will be needed to support numerics as ByteRef in single-term queries. When this will be added, don't forget to change TestNumericRangeQueryXX to use the BytesRef ctor of TRQ. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2354: -- Attachment: LUCENE-2354.patch Here updated patch with cleaned up NumericUtils (no String methods anymore). For now I just commented them out, if we want to reactivate parts of them. Before release the methods should be removed. I changed all tests (and deactivated tests in backwards) using those String methods. Also rewrote the CartesianShapeFilter in contrib/spatial to use flex API (optimized for the one-term-case without OpenBitSet allocation). Also changed spatial tests to use NumericField. Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch, LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.
Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co. - Key: LUCENE-2364 URL: https://issues.apache.org/jira/browse/LUCENE-2364 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Fix For: Flex Branch It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery (as both queries convert the strings to BytesRef internally). For NumericRange support in Solr it will be needed to support numerics as ByteRef in single-term queries. When this will be added, don't forget to change TestNumericRangeQueryXX to use the BytesRef ctor of TRQ. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2354: -- Attachment: LUCENE-2354.patch Updated patch with lots of javadocs cleanups and new getPrefixCodedXxxShift() methods. Also optimized some methods. I will commit this tomorrow! Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch, LUCENE-2354.patch, LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2363) Classes BooleanFilter and FilterClause missing in 2.2
[ https://issues.apache.org/jira/browse/LUCENE-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2363. --- Resolution: Invalid These classes are in the queries contrib, not in lucene-core. So you have to add lucene-queries.jar to your classpath (its in the contrib subfolder). Also bugs in version 2.2 will no longer be fixed. Current are version 2.9.2 and 3.01. Classes BooleanFilter and FilterClause missing in 2.2 - Key: LUCENE-2363 URL: https://issues.apache.org/jira/browse/LUCENE-2363 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Environment: Windows Reporter: Amit Wamburkar I downloaded lucene-core-2.2.0.jar and started using it. But when i tried to created objects of the classes: BooleanFilter and FilterClause , could not find them in the jar. In fact i want to use them so that i can get rid of BooleanQuery which is causing exception BooleanQuery$TooManyClauses. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-2363) Classes BooleanFilter and FilterClause missing in 2.2
[ https://issues.apache.org/jira/browse/LUCENE-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-2363. - Classes BooleanFilter and FilterClause missing in 2.2 - Key: LUCENE-2363 URL: https://issues.apache.org/jira/browse/LUCENE-2363 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Environment: Windows Reporter: Amit Wamburkar I downloaded lucene-core-2.2.0.jar and started using it. But when i tried to created objects of the classes: BooleanFilter and FilterClause , could not find them in the jar. In fact i want to use them so that i can get rid of BooleanQuery which is causing exception BooleanQuery$TooManyClauses. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Landing the flex branch
Hi, we should think about how to merge the changes to trunk. I can try this out during the weekend, to merge back the changes to trunk, but this can be very hard. So we have the following options: Try a merge back: This would let flex appear as a single commit to trunk, so the history of trunk would be preserved. If somebody wants to see the changes in the flex branch, he could ask for them (e.g. in TortoiseSVN there is a checkbox Include merged revisions). If this is not easy or fails, we can do the following: - Create a big diff between current trunk and flex (after flex is merged up to trunk). Attach this patch to an issue and let everybody review. After that we can apply the patch to trunk. This would result in the same behavior for trunk, no changes lost, but all changes in flex cannot be reviewed. - Delete current trunk and svn move the branch to trunk (after flex is merged up to trunk): This would make the history of flex the current history. The drawback: You losse latest trunk changes since the split of flex. Instead you will only see the merge messages. Therefore we should see this only as a last chance. Comments? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, March 30, 2010 5:35 PM To: java-dev@lucene.apache.org Subject: Landing the flex branch I think the time has finally come! Pending one issue (LUCENE-2354 -- Uwe), I think flex is ready to land I think the other issues with Fix Version = Flex Branch can be moved to 3.1 after we land. We still use the pre-flex APIs in a number of places... I think this is actually good (so we continue to test the back-compat emulation layer). With time we can cut them over. After flex, there are a number of fun things to explore. EG, we need to make attributes work well with codecs indexing/searching (with Multi/DirReader, serailize/unserialize, etc.); we need a BytesRef + packed ints FieldCache StringIndex variant which should use much less RAM in certain cases; we should build a fast core PForDelta codec; more queries can cutover to operating directly on byte[] terms, etc. But these can all come with time... Thoughts/issues/objections? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851849#action_12851849 ] Uwe Schindler commented on LUCENE-2310: --- I am also +1 on the indexer interface. I just repeat myself: We still need TokenStream, an AttributeSource alone is too less. But that is away from that issue: Indexable provides an iterator of fields that consist of name and TokenStream and some options (possibly like omitNorms). If you just dont want to have close() in TokenStream, let's remove it. end() is needed for offsets, the indexer need to take care. incrementToken() is the iterator approach. What else is there? Reset may be invisible to indexer (I would refactor that and would make a subclass of TokenStream that supports reset, ResetableTokenStream - like Tokenizer supports reset(Reader), which is also a subclass). The abstract TokenStream then is only consisting of incrementToken() and end() + the AttributeSource access methods. Attributes needed by indexer are only TermToBytesRefAttribute, PositionIncrementAtt, OffsetAttribute and PayloadAttribute. Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851856#action_12851856 ] Uwe Schindler commented on LUCENE-2310: --- Yeah! Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851598#action_12851598 ] Uwe Schindler commented on LUCENE-2354: --- Will work here the next days and rewrite the tests. One problem: Solr at the moment uses the deprecated string api for building a TermQuery. This should be replaced by a NRQ with upper==lower(inclusive), as this disables scoring, which is wrong for Numeric fields. Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851596#action_12851596 ] Uwe Schindler commented on LUCENE-2302: --- Will add the javadocs and think about the CharSequence problems again. It's tricky :( I have less time at the moment, will do hopefully until the weekend. The same for LUCENE-2354, which needs some test rewriting. Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable) Key: LUCENE-2302 URL: https://issues.apache.org/jira/browse/LUCENE-2302 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array. Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence I propose to create a new interface CharTermAttribute with a clean new API that concentrates on CharSequence and Appendable. The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute. To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010 ] Uwe Schindler commented on LUCENE-2354: --- bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same as trunk). Yes. And i think we should keep it for now using 7 bit. Problems start when the sort order of terms is needed (which is the case for NRQ). As default in flex is the UTF-8 term comparator, it would not sort correctly for numeric fields with full 8 bits? bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search perf would improve but only a tiny bit since NRQ visits so few terms? I dont think you will notice a difference. A standard int range contains maybe 10 to 20 sub-ranges (at maximum), so converting between string and TermRef should not count. But the new implementation is more clean. In principle we could remove the whole char[]/String based API in NumericUtils - I only have to rewrite the tests and remove the NumericUtils test in backwards (as no longer applies then, too). Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010 ] Uwe Schindler edited comment on LUCENE-2354 at 3/29/10 5:23 PM: bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same as trunk). Yes. And i think we should keep it for now using 7 bit. Problems start when the sort order of terms is needed (which is the case for NRQ). As default in flex is the UTF-8 term comparator, it would not sort correctly for numeric fields with full 8 bits? By the way, the recently added backwards test checks that an old index with NumericField behaves as before! This is why I added a new zip file to TestBackwardCompatibility. bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search perf would improve but only a tiny bit since NRQ visits so few terms? I dont think you will notice a difference. A standard int range contains maybe 10 to 20 sub-ranges (at maximum), so converting between string and TermRef should not count. But the new implementation is more clean. In principle we could remove the whole char[]/String based API in NumericUtils - I only have to rewrite the tests and remove the NumericUtils test in backwards (as no longer applies then, too). was (Author: thetaphi): bq. But the encoding is unchanged right? (Ie only using 7 bits per byte, same as trunk). Yes. And i think we should keep it for now using 7 bit. Problems start when the sort order of terms is needed (which is the case for NRQ). As default in flex is the UTF-8 term comparator, it would not sort correctly for numeric fields with full 8 bits? bq. And you cutover to BytesRef TermsEnum API too - great. Presumably search perf would improve but only a tiny bit since NRQ visits so few terms? I dont think you will notice a difference. A standard int range contains maybe 10 to 20 sub-ranges (at maximum), so converting between string and TermRef should not count. But the new implementation is more clean. In principle we could remove the whole char[]/String based API in NumericUtils - I only have to rewrite the tests and remove the NumericUtils test in backwards (as no longer applies then, too). Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2315) AttributeSource's methods for accessing attributes should be final, else its easy to corrupt the internal states
[ https://issues.apache.org/jira/browse/LUCENE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2315: - Assignee: Uwe Schindler AttributeSource's methods for accessing attributes should be final, else its easy to corrupt the internal states Key: LUCENE-2315 URL: https://issues.apache.org/jira/browse/LUCENE-2315 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 The methods that operate and modify the internal maps of AttributeSource should be final, which is a backwards break. But anybody that overrides such methods simply creates a buggy AS either case. I want to makeall impls final (in general the class should be final at all, but it is made for extension in TokenStream). So its important that the implementations are final! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2354: -- Attachment: LUCENE-2354.patch Here a first preview patch. NumericUtils still contains lots of unused String-based methods, I think we should remove them, the class is expert-only and also experimental. Backwards compatibility is broken even with those backwards layers (as the split functions were changed to use BytesRefs. Also these backwards methods are simply slow now (as the byte[] is copied to char[] and vice-versa). The new NumericTokenStream now uses a special NumericTermAttribute, so possibly Filters coming later have access to shift value and so on. This attribute also implements the TermToBytesRefAttribute for the indexer. Please note: This attribute is a hack and does not support copyTo/clone/, so you cannot put away tokens (which is not needed), but its still possible to add further attributes to numeric tokens (which is why the attribute is there). The NumericTokenStream backwards test was removed, because the new stream does no longer contain a TermAttribute, so the test always fails. TODO: A better inline-hashCode generation for the numeric-to-BytesRef transformation Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[] - Key: LUCENE-2354 URL: https://issues.apache.org/jira/browse/LUCENE-2354 Project: Lucene - Java Issue Type: Improvement Affects Versions: Flex Branch Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: Flex Branch Attachments: LUCENE-2354.patch After LUCENE-2302, we should use TermToBytesRefAttribute to index using NumericTokenStream. This also should convert the whole NumericUtils to use BytesRef when converting numerics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-2306) contrib/xml-query-parser: NumericRangeFilter support
[ https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-2306: --- I will commit my changes to the package names and a missing super.tearDown() soon. But I found one other thing: NRQ allows one or both of the bounds to be null (like TermRangeQuery). But the builder enforces both attributes to be present. Also I dont like the default type of int, I would instead enforce the type. Will post a patch soon. contrib/xml-query-parser: NumericRangeFilter support Key: LUCENE-2306 URL: https://issues.apache.org/jira/browse/LUCENE-2306 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.1 Reporter: Jingkei Ly Assignee: Mark Harwood Fix For: 3.1 Attachments: LUCENE-2306.patch, LUCENE-2306.patch Create a FilterBuilder for NumericRangeFilter so that it may be used with the XML query parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2306) contrib/xml-query-parser: NumericRangeFilter support
[ https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850495#action_12850495 ] Uwe Schindler commented on LUCENE-2306: --- Committed package and test fixes in revision: 928177 contrib/xml-query-parser: NumericRangeFilter support Key: LUCENE-2306 URL: https://issues.apache.org/jira/browse/LUCENE-2306 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.1 Reporter: Jingkei Ly Assignee: Mark Harwood Fix For: 3.1 Attachments: LUCENE-2306.patch, LUCENE-2306.patch Create a FilterBuilder for NumericRangeFilter so that it may be used with the XML query parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2306) contrib/xml-query-parser: NumericRangeQuery and -Filter support
[ https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2306: -- Summary: contrib/xml-query-parser: NumericRangeQuery and -Filter support (was: contrib/xml-query-parser: NumericRangeFilter support) contrib/xml-query-parser: NumericRangeQuery and -Filter support --- Key: LUCENE-2306 URL: https://issues.apache.org/jira/browse/LUCENE-2306 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.1 Reporter: Jingkei Ly Assignee: Mark Harwood Fix For: 3.1 Attachments: LUCENE-2306.patch, LUCENE-2306.patch Create a FilterBuilder for NumericRangeFilter so that it may be used with the XML query parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org