[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763389#action_12763389 ] Michael Busch commented on LUCENE-1960: --- Users can use CompressionTools#decompress()

[jira] Updated: (LUCENE-1961) Remove remaining deprecations in document package

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1961: -- Attachment: lucene-1961.patch All tests pass. > Remove remaining deprecations in document pac

Output from a small Snowball benchmark

2009-10-08 Thread Karl Wettin
There have been a few small comments in the Jira about the reflection in Snowball's Among class. There is very little to do about this unless one want to redesign the stemmers so they include an inner class that handle the method callbacks. That's quite a bit of work and I don't even know h

[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763399#action_12763399 ] Michael Busch commented on LUCENE-1960: --- {quote} Also the constant bitmask for compr

[jira] Reopened: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reopened LUCENE-1960: --- Reopening so that I don't forget to add back the COMPRESS bit. > Remove deprecated Field.Store.

[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763403#action_12763403 ] Uwe Schindler commented on LUCENE-1960: --- In the discussion with Mike, we said, that

[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763408#action_12763408 ] Michael Busch commented on LUCENE-1960: --- {quote} The problem with your patch: If the

[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763410#action_12763410 ] Uwe Schindler commented on LUCENE-1960: --- No problem. :-) I think it should not be a

[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763417#action_12763417 ] Uwe Schindler commented on LUCENE-1960: --- bq. I don't think the SegmentMerger should

[jira] Assigned: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1951: -- Assignee: Michael McCandless > wildcardquery rewrite improvements > --

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763429#action_12763429 ] Michael McCandless commented on LUCENE-1951: Patch looks good, thanks Robert!

[jira] Assigned: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1959: -- Assignee: Michael McCandless > Index Splitter > -- > >

[jira] Updated: (LUCENE-1962) Persian Arabic Analyzer cleanup

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1962: Attachment: LUCENE_1962.patch > Persian Arabic Analyzer cleanup >

[jira] Created: (LUCENE-1962) Persian Arabic Analyzer cleanup

2009-10-08 Thread Simon Willnauer (JIRA)
Persian Arabic Analyzer cleanup --- Key: LUCENE-1962 URL: https://issues.apache.org/jira/browse/LUCENE-1962 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Affects Versio

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763433#action_12763433 ] Michael McCandless commented on LUCENE-1959: Looks great, thanks Jason! I jus

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763440#action_12763440 ] Andrzej Bialecki commented on LUCENE-1959: --- I'm of a split mind about this spli

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763443#action_12763443 ] Uwe Schindler commented on LUCENE-1959: --- I would put it into contrib, as it is a uti

[jira] Issue Comment Edited: (LUCENE-1959) Index Splitter

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763443#action_12763443 ] Uwe Schindler edited comment on LUCENE-1959 at 10/8/09 3:52 AM:

Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
I'm wondering if there is a bug in ArabicAnalyzer in 2.9. (I don't know Arabic or Farsi, but have some texts to index in those languages.) The tokenizer/filter chain for ArabicAnalyzer is: TokenStream result = new ArabicLetterTokenizer( reader ); result = new StopFilter( result

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763457#action_12763457 ] Michael McCandless commented on LUCENE-1959: bq. I would put it into contrib

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763460#action_12763460 ] Mark Miller commented on LUCENE-1959: - bq. To copy the files it should use the directo

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763461#action_12763461 ] Mark Miller commented on LUCENE-1959: - bq. So I guess I'm -0 on this index splitting m

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763464#action_12763464 ] Michael McCandless commented on LUCENE-1959: bq. No reason not to start somewh

[jira] Updated: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1959: --- Attachment: LUCENE-1959.patch New patch attached: move to contrib/misc, renamed Test

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763468#action_12763468 ] Robert Muir commented on LUCENE-1951: - Michael, I thought about this problem too, but

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
DM, this isn't a bug. The arabic stopwords are not normalized. but for persian, i normalized the stopwords. mostly because i did not want to have to create variations with farsi yah versus arabic yah for each one. On Thu, Oct 8, 2009 at 7:24 AM, DM Smith wrote: > I'm wondering if there is a b

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763479#action_12763479 ] Mark Miller commented on LUCENE-1959: - small opt - you might switch it to reuse the bu

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763481#action_12763481 ] Michael McCandless commented on LUCENE-1959: bq. small opt - you might switch

[jira] Resolved: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1959. Resolution: Fixed Fix Version/s: (was: 3.1) 3.0 Than

[jira] Commented: (LUCENE-1962) Persian Arabic Analyzer cleanup

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763485#action_12763485 ] Robert Muir commented on LUCENE-1962: - Simon, thanks, please commit this :) > Persia

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
Robert, Thanks for the info. As I said, I am illiterate in Arabic. So I have another, perhaps nonsensical, question: Does the stop word list have every combination of upper/lower case for each Arabic word in the list? (i.e. is it fully de-normalized?) Or should it come after LowerCaseFilter?

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Ahmed Al-Obaidy
There is no upper and lower case in Arabic. --- On Thu, 10/8/09, DM Smith wrote: From: DM Smith Subject: Re: Arabic Analyzer: possible bug To: java-dev@lucene.apache.org Date: Thursday, October 8, 2009, 3:14 PM Robert,Thanks for the info.As I said, I am illiterate in Arabic. So I have another

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Basem Narmok
DM, there is no upper/lower cases in Arabic, so don't worry, but the stop word list needs some corrections and may miss some common/stop Arabic words. Best, On Thu, Oct 8, 2009 at 4:14 PM, DM Smith wrote: > Robert, > Thanks for the info. > As I said, I am illiterate in Arabic. So I have another,

RE: Arabic Analyzer: possible bug

2009-10-08 Thread Uwe Schindler
Just an addition: The lowercase filter is only for the case of embedded non-arabic words. And these will not appear in the stop words. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Basem Narmok [mailto

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
the upper/lower case is there, in case you happen to have some english text mixed in :) but to answer your question, the stopword list contains some variant forms, and I added a couple more in LUCENE-1758. Maybe this will help: ArabicNormalizer is 'aggressive' for arabic language. ArabicNormalize

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
Basem, by any chance would you be willing to help improve it for us? On Thu, Oct 8, 2009 at 9:20 AM, Basem Narmok wrote: > DM, there is no upper/lower cases in Arabic, so don't worry, but the > stop word list needs some corrections and may miss some common/stop > Arabic words. > > Best, > > On T

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
On 10/08/2009 09:23 AM, Uwe Schindler wrote: Just an addition: The lowercase filter is only for the case of embedded non-arabic words. And these will not appear in the stop words. I learned something new! Hmm. If one has a mixed Arabic / English text, shouldn't one be able to augment the s

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Basem Narmok
Robert, I will be happy to do so. Currently, I am testing the new Arabic analyzer in 2.9, and also I will prepare a new stop word list. I will provide you with my findings/comments soon. Best, On Thu, Oct 8, 2009 at 4:28 PM, Robert Muir wrote: > Basem, by any chance would you be willing to help

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
DM, i suppose. but this is a tricky subject, what if you have mixed Arabic / German or something like that? for some other languages written in the Latin script, English stopwords could be bad :) I think that Lowercasing non-Arabic (also cyrillic, etc), is pretty safe across the board though. On

[jira] Commented: (LUCENE-1953) FastVectorHighlighter: small fragCharSize can cause StringIndexOutOfBoundsException

2009-10-08 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763497#action_12763497 ] Koji Sekiguchi commented on LUCENE-1953: bq. Koji can't commit to the 2.9 branch c

[jira] Issue Comment Edited: (LUCENE-1953) FastVectorHighlighter: small fragCharSize can cause StringIndexOutOfBoundsException

2009-10-08 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763497#action_12763497 ] Koji Sekiguchi edited comment on LUCENE-1953 at 10/8/09 6:52 AM: ---

[jira] Closed: (LUCENE-1962) Persian Arabic Analyzer cleanup

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer closed LUCENE-1962. --- Resolution: Fixed Commited in r823180 thx robert > Persian Arabic Analyzer cleanup > --

RE: Arabic Analyzer: possible bug

2009-10-08 Thread Uwe Schindler
I think the idea of lowercase filter in the arabic analyzers is not to really index mixed language texts. It is more for the case, if you have some word between the Arabic content (like product names,.), which happens often. You see this often also in Japanese texts. And for these embedded English

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
Uwe, I might add to what you say. I do disagree a bit and think mixed english/arabic text is pretty common (aside from the "product name" issue you discussed). this can get really complex for some informal text: you have maybe some english, arabic, and arabic written in informal romanization, some

[jira] Commented: (LUCENE-1953) FastVectorHighlighter: small fragCharSize can cause StringIndexOutOfBoundsException

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763505#action_12763505 ] Mark Miller commented on LUCENE-1953: - just committed Koji. > FastVectorHighlighter:

Re: svn commit: r823189 - in /lucene/java/branches/lucene_2_9/contrib: ./ fast-vector-highlighter/src/java/org/apache/lucene/search/vectorhighlight/ fast-vector-highlighter/src/test/org/apache/lucene/

2009-10-08 Thread Koji Sekiguchi
Thanks, Mark! Can you change "Trunk" to "2.9 branch" in CHANGES.txt? :-) +=== Trunk (not yet released) === Koji markrmil...@apache.org wrote: Author: markrmiller Date: Thu Oct 8 14:32:09 2009 New Revision: 823189 URL: http://svn.apache.org/viewvc?rev=8

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763507#action_12763507 ] Robert Muir commented on LUCENE-1951: - think there would be objection to making this p

[jira] Resolved: (LUCENE-1953) FastVectorHighlighter: small fragCharSize can cause StringIndexOutOfBoundsException

2009-10-08 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-1953. Resolution: Fixed Thanks, Mark! BTW, I cannot assign myself because I cannot find "Assign

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763521#action_12763521 ] Michael McCandless commented on LUCENE-1951: bq. think there would be objectio

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
Robert, Yes it is tricky. I'm not suggesting that the ArabicAnalyzer have any stopwords other than Arabic. I'm suggesting that if I know my input document well and know that it has mixed text and that the text is Arabic and one other known language that I might want to augment the stop list

[jira] Updated: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1951: Attachment: LUCENE-1951.patch updated patch, using SingleTermEnum instead of TermQuery rewrite whe

[jira] Commented: (LUCENE-1953) FastVectorHighlighter: small fragCharSize can cause StringIndexOutOfBoundsException

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763526#action_12763526 ] Mark Miller commented on LUCENE-1953: - I think that means someone has to give you JIRA

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
> > I'm suggesting that if I know my input document well and know that it has > mixed text and that the text is Arabic and one other known language that I > might want to augment the stop list with stop words appropriate for that > known language. I think that in this case, stop filter should be af

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
DM by the way, if you want this lowercasing behavior with edge cases, check out LUCENE-1488. There is a case folding filter there, as well as a normalization filter, and they interact correctly for what you want :) its my understanding that contrib/analyzers should not have any external dependenci

[jira] Created: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread Robert Muir (JIRA)
ArabicAnalyzer: Lowercase before Stopfilter --- Key: LUCENE-1963 URL: https://issues.apache.org/jira/browse/LUCENE-1963 Project: Lucene - Java Issue Type: Improvement Components: contrib/anal

[jira] Updated: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1963: Attachment: LUCENE-1963.patch simple patch, but will need to warn in CHANGES.txt that folks should

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763544#action_12763544 ] Mark Miller commented on LUCENE-1121: - Isn't this still a nice little optimization for

[jira] Updated: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1963: Fix Version/s: 3.0 if no one objects, I'd like to commit this for 3.0 at the end of the day. > Ar

[jira] Updated: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1963: Attachment: LUCENE-1963.patch here also update the javadocs to reflect the new order of what is go

Re: Arabic Analyzer: possible bug

2009-10-08 Thread DM Smith
On 10/08/2009 11:46 AM, Robert Muir wrote: DM by the way, if you want this lowercasing behavior with edge cases, check out LUCENE-1488. There is a case folding filter there, as well as a normalization filter, and they interact correctly for what you want :) Robert, So cool. I've been followin

[jira] Commented: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763554#action_12763554 ] DM Smith commented on LUCENE-1963: -- can you commit it to 2.9.1 too? (For those stuck on J

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
DM, thanks. I will reply to your comments below. How ready is it? I'd like to use it if it is "good enough". > It is not committed yet, so I think it would be best to say it is not ready, but I think it works, give it a try if you have time :). Mainly it needs better doc and tests, but I am focus

[jira] Commented: (LUCENE-1963) ArabicAnalyzer: Lowercase before Stopfilter

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763562#action_12763562 ] Robert Muir commented on LUCENE-1963: - bq. can you commit it to 2.9.1 too? (For those

[jira] Updated: (LUCENE-1950) Remove autoCommit from IndexWriter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1950: --- Attachment: LUCENE-1950.patch Attached patch. All tests pass. This is just the fir

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763573#action_12763573 ] Mark Miller commented on LUCENE-1121: - NM - it appears that when you chunk, you lose t

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763579#action_12763579 ] Michael McCandless commented on LUCENE-1951: Patch looks good Robert! Thanks.

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763580#action_12763580 ] Robert Muir commented on LUCENE-1951: - Michael, cool. The bw_compat patch is still val

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Robert Muir
Basem, I really appreciate your time if you are able to do this. Its been my hope that introducing Arabic/Farsi support will create enough interest to encourage more qualified people to come and really make things nice. If you don't mind, you can look at http://wiki.apache.org/lucene-java/HowToCo

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763597#action_12763597 ] Michael McCandless commented on LUCENE-1951: That is a rather roundabout way t

[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763599#action_12763599 ] Robert Muir commented on LUCENE-1951: - bq. That is a rather roundabout way to arrive a

[jira] Resolved: (LUCENE-1961) Remove remaining deprecations in document package

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1961. --- Resolution: Fixed Committed revision 823252. > Remove remaining deprecations in document pa

[jira] Commented: (LUCENE-1961) Remove remaining deprecations in document package

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763607#action_12763607 ] Michael McCandless commented on LUCENE-1961: I'm seeing this when I run "ant t

[jira] Created: (LUCENE-1964) InstantiatedIndex : TermFreqVector is missing

2009-10-08 Thread David Causse (JIRA)
InstantiatedIndex : TermFreqVector is missing - Key: LUCENE-1964 URL: https://issues.apache.org/jira/browse/LUCENE-1964 Project: Lucene - Java Issue Type: Bug Components: contrib/* Af

[jira] Updated: (LUCENE-1964) InstantiatedIndex : TermFreqVector is missing

2009-10-08 Thread David Causse (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Causse updated LUCENE-1964: - Attachment: term-vector-fix.patch Fix the TermVector storing problem. > InstantiatedIndex : Ter

[jira] Created: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Simon Willnauer (JIRA)
Lazy Atomic Loading Stopwords in SmartCN - Key: LUCENE-1965 URL: https://issues.apache.org/jira/browse/LUCENE-1965 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzer

[jira] Updated: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1965: Attachment: LUCENE-1965.patch attached patch > Lazy Atomic Loading Stopwords in SmartCN

[jira] Updated: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1965: Priority: Trivial (was: Major) > Lazy Atomic Loading Stopwords in SmartCN >

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Hi guys: What are your thoughts about contributing Kamikaze as a lucene contrib package? We just finished porting kamikaze to lucene 2.9. With the new 2.9 api, it allows us for some more code tuning and optimization improvements. We will be releasing kamikaze, it might a good time to ad

Back-compat tags

2009-10-08 Thread Michael Busch
Hi, for the last patches I committed I created a new back-compat tag each time. Since this is happening so often right now, because we're removing APIs, I was wondering whether we should not create a separate tag for each patch, but instead gather the changes in the back-compat branch and cre

[jira] Commented: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763621#action_12763621 ] Robert Muir commented on LUCENE-1965: - Simon, everything is ok, but i have one comment

Re: Back-compat tags

2009-10-08 Thread Michael McCandless
How about we just use the tip of the back-compat branch? (Ie no tagging). Until we settle down. Mike On Thu, Oct 8, 2009 at 2:42 PM, Michael Busch wrote: > Hi, > > for the last patches I committed I created a new back-compat tag each time. > Since this is happening so often right now, because

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread Michael McCandless
+1! Mike On Thu, Oct 8, 2009 at 2:41 PM, John Wang wrote: > Hi guys: > > What are your thoughts about contributing Kamikaze as a lucene contrib > package? We just finished porting kamikaze to lucene 2.9. With the new 2.9 > api, it allows us for some more code tuning and optimization improve

[jira] Updated: (LUCENE-1964) InstantiatedIndex : TermFreqVector is missing

2009-10-08 Thread David Causse (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Causse updated LUCENE-1964: - Attachment: iiw-regression-fix.patch My previous patch has broken the Writer, sorry... I tried t

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Awesome! Mike, can you let us know what the process is and the time line? Thanks -John On Thu, Oct 8, 2009 at 11:48 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > +1! > > Mike > > On Thu, Oct 8, 2009 at 2:41 PM, John Wang wrote: > > Hi guys: > > > > What are your thoughts a

Re: Back-compat tags

2009-10-08 Thread Michael Busch
+1. I guess then we have to make some changes to the build script. Currently it's only possible to specify a tag. I'll open a JIRA issue. Michael On 10/8/09 11:45 AM, Michael McCandless wrote: How about we just use the tip of the back-compat branch? (Ie no tagging). Until we settle down.

[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

2009-10-08 Thread MRIT64 (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763638#action_12763638 ] MRIT64 commented on LUCENE-1958: It doesnt happen with Lucene 2.9 (just downloaded). > Sh

[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763641#action_12763641 ] Robert Muir commented on LUCENE-1958: - bq. It doesnt happen with Lucene 2.9 (just down

[jira] Updated: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1965: Attachment: LUCENE-1965.patch Thanks robert, good catch! I was adding one test with null i

[jira] Commented: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763646#action_12763646 ] Robert Muir commented on LUCENE-1965: - Simon, cool. I like it now, think its a good im

[jira] Closed: (LUCENE-1965) Lazy Atomic Loading Stopwords in SmartCN

2009-10-08 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer closed LUCENE-1965. --- Resolution: Fixed commited in r823285 thx robert for reviewing > Lazy Atomic Loading Stopw

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread Michael McCandless
Well, it's the usual process... pull together a big patch, open an issue, etc. Probably because it's a large amount of code (I think?) you'll need to submit a software grant (http://www.apache.org/licenses/software-grant.txt). Mike On Thu, Oct 8, 2009 at 2:58 PM, John Wang wrote: > Awesome! > >

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread Mark Miller
Yup - you need for anything developed outside of Apache. Michael McCandless wrote: > Well, it's the usual process... pull together a big patch, open an issue, etc. > > Probably because it's a large amount of code (I think?) you'll need to > submit a software grant > (http://www.apache.org/licenses

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Basem Narmok
Uwe, !00% correct On Thu, Oct 8, 2009 at 4:56 PM, Uwe Schindler wrote: > I think the idea of lowercase filter in the arabic analyzers is not to > really index mixed language texts. It is more for the case, if you have some > word between the Arabic content (like product names,.), which happens of

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Basem Narmok
Ok, the list is ready (initial one, as I will continue enhancing it). I will create JIRA issue and send the patch. Also, I have some small changes to the normalization (e.g. removing some diacritics, and other changes) Best, Basem On Thu, Oct 8, 2009 at 8:51 PM, Robert Muir wrote: > Basem, I re

[jira] Updated: (LUCENE-1959) Index Splitter

2009-10-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1959: -- Attachment: mp-splitter.patch Here's my submission to the index splitting race ;) This

Re: Arabic Analyzer: possible bug

2009-10-08 Thread Basem Narmok
Robert, Yes, this issue will not work, as some numbers are used to represent (transliterate if I may say) some English letters (e.g. 3 for Arabic Aeen, and 7 for Arabic H'a). Some online services provide instant translation for such transliteration (e.g. http://www.yamli.com/ try this word "7elo"

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763684#action_12763684 ] Michael McCandless commented on LUCENE-1959: Excellent! > Index Splitter > --

[jira] Commented: (LUCENE-1961) Remove remaining deprecations in document package

2009-10-08 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763685#action_12763685 ] Michael Busch commented on LUCENE-1961: --- I committed a fix for this to the back-comp

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763682#action_12763682 ] Mark Miller commented on LUCENE-1959: - Nice! Lets add it to the mix - I'm guessing Jas

[jira] Commented: (LUCENE-1959) Index Splitter

2009-10-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763683#action_12763683 ] Uwe Schindler commented on LUCENE-1959: --- Really cool! > Index Splitter > --

  1   2   >