Hi:
I am using the SmartChineseAnalyzer class and it is great!
Was wondering if we should have a set of chinese stopwords. The default
set containts only punctuations.
Thanks
-John
I should mention - I tried it with:
config.setRAMBufferSizeMB(1.0);
and should have posted that version. It still comes up with one 5mb
CFS segment file.
On Fri, Apr 9, 2010 at 2:55 PM, Lance Norskog wrote:
> If the IndexWriterConfig.ram buffer size and the mergeMB size on the
> policy object a
If the IndexWriterConfig.ram buffer size and the mergeMB size on the
policy object are both 1mg, then can there be a segment larger than
2mb? Or 3mb? Or 10mb?
Is there any way to (totally utterly completely absolutely 100%) cap
the size of a segment merge?:If so, it appears to be an algebraic
equa
I have found it useful to keep two lists of tests: the slow tests and
the fast tests. Maybe the TestSuite feature would work for this
purpose?
An @SlowTest annotation would be even better. JUnit might have a tool
to do this filtering.
On Fri, Apr 9, 2010 at 2:49 AM, Michael McCandless
wrote:
> I
[
https://issues.apache.org/jira/browse/LUCENE-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2323:
Attachment: LUCENE-2323_wikipedia.patch
now that flex is merged, its a good time to continue doing
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855498#action_12855498
]
Uwe Schindler commented on LUCENE-2372:
---
One more: PerFieldAnalyzerWrapper :( - Sorr
[
https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855496#action_12855496
]
Shivender Devarakonda commented on LUCENE-2376:
---
I have a question on this,
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855493#action_12855493
]
Uwe Schindler commented on LUCENE-2372:
---
Did it already for StandardAna (see patch).
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855492#action_12855492
]
Michael McCandless commented on LUCENE-2372:
+1 to making KeywordAnalyzer fina
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855489#action_12855489
]
Mark Miller commented on LUCENE-2372:
-
bq.If I make it final and
+1 - lets just remem
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2372:
--
Attachment: LUCENE-2372.patch
Small updates.
Just one question: The only non-final Analyzer l
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855470#action_12855470
]
Michael McCandless commented on LUCENE-2386:
I think oal.index is good.
> Ind
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855457#action_12855457
]
Shai Erera commented on LUCENE-2386:
Ok sounds good. Is there a preferred package for
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855421#action_12855421
]
Michael McCandless commented on LUCENE-2386:
Patch looks good!
Hmm... maybe w
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2372:
--
Attachment: LUCENE-2372.patch
Patch that removes deprecated usage of TermAttribute from Lucene
[
https://issues.apache.org/jira/browse/LUCENE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-2388.
-
Resolution: Fixed
Fix Version/s: 3.1
both patches are committed... if you find any outdat
[
https://issues.apache.org/jira/browse/LUCENE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2388:
Attachment: LUCENE-2388_solr.patch
attached is a patch to fix the references on the solr site.
>
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-2386:
---
Attachment: LUCENE-2386.patch
Patch fixes all tests as well as changes to IndexWriter, IndexFileDele
[
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2387.
Resolution: Fixed
Fix Version/s: 3.1
> IndexWriter retains references to Re
[
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855379#action_12855379
]
Shai Erera commented on LUCENE-1879:
I have found such version ... and it fails too :)
[
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855377#action_12855377
]
Michael Busch commented on LUCENE-1879:
---
{quote}
I'll start by describing the limita
Setting maxMergeMB does not limit the size of segments you will see - it
simply limits what segments will be merged - segments over maxMergeMB
will not be merged with other segments - you can still buffer up a ton
of docs in RAM and flush a segment larger than maxMergeMB, or merge n
segments sm
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855369#action_12855369
]
Shai Erera commented on LUCENE-2386:
I already did that ... just didn't post back. Cre
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855364#action_12855364
]
Michael McCandless commented on LUCENE-2386:
How about we subclass FNFE? Eg "
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855360#action_12855360
]
Earwin Burrfoot commented on LUCENE-2386:
-
I'm at loss for words. No, seriously, b
[
https://issues.apache.org/jira/browse/LUCENE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2388:
Attachment: LUCENE-2388.patch
attached is a patch for lucene.
if no one objects, i'd like to comm
the unversioned site points to a dead trunk
---
Key: LUCENE-2388
URL: https://issues.apache.org/jira/browse/LUCENE-2388
Project: Lucene - Java
Issue Type: Bug
Components: Website
[
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855358#action_12855358
]
Uwe Schindler commented on LUCENE-2364:
---
+1
Term is still used at a lot of places i
[
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler resolved LUCENE-2302.
---
Resolution: Fixed
Lucene Fields: [New, Patch Available] (was: [New])
Committed revis
[
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2302:
--
Attachment: LUCENE-2302-toString.patch
Patch that fixes the toString() problems in Token and a
[
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855347#action_12855347
]
Michael McCandless commented on LUCENE-2387:
I agree, Uwe -- I'll fold that in
[
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855345#action_12855345
]
Uwe Schindler commented on LUCENE-2387:
---
As Tokenizers are reused, the analyzer hold
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855344#action_12855344
]
Shai Erera commented on LUCENE-2386:
Ok I've added the following to DirReader:
{code}
[
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855343#action_12855343
]
Michael McCandless commented on LUCENE-2364:
Maybe we should simply deprecate
I agree IW should not hold refs to the Field instances from the last
doc indexed... I put a patch on LUCENE-2387 to null the reference as
we go. Can you confirm this lets GC reclaim?
Mike
On Fri, Apr 9, 2010 at 12:54 AM, Ruben Laguna wrote:
> But the Readers I'm talking about are not held by th
[
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2387:
---
Attachment: LUCENE-2387.patch
Attached patch nulls out the Fieldable reference.
> I
[
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-2387:
--
Assignee: Michael McCandless
> IndexWriter retains references to Readers used
[
https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855336#action_12855336
]
Michael McCandless commented on LUCENE-2376:
Hmm indeed you have a great many
[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855333#action_12855333
]
Michael McCandless commented on LUCENE-2386:
bq. This is a behavioral bw break
It's also slow because it repeats all the tests for each of the core
codecs (standard, sep, pulsing, intblock).
I think it's fine to reduce the number of iterations -- just make sure
there's no seed to newRandom() so the distributing testing is
"effective".
Mike
On Fri, Apr 9, 2010 at 12:43 AM,
[
https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2372:
--
Attachment: LUCENE-2372.patch
Here a first patch for the core tokenstreams. Tests not yet chan
[
https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855306#action_12855306
]
Shivender Devarakonda commented on LUCENE-2376:
---
Please find the attached Ch
42 matches
Mail list logo