Re: Build failed in Hudson: Lucene-trunk #814

2009-05-02 Thread Michael McCandless
This was the failure: [junit] NOTE: random seed of testcase 'testRandomIWReader' was: -5001333286299627079 [junit] - --- [junit] Testcase: testRandomIWReader(org.apache.lucene.index.TestStressIndexing2):Caused an ERROR [junit] MockRA

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-05-02 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705251#action_12705251 ] Michael McCandless commented on LUCENE-1593: bq. But I think it's strange that

Question related to improving search results

2009-05-02 Thread Aditya
Hi, New to this group. Question: Generally sites like wikipeadia have a template and every page follows it. These templates contains the word that occurs in every page. For example wikipedia template has the list of language in the left panel. Now these words gets indexed every tim

[jira] Commented: (LUCENE-1313) Realtime Search

2009-05-02 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705255#action_12705255 ] Michael McCandless commented on LUCENE-1313: {quote} IndexFileDeleter takes i

[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it

2009-05-02 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705257#action_12705257 ] Michael McCandless commented on LUCENE-1536: {quote} Early termination (prunin

[jira] Commented: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-05-02 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705279#action_12705279 ] Yonik Seeley commented on LUCENE-1609: -- bq. If it's only for segment merging, couldn'

[jira] Reopened: (LUCENE-1425) Add ConstantScore highlighting support to SpanScorer

2009-05-02 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened LUCENE-1425: - needs a little fix to work with null field option and possibly default field > Add ConstantScore hi

[jira] Updated: (LUCENE-1425) Add ConstantScore highlighting support to SpanScorer

2009-05-02 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1425: Attachment: LUCENE-1425.patch > Add ConstantScore highlighting support to SpanScorer > ---

[jira] Resolved: (LUCENE-1425) Add ConstantScore highlighting support to SpanScorer

2009-05-02 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1425. - Resolution: Fixed > Add ConstantScore highlighting support to SpanScorer > -

Re: ArabicAnalyzer

2009-05-02 Thread DM Smith
On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote: I've wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and ArabicFilter. It can handle Arabic text very well. I've tested it with large set of Arabic documents and it worked OK both in term of accuracy and performance. The

Re: Question related to improving search results

2009-05-02 Thread Erik Hatcher
I suppose you're talking about content that is indexed from web crawling. It's a messy problem. Extraneous junk needs to be filtered out and not indexed, so some form of header/footer/sidebar detection and exclusion definitely makes searching crawled pages much better. When possible, inde

Hudson build is back to normal: Lucene-trunk #815

2009-05-02 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/815/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: ArabicAnalyzer

2009-05-02 Thread Ahmed Al-Obaidy
Well I don't know really... but it shouldn't be hard to support it. --- On Sun, 5/3/09, DM Smith wrote: From: DM Smith Subject: Re: ArabicAnalyzer To: java-dev@lucene.apache.org Date: Sunday, May 3, 2009, 4:05 AM On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote: I've wrote a simple (but yet

Re: ArabicAnalyzer

2009-05-02 Thread Robert Muir
have you looked at the existing ar analyzer in contrib? I like your analyzer but glancing at your code I think you can get the same behavior with the existing one (it also has stopwords & stemming but you can disable that). lemme know if i am missing something! wrt farsi i wouldnt recommend using