[jira] Resolved: (LUCENE-1371) Add Searcher.search(Query, int)

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1371. Resolution: Fixed > Add Searcher.search(Query, int) >

[jira] Commented: (LUCENE-1126) Simplify StandardTokenizer JFlex grammar

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627943#action_12627943 ] Michael McCandless commented on LUCENE-1126: Hmm -- I'm now seeing an failure

[jira] Created: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Michael McCandless (JIRA)
Merging of compressed string Fields may hit NPE --- Key: LUCENE-1374 URL: https://issues.apache.org/jira/browse/LUCENE-1374 Project: Lucene - Java Issue Type: Bug Components: Index Af

[jira] Updated: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1374: --- Attachment: LUCENE-1374.patch Attached patch that fixes AbstractField's getBinaryVal

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Nadav Har'El
On Tue, Sep 02, 2008, Chris Hostetter wrote about "Re: Moving SweetSpotSimilarity out of contrib": > > : >From a legal standpoint, whenever we need to use open-source code, somebody > : has to inspect the code and 'approve' it. This inspection makes sure there's > : no use of 3rd party libraries,

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Shai Erera
Thanks all for the "legal" comments. Can we consider moving the SweetSpotSimilarity to "core" because of the quality improvements it introduces to search? I tried to emphasize that that's the main reason, but perhaps I didn't do a good job at that, since the discussion has turned into a legal issu

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Mark Miller
I think its a fair question that, regardless of the legal mumbo jumbo provoking it, can be considered on the merits that it should be - is it something important enough to bulk up the core with the trade off being more people will find it helpful and can use it with slightly less hassle? I hav

[jira] Commented: (LUCENE-1373) Most of the contributed Analyzers suffer from invalid recognition of acronyms.

2008-09-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627990#action_12627990 ] Grant Ingersoll commented on LUCENE-1373: - I think you should mirror what is done

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread mark harwood
Not tried SweetSpot so can't comment on worthiness of moving to core but agree with the principle that we can't let the hassles of a company's "due diligence" testing dictate the shape of core vs contrib. For anyone concerned with the overhead of doing these checks a company/product of potentia

[jira] Resolved: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1374. Resolution: Fixed Committed revision 691617. > Merging of compressed string Field

RE: Multi Phrase Search at the Beginning of a field

2008-09-03 Thread ext-vinay.thota
Excellent, it worked :) Thank you Tori!! Regards, Vinay >-Original Message- >From: ext Andraz Tori [mailto:[EMAIL PROTECTED] >Sent: 01 September, 2008 16:39 >To: java-dev@lucene.apache.org >Subject: Re: Multi Phrase Search at the Beginning of a field > >You can use standard trick. > >I

[jira] Commented: (LUCENE-532) [PATCH] Indexing on Hadoop distributed file system

2008-09-03 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628025#action_12628025 ] Ning Li commented on LUCENE-532: Is the use of seek and write in ChecksumIndexOutput making

Can I filter the results returned by IndexReader.terms(term)?

2008-09-03 Thread AdrianPillinger
I am using IndexReader.terms(term) to produce term suggestions to my users as they type. In many cases the user is searching lucene with a filter applied, for example a date range. Is there any way I can get a list of terms in the index that are contained within a subset of the documents by a gi

Re: Can I filter the results returned by IndexReader.terms(term)?

2008-09-03 Thread mark harwood
One way is to read TermDocs for each candidate term and see if they are in your filter - but that sounds like a lot of disk IO to me when responding to individual user keystrokes. You can use "skip" to avoid reading all term docs when you know what is in the filter but it all seems a bit costly.

[jira] Commented: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Chris Harris (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628055#action_12628055 ] Chris Harris commented on LUCENE-1374: -- "ant test" on 691617 for me fails on the foll

[jira] Issue Comment Edited: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Chris Harris (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628055#action_12628055 ] ryguasu edited comment on LUCENE-1374 at 9/3/08 10:07 AM: --- "

Re: Can I filter the results returned by IndexReader.terms(term)?

2008-09-03 Thread Paul Elschot
Another way is to use the trunk, where Scorer is a subclass of DocIdSetIterator, which is returned by a Filter. This allows to create a TermFilter that returns a TermScorer (which is based on TermEnum internally.) Try wrapping it in a CachingWrapperFilter when it needs to be reused. Finally, have

[jira] Commented: (LUCENE-1374) Merging of compressed string Fields may hit NPE

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628067#action_12628067 ] Michael McCandless commented on LUCENE-1374: Woops, you're right: I too see th

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Chris Hostetter
: saw, the distinction and rules are not quite clear. I would think though, if : the new Similarity is really that much better than the old, it might actually : benefit in core. There is no doubt core gets more attention on both the user : and developer side, and important pieces with general usag

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Mark Miller
I would agree with you if I was wrong about the contrib/core attention thing, but I don't think I am. It seems as if you have been arguing that contrib is really just an extension of core, on par with core, but just in different libs, and to keep core lean and mean, anything not needed in core

[jira] Commented: (LUCENE-1313) Ocean Realtime Search

2008-09-03 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628092#action_12628092 ] Jason Rutherglen commented on LUCENE-1313: -- Is there a good place to place the ja

RE: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Steven A Rowe
On 09/03/2008 at 2:00 PM, Chris Hostetter wrote: > On 09/03/2008 at 8:40 AM, Mark Miller wrote: > > I havn't used it myself, so I won't guess (too much ), but the > > question to me seems to be, is SweetSpot important enough to move to > > core? Are there enough good reasons? And even if so, is it

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Michael McCandless
Another important driver is the "out-of-the-box experience". It's crucial that Lucene has good starting defaults for everything because many developers will stick with these defaults and won't discover the wiki page that says you need to do X, Y and Z to get better relevance, indexing speed, sear

solr2: Onward and Upward

2008-09-03 Thread Yonik Seeley
If you've considered Solr in the past, but for some reason it didn't meet your needs, we'd love to hear from you over on solr-dev. We're starting to do some forward looking architecture work on the next major version of Solr, so let us know what ideas you have and what you'd like to see! solr-dev

Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
Hello all, I don't mean this to sound like a solicitation. I've been working on realtime search and created some Lucene patches etc. I am wondering if there are social networks (or anyone else) out there who would be interested in collaborating with Apache on realtime search to get it to the poi

[jira] Commented: (LUCENE-1126) Simplify StandardTokenizer JFlex grammar

2008-09-03 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628106#action_12628106 ] Steven Rowe commented on LUCENE-1126: - Yeah, I see this too. The issue is that the en

[jira] Reopened: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened LUCENE-1320: - Lucene Fields: [Patch Available] (was: [Patch Available, New]) Despite the fact that we

[jira] Updated: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1320: Priority: Blocker (was: Major) I'm marking this as a blocker for 2.4 based on the Java 1.

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread markharw00d
>>Another important driver is the "out-of-the-box experience". >>we need a "standard distro" ...which would be the core plus cherry-pick certain important contrib modules (highlighter, >> SweetSpotSimilarity,snowball, spellchecker, etc.) and bundle them together. Is that not Solr, or at least

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > I am wondering > if there are social networks (or anyone else) out there who would be > interested in collaborating with Apache on realtime search to get it > to the point it can be used in production. Good timing Jason,

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Michael McCandless
markharw00d wrote: >>Another important driver is the "out-of-the-box experience". >>we need a "standard distro" ...which would be the core plus cherry- pick certain important contrib modules (highlighter, >> SweetSpotSimilarity,snowball, spellchecker, etc.) and bundle them together. Is that

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 4:55 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: >> I suspect any attempts at "bundling" Lucene code may snowball until you've >> rebuilt Solr. > > Yeah I guess it is... though Solr includes the whole webapp too, whereas I > think there's a natural bundle that wouldn't

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Grant Ingersoll
On Sep 3, 2008, at 3:00 PM, Michael McCandless wrote: Obviously we can't default everything perfectly since at some point there are hard tradeoffs to be made and every app is different, but if SweetSpotSimilarity really gives better relevance for many/most apps, and doesn't have any downsides (

[jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-03 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628132#action_12628132 ] Karl Wettin commented on LUCENE-1320: - OK. Either remove it or place it in some altern

[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628154#action_12628154 ] Michael McCandless commented on LUCENE-1131: Otis is this one ready to go in?

[jira] Resolved: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1350. Resolution: Duplicate Fix Version/s: (was: 2.3.3) Lucene Fields: [Ne

[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628158#action_12628158 ] Doron Cohen commented on LUCENE-1350: - Yes it is a dup, thanks Mike for taking care of

[jira] Commented: (LUCENE-1356) Allow easy extensions of TopDocCollector

2008-09-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628163#action_12628163 ] Michael McCandless commented on LUCENE-1356: Doron is this one ready to go in?

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Doron Cohen
My thought was to move SSS to core as a step towards making it the default, if and when there is more evidence it is better than current default - it just felt right as a cautious step - I mean first move it to core so that it is more exposed and used, an only after a while, maybe, if there are mos

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
Hi Yonik, The SOLR 2 list looks good. The question is, who is going to do the work? I tried to simplify the scope of Ocean as much as possible to make it possible (and slowly at that over time) for me to eventually finish what is mentioned on the wiki. I think SOLR is very cool and was major

[jira] Commented: (LUCENE-1356) Allow easy extensions of TopDocCollector

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628193#action_12628193 ] Doron Cohen commented on LUCENE-1356: - It is, applies cleanly and seems correct. Will

[jira] Resolved: (LUCENE-1356) Allow easy extensions of TopDocCollector

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1356. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Th

[jira] Updated: (LUCENE-989) Statistics from ValueSourceQuery

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-989: --- Fix Version/s: (was: 2.4) 3.0 Assignee: (was: Doron Cohen) This s

[jira] Updated: (LUCENE-1081) Remove the "Experimental" warnings from search.function package

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1081: Fix Version/s: (was: 2.4) 3.0 Assignee: (was: Doron Cohen) Wil

[jira] Updated: (LUCENE-1085) search.function should support all capabilities of Solr's search.function

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1085: Fix Version/s: (was: 2.4) 3.0 Assignee: (was: Doron Cohen) > s

[jira] Issue Comment Edited: (LUCENE-989) Statistics from ValueSourceQuery

2008-09-03 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628197#action_12628197 ] doronc edited comment on LUCENE-989 at 9/3/08 4:38 PM: This sho

Re: [jira] Commented: (LUCENE-1373) Most of the contributed Analyzers suffer from invalid recognition of acronyms.

2008-09-03 Thread Mark Lassau
Grant Ingersoll (JIRA) wrote: Of course, it's still a bit weird, b/c in your case the type value is going to be set to ACRONYM, when your example is clearly not one. This suggests to me that the grammar needs to be revisited, but that can wait until 3.0 I believe. Grant, not sure what you

Re: [jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-03 Thread Grant Ingersoll
Or just remove the generics, right? On Sep 3, 2008, at 5:09 PM, Karl Wettin (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628132 #action_12628132 ] Karl Wettin commented on LUCENE

Re: [jira] Commented: (LUCENE-1373) Most of the contributed Analyzers suffer from invalid recognition of acronyms.

2008-09-03 Thread Shai Erera
I think we should distinguish between what is a bug and what is an attempt of the tokenizer to produce a meaningful token. When the tokenizer outputs a HOST or ACRONYM token type, there's nothing that prevents you from putting a filter after the tokenizer that will use a UIMA Annotator (for example

Is the COMPANY rule in StandardTokenizer valid?

2008-09-03 Thread Shai Erera
Hi The COMPANY rule in StandardTokenizer is defined like this: // Company names like AT&T and [EMAIL PROTECTED] COMPANY= {ALPHA} ("&"|"@") {ALPHA} While this works perfect for AT&T and [EMAIL PROTECTED], it doesn't work well for strings like widget&javascript&html. Now, the latter is obvio