Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hey Mike, > This looks great! Thanks! > > But, the goal is to make a standalone toolkit exposing GIS functions, > right? Yep you got it! > > My original question (integrating this into Lucene/Solr) remains. Sure, I think the goal would be to provide only the Spatial aspects required by Search

Boosting on *unique* term matches without using MUST

2010-03-01 Thread tavi.nathanson
Hey everyone, Let me start with an example query: [apple orange banana] I would like to heavily boost documents containing a greater number of unique query terms (apple, orange, banana), without MUST'ing the terms; in other words, a document containing just 2 unique terms (apple, banana) should

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Doug Cutting
Ted Dunning wrote: Hadoop is a strange beast. The Hadoop core itself has fractured into three projects that have independent mailing lists but which share release dates. But without any releases yet. Is that "shared nothing"? The rationale for the Hadoop split was that the single codebase wa

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mark Miller
On 03/01/2010 01:43 PM, Chris Hostetter wrote: (Man, why is it you guys alwasy decide to start the monolithic "let's redesign the world" threads while i'm offline for a few days ... I figured at worst I'd 'svn up' and discover that McCandless had reimplemented all of the indexing code in Scala, b

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Ted Dunning
Hadoop is a strange beast. The Hadoop core itself has fractured into three projects that have independent mailing lists but which share release dates. On Mon, Mar 1, 2010 at 11:00 AM, Chris Hostetter wrote: > Conversly: Hadoop has lots of subprojects with divergent user > communities, but (again

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
The possibility of slowing down releases is the only real concern I also share But, I think release frequency is largely a matter of discipline :) But, digging into it, I think as long as the project keeps a "stable trunk" (something Lucene has always tried to do -- does Solr?)... then releas

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Chris Hostetter
: I've started to think that a merge of Solr and Lucene would be in the : best interest of both projects. As I already mentioned in my previous reply: I think there are incremental steps that could be made before we spend too much effort worrying if/how Solr develpment could be more tightly int

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hey Hoss, I support Mike's original suggestion of having a shared, independently maintained/released analysis package for Nutch/Solr/Lucene. I emphatically do not support merging Solr and Lucene in the way proposed. Hope that clarifies things, at least from me. Cheers, Chris On 3/1/10 11:43

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
This looks great! But, the goal is to make a standalone toolkit exposing GIS functions, right? My original question (integrating this into Lucene/Solr) remains. EG there's alot of good working happening now in Solr to make spatial search available. How will that find its way back to Lucene? Lu

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hey Steve, Thanks! Yep we just started, and got our mailing lists set up after the positive Incubation vote. You can read the project proposal here: http://wiki.apache.org/incubator/SpatialProposal Cheers, Chris On 3/1/10 11:41 AM, "Steven A Rowe" wrote: Hi Chris, On 03/01/2010 at 1:28 PM

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Chris Hostetter
(Man, why is it you guys alwasy decide to start the monolithic "let's redesign the world" threads while i'm offline for a few days ... I figured at worst I'd 'svn up' and discover that McCandless had reimplemented all of the indexing code in Scala, but i certainly wasn't expecting all of this.

RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Steven A Rowe
Hi Chris, On 03/01/2010 at 1:28 PM, Mattmann, Chris A (388J) wrote: > http://incubator.apache.org/projects/sis.html > > We're just starting to tackle that very issue right > now...patches/ideas/contributions welcome. Patches? SVN looks empty ATM

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
Also, there still seems to be a misconception on what's being proposed here. The proposal is to synchronize the development of Solr and Lucene. Ie, a single dev list, single set of committers, synchronized releases. Everything else remains the same. EG the release artifacts, user's lists, web si

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
I'm glad that you brought that up! :) Check out: http://incubator.apache.org/projects/sis.html We're just starting to tackle that very issue right now...patches/ideas/contributions welcome. Cheers, Chris On 3/1/10 11:25 AM, "Michael McCandless" wrote: Because the code dup with analyzers i

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
Because the code dup with analyzers is only one of the problems to solve. In fact, it's the easiest of the problems to solve (that's why I proposed it, only, first). A more differentiating example is a much less mature module EG take spatial -- if Solr were its own TLP, how could spatial be

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael Busch
It seems like most of the people agree with these good goals but are concerned about the release cycles (including me). How can we achieve these goals without making releases more difficult? Michael On 3/1/10 9:44 AM, Michael McCandless wrote: If we don't somehow first address the code dupli

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Mike, I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. Chris On 3

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey wrote: > On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote: > >> But it goes beyond analyzers: I'd like to see other modules, now in >> Solr, eventually moved to Lucene, because they really are "core" >> functionality (eg facets, fu

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Marvin Humphrey
On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote: > But it goes beyond analyzers: I'd like to see other modules, now in > Solr, eventually moved to Lucene, because they really are "core" > functionality (eg facets, function (and other?) queries, spatial, > maybe improvements to s

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
If we don't somehow first address the code duplication across the 2 projects, making Solr a TLP will make things worse. I started here with analysis because I think that's the biggest pain point: it seemed like an obvious first step to fixing the code duplication and thus the most likely to reach

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Simon Willnauer
IMO the only downside is that we risk a longer release cycle if we merge. I requires a certain level of discipline but has this been the case since ever?! Anything else seems to be a win to both communities and I personally would love to see the communities coming closer again. I was working on man

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Robert Muir
but Yonik's proposal (or at least some of the ideas from it?) is attractive as it seems to solve the real problem that created the duplication in the first place, which is not limited to analyzers. On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: >

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Grant, > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: > >> Hi Robert, >> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers >> issue - I was in favor, at the very least, of having a separate >> module/project/whatever that both Solr/Lucene (and what

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Grant Ingersoll
On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: > Hi Robert, > > I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers > issue - I was in favor, at the very least, of having a separate > module/project/whatever that both Solr/Lucene (and whatever project) can

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Jason Rutherglen
Here's the main points that pop up: > * Solr is Lucene's biggest direct user -- most people who use Lucene >use it through Solr -- so having it more closely integrated means >we know sooner if we broke something. > * Right now I could test whether flex breaks anything in Solr. I >ca

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Robert, I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers issue - I was in favor, at the very least, of having a separate module/project/whatever that both Solr/Lucene (and whatever project) can depend on for the shared analyzer code... Cheers, Chris On 3/1/10

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Robert Muir
this will make the analyzers duplication problem even worse On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Mark, > > Thanks for your message. I respect your viewpoint, but I respectfully > disagree. It just seems (to me at least based on the

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mark Miller
That's fine with me ;) I can certainly see people thinking both ways. I'm sure neither approach is a clear win in every aspect. - Mark On 03/01/2010 11:06 AM, Mattmann, Chris A (388J) wrote: Hi Mark, Thanks for your message. I respect your viewpoint, but I respectfully disagree. It just se

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Mark, Thanks for your message. I respect your viewpoint, but I respectfully disagree. It just seems (to me at least based on the discussion) like a TLP for Solr is the way to go. Cheers, Chris On 3/1/10 8:54 AM, "Mark Miller" wrote: On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mark Miller
On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: Hi Mark, That would really be no real world change from how things work today. The fact is, today, Solr already operates essentially as an independent project. Well if that's the case, then it would lead me to think that it's mo

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Grant, > All of this needs to be discussed and it's not even clear whether any of it is > required. Lucene runs pretty smoothly from a PMC level, so I don't feel a > huge need to break something up just for the sake of it. Well that's what we're doing, discussing it right? Also, you brought u

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hi Mark, > > That would really be no real world change from how things work today. The fact > is, today, Solr already operates essentially as an independent project. Well if that's the case, then it would lead me to think that it's more of a TLP more than anything else per best practices. > The

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Grant Ingersoll
All of this needs to be discussed and it's not even clear whether any of it is required. Lucene runs pretty smoothly from a PMC level, so I don't feel a huge need to break something up just for the sake of it. At any rate, I doubt it makes much sense for some subs to be split out, but Mahout

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mark Miller
{quote}If so, that may be the best of all worlds, allowing project independence, but also not following the Apache "antipattern" as Doug put it...{quote} That would really be no real world change from how things work today. The fact is, today, Solr already operates essentially as an independent

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mattmann, Chris A (388J)
Hey Grant, I¹d like to explore this < does this imply that the Lucene sub-projects will go away and Lucene will turn into Lucene-java and maintain its Apache TLP, and then you¹d have say, solr.apache.org, tika.apache.org, mahout.apache.org (already started), etc. etc.? If so, that may be the best

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Grant Ingersoll
On Mar 1, 2010, at 6:28 AM, Grant Ingersoll wrote: > > On Feb 28, 2010, at 9:05 PM, Michael Busch wrote: > >> On 2/28/10 4:30 PM, Grant Ingersoll wrote: >>> >> >> I was really happy about the original idea of having a separate analyzer >> module (or subproject, library, whatever name it'd ha

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Grant Ingersoll
On Feb 28, 2010, at 9:05 PM, Michael Busch wrote: > On 2/28/10 4:30 PM, Grant Ingersoll wrote: >> > > I was really happy about the original idea of having a separate analyzer > module (or subproject, library, whatever name it'd have), because analysis > seems quite separate from indexing/sear

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Mark Miller
On 2/28/10 9:05 PM, Michael Busch wrote: Think about changes like per-segment search or the new TokenStream API and how difficult and time consuming they were for core and contrib already. 1. Its not just more work for the same Lucene devs - there would be more devs with a merge to work on t

Finding minimum and maximum value in a field

2010-03-01 Thread Ranga
Hi All, I want to find out the minimum and maximum value in a field. How to achieve is Lucene.Net. Currently I am using Lucene.Net version 2.4.0 can anyone help me. Thanks in advance -Ranga