Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Yonik Seeley
It pointed to 7.1.0 for me perhaps a browser cache issue?
Anyway, you can go directly as well:
http://www.apache.org/dyn/closer.lua/lucene/solr/7.1.0

-Yonik


On Tue, Oct 17, 2017 at 11:25 AM, Susheel Kumar  wrote:
> Thanks, Shalin.
>
> But the download mirror still has 7.0.1 not 7.1.0.
>
> http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1
>
>
>
>
> On Tue, Oct 17, 2017 at 5:28 AM, Shalin Shekhar Mangar
>  wrote:
>>
>> 17 October 2017, Apache Solr™ 7.1.0 available
>>
>> The Lucene PMC is pleased to announce the release of Apache Solr 7.1.0
>>
>> Solr is the popular, blazing fast, open source NoSQL search platform
>> from the Apache Lucene project. Its major features include powerful
>> full-text search, hit highlighting, faceted search, dynamic
>> clustering, database integration, rich document (e.g., Word, PDF)
>> handling, and geospatial search. Solr is highly scalable, providing
>> fault tolerant distributed search and indexing, and powers the search
>> and navigation features of many of the world's largest internet sites.
>>
>> Solr 7.1.0 is available for immediate download at:
>>
>> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>>
>> See http://lucene.apache.org/solr/7_1_0/changes/Changes.html for a
>> full list of details.
>>
>> Solr 7.1.0 Release Highlights:
>>
>> * Critical Security Update: Fix for CVE-2017-12629 which is a working
>> 0-day exploit reported on the public mailing list. See
>> https://s.apache.org/FJDl
>>
>> * Auto-scaling: Solr can now move replicas automatically when a new
>> node is added or an existing node is removed using the auto scaling
>> policy framework introduced in 7.0
>>
>> * Auto-scaling: The 'autoAddReplicas' feature which was limited to
>> shared file systems is now available for all file systems. It has been
>> ported to use the new autoscaling framework internally.
>>
>> * Auto-scaling: New set-trigger, remove-trigger, set-listener,
>> remove-listener, suspend-trigger, resume-trigger APIs
>>
>> * Auto-scaling: New /autoscaling/history API to show past autoscaling
>> actions and cluster events
>>
>> * New JSON based Query DSL for Solr that extends JSON Request API to
>> also support all query parsers and their nested parameters
>>
>> * JSON Facet API: min/max aggregations are now supported on
>> single-valued date fields
>>
>> * Lucene's Geo3D (surface of sphere & ellipsoid) is now supported on
>> spatial RPT fields by setting spatialContextFactory="Geo3D".
>> Furthermore, this is the first time Solr has out of the box support
>> for polygons
>>
>> * Expanded support for statistical stream evaluators such as various
>> distributions, rank correlations, distances and more.
>>
>> * Multiple other optimizations and bug fixes
>>
>> You are encouraged to thoroughly read the "Upgrade Notes" at
>> http://lucene.apache.org/solr/7_1_0/changes/Changes.html or in the
>> CHANGES.txt file accompanying the release.
>>
>> Solr 7.1 also includes many other new features as well as numerous
>> optimizations and bugfixes of the corresponding Apache Lucene release.
>>
>> Please report any feedback to the mailing lists
>> (http://lucene.apache.org/solr/discussion.html)
>>
>> Note: The Apache Software Foundation uses an extensive mirroring
>> network for distributing releases. It is possible that the mirror you
>> are using may not have replicated the release yet. If that is the
>> case, please try another mirror. This also goes for Maven access.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>


Re: Welcome David Smiley to the PMC

2013-03-18 Thread Yonik Seeley
On Mon, Mar 18, 2013 at 10:28 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Thanks Steve, and to the rest of the PMC members!  I hope to see many of
 you at Lucene/Solr Revolution in May.

+1

Welcome!

-Yonik
http://lucidworks.com


Re: Welcome Alan Woodward as Lucene/Solr committer

2012-10-17 Thread Yonik Seeley
Congrats and welcome, Alan!

-Yonik
http://lucidworks.com


On Wed, Oct 17, 2012 at 1:36 AM, Robert Muir rcm...@gmail.com wrote:
 I'm pleased to announce that the Lucene PMC has voted Alan as a
 Lucene/Solr committer.

 Alan has been contributing patches on various tricky stuff: positions
 iterators, span queries, highlighters, codecs, and so on.

 Alan: its tradition that you introduce yourself with your background.

 I think your account is fully working and you should be able to add
 yourself to the who we are page on the website as well.

 Congratulations!


Re: SOLR Sorting algorithm

2011-09-06 Thread Yonik Seeley
When sorting, ties are broken by the internal document id.  This gives
us a stable (if somewhat arbitrary) sort ordering.
If you want score to be the tiebreaker, you can specify it as the
secondary sort.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


On Tue, Sep 6, 2011 at 1:49 PM, BrianK brian.krue...@bonton.com wrote:
 We are running a SOLR query and are specifying a custom sort field to sort
 our results based on our sort field rather than the default score.  For the
 most part, the results are sorting by our field, but SOLR appears to be
 sorting the results by some other field or alogorithm and it's not the score
 field.  Our documents are populated from a database table and when running a
 similar query/sort against the database we don't get the same sort sequence
 as SOLR even though the sort is on the same field in both systems.
 IMPORTANT NOTE: the sort field/results field is not unique, the search
 results in question have the same value (1 in this case), but the results
 always come out in the same order.

 Can someone explain or point me in the right direction to determine how SOLR
 sorts results beyond the field specified in our query string.

 Example Query: q=Kitchen Productssort=sortSequence asc

 Example Results:
 name: Product 1
 sortSequence: 1
 score: 1.52221

 name: Product 5
 sortSequence: 1
 score: 1.52221

 name: Product 3
 sortSequence: 1
 score: 1.53112

 name: Product 2
 sortSequence: 2
 score: 1.51112

 etc.

 Are there hidden fields like document date, creation date, or other field
 that is not visible that might be factored into a sort?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-Sorting-algorithm-tp3314295p3314295.html
 Sent from the Lucene - General mailing list archive at Nabble.com.



Re: SOLR Sorting algorithm

2011-09-06 Thread Yonik Seeley
On Tue, Sep 6, 2011 at 4:48 PM, BrianK brian.krue...@bonton.com wrote:
 by internal document id you are referring to a field that is not visible to
 us.  We have an id field, I assume this is not the document id field you
 are talking about.  Assuming document id is not available to us, is it
 sorting this ascendind/descending  and is the document id simply a
 sequential number assigned as documents are loaded/indexed by solr?

Correct - it's not a field, but just the internal index or ord into
the internal data structures.
It's also transient, in that it can change across commit calls (either by
deleted documents being squeezed out, or by non-adjacent segments being merged).

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


[VOTE] Create Solr TLP

2011-04-26 Thread Yonik Seeley
A single merged project works only when people are relatively on the same page,
and when people feel it's mutually beneficial.  Recent events make it
clear that that
is no longer the case.

Improvements to Solr have been recently blocked and reverted on the
grounds that the new functionality was not immediately available to
non-Solr users.
This was obviously never part of the original idea (well actually - it was
considered but rejected as too onerous).  But the past doesn't matter as
much as the present - about how people chose to act and interpret
things today.

https://issues.apache.org/jira/browse/SOLR-2272
http://markmail.org/message/unrvjfudcbgqatsy

Some people warned us against merging at the start, and I guess it
turns out they were right.

I no longer feel it's in Solr's best interests to remain under the same
PMC as Lucene-Java, and I know some other committers who have said
they feel like Lucene got the short end of the stick.  But rather than
arguing about who's right (maybe both?) since enough of us feel it's no longer
mutually beneficial, we should stop fighting and just go our separate
ways.

Please VOTE to create a new Apache Solr TLP.

Here's my +1

-Yonik


Re: [DISCUSS] Lucene Java - Lucene Core

2010-11-09 Thread Yonik Seeley
On Mon, Nov 8, 2010 at 1:02 PM, Uwe Schindler u...@thetaphi.de wrote:
 Die, Contrib, die! We will hopefully only have modules soon?

 +1 to Lucene Core, Lucene Modules and Solr. As qualifier we can use 
 for Java to differentiate from .NET. But in my opinion, all others should 
 be separate projects and the main project is called Lucene Family for Java 
 (without family but I like it).


Right - Lucene Core, Modules, Solr are all the same project.
We're only coming up with these different labels because some of the
parts may be downloaded and/or documented separately (and have a
pre-existing brand associated with that).

So distinct labels make sense for Lucene and Solr, but not for contrib
and not for Modules (at least not yet).

I understand Steven's concern too - the download for Lucene Core is
likely to have contrib stuff for some time to come, so there would
logically be core and contrib parts to Lucene Core.  Although in
practice, I don't think that little bit of ambiguity is likely to
cause problems.

-Yonik
http://www.lucidimagination.com


Re: New LuSolr trunk (was: RE: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling)

2010-03-23 Thread Yonik Seeley
For Solr, we can just move the current trunk to a 15 branch.

-Yonik

On Tue, Mar 23, 2010 at 9:39 AM, Grant Ingersoll gsing...@apache.org wrote:

 On Mar 22, 2010, at 8:27 AM, Uwe Schindler wrote:

 Hi all,

 the discussion where to do the development after the merge, now gets actual:

 Currently a lusolr test-trunk is done as a branch inside solr 
 (https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk). The 
 question is, where to put the main development and how to switch, so 
 non-developers that have checkouts of solr and/or lucene will see the change 
 and do not send us outdated patches.

 I propose to do the following:

 - Start a new top-level project folder inside /lucene root svn folder: 
 https://svn.apache.org/repos/asf/lucene/lusolr (please see lusolr as a 
 placeholder name) and add branches, tags subfolders to it. Do not create 
 trunk and do this together with the next step.

 OK, I created https://svn.apache.org/repos/asf/lucene/dev/ and given 
 appropriate rights.  Uwe, you can now do the rest of the move.  Once you've 
 done it, let me know and I can make sure to add back the contrib rights.

 - Move the branch from 
 https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk to this new 
 directory as trunk
 - For lucene flexible indexing, create a corresponding flex branch there and 
 svn copy it from current new trunk. Merge the lucene flex changes into it. 
 Alternatively, land flex now. Or simply do svn copy of current flex branch 
 instead of merging (may be less work).
 - Do the same for possible solr branches in development
 - Create a tag in the lucene tags folder and in the solr tags folder with 
 the current state of each trunk. After that delete all contents from old 
 trunk in solr and lucene and place a readme file pointing developers to the 
 new merged trunk folder (for both old trunks). This last step is important, 
 else people who checkout the old trunk will soon see a very outdated view 
 and may send us outdated patches in JIRA. When the contents of old-trunk 
 disappear it's obvious to them what happened. If they had already some 
 changes in their checkout, the svn client will keep the changed files as 
 unversioned (after upgrade). The history keeps available, so it's also 
 possible to checkout an older version from trunk using @rev or -r rev. I did 
 a similar step with some backwards compatibility changes in lucene (add a 
 README).

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Monday, March 22, 2010 11:37 AM
 To: java-...@lucene.apache.org
 Subject: Re: (LUCENE-2297) IndexWriter should let you optionally enable
 reader pooling

 I think we should.

 It (newtrunk) was created to test Hoss's side-by-sdie proposal, and
 that approach looks to be working very well.

 Up until now we've been committing to the old trunk and then
 systematically merging over to newtrunk.  I think we should now flip
 that, ie, commit to newtrunk and only merge back to the old trunk if
 for some strange reason it's needed.

 Mike

 On Mon, Mar 22, 2010 at 6:32 AM, Uwe Schindler u...@thetaphi.de wrote:
 Are we now only working on newtrunk?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless (JIRA) [mailto:j...@apache.org]
 Sent: Monday, March 22, 2010 11:22 AM
 To: java-...@lucene.apache.org
 Subject: [jira] Resolved: (LUCENE-2297) IndexWriter should let you
 optionally enable reader pooling


     [ https://issues.apache.org/jira/browse/LUCENE-
 2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-
 tabpanel
 ]

 Michael McCandless resolved LUCENE-2297.
 

    Resolution: Fixed

 Fixed on newtrunk.

 IndexWriter should let you optionally enable reader pooling
 ---

                Key: LUCENE-2297
                URL: https://issues.apache.org/jira/browse/LUCENE-
 2297
            Project: Lucene - Java
         Issue Type: Improvement
           Reporter: Michael McCandless
           Priority: Minor
            Fix For: 3.1

        Attachments: LUCENE-2297.patch


 For apps using a large index and frequently need to commit and
 resolve deletes, the cost of opening the SegmentReaders on demand
 for
 every commit can be prohibitive.
 We an already pool readers (NRT does so), but, we only turn it on
 if
 NRT readers are in use.
 We should allow separate control.
 We should do this after LUCENE-2294.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.


 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: 

Re: Branding Solr+Lucene

2010-03-22 Thread Yonik Seeley
On Mon, Mar 22, 2010 at 2:20 PM, Ryan McKinley ryan...@gmail.com wrote:
 I'm confused... what is the need for a new name?  The only place where
 there is a conflict is in the top level svn tree...

Agree, no need to re-brand.

 What about something general like:
 https://svn.apache.org/repos/asf/lucene/dev
 or
 https://svn.apache.org/repos/asf/lucene/project

Hmmm, that one isn't bad.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-14 Thread Yonik Seeley
On Sun, Mar 14, 2010 at 11:02 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Would it be correct to say that a subset of Lucene/Solr committers discussed 
 the proposal internally/offline (i.e. not on MLs) before proposing it?

Nope. Where did this idea come from?

I'm quite sure my proposal (my original we-should-just-merge email)
was a surprise to everyone.  I discussed it with no one previously.
All of the related discussions in previous months had been about
pulling stuff out of Solr, why that was disadvantageous to Solr, etc,
etc.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-14 Thread Yonik Seeley
On Sun, Mar 14, 2010 at 2:36 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
  if I understand things correctly, poaching is only needed when the code is 
 not committed in the
 right project/location to begin with.

That is the problem though - Solr should be allowed to keep whatever
code was written under it's control, w/o pressure to put it in Lucene
(and often out of reach).  And Lucene should be able to poach what it
wants from Solr.  But with the projects already half overlapping... it
was a recipe for conflict.

We've already had conflicts about this in the past.  The conflicts
were either going to get worse over time, esp with Solr not on
Lucene's trunk, or we were going to merge.  We've decided to tear down
the artificial wall and work together.

Some people suggest that this could have worked w/o merging.  I
disagreed, as I think the majority of those voting +1 disagreed.

Not sure who's following lucene-dev and solr-dev, but the committers
have already been merged. We're not standing still...

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-14 Thread Yonik Seeley
On Sun, Mar 14, 2010 at 2:58 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Would it make sense to think of Solr as one such Lucene module?
 In other words, don't even bother with merging just the -dev lists, but 
 really just merge everything.  In that case Solr's relationship with Lucene 
 core becomes much like the relationship Lucene contribs have with Lucene core 
 today in terms of compatibility, builds, and committers' responsibilities?

 That kind of makes sense to me.  Of course, because of the sheer volume we 
 may want to keep -user lists separate and possibly even create new ones for 
 Lucene modules that attract enough interest on their own.

Yes, the general gist of that all makes sense.  merge-everything is
more along the lines of the original discussion (we just needed to
enumerate some specific action items in the vote).  The things we
probably don't merge are just for user convenience.  Separate
downloads  websites  user lists.  Might have made sense to merge
JIRA, but there are just so many open issues... it prob wouldn't be
practical.

And yes, more user lists in the future could even make sense - say a
separate one for DIH.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-11 Thread Yonik Seeley
Thanks everyone, this vote has passed.
A bit more contentious of a PMC vote than usual, but the committer
vote was clear.

-Yonik


On Mon, Mar 8, 2010 at 9:11 PM, Yonik Seeley ysee...@gmail.com wrote:
 Apoligies in advance for calling yet another vote, but I just wanted
 to make sure this was official.
 Mike's second VOTE thread could probably technically stand on it's own
 (since it included PMC votes), but given that I said in my previous
 VOTE thread that I was just polling Lucene/Solr committers and would
 call a second PMC vote, that may have acted to suppress PMC votes on
 Mike's thread also.

 Please vote for the proposal quoted below to merge lucene/solr development.
 Here's my +1

 -Yonik

 Mike's call for a VOTE (amongst lucene/solr committers +11 to -1):
 http://search.lucidimagination.com/search/document/a400ffe62ae21aca/vote_merge_the_development_of_solr_lucene_take_2#22d7cd086d9c5cf0
 Subject: Merge the development of Solr/Lucene (take 2)
 A new vote, that slightly changes proposal from last vote (adding only
 that Lucene can cut a release even if Solr doesn't):

  * Merging the dev lists into a single list.

  * Merging committers.

  * When any change is committed (to a module that belongs to Solr or
    to Lucene), all tests must pass.

  * Release details will be decided by dev community, but, Lucene may
    release without Solr.

  * Modulariize the sources: pull things out of Lucene's core (break
    out query parser, move all core queries  analyzers under their
    contrib counterparts), pull things out of Solr's core (analyzers,
    queries).

 These things would not change:

  * Besides modularizing (above), the source code would remain factored
    into separate dirs/modules the way it is now.

  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
    issues).

  * User's lists remain separate.

  * Web sites remain separate.

  * Release artifacts/jars remain separate.



Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Yonik Seeley
I think the problem is political - and that leads to both technical
and political problems.
We came up with a largely political solution that should solve both.

We can't have a one way street of pulling everything interesting out
of Solr for Lucene, or poaching, or expanding Lucene's domain while
shrinking Solr's (just limit to server stuff, etc).  Lucene and Solr
committers are headed down the road toward greater competition - but
with this proposal, we said we'd rather work together instead.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Yonik Seeley
On Tue, Mar 9, 2010 at 9:48 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 I have built 10s of projects that
 have simply used Lucene as an API and had no need for Solr, and I've built
 10s of projects where Solr made perfect sense. So, I appreciate their
 separation.

As does everyone - which is why there will always be separate
downloads.  As a user, the only side affect you should see is an
improved Lucene and Solr.

Saying that Solr should move some stuff to Lucene for Lucene's
benefit, without regard to if it's actually benefitial to Solr, is a
non-starter.  The lucene/solr committers have been down that road
before.  The solution that most committers agreed would improve the
development of both projects is to merge development.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Yonik Seeley
On Tue, Mar 9, 2010 at 11:00 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 However, like I said it seems to be like
 the discussion of the real issues is only happening recently over the past
 few days.

This certainly isn't new territory for lucene/solr devs though - the
issue of what belongs in Solr and what belongs in Lucene, and problems
around pulling out schema and faceting and putting it in Lucene have
come up before (also in lengthy threads).

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Yonik Seeley
On Tue, Mar 9, 2010 at 11:35 AM, Michael Busch busch...@gmail.com wrote:
 No matter if this dev-merge vote passes or not, we still
 want a separate analysis module, right?

No.  That's the point of the dev merge - to allow free movement of
source code that benefits both projects.

-Yonik


[VOTE] merge lucene/solr development (take 3)

2010-03-08 Thread Yonik Seeley
Apoligies in advance for calling yet another vote, but I just wanted
to make sure this was official.
Mike's second VOTE thread could probably technically stand on it's own
(since it included PMC votes), but given that I said in my previous
VOTE thread that I was just polling Lucene/Solr committers and would
call a second PMC vote, that may have acted to suppress PMC votes on
Mike's thread also.

Please vote for the proposal quoted below to merge lucene/solr development.
Here's my +1

-Yonik

Mike's call for a VOTE (amongst lucene/solr committers +11 to -1):
http://search.lucidimagination.com/search/document/a400ffe62ae21aca/vote_merge_the_development_of_solr_lucene_take_2#22d7cd086d9c5cf0
 Subject: Merge the development of Solr/Lucene (take 2)
 A new vote, that slightly changes proposal from last vote (adding only
 that Lucene can cut a release even if Solr doesn't):

  * Merging the dev lists into a single list.

  * Merging committers.

  * When any change is committed (to a module that belongs to Solr or
to Lucene), all tests must pass.

  * Release details will be decided by dev community, but, Lucene may
release without Solr.

  * Modulariize the sources: pull things out of Lucene's core (break
out query parser, move all core queries  analyzers under their
contrib counterparts), pull things out of Solr's core (analyzers,
queries).

 These things would not change:

  * Besides modularizing (above), the source code would remain factored
into separate dirs/modules the way it is now.

  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
issues).

  * User's lists remain separate.

  * Web sites remain separate.

  * Release artifacts/jars remain separate.


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-08 Thread Yonik Seeley
On Mon, Mar 8, 2010 at 9:22 PM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 For completeness from the VOTE on private@

It's called private for a reason.

-Yonik


Re: [VOTE] merge lucene/solr development (take 3)

2010-03-08 Thread Yonik Seeley
On Mon, Mar 8, 2010 at 9:49 PM, Michael Busch busch...@gmail.com wrote:
 Question: Is it sufficient to have more +1s than -1s for this vote to pass?
3 +1s and more +1s than -1s is sufficient.

 I thought for votes as significant as this one a -1 veto is a showstopper?
It's not really tied to significance - releases, acceptance to
incubate, etc, all require more +1s than -1s.

-Yonik


Re: [VOTE] merge lucene/solr development

2010-03-04 Thread Yonik Seeley
+1
Great idea! :-)

-Yonik

On Wed, Mar 3, 2010 at 5:42 PM, Yonik Seeley yo...@apache.org wrote:
 Many Lucene/Solr committers think that merging development would be a
 benefit to both projects.
 Separate downloads would remain (among other things), so end users
 would not be impacted (except for higher quality products over time).
 Since this is a change to Lucene/Solr project development, I'd like to
 get a format vote from the committers of both projects.
 If there are 3 +1s and more +1s than -1s, we can pass this to the
 Lucene PMC to ratify.

 -Yonik

 Discussion thread:
 http://search.lucidimagination.com/search/document/c7817932400808ad/factor_out_a_standalone_shared_analysis_package_for_nutch_solr_lucene



Re: [VOTE] Merge the development of Solr/Lucene (take 2)

2010-03-04 Thread Yonik Seeley
+1

-Yonik

On Thu, Mar 4, 2010 at 4:33 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 A new vote, that slightly changes proposal from last vote (adding only
 that Lucene can cut a release even if Solr doesn't):

  * Merging the dev lists into a single list.

  * Merging committers.

  * When any change is committed (to a module that belongs to Solr or
   to Lucene), all tests must pass.

  * Release details will be decided by dev community, but, Lucene may
   release without Solr.

  * Modulariize the sources: pull things out of Lucene's core (break
   out query parser, move all core queries  analyzers under their
   contrib counterparts), pull things out of Solr's core (analyzers,
   queries).

 These things would not change:

  * Besides modularizing (above), the source code would remain factored
   into separate dirs/modules the way it is now.

  * Issue tracking remains separate (SOLR-XXX and LUCENE-XXX
   issues).

  * User's lists remain separate.

  * Web sites remain separate.

  * Release artifacts/jars remain separate.

 Mike



[VOTE] merge lucene/solr development

2010-03-03 Thread Yonik Seeley
Many Lucene/Solr committers think that merging development would be a
benefit to both projects.
Separate downloads would remain (among other things), so end users
would not be impacted (except for higher quality products over time).
Since this is a change to Lucene/Solr project development, I'd like to
get a format vote from the committers of both projects.
If there are 3 +1s and more +1s than -1s, we can pass this to the
Lucene PMC to ratify.

-Yonik

Discussion thread:
http://search.lucidimagination.com/search/document/c7817932400808ad/factor_out_a_standalone_shared_analysis_package_for_nutch_solr_lucene


Re: [VOTE] merge lucene/solr development

2010-03-03 Thread Yonik Seeley
On Wed, Mar 3, 2010 at 7:41 PM, Mark Miller markrmil...@gmail.com wrote:
 I'm only for the merge with aligned releases - its the only way Solr can
 really stay on Lucene trunk happily. Aligned releases are also my biggest
 worry (and part of why I initially leaned against such a merge), but without
 it, there goes all of the larger reasons I'm into the merge now - Solr can
 be on trunk and we can have better sharing / less duplication between the
 projects - which I personally think requires Solr being on Lucene trunk - or
 it won't really work at all. And Solr being on trunk really needs aligned
 releases

Correct - I believe most who agreed with a merge essentially agreed on
all the points about what a merge meant.  Merged dev list, committers,
and releases.  Maintain separate downloads, user lists.

 I wanted to avoid the lawyers... if we need to hammer out all the
little things that aren't as important (sync release numbers, etc)
discussion will be endless and we'll never get anywhere.

-Yonik


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-02-26 Thread Yonik Seeley
On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe sar...@syr.edu wrote:
 On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
 I've started to think that a merge of Solr and Lucene would be in the
 best interest of both projects.

 The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather 
 than physically merging:

Everything is virtual here anyway :-)
I agree with Mike that a single dev list is highly desirable.  There
would still be separate downloads.  What to do with some of the other
stuff is unspecified.

Committers would need to be merged though - that's the only way to
make a change across projects w/o breaking stuff.

-Yonik


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-02-24 Thread Yonik Seeley
I've started to think that a merge of Solr and Lucene would be in the
best interest of both projects.

Recently, Solr as pulled back from using Lucene trunk (or even the
latest version), as the increased amount of change between releases
(and in-between releases) made it impractical to deal with. This is a
pretty big negative for Lucene, since Solr is the biggest Lucene user
(where people are directly exposed to lucene for the express purpose
of developing search features).  I know Solr development has always
benefited hugely from users using trunk, and Lucene trunk has now lost
all the solr users.

Some in Lucene development have expressed a desire to make Lucene more
of a complete solution, rather than just a core full-text search
library... things like a data schema, faceting, etc.  The Lucene
project already has an enterprise search platform with these
features... that's Solr.  Trying to pull popular pieces out of Solr
makes life harder for Solr developers, brings our projects into
conflict, and is often unsuccessful (witness the largely failed
migration of FunctionQueries from Solr to Lucene).  For Lucene to
achieve the ultimate in usability for users, it can't require Java
experience... it needs higher level abstractions provided by Solr.

The other benefit to Lucene would be to bring features to developers
much sooner... Solr has had features years before they were developed
in Lucene, and currently has more developers working with it.  Esp
with Solr not using Lucene trunk, if a Solr developer wants a feature
quickly, they cannot add it to Lucene (even if it might make sense
there) since that introduces a big unpredictable lag - when that
version of Lucene make it's way into Solr.

The current divide is a bit unnatural.  For maximum benefit of both
projects, it seems like Solr and Lucene should essentially merge.
Lucene core would essentially remain as it is, but:
1) Solr would go back to using Lucene's trunk
2) For new Solr features, there would be an effort to abstract it such
that non-Solr users could use the functionality (faceting, field
collapsing, etc)
3) For new Lucene features, there would be an effort to integrate it into Solr.
4) Releases would be synchronized... Lucene and Solr would release at
the same time.

-Yonik


Re: [spatial] Cartesian Tiers nomenclature

2009-12-29 Thread Yonik Seeley
On Tue, Dec 29, 2009 at 7:13 PM, Marvin Humphrey mar...@rectangular.com wrote:
 ... but for this algorithm, different rasterization resolutions need not
 proceed by powers-of-two.

Indeed - one way to further generalize would be to use something like
Lucene's trie-based Numeric field, but with a square instead of a
line.  That would allow to tweak the space/speed tradeoff.

-Yonik
http://www.lucidimagination.com


Re: [VOTE] Release Apache Lucene Java 2.9.1, take 3

2009-10-31 Thread Yonik Seeley
On Thu, Oct 29, 2009 at 7:27 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 OK, let's try this again!

 I've built new release artifacts from svn rev 831145 (on the 2.9
 branch), here:

  http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/

 Changes are here:

  http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1changes/

 Please vote to officially release these artifacts as Apache Lucene
 Java 2.9.1.

+1

-Yonik
http://www.lucidimagination.com


Re: [VOTE] Release Solr 1.4.0

2009-10-29 Thread Yonik Seeley
On Thu, Oct 29, 2009 at 8:49 AM, Uwe Schindler u...@thetaphi.de wrote:
 Yes, it's too bad!

 But you will replace the lucene jars in the artifacts before releasing?
 Because it would not be good to have jar files with version 2.9.1 in the
 package that are not the officially released 2.9.1 artifacts.

Darn... forgot about the version number in the jars.
Sigh.

-Yonik
http://www.lucidimagination.com


Re: [VOTE] Release Solr 1.4.0

2009-10-29 Thread Yonik Seeley
On Thu, Oct 29, 2009 at 9:07 AM, Bill Au bill.w...@gmail.com wrote:
 I think someone has already pointed this out before.  On numerous occasions
 I have had to dig into the Lucene code when writing code to extend Solr.  So
 it will be much earier to make sure that I am looking at the right code if
 Solr uses an official release of Lucene, as opposed to a particular SVN
 revision.

And it's much easier for people to use a Solr release if we could
actually *release* one!!!
But yes, it looks like we will spin a new Solr release.

-Yonik
http://www.lucidimagination.com


 Bill

 On Thu, Oct 29, 2009 at 8:59 AM, Grant Ingersoll gsing...@apache.orgwrote:

 Yeah, unfortunately, I think we need to use the new Jars.


 On Oct 29, 2009, at 8:52 AM, Yonik Seeley wrote:

  On Thu, Oct 29, 2009 at 8:49 AM, Uwe Schindler u...@thetaphi.de wrote:

 Yes, it's too bad!

 But you will replace the lucene jars in the artifacts before releasing?
 Because it would not be good to have jar files with version 2.9.1 in the
 package that are not the officially released 2.9.1 artifacts.


 Darn... forgot about the version number in the jars.
 Sigh.

 -Yonik
 http://www.lucidimagination.com


Re: [VOTE] Release Apache Lucene Java 2.9.1

2009-10-26 Thread Yonik Seeley
On Mon, Oct 26, 2009 at 12:43 PM, Uwe Schindler u...@thetaphi.de wrote:
 Looks good. One thing:

 In Mark's artifacts, he changed the common-build.xml to not have -dev in the
 version before the release. You can see this in SVN. I am fine with having
 -dev in the source artefact, because if someone compiles his own bin from
 the artefact, it should have -dev in it, because it's not an official build.

Right, having the -dev when someone tries to build it themselves is
the way we should keep it.

-Yonik
http://www.lucidimagination.com


Re: [VOTE] Release Solr 1.4.0

2009-10-26 Thread Yonik Seeley
Hmmm, weren't you going to update the version numbers to 1.4.1-dev
like we just discussed in the other thread?
That way if someone changes some of the solr source from the download
and recompiles, they don't get a version number of 1.4.0

-Yonik
http://www.lucidimagination.com



On Mon, Oct 26, 2009 at 6:15 PM, Grant Ingersoll gsing...@apache.org wrote:
 Tis the season for releases...

 Please vote on releasing the Solr 1.4.0 artifacts located at
 http://people.apache.org/~gsingers/solr/1.4.0/  (note, solr.tar and
 solr-maven.tar are not artifacts to be released)

 CHANGES are spelled out at
 https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/CHANGES.txt

 Thanks,
 Grant



Re: [VOTE] Release Solr 1.4.0

2009-10-26 Thread Yonik Seeley
On Mon, Oct 26, 2009 at 9:58 PM, Grant Ingersoll gsing...@apache.org wrote:
 OK, take two is up in the same place.  Please vote.

I'm seeing emptiness at
http://people.apache.org/~gsingers/solr/1.4.0/

-Yonik
http://www.lucidimagination.com


 On Oct 26, 2009, at 6:15 PM, Grant Ingersoll wrote:

 Tis the season for releases...

 Please vote on releasing the Solr 1.4.0 artifacts located at
 http://people.apache.org/~gsingers/solr/1.4.0/  (note, solr.tar and
 solr-maven.tar are not artifacts to be released)

 CHANGES are spelled out at
 https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/CHANGES.txt

 Thanks,
 Grant





Re: [ACUS09] Proposed Schedule

2009-07-14 Thread Yonik Seeley
On Tue, Jul 14, 2009 at 4:53 PM, Uwe Schindleru...@thetaphi.de wrote:
 NumericRangeQuery is not only geographical search... So it would also cover
 other directions: Things I can do with Lucene additionally to full text
 search, that could be done before only with RDBMS and/or PostGIS...: Do full
 text search with scoring and so on in addition to filter my products by
 price and availability in shops at specific geographic regions; newspaper
 articles about Arnold and national bankruptcy *g* from a datetime range
 sorted by article size,... (we know all possibilities we have now by
 numeric/geo search).

Let's not over-state the case: people have been doing all that for
years with Lucene.  There were many methods to deal with numeric
encoding and slow range queries.  The Trie* stuff has made it both
easier and faster out-of-the-box - a good thing.

-Yonik
http://www.lucidimagination.com


Re: how to control the disk size of the indices

2008-03-24 Thread Yonik Seeley
On Mon, Mar 24, 2008 at 9:34 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Hi Yannis,

  I don't think there is anything of that sort in Lucene, but this shouldn't 
 be hard to do with a process outside Lucene.  Of course. optimizing an index 
 increases its size temporarily, so your external process would have to take 
 that into account and play it safe.  You could also set mergeFactor to 1, 
 which should keep your index in a fully optimized state

MergeFactor must be = 2

You will always need to allow for double the index size due to
increased temporary disk usage during segment merges (including
optimize).   Peak use on a system being searched and indexed
concurrently will often be even higher since currently open readers
reference files that have been deleted.

-Yonik


Solr graduates and joins Lucene as sub-project

2007-01-17 Thread Yonik Seeley

Solr has just graduated from the Incubator, and has been accepted as a
Lucene sub-project!
Thanks to all the Lucene and Solr users, contributors, and developers
who helped make this happen!

I have a feeling we're just getting started :-)
-Yonik


Re: Searching by bit masks

2006-11-09 Thread Yonik Seeley

On 11/9/06, ltaylor.employon [EMAIL PROTECTED] wrote:

I am currently evaluating Lucene to see if it would be appropriate to
replace my company's current search software.  So far everything has been
looking great, however there is one requirement that I am not too certain
about.

What we need to do is to be able to store a bit mask specifying various
filter flags for a document in the index and then search this field by
specifying another bit mask with desired filters, returning documents that
have any of the specified flags set.  In other words, we are doing a bitwise
OR on the stored filter bit mask and the specified filter bit mask and if it
is non-zero, we want to return the document.


Lucene maintains an inverted index, so you don't need a bit mask...
you can actually use symbolic values.

doc {
 id=1
 tags = tag1 tag3 tag7
}

doc {
 id = 2
 tags = tag1 tag2 tag5 tag9
}

Then you can search via a BooleanQuery:

tags:(tag1 OR tag2 OR tag7)

If you are new to Lucene, you might check out Solr first.  If nothing
else, it would be a gentle introduction to Lucene, and you could build
a custom Lucene implementation later if it doesn't meet your needs.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Re: [PROPOSAL] index server project

2006-10-19 Thread Yonik Seeley

On 10/19/06, Steven Parkes [EMAIL PROTECTED] wrote:

You mention partitioning of indexes, though mostly around delete. What
about scalability of corpus size?


Definitely in scope.  Solr already has scalability of search volume
via searchers behind of a load balancer all getting their index from a
master.  The problem comes when an index is too big to get decent
latency for a single query, and that's when you need to partiton the
index into shards to use google terminology.


Would partitioning be effective for
that, too?


Yes, to a certain extent.  At some point you run into network
bandwidth issues if you go deep into rankings.


What about scalability of ingest rate?


As it relates to indexing, I think nutch already has that base covered.


What are you thinking, in terms of size? Is this a 10 node thing?


I'm personally interested in perhaps 10 to 20 index shards, with
multiple replicas of each shard for HA and query load scalability.


A 1000
node thing? More? Bigger is cool, but raises a lot of issues.


Should be possible, but I won't personally be looking for that.  I
think scaling effectively will be partially in the hands of the client
and how it chooses to merge results from shards.


How
dynamic?



Can nodes come and go?


Unplanned: yes.  HA is personally key for me.
Planned (adding capacity gracefully): it would be nice.  I actually
hadn't planned it for Solr.


Are you going to assume homogeneity of
nodes?


Hardware homogeneity?  That might be out of scope... I'd start off
without worrying about it in any case.


What about add/modify/delete to search visibility latency? Close to
batch/once-a-day or real-time?


Anywhere in between I'd think.  Realtime latencies of minutes or
longer are normally fine.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Re: [PROPOSAL] index server project

2006-10-18 Thread Yonik Seeley

On 10/18/06, Doug Cutting [EMAIL PROTECTED] wrote:

Does this make sense?  Does it sound like it would be useful to Solr?
To Nutch?  To others?  Who would be interested and able to work on it?


Rather than holding my tounge until I wrap my head around all the
issues, I'll say that I'm definitely interested!

-Yonik


Re: Infrastructure for large Lucene index

2006-10-06 Thread Yonik Seeley

On 10/6/06, James [EMAIL PROTECTED] wrote:

Our indexes are, in aggregate across our
various collections, even larger than you need.  We use Remote
ParalellMultiSearcher, with some custom modifications (and we are in the
process of making more)


I'm looking into adding some form of distributed search to Solr.
The main problem I see with directly using ParallelMultiSearcher is a
lack of high availability features.

If the index is broken into multiple shards then we need multiple
copies of each shard, and some way of loadbalancing and failing over
amongst copies of shards.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Re: Infrastructure for large Lucene index

2006-10-06 Thread Yonik Seeley

On 10/6/06, Slava Imeshev [EMAIL PROTECTED] wrote:

-- James [EMAIL PROTECTED] wrote:
  If the index is broken into multiple shards then we need multiple copies
 of each shard, and some way of loadbalancing and failing over amongst copies
 of shards.

 Yep.  Unfortunately it's not simple, but those are all pieces of what we are
 currently in the process of implementing.

The problem is that over time indexes develop personality and the term 
frequency
can be vary significantly from index to index


A global idf calculation is possible though... MultiSearcher already
does this when searching across multiple indicies.  The downside of
doing it across remote indicies is an increase in the number of RPC
calls.  In general, it's probably better to try and keep index shards
balanced.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Re: Binary fields in index

2005-09-26 Thread Yonik Seeley
Binary fields can be stored, but not indexed.

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 9/26/05, Fredrik Andersson [EMAIL PROTECTED] wrote:

 I was hoping to avoid the overhead of encoding/decoding, but it looks like
 I'll have to do that :(

 While on the topic, I noticed in the Field class that we have a isBinary
 boolean flag, however this always gets set to false in the constructors as
 well as the default value, and I can't even see a usage of this flag at
 write-time. What's the point of this flag, a feature for binary fields
 that
 was never implemented? I'm talking about the latest sources now, by the
 way,
 1.9.something.

 Fredrik

 On 9/26/05, Koji Sekiguchi [EMAIL PROTECTED] wrote:
 
  You can encode (e.g. base64) the binary data to get a String
  and store the String.
 
  Koji
 
   -Original Message-
   From: Fredrik Andersson [mailto:[EMAIL PROTECTED]
   Sent: Monday, September 26, 2005 6:31 PM
   To: general@lucene.apache.org
   Subject: Binary fields in index
  
  
   Hello Gang!
  
   Is there any trick, or undocumented way, to store binary (unindexed,
   untokenized) data in a Lucene Field? All the Field
   constructors just deal
   with Strings. I'm currently using another database to store
   binary data, but
   it would be very neat, and more efficient, to store it
   directly in Lucene.
  
   Thanks in advance,
   Fredrik