Re: Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Erik Hatcher
Sweta -

There’s been an enormous number of changes between 6.1 and 6.5.1.  See CHANGES: 

 
https://github.com/apache/lucene-solr/blob/master/solr/CHANGES.txt#L439-L1796 


wow, huh?

And yes, there have been dramatic improvements (Solr 6.5+) in multi-word 
synonym handling, see Steve’s blog here for details: 

   
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
 


As for your other questions, not quite sure exactly what you mean on those.  
What features/improvements are you looking for specifically here?

Erik


> On May 12, 2017, at 8:39 AM, Sweta Parekh  wrote:
> 
> Hi Team,
> Can you please help me with new features, enhancements and improvements on 
> Solr 6.5.1 v/s 6.1 as we are planning to upgrade the version.
> * Has there been major improvement in multi-term / phrase synonyms 
> and match mode
> 
> * Can we perform secondary search using different mm to find better 
> results like auto relax mm
> 
> * Any new update in results exclusion, elevation etc..
> 
> 
> Regards,
> Sweta Parekh
> Search / CRO - Associate Program Manager
> Digital Marketing Services
> sweta.par...@clerx.com
> Extn: 284887 | Mobile: +(91) 9004667625
> eClerx Services Limited [www.eClerx.com]
> 



Re: Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Steve Rowe
Hi,

See 6.5.1 CHANGES: 

--
Steve
www.lucidworks.com

> On May 12, 2017, at 8:39 AM, Sweta Parekh  wrote:
> 
> Hi Team,
> Can you please help me with new features, enhancements and improvements on 
> Solr 6.5.1 v/s 6.1 as we are planning to upgrade the version.
> * Has there been major improvement in multi-term / phrase synonyms 
> and match mode
> 
> * Can we perform secondary search using different mm to find better 
> results like auto relax mm
> 
> * Any new update in results exclusion, elevation etc..
> 
> 
> Regards,
> Sweta Parekh
> Search / CRO - Associate Program Manager
> Digital Marketing Services
> sweta.par...@clerx.com
> Extn: 284887 | Mobile: +(91) 9004667625
> eClerx Services Limited [www.eClerx.com]
> 



Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Sweta Parekh
Hi Team,
Can you please help me with new features, enhancements and improvements on Solr 
6.5.1 v/s 6.1 as we are planning to upgrade the version.
 * Has there been major improvement in multi-term / phrase synonyms and 
match mode

* Can we perform secondary search using different mm to find better 
results like auto relax mm

* Any new update in results exclusion, elevation etc..


Regards,
Sweta Parekh
Search / CRO - Associate Program Manager
Digital Marketing Services
sweta.par...@clerx.com
Extn: 284887 | Mobile: +(91) 9004667625
eClerx Services Limited [www.eClerx.com]



Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
Well, I've started to answer, but it hit a nerve and turned into a
guide. Which is now a blog post with 6 steps (not mentioning step 0 -
Admitting you have a problem).

I hope this is helpful:
http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com> wrote:
> Hi,
>
> I am in the process of looking for a comprehensive list of Solr features in
> order to assess how much have we implemented, what are some features that
> we were unaware of that we can utilize etc. I have looked at the following
> link for Solr features http://lucene.apache.org/solr/features.html but it
> looks like it highlights the main features. I also looked at this page
> http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> details and I am looking for more of such list and possibly a comprehensive
> list that combines them all.
>
> Regards,
> Salman


Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
On 5 November 2015 at 11:22, Shawn Heisey  wrote:
> As far as I know, there are no currently available books covering
> version 5, but I believe there is at least one on the horizon.

Rafal's book is "compatible" with Solr 5:
http://solr.pl/solr-cookbook-third-edition/ . But the number of
features and changes introduced in 5.1, 5.2, AND 5.3 was making any
book writing on the topic quite hard. Speaking from the experience.

Regards,
   Alex.
P.s. My last book of course targeted the latest and greatest 4.3 :-) I
no longer recommend people buy it. The concepts might all still be
valid, but the step-by-step guides would be quite broken.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


Re: Solr Features

2015-11-05 Thread Shawn Heisey
On 11/5/2015 8:38 AM, Jack Krupansky wrote:
> It's unfortunate, but the official Solr reference guide does not have a
> table of contents:
> http://mirror.olnevhost.net/pub/apache/lucene/solr/ref-guide/apache-solr-ref-guide-5.3.pdf
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

While it's true that there is no table of contents included in the
reference guide text, Acrobat Reader will automatically generate a table
of contents from markers within the guide for navigation purposes.

See the left side of this window:

https://www.dropbox.com/s/6foaz7xeq11vyuy/solr-ref-guide-toc.png?dl=0

> My Solr 4.4 Deep Dive is now a little outdated (since 4.4) and even then
> was not complete (no SolrCloud or DIH), but its table of contents would
> probably give you a fair view of the sheer magnitude of the number of Solr
> features:
> http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> It probably still has the most in-depth coverage and examples for token
> analysis and update processors, even though more recent Solr changes are
> not covered.

I have not seen your book.  I bet it's awesome, and for $10 I should
just go ahead and buy it.

The recent title "Solr In Action" covers Solr pretty well, though it is
somewhat pricy.  I have not read all of it.

https://www.manning.com/books/solr-in-action?a_bid=39472865_aid=1

As far as I know, there are no currently available books covering
version 5, but I believe there is at least one on the horizon.

Thanks,
Shawn



Re: Solr Features

2015-11-05 Thread Jack Krupansky
It's unfortunate, but the official Solr reference guide does not have a
table of contents:
http://mirror.olnevhost.net/pub/apache/lucene/solr/ref-guide/apache-solr-ref-guide-5.3.pdf
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

My Solr 4.4 Deep Dive is now a little outdated (since 4.4) and even then
was not complete (no SolrCloud or DIH), but its table of contents would
probably give you a fair view of the sheer magnitude of the number of Solr
features:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

It probably still has the most in-depth coverage and examples for token
analysis and update processors, even though more recent Solr changes are
not covered.



-- Jack Krupansky

On Thu, Nov 5, 2015 at 9:18 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Glad you liked it.
>
> The problem with your request is that it is not clear what you already
> know and in which direction you are trying to go. Cloud is a big topic
> all on its own. Relevancy - another one. Crafting schema to best
> represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
> party client - a fourth. Multilingual content - a fifth. And so on.
>
> But if you want high level guidelines, I would pick a couple of Solr
> books and look at their Tables of Contents. Then, do the same for the
> Reference Guide. This should be a good mid-level overview of issues.
>
> Regards,
> Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 08:43, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Thanks Alex for your response. Much appreciated effort! For sure, I will
> > need to look for all those details and information to fully understand
> Solr
> > but I don't have that much time in my hand. That's why I was thinking
> > instead of reading everything from the beginning is to start with a
> feature
> > list that briefly explains what each feature does and then dig deeper if
> I
> > need more information. I will appreciate any comments/feedback regarding
> > this.
> >
> > Regards,
> > Salman
> >
> > On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Well, I've started to answer, but it hit a nerve and turned into a
> >> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
> >> Admitting you have a problem).
> >>
> >> I hope this is helpful:
> >> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
> >>
> >> Regards,
> >>Alex.
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am in the process of looking for a comprehensive list of Solr
> features
> >> in
> >> > order to assess how much have we implemented, what are some features
> that
> >> > we were unaware of that we can utilize etc. I have looked at the
> >> following
> >> > link for Solr features http://lucene.apache.org/solr/features.html
> but
> >> it
> >> > looks like it highlights the main features. I also looked at this page
> >> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> >> > details and I am looking for more of such list and possibly a
> >> comprehensive
> >> > list that combines them all.
> >> >
> >> > Regards,
> >> > Salman
> >>
>


Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
Glad you liked it.

The problem with your request is that it is not clear what you already
know and in which direction you are trying to go. Cloud is a big topic
all on its own. Relevancy - another one. Crafting schema to best
represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
party client - a fourth. Multilingual content - a fifth. And so on.

But if you want high level guidelines, I would pick a couple of Solr
books and look at their Tables of Contents. Then, do the same for the
Reference Guide. This should be a good mid-level overview of issues.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 08:43, Salman Ansari <salman.rah...@gmail.com> wrote:
> Thanks Alex for your response. Much appreciated effort! For sure, I will
> need to look for all those details and information to fully understand Solr
> but I don't have that much time in my hand. That's why I was thinking
> instead of reading everything from the beginning is to start with a feature
> list that briefly explains what each feature does and then dig deeper if I
> need more information. I will appreciate any comments/feedback regarding
> this.
>
> Regards,
> Salman
>
> On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Well, I've started to answer, but it hit a nerve and turned into a
>> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
>> Admitting you have a problem).
>>
>> I hope this is helpful:
>> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am in the process of looking for a comprehensive list of Solr features
>> in
>> > order to assess how much have we implemented, what are some features that
>> > we were unaware of that we can utilize etc. I have looked at the
>> following
>> > link for Solr features http://lucene.apache.org/solr/features.html but
>> it
>> > looks like it highlights the main features. I also looked at this page
>> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
>> > details and I am looking for more of such list and possibly a
>> comprehensive
>> > list that combines them all.
>> >
>> > Regards,
>> > Salman
>>


Re: Solr Features

2015-11-05 Thread Erick Erickson
I agree with Alexandre, the question is far too broad.

Better to pick something you want to _do_ and ask
how to accomplish that. Define a use case that's useful
for your user base (actual or future) and see if Solr
can do that.

See the "books" section here for a number of resources
that people have spent inordinate amounts of time
creating to allow you to use your time wisely:
http://lucene.apache.org/solr/resources.html#documentation

Best,
Erick

On Thu, Nov 5, 2015 at 6:18 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Glad you liked it.
>
> The problem with your request is that it is not clear what you already
> know and in which direction you are trying to go. Cloud is a big topic
> all on its own. Relevancy - another one. Crafting schema to best
> represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
> party client - a fourth. Multilingual content - a fifth. And so on.
>
> But if you want high level guidelines, I would pick a couple of Solr
> books and look at their Tables of Contents. Then, do the same for the
> Reference Guide. This should be a good mid-level overview of issues.
>
> Regards,
> Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 08:43, Salman Ansari <salman.rah...@gmail.com> wrote:
>> Thanks Alex for your response. Much appreciated effort! For sure, I will
>> need to look for all those details and information to fully understand Solr
>> but I don't have that much time in my hand. That's why I was thinking
>> instead of reading everything from the beginning is to start with a feature
>> list that briefly explains what each feature does and then dig deeper if I
>> need more information. I will appreciate any comments/feedback regarding
>> this.
>>
>> Regards,
>> Salman
>>
>> On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <arafa...@gmail.com>
>> wrote:
>>
>>> Well, I've started to answer, but it hit a nerve and turned into a
>>> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
>>> Admitting you have a problem).
>>>
>>> I hope this is helpful:
>>> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>>>
>>> Regards,
>>>Alex.
>>> 
>>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am in the process of looking for a comprehensive list of Solr features
>>> in
>>> > order to assess how much have we implemented, what are some features that
>>> > we were unaware of that we can utilize etc. I have looked at the
>>> following
>>> > link for Solr features http://lucene.apache.org/solr/features.html but
>>> it
>>> > looks like it highlights the main features. I also looked at this page
>>> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
>>> > details and I am looking for more of such list and possibly a
>>> comprehensive
>>> > list that combines them all.
>>> >
>>> > Regards,
>>> > Salman
>>>


Re: Solr Features

2015-11-05 Thread Salman Ansari
Thanks Alex for your response. Much appreciated effort! For sure, I will
need to look for all those details and information to fully understand Solr
but I don't have that much time in my hand. That's why I was thinking
instead of reading everything from the beginning is to start with a feature
list that briefly explains what each feature does and then dig deeper if I
need more information. I will appreciate any comments/feedback regarding
this.

Regards,
Salman

On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Well, I've started to answer, but it hit a nerve and turned into a
> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
> Admitting you have a problem).
>
> I hope this is helpful:
> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Hi,
> >
> > I am in the process of looking for a comprehensive list of Solr features
> in
> > order to assess how much have we implemented, what are some features that
> > we were unaware of that we can utilize etc. I have looked at the
> following
> > link for Solr features http://lucene.apache.org/solr/features.html but
> it
> > looks like it highlights the main features. I also looked at this page
> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> > details and I am looking for more of such list and possibly a
> comprehensive
> > list that combines them all.
> >
> > Regards,
> > Salman
>


Solr Features

2015-11-04 Thread Salman Ansari
Hi,

I am in the process of looking for a comprehensive list of Solr features in
order to assess how much have we implemented, what are some features that
we were unaware of that we can utilize etc. I have looked at the following
link for Solr features http://lucene.apache.org/solr/features.html but it
looks like it highlights the main features. I also looked at this page
http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
details and I am looking for more of such list and possibly a comprehensive
list that combines them all.

Regards,
Salman


Re: demo app explaining solr features

2014-09-28 Thread Jack Krupansky
And you can also check out the tutorials in any of the Solr books, including 
my Solr Deep Dive e-book:


http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Mikhail Khludnev

Sent: Sunday, September 28, 2014 1:35 AM
To: solr-user
Subject: Re: demo app explaining solr features

On Sat, Sep 27, 2014 at 12:26 PM, Anurag Sharma anura...@gmail.com wrote:


I am wondering if there is any demo app that can demonstrate all the
features/capabilities of solr. My intention is to understand, use and play
around all the features supported by solr.



https://lucene.apache.org/solr/4_10_0/tutorial.html


--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com 



Re: demo app explaining solr features

2014-09-28 Thread Alexandre Rafalovitch
There is NOTHING that will explain all features of Solr. Solr is too
deep. It starts from Hello World and gets into PhD level
computational linguistics as well as specialist-level distributed
systems.

However, as mentioned in the other emails, there are resources, both
online and paid that can get you from that Hello World to the point
where you can use reference resources and Solr source code to chart
your own path further.

That was the goal with my book certainly and it would still do it,
though it does not cover more recently added basics such as dynamic
schemas. On the other hand, I just shadowed somebody going through it
in intensive 3 days and then being ready to troubleshoot some hairy
perl client issues. :-)

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 28 September 2014 06:17, Jack Krupansky j...@basetechnology.com wrote:
 And you can also check out the tutorials in any of the Solr books, including
 my Solr Deep Dive e-book:

 http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

 -- Jack Krupansky

 -Original Message- From: Mikhail Khludnev
 Sent: Sunday, September 28, 2014 1:35 AM
 To: solr-user
 Subject: Re: demo app explaining solr features


 On Sat, Sep 27, 2014 at 12:26 PM, Anurag Sharma anura...@gmail.com wrote:

 I am wondering if there is any demo app that can demonstrate all the
 features/capabilities of solr. My intention is to understand, use and play
 around all the features supported by solr.


 https://lucene.apache.org/solr/4_10_0/tutorial.html


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: demo app explaining solr features

2014-09-28 Thread Aman Tandon
Hi Anurag,

For the demo you can post xml to solr in example-docs folder I guess and
then you can use the browse request handler
http://localhost:8983/solr/browse

I am not too sure about URL but this can help you to gave an idea about
searching, faceting, geo spatial search,etc

I wish this could help you.
On Sep 28, 2014 4:39 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 There is NOTHING that will explain all features of Solr. Solr is too
 deep. It starts from Hello World and gets into PhD level
 computational linguistics as well as specialist-level distributed
 systems.

 However, as mentioned in the other emails, there are resources, both
 online and paid that can get you from that Hello World to the point
 where you can use reference resources and Solr source code to chart
 your own path further.

 That was the goal with my book certainly and it would still do it,
 though it does not cover more recently added basics such as dynamic
 schemas. On the other hand, I just shadowed somebody going through it
 in intensive 3 days and then being ready to troubleshoot some hairy
 perl client issues. :-)

 Regards,
Alex.

 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 28 September 2014 06:17, Jack Krupansky j...@basetechnology.com
 wrote:
  And you can also check out the tutorials in any of the Solr books,
 including
  my Solr Deep Dive e-book:
 
 
 http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
 
  -- Jack Krupansky
 
  -Original Message- From: Mikhail Khludnev
  Sent: Sunday, September 28, 2014 1:35 AM
  To: solr-user
  Subject: Re: demo app explaining solr features
 
 
  On Sat, Sep 27, 2014 at 12:26 PM, Anurag Sharma anura...@gmail.com
 wrote:
 
  I am wondering if there is any demo app that can demonstrate all the
  features/capabilities of solr. My intention is to understand, use and
 play
  around all the features supported by solr.
 
 
  https://lucene.apache.org/solr/4_10_0/tutorial.html
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
  mkhlud...@griddynamics.com



demo app explaining solr features

2014-09-27 Thread Anurag Sharma
I am wondering if there is any demo app that can demonstrate all the
features/capabilities of solr. My intention is to understand, use and play
around all the features supported by solr.

Also looking to explore how solr fits with NLP (like open-nlp), different
datastores (like cassandra, mongodb, arangoDB, couchDB), ML engines(mahout,
prediction.io etc) and caching servers (redis, memcache).

Thanks
Anurag


Re: demo app explaining solr features

2014-09-27 Thread Mikhail Khludnev
On Sat, Sep 27, 2014 at 12:26 PM, Anurag Sharma anura...@gmail.com wrote:

 I am wondering if there is any demo app that can demonstrate all the
 features/capabilities of solr. My intention is to understand, use and play
 around all the features supported by solr.


https://lucene.apache.org/solr/4_10_0/tutorial.html


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Some new SOLR features

2008-09-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
why to restart solr ? reloading a core may be sufficient.
SOLR-561 already supports this
-


On Thu, Sep 18, 2008 at 5:17 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
 Servlets is one thing.  For SOLR the situation is different.  There
 are always small changes people want to make, a new stop word, a small
 tweak to an analyzer.  Rebooting the server for these should not be
 necessary.  Ideally this is handled via a centralized console and
 deployed over the network (using RMI or XML) so that files do not need
 to be deployed.

 On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:

 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:


 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:


 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.


 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.



 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.


 That's the plan... completely separate the serialized and in memory
 representations.



 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.


 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the
 options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along
 with a new jar file containing the custom components.

 -Yonik









-- 
--Noble Paul


Re: Some new SOLR features

2008-09-19 Thread Jason Rutherglen
Yes reloading a core can be used.  I guess the proposal is a way to
update the config and schema files over the network through SOLR
rather than by the filesystem.  This will make grid computing and
schema updates much faster.

On Fri, Sep 19, 2008 at 2:11 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 why to restart solr ? reloading a core may be sufficient.
 SOLR-561 already supports this
 -


 On Thu, Sep 18, 2008 at 5:17 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 Servlets is one thing.  For SOLR the situation is different.  There
 are always small changes people want to make, a new stop word, a small
 tweak to an analyzer.  Rebooting the server for these should not be
 necessary.  Ideally this is handled via a centralized console and
 deployed over the network (using RMI or XML) so that files do not need
 to be deployed.

 On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:

 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:


 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:


 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.


 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.



 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.


 That's the plan... completely separate the serialized and in memory
 representations.



 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.


 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the
 options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along
 with a new jar file containing the custom components.

 -Yonik









 --
 --Noble Paul



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Hi Yonik,

One approach I have been working on that I will integrate into SOLR is
the ability to use serialized objects for the analyzers so that the
schema can be defined on the client side if need be.  The analyzer
classes will be dynamically loaded.  Or there is no need for a schema
and plain Java objects can be defined and used.

I'd like to see the synonyms serialized as well.  When I mentioned the
serialization it is in regards to setting the configuration over the
Hadoop RMI LUCENE-1336 protocol.  Instead of creating methods for each
new call one wants, the easiest approach in distributed computing is
to have a dynamic class loaded that operates directly on SolrCore and
so can do whatever is necessary to get the work completed.  Creating
new methods in distributed computing is always a bad idea IMO.

In realtime indexing one will not be able to simply reindex all the
time, and so either a dynamic schema, or no schema at all is best.
Otherwise the documents would need to have a schemaVersion field, this
gets messy I looked at this.

Jason

On Wed, Sep 17, 2008 at 5:10 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote:
 Yonik Seeley wrote:

 ...multi-core allows you to instantiate a completely
 new core and swap it for the old one, but it's a bit of a heavyweight
 approach
 ...a schema object would not be mutable, but
 that one could easily swap in a new schema object for an index at any
 time...


 Not sure I understand what we gain; if you change the schema, you'll most
 likely will
 have to reindex as well.

 That's management at a higher level in a way.
 There are enough ways that one could change the schema in a compatible
 way (say like just adding query-time synonyms, etc) that it does seem
 like we should permit it.

 Or are you saying we should have a shortcut for the
 whole operation of
 creating a new core, reindex content, replacing an existing core ?

 Eventually, it seems like we should be able to handle re-indexing when
 necessary.
 And we should consider the ability to change some config without
 necessarily reloading *everything*.

 -Yonik



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
This should be done.  Great idea.

On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:
 My vote is for dynamically scanning a directory of configuration files. When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.

 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.

 That's the plan... completely separate the serialized and in memory
 representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.

 Nothing will stop one from using java serialization for config persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema along
 with a new jar file containing the custom components.

 -Yonik




Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
 That would allow a single request to see a stable view of the
 schema, while preventing having to make every aspect of the schema
 thread-safe.

Yes that is the best approach.

 Nothing will stop one from using java serialization for config
 persistence,

Persistence should not be serialized.  Serialization is for transport
over the wire for automated upgrades of the configuration.  This could
be done in XML as well, but it would be good to support both models.

 Is there a role here for OSGi to play?

Yes.  Eclipse successfully uses OSGI, and for grid computing in Java,
and to take advantage of what Java can do with dynamic classloading,
OSGI is the way to go.  Every search project I have worked on needs
this stuff to be way easier than it is now.  The current distributed
computing model in SOLR may work, but it will not work reliably and
will break a lot.  When it does break there is no way to know what
happened.  This will create excessive downtime for users.  I have had
excessive downtime in production even in the current simple
master-slave architecture because there is no failover.  Failover in
the current system should be in there because it's too easy to
implement with the rsync based batch replication.

On Wed, Sep 17, 2008 at 2:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.

 Exactly.  Actually, multi-core allows you to instantiate a completely
 new core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but
 that one could easily swap in a new schema object for an index at any
 time.  That would allow a single request to see a stable view of the
 schema, while preventing having to make every aspect of the schema
 thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.

 That's the plan... completely separate the serialized and in memory
 representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.

 Nothing will stop one from using java serialization for config
 persistence, however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can
 cut-n-paste relevant parts of their config in email for support, or to
 a wiki to explain things, etc.

 Of course, if you are talking about being able to have custom filters
 or analyzers (new classes that don't even exist on the server yet),
 then it does start to get interesting.  This intersects with
 deployment in general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that
 could also automatically be handled in a a large cluster... what are
 the options for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along with a new jar file containing the custom components.

 -Yonik



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Servlets is one thing.  For SOLR the situation is different.  There
are always small changes people want to make, a new stop word, a small
tweak to an analyzer.  Rebooting the server for these should not be
necessary.  Ideally this is handled via a centralized console and
deployed over the network (using RMI or XML) so that files do not need
to be deployed.

On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:

 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:


 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:


 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.


 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.



 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.


 That's the plan... completely separate the serialized and in memory
 representations.



 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.


 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the
 options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along
 with a new jar file containing the custom components.

 -Yonik







Re: Some new SOLR features

2008-09-18 Thread Mark Miller
Dynamic changes are not what I'm against...I'm against dynamic changes 
that are triggered by the app noticing that the config have changed.


Jason Rutherglen wrote:

Servlets is one thing.  For SOLR the situation is different.  There
are always small changes people want to make, a new stop word, a small
tweak to an analyzer.  Rebooting the server for these should not be
necessary.  Ideally this is handled via a centralized console and
deployed over the network (using RMI or XML) so that files do not need
to be deployed.

On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
  

Isnt this done in servlet containers for debugging type work? Maybe an
option, but I disagree that this should drive anything in solr. It should
really be turned off in production in servelet containers imo as well.

This can really be such a pain in the ass on a live site...someone touches
web.xml and the app server reboots*shudder*. Seen it, don't dig it.

Jason Rutherglen wrote:


This should be done.  Great idea.

On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:

  

My vote is for dynamically scanning a directory of configuration files.
When
a new one appears, or an existing file is touched, load it. When a
configuration disappears, unload it.  This model works very well for
servlet
containers.

Lance

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, September 17, 2008 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Some new SOLR features

On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:



If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update the configuration and schema
without needing to reboot the server.

  

Exactly.  Actually, multi-core allows you to instantiate a completely new
core and swap it for the old one, but it's a bit of a heavyweight
approach.

The key is finding the right granularity of change.
My current thought is that a schema object would not be mutable, but that
one could easily swap in a new schema object for an index at any time.
 That
would allow a single request to see a stable view of the schema, while
preventing having to make every aspect of the schema thread-safe.




Also I would like the
configuration classes to just contain data and not have so many
methods that operate on the filesystem.

  

That's the plan... completely separate the serialized and in memory
representations.




This way the configuration
object can be serialized, and loaded by the server dynamically.  It
would be great for the schema to work the same way.

  

Nothing will stop one from using java serialization for config
persistence,
however I am a fan of human readable for config files...
so much easier to debug and support.  Right now, people can cut-n-paste
relevant parts of their config in email for support, or to a wiki to
explain
things, etc.

Of course, if you are talking about being able to have custom filters or
analyzers (new classes that don't even exist on the server yet), then it
does start to get interesting.  This intersects with deployment in
general... and I'm not sure what the right answer is.
What if Lucene or Solr needs an upgrade?  It would be nice if that could
also automatically be handled in a a large cluster... what are the
options
for handling that?  Is there a role here for OSGi to play?
 It sounds like at least some of that is outside of the Solr domain.

An alternative to serializing everything would be to ship a new schema
along
with a new jar file containing the custom components.

-Yonik









Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Yes, so it's probably best to make the changes through a remote
interface so that the app will be able to make the appropriate
internal changes.  File based system changes are less than ideal,
agreed, however I suppose with an open source project such as SOLR the
kitchen sink affect happens and it will find it's way in there
anyways.  The hard part is organizing the project such that it does
not get too bloated with everyone's features and allows features to be
pluggable outside of the core releases.  There are many things that
may best best as contrib modules that could be OSGI based add ons
rather than placed into the standard releases (of which I don't have
any off hand).  The standard for contribs for SOLR can be OSGI.  This
will greatly assist in SOLR becoming grid computing friendly.  Ideally
SOLR 2.0 would be cleaner, standardized, and most of the features
pluggable.  This will allow for consistent release cycles, make grid
computing simpler to implement.  SOLR seems like it could be going in
the direction of bloat which could increasingly confuse new users.
Instead they could either implement their own modules and upload them
in the contrib section, implement their own that are proprietary.

I am curious about what is the recommended place to put the query
expansion code (such as adding boosting, adding phrase queries and
such)?  Is is now best to use a SearchComponent?  Is it possible in
the future to make SearchComponents OSGI enabled?

On Thu, Sep 18, 2008 at 7:56 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Dynamic changes are not what I'm against...I'm against dynamic changes that
 are triggered by the app noticing that the config have changed.

 Jason Rutherglen wrote:

 Servlets is one thing.  For SOLR the situation is different.  There
 are always small changes people want to make, a new stop word, a small
 tweak to an analyzer.  Rebooting the server for these should not be
 necessary.  Ideally this is handled via a centralized console and
 deployed over the network (using RMI or XML) so that files do not need
 to be deployed.

 On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED]
 wrote:


 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone
 touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:


 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED]
 wrote:



 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:



 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.



 Exactly.  Actually, multi-core allows you to instantiate a completely
 new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but
 that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.




 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.



 That's the plan... completely separate the serialized and in memory
 representations.




 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.



 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters
 or
 analyzers (new classes that don't even exist on the server yet), then
 it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that
 could
 also automatically be handled in a a large cluster... what

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Tue, Sep 16, 2008 at 10:12 AM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
  SQL database such as H2
 Mainly to offer joins and be able to perform hierarchical queries.

Can you define or give an example of what you mean by hierarchical queries?
A downside of any type of cross-document queries (like joins) is that
it tends to limit scalability.  Of course, I think it's acceptable to
have some query types that only work on a single shard, since that may
continue to cover the majority of users.

Along the same lines, I think it would be useful to have a highly
integrated extension point for stored fields (so they could be
retrieved from external systems if needed).

-Yonik


Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update the configuration and schema
without needing to reboot the server.  Also I would like the
configuration classes to just contain data and not have so many
methods that operate on the filesystem.  This way the configuration
object can be serialized, and loaded by the server dynamically.  It
would be great for the schema to work the same way.

Yonik, what is the best way to get this type of things going?  Where
in the code do you want to implement the distributed RMI Hadoop stuff?

On Tue, Sep 16, 2008 at 1:07 PM, Henrib [EMAIL PROTECTED] wrote:



 ryantxu wrote:


 Yes, include would get us some of the way there, but not far enough
 (IMHO).  The problem is that (as written) you still need to have all
 the configs spattered about various directories.



 I does not allow us to go *all* the way but it does allow to put
 configurations files in one directory (plus schema  conf can have specific
 names set for each CoreDescriptor).
 There actually is a test where the config  schema are shared  can set the
 dataDir as a property.
 Still a step forward...

 --
 View this message in context: 
 http://www.nabble.com/Some-new-SOLR-features-tp19494251p19516242.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
  Can you define or give an example of what you mean by hierarchical queries?

Good question, I think Erik Hatcher had more ideas on that.  I was
imagining joins or sub queries like SQL does.  Clearly they won't be
efficient, but it's easier than implementing joins (or is it) in SOLR?

Joins limit scalability that is true, I guess it's just the nature of
it though.  Unless there is some other way to do it.  Doesn't Oracle
implement some sort of distributed join in their clustering solution?
Is it worth it?

On Wed, Sep 17, 2008 at 12:25 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Tue, Sep 16, 2008 at 10:12 AM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
  SQL database such as H2
 Mainly to offer joins and be able to perform hierarchical queries.

 Can you define or give an example of what you mean by hierarchical queries?
 A downside of any type of cross-document queries (like joins) is that
 it tends to limit scalability.  Of course, I think it's acceptable to
 have some query types that only work on a single shard, since that may
 continue to cover the majority of users.

 Along the same lines, I think it would be useful to have a highly
 integrated extension point for stored fields (so they could be
 retrieved from external systems if needed).

 -Yonik



Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.

Exactly.  Actually, multi-core allows you to instantiate a completely
new core and swap it for the old one, but it's a bit of a heavyweight
approach.

The key is finding the right granularity of change.
My current thought is that a schema object would not be mutable, but
that one could easily swap in a new schema object for an index at any
time.  That would allow a single request to see a stable view of the
schema, while preventing having to make every aspect of the schema
thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.

That's the plan... completely separate the serialized and in memory
representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.

Nothing will stop one from using java serialization for config
persistence, however I am a fan of human readable for config files...
so much easier to debug and support.  Right now, people can
cut-n-paste relevant parts of their config in email for support, or to
a wiki to explain things, etc.

Of course, if you are talking about being able to have custom filters
or analyzers (new classes that don't even exist on the server yet),
then it does start to get interesting.  This intersects with
deployment in general... and I'm not sure what the right answer is.
What if Lucene or Solr needs an upgrade?  It would be nice if that
could also automatically be handled in a a large cluster... what are
the options for handling that?  Is there a role here for OSGi to play?
 It sounds like at least some of that is outside of the Solr domain.

An alternative to serializing everything would be to ship a new schema
along with a new jar file containing the custom components.

-Yonik


RE: Some new SOLR features

2008-09-17 Thread Lance Norskog
My vote is for dynamically scanning a directory of configuration files. When
a new one appears, or an existing file is touched, load it. When a
configuration disappears, unload it.  This model works very well for servlet
containers.

Lance

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Wednesday, September 17, 2008 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Some new SOLR features

On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like 
 to see the ability to dynamically update the configuration and schema 
 without needing to reboot the server.

Exactly.  Actually, multi-core allows you to instantiate a completely new
core and swap it for the old one, but it's a bit of a heavyweight approach.

The key is finding the right granularity of change.
My current thought is that a schema object would not be mutable, but that
one could easily swap in a new schema object for an index at any time.  That
would allow a single request to see a stable view of the schema, while
preventing having to make every aspect of the schema thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many 
 methods that operate on the filesystem.

That's the plan... completely separate the serialized and in memory
representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It 
 would be great for the schema to work the same way.

Nothing will stop one from using java serialization for config persistence,
however I am a fan of human readable for config files...
so much easier to debug and support.  Right now, people can cut-n-paste
relevant parts of their config in email for support, or to a wiki to explain
things, etc.

Of course, if you are talking about being able to have custom filters or
analyzers (new classes that don't even exist on the server yet), then it
does start to get interesting.  This intersects with deployment in
general... and I'm not sure what the right answer is.
What if Lucene or Solr needs an upgrade?  It would be nice if that could
also automatically be handled in a a large cluster... what are the options
for handling that?  Is there a role here for OSGi to play?
 It sounds like at least some of that is outside of the Solr domain.

An alternative to serializing everything would be to ship a new schema along
with a new jar file containing the custom components.

-Yonik



Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote:
 Yonik Seeley wrote:

 ...multi-core allows you to instantiate a completely
 new core and swap it for the old one, but it's a bit of a heavyweight
 approach
 ...a schema object would not be mutable, but
 that one could easily swap in a new schema object for an index at any
 time...


 Not sure I understand what we gain; if you change the schema, you'll most
 likely will
 have to reindex as well.

That's management at a higher level in a way.
There are enough ways that one could change the schema in a compatible
way (say like just adding query-time synonyms, etc) that it does seem
like we should permit it.

 Or are you saying we should have a shortcut for the
 whole operation of
 creating a new core, reindex content, replacing an existing core ?

Eventually, it seems like we should be able to handle re-indexing when
necessary.
And we should consider the ability to change some config without
necessarily reloading *everything*.

-Yonik


Re: Some new SOLR features

2008-09-16 Thread Jason Rutherglen
Hello Ryan,

  SQL database such as H2

Mainly to offer joins and be able to perform hierarchical queries.
Also any other types of queries a hybrid SQL search system would
offer.  This is something that is best built into SOLR rather than
Lucene.  It seems like a lot of the users of SOLR work with SQL
databases as well.  It would seem natural to integrate the two.  Also
the Summize realtime search system that Twitter purchased worked by
integrating with Mysql.  The way to do something similar in Lucene
would be to integrate with a Java SQL database.  Also hierarchical
queries could be performed faster using this method (though I could be
wrong, if there is a better way).

 to have multiple lucene indexes within a single SolrCore?

I don't like the whole multi core thing from an administrative
perspective.  That means each index needs a separate schema and
configuration etc.  That becomes hard to manage if there are 10+
indexes required and is definitely not as simple as an SQL database
does not require so many separate directories and manual
configuration.  It would be simple to add this into SOLR.  In general
though I have trouble figuring out many of the design decisions of
SOLR though and so hesitate to implement things that seem to go
against the SOLR design model (is there one?).

 9. Distributed search and updates using a object serialization which

Where would I start with integrating this into SOLR?  Need some help
on that part of it.  Tell me what's best and I'll integrate it, it
should be the easiest on the list.

Jason

On Mon, Sep 15, 2008 at 11:44 AM, Ryan McKinley [EMAIL PROTECTED] wrote:


 Here are my gut reactions to this list... in general, most of this comes
 down to sounds great, if someone did the work I'm all for it!

 Also, no need to post to solr-user AND solr-dev, probably better to think of
 solr-user as a superset of solr-dev.


 1. Machine learning based suggest feature
 https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
 as is similar to what Google in their suggest implementation.  The
 Fuzzy based spellchecker is ok, but it would be better to incorporate
 use behavior.
 2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
 and work being planned for IndexWriter
 3. Realtime untokenized field updates
 https://issues.apache.org/jira/browse/LUCENE-1292

 Without knowing the details of these patches, everything sounds great.

 In my view, SOLR should offer a nice interface to anything in lucene
 core/contrib


 4. BM25 Scoring

 Again, no idea, but if implement in lucene yes


 5. Integration with an open source SQL database such as H2.  This
 would mean under the hood, SOLR would enable storing data in a
 relational database to allow for joins and things.  It would need to
 be combined with realtime updates.  H2 has Lucene integration but it
 is the usual index everything at once, non-incrementally.  The new
 system would simply index as a new row in a table is added.  The SOLR
 schema could allow for certain fields being stored in an SQL database.

 Sounds interesting -- what is the basic problem you are addressing?

 (It seems you are pointing to something specific, and describing your
 solution)



 6. SOLR schema allowing for multiple indexes without using the
 multicore.  The indexes could be defined like SQL tables in the
 schema.xml file.

 Is this just a configuration issue?  I defiantly hope we can make
 configuration easier in the future.

 As is, a custom handler can look at multiple indexes... why is their a need
 to have multiple lucene indexes within a single SolrCore?



 6. Crowd by feature ala GBase
 http://code.google.com/apis/base/attrs-queries.html#crowding which is
 similar to Field Collapsing.  I am thinking it is advantageous from a
 performance perspective to obtain an excessive amount of results, then
 filter down the result set, rather than first sort a result set.

 Again, sounds great!  I would love to see it.


 7. Improved relevance based on user clicks of individual query results
 for individual queries.  This can be thought of as similar to what
 Digg does.  I'm sure Google does something similar.  It is a feature
 that would be of value to almost any SOLR implementation.

 Agreed -- if there is a good way to quickly update a field used for
 sorting/scoring, this would happen


 8. Integration of LocalSolr into the standard SOLR distribution.
 Location is something many sites use these days and is standard in
 GBase and most likely other products like FAST.

 I'm working on it  will be a lucene contrib package and cooked into the
 core solr distribution.



 9. Distributed search and updates using a object serialization which
 could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
 allows span queries, custom payload queries, custom similarities,
 custom analyzers, without compiling and deploying and a new SOLR war
 file to individual servers.


 sounds good (but I have no technical basis to 

Re: Some new SOLR features

2008-09-16 Thread Ryan McKinley


On Sep 16, 2008, at 10:12 AM, Jason Rutherglen wrote:


Hello Ryan,


SQL database such as H2


Mainly to offer joins and be able to perform hierarchical queries.
Also any other types of queries a hybrid SQL search system would
offer.  This is something that is best built into SOLR rather than
Lucene.  It seems like a lot of the users of SOLR work with SQL
databases as well.  It would seem natural to integrate the two.  Also
the Summize realtime search system that Twitter purchased worked by
integrating with Mysql.  The way to do something similar in Lucene
would be to integrate with a Java SQL database.  Also hierarchical
queries could be performed faster using this method (though I could be
wrong, if there is a better way).



Defiantly sounds interesting -- not on my personal TODO list, but I  
can see the value and would support this direction (perhaps as a  
contrib?)
For starters, it seems like everything could happen in a custom  
RequestHandler  (perhaps QueryComponent?)




to have multiple lucene indexes within a single SolrCore?


I don't like the whole multi core thing from an administrative
perspective.  That means each index needs a separate schema and
configuration etc.  That becomes hard to manage if there are 10+
indexes required and is definitely not as simple as an SQL database
does not require so many separate directories and manual
configuration.


I 100% agree that that multicore configuration gets unwieldy quickly.   
That said what I'm hearing from you is the config is problematic, not  
that you really need multiple lucene indexes in the same SolrCore.


FYI -- the name SolrCore is perhaps legacy from when it was static  
and had access to the only index available.  With MultiCore we  
removed all the static access and each lucene index gets a SolrCore.   
Maybe better to think of SolrCore as SolrIndex -- everythign you can  
do with one index.


Yes, I would like to see a way to specify all the fieldtypes /  
handlers in one location and then only specify what fields are  
available for each core.


So yes -- I agree.  In 2.0, I hope to flush out configs so they are  
not monstrous.




 It would be simple to add this into SOLR.  In general
though I have trouble figuring out many of the design decisions of
SOLR though and so hesitate to implement things that seem to go
against the SOLR design model (is there one?).



The 1.X line is organic growth from an internal CNET architecture.
I hope the 2.X line will have more consistent design model...

As far as getting around the exiting multicore configs  I do this  
in my code by overriding:


 protected CoreContainer.Initializer createInitializer() {
return new CoreContainer.Initializer();
  }
in SolrDispatchFilter.

I actually initialize the CoreContainer manually (pulling some info  
from a SQL database)




9. Distributed search and updates using a object serialization which


Where would I start with integrating this into SOLR?  Need some help
on that part of it.  Tell me what's best and I'll integrate it, it
should be the easiest on the list.



not sure ;)  Distributed search is one of the areas I have not looked  
at


ryan



Re: Some new SOLR features

2008-09-16 Thread Henrib



ryantxu wrote:
 
 ...
 Yes, I would like to see a way to specify all the fieldtypes /   
 handlers in one location and then only specify what fields are   
 available for each core. 
 
 So yes -- I agree.  In 2.0, I hope to flush out configs so they are   
 not monstrous. 
 ...
 

What about using include so each core can have a minimal specific
configuration and schema  everything else shared between them?
Something akin to what's allowed by solr-646.
Just couldn't resist :-)
Henri

-- 
View this message in context: 
http://www.nabble.com/Some-new-SOLR-features-tp19494251p19515526.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Some new SOLR features

2008-09-16 Thread Ryan McKinley


ryantxu wrote:


...
Yes, I would like to see a way to specify all the fieldtypes /
handlers in one location and then only specify what fields are
available for each core.

So yes -- I agree.  In 2.0, I hope to flush out configs so they are
not monstrous.
...



What about using include so each core can have a minimal specific
configuration and schema  everything else shared between them?
Something akin to what's allowed by solr-646.
Just couldn't resist :-)
Henri



somehow I knew that was coming :)

Yes, include would get us some of the way there, but not far enough  
(IMHO).  The problem is that (as written) you still need to have all  
the configs spattered about various directories.



ryan


Re: Some new SOLR features

2008-09-16 Thread Henrib



ryantxu wrote:
 
 
 Yes, include would get us some of the way there, but not far enough  
 (IMHO).  The problem is that (as written) you still need to have all  
 the configs spattered about various directories.
 
 

I does not allow us to go *all* the way but it does allow to put
configurations files in one directory (plus schema  conf can have specific
names set for each CoreDescriptor).
There actually is a test where the config  schema are shared  can set the
dataDir as a property.
Still a step forward...

-- 
View this message in context: 
http://www.nabble.com/Some-new-SOLR-features-tp19494251p19516242.html
Sent from the Solr - User mailing list archive at Nabble.com.



Some new SOLR features

2008-09-15 Thread Jason Rutherglen
Hello,

There are a few features I would like to see in SOLR going forward and
I am interested in finding out what other folks thought about them to
get a priority list.  I believe there are many features that Google
and FAST have that SOLR and Lucene will want to implement in future
releases.

1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292
4. BM25 Scoring
5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.
6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.
6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.
7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.
8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.
9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.

Cheers,
Jason


Re: Some new SOLR features

2008-09-15 Thread Ryan McKinley




Here are my gut reactions to this list... in general, most of this  
comes down to sounds great, if someone did the work I'm all for it!


Also, no need to post to solr-user AND solr-dev, probably better to  
think of solr-user as a superset of solr-dev.




1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292


Without knowing the details of these patches, everything sounds great.

In my view, SOLR should offer a nice interface to anything in lucene  
core/contrib




4. BM25 Scoring


Again, no idea, but if implement in lucene yes



5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.


Sounds interesting -- what is the basic problem you are addressing?

(It seems you are pointing to something specific, and describing your  
solution)





6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.


Is this just a configuration issue?  I defiantly hope we can make  
configuration easier in the future.


As is, a custom handler can look at multiple indexes... why is their a  
need to have multiple lucene indexes within a single SolrCore?





6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.


Again, sounds great!  I would love to see it.



7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.


Agreed -- if there is a good way to quickly update a field used for  
sorting/scoring, this would happen




8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.


I'm working on it  will be a lucene contrib package and cooked  
into the core solr distribution.





9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.



sounds good (but I have no technical basis to say so)


ryan