Re: Real Time Search and External File Fields

2016-10-07 Thread Erick Erickson
bq: Most soft commit
documentation talks about setting up soft commits with  of about a
second.

I think this is really a consequence of this being included in the
example configs
for illustrative purposes, personally I never liked this.

There is no one right answer. I've seen soft commit intervals from -1
(never soft commit)
to 1 second. The latter means most all of your caches are totally
useless and might
as well be turned off usually.

What you haven't mentioned is how often you add new docs. Is it once a
day? Steadily
from 8:00 to 17:00? All in three hours in the morning?

Whatever, your soft commit really should be longer than your autowarm
interval. Configure
autowarming to reference queries (firstSearcher or newSearcher events
or autowarm
counts in queryResultCache and filterCache. Say 16 in each of these
latter for a start) such
that they cause the external file to load. That _should_ prevent any
queries from being
blocked since the autowarming will happen in the background and while
it's happening
incoming queries will be served by the old searcher.

Best,
Erick

On Fri, Oct 7, 2016 at 5:19 PM, Mike Lissner
 wrote:
> I have an index of about 4M documents with an external file field
> configured to do boosting based on pagerank scores of each document. The
> pagerank file is about 93MB as of today -- it's pretty big.
>
> Each day, I add about 1,000 new documents to the index, and I need them to
> be available as soon as possible so that I can send out alerts to our users
> about new content (this is Google Alerts, essentially).
>
> Soft commits seem to be exactly the thing for this, but whenever I open a
> new searcher (which soft commits seem to do), the external file is
> reloaded, and all queries are halted until it finishes loading. When I just
> measured, this took about 30 seconds to complete. Most soft commit
> documentation talks about setting up soft commits with  of about a
> second.
>
> Is there anything I can do to make the external file field not get reloaded
> constantly? It only changes about once a month, and I want to use soft
> commits to power the alerts feature.
>
> Thanks,
>
> Mike


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Walter Underwood
Chegg uses a servlet filter to collect metrics on each request and forward them
to Graphite and New Relic. We can configure that because we have hooks
into the webapp config.

This works with 4.10.4. We haven’t tried it with a later version, but it could
be a blocker for upgrading. Our SLA commitments to internal clients are
based on 95th percentile response times. We need to track those.

We could (should) contribute this, but it would still be an “edit the code” 
integration for different metrics systems.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 7, 2016, at 4:13 PM, Renee Sun  wrote:
> 
> I just read through the following link Shawn shared in his reply:
> https://wiki.apache.org/solr/WhyNoWar
> 
> While the following statement is true:
> 
> "Supporting a single set of binary bits is FAR easier than worrying
> about what kind of customized environment the user has chosen for their
> deployment. "
> 
> But it also probably will reduce the flexibility... for example, we tune for
> Scalability at tomcat level, such as its thread pool etc.  I assume the
> standalone Solr (which is still using Jetty underlying) would expose
> sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our
> data work load.
> 
> If we want to minimize the migration work, our existing business logic
> component will remain in tomcat, then the fact that we will have co-exist
> jetty and tomcat deployed in production system is a bit strange... or is it? 
> 
> Even if I could port our webapps to use Jetty, I assume the way solr is
> embedding Jetty I would be able to integrate at that level, I probably end
> up with 2 Jetty container instances running on same server, correct? It is
> still too early for me to be sure how this will impact our system but I am a
> little worried.
> 
> Renee 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Whether replicationFactor=2 makes sense?

2016-10-07 Thread Jeffery Yuan
Thanks Erick Erickson, that totally makes sense for me now :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300271.html
Sent from the Solr - User mailing list archive at Nabble.com.


Real Time Search and External File Fields

2016-10-07 Thread Mike Lissner
I have an index of about 4M documents with an external file field
configured to do boosting based on pagerank scores of each document. The
pagerank file is about 93MB as of today -- it's pretty big.

Each day, I add about 1,000 new documents to the index, and I need them to
be available as soon as possible so that I can send out alerts to our users
about new content (this is Google Alerts, essentially).

Soft commits seem to be exactly the thing for this, but whenever I open a
new searcher (which soft commits seem to do), the external file is
reloaded, and all queries are halted until it finishes loading. When I just
measured, this took about 30 seconds to complete. Most soft commit
documentation talks about setting up soft commits with  of about a
second.

Is there anything I can do to make the external file field not get reloaded
constantly? It only changes about once a month, and I want to use soft
commits to power the alerts feature.

Thanks,

Mike


Re: (ANNOUNCEMENT) Solr Examples reading group

2016-10-07 Thread Erick Erickson
Personally I'd like an XML formatter as part of the checkin process ;)...

But that's not my point. In any "replace configs" discussion, we need to be
aware of the many different environments that are out there "in the wild".
An indentation-sensitive format like YAML brings with it its own problems.
One of them is that if I don't have my editor set to use spaces I've introduced
errors. At least with a format that has structure you can format it ugly without
changing its meaning.

Best,
Erick

On Fri, Oct 7, 2016 at 4:45 PM, Rick Leir  wrote:
>
>
> On 2016-10-07 01:04 PM, Erick Erickson wrote:
>>
>> Rick:
>> ..
>> I've seen a _lot_ of configs in the wild with weird indentation, they
>> tend to get that way because there are lots of situations I've seen
>> where people edit them through some kind of remote terminal and can
>> only edit in some vi-like editor. Which may be customized a zillion
>> different ways in terms of how indentation is handled. That's
>> something that we'd need to be sensitive to when considering an
>> indentation-sensitive format.
>
> I commonly do a diff of the xmls to see what has changed in a new release,
> or what differs in an example.  The indentation is often 'tidied up' in
> different ways, making the diff almost useless. Perhaps I need to run an xml
> formatter before doing any diff's on xml's. Also, perhaps the Solr
> committers could agree to standardize on an xml formatter.
>
> Sorry for warping this thread so far: When you comment out a swatch of XML,
> you need to allow for embedded comments. That is less troublesome with some
> other flavours of config file.
>>
>>
>> Erick
>>
>> On Fri, Oct 7, 2016 at 9:50 AM, Rick Leir  wrote:
>>>
>>> Thanks for using the word bewildering, I agree.
>>>
>>>
>>> While we are talking of simplifying solrconfig.xml, may I mention YAML? I
>>> find the YAML format so much more readable than XML.
>>>
>>>
>>> I have not looked at the code which reads the config, so I do not know
>>> how
>>> big a change it is to use cfg4j and read in YAML.
>>>
>>>
>


Re: (ANNOUNCEMENT) Solr Examples reading group

2016-10-07 Thread Rick Leir



On 2016-10-07 01:04 PM, Erick Erickson wrote:

Rick:
..
I've seen a _lot_ of configs in the wild with weird indentation, they
tend to get that way because there are lots of situations I've seen
where people edit them through some kind of remote terminal and can
only edit in some vi-like editor. Which may be customized a zillion
different ways in terms of how indentation is handled. That's
something that we'd need to be sensitive to when considering an
indentation-sensitive format.
I commonly do a diff of the xmls to see what has changed in a new 
release, or what differs in an example.  The indentation is often 
'tidied up' in different ways, making the diff almost useless. Perhaps I 
need to run an xml formatter before doing any diff's on xml's. Also, 
perhaps the Solr committers could agree to standardize on an xml formatter.


Sorry for warping this thread so far: When you comment out a swatch of 
XML, you need to allow for embedded comments. That is less troublesome 
with some other flavours of config file.


Erick

On Fri, Oct 7, 2016 at 9:50 AM, Rick Leir  wrote:

Thanks for using the word bewildering, I agree.


While we are talking of simplifying solrconfig.xml, may I mention YAML? I
find the YAML format so much more readable than XML.


I have not looked at the code which reads the config, so I do not know how
big a change it is to use cfg4j and read in YAML.






Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Wei Zhang
I think it just means they won't officially support deploying war to tomcat
or other container. make sense to me if I was in charge of solr, I would
just support jetty,  predictable with a single configuration.  I wouldn't
want to spent countless hrs supporting various configurations.  Instead use
those hrs to further solr development.  I'm sure someone that has enough
familiarity with tomcat and Java and solr shouldn't have any issue, after
all solr is free but you need to pay for support.

On Fri, Oct 7, 2016, 7:13 PM Renee Sun  wrote:

> I just read through the following link Shawn shared in his reply:
> https://wiki.apache.org/solr/WhyNoWar
>
> While the following statement is true:
>
> "Supporting a single set of binary bits is FAR easier than worrying
> about what kind of customized environment the user has chosen for their
> deployment. "
>
> But it also probably will reduce the flexibility... for example, we tune
> for
> Scalability at tomcat level, such as its thread pool etc.  I assume the
> standalone Solr (which is still using Jetty underlying) would expose
> sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our
> data work load.
>
> If we want to minimize the migration work, our existing business logic
> component will remain in tomcat, then the fact that we will have co-exist
> jetty and tomcat deployed in production system is a bit strange... or is
> it?
>
> Even if I could port our webapps to use Jetty, I assume the way solr is
> embedding Jetty I would be able to integrate at that level, I probably end
> up with 2 Jetty container instances running on same server, correct? It is
> still too early for me to be sure how this will impact our system but I am
> a
> little worried.
>
> Renee
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Whether replicationFactor=2 makes sense?

2016-10-07 Thread Erick Erickson
you are correct, that's the whole point of SolrCloud.

The other thing replicas gain you is the ability to
serve more queries since you only query a single
replica for each shards.

Best,
Erick

On Fri, Oct 7, 2016 at 4:02 PM, Jeffery Yuan  wrote:
> Thanks so much for your reply, Erick Erickson.
>
> We want to increase replicationFactor from 1 to 2 to, but I am wondering
> what's the advantage to do so.
> Whether this will make our system more robust and resilient to temporary
> network failure issue?
>
> Say if we have 3 machines, and split data into 3 shards, if we set
> replicationFactor to 2,
> machine A contains data from shard 1. shard2, machine B contains shard2,
> shard 3, machine c contains shard3, shard 1
>
> If machine A is down or has temporally network issue, whether the system can
> continue work?
> -- I would guess so, as you suggested, the zookeeper is used to maintain
> cluster info, then zookeeper will figure out and choose new leader if needed
> and the system will keep running.
>
> Thanks again
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300257.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
I just read through the following link Shawn shared in his reply:
https://wiki.apache.org/solr/WhyNoWar

While the following statement is true:

"Supporting a single set of binary bits is FAR easier than worrying
about what kind of customized environment the user has chosen for their
deployment. "

But it also probably will reduce the flexibility... for example, we tune for
Scalability at tomcat level, such as its thread pool etc.  I assume the
standalone Solr (which is still using Jetty underlying) would expose
sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our
data work load.

If we want to minimize the migration work, our existing business logic
component will remain in tomcat, then the fact that we will have co-exist
jetty and tomcat deployed in production system is a bit strange... or is it? 

Even if I could port our webapps to use Jetty, I assume the way solr is
embedding Jetty I would be able to integrate at that level, I probably end
up with 2 Jetty container instances running on same server, correct? It is
still too early for me to be sure how this will impact our system but I am a
little worried.

Renee 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Whether replicationFactor=2 makes sense?

2016-10-07 Thread Jeffery Yuan
Thanks so much for your reply, Erick Erickson.

We want to increase replicationFactor from 1 to 2 to, but I am wondering
what's the advantage to do so.
Whether this will make our system more robust and resilient to temporary
network failure issue?

Say if we have 3 machines, and split data into 3 shards, if we set
replicationFactor to 2,
machine A contains data from shard 1. shard2, machine B contains shard2,
shard 3, machine c contains shard3, shard 1

If machine A is down or has temporally network issue, whether the system can
continue work?
-- I would guess so, as you suggested, the zookeeper is used to maintain
cluster info, then zookeeper will figure out and choose new leader if needed
and the system will keep running.

Thanks again




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300257.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread John Bickerstaff
Rajesh was right on Renee -- the only big concern might be if that other
code is tightly-coupled to Tomcat or to other things which *must* have
Tomcat.

But it sounds to me as if your multi-lingual processors - if they just work
with Solr/Tomcat - out to "just work" with Solr/Jetty - or work with
minimal tweaking...

Good luck!  Interesting problem!

On Fri, Oct 7, 2016 at 3:47 PM, Renee Sun  wrote:

> Thanks everyone, I think this is very helpful... I will post more specific
> questions once we start to get more familiar with solr 6.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-
> fearing-about-this-tp4300065p4300253.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
Thanks everyone, I think this is very helpful... I will post more specific
questions once we start to get more familiar with solr 6.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Shawn Heisey
On 10/7/2016 10:33 AM, Renee Sun wrote:
> In our production, we have cloud based big data indexing using Solr for many
> years. We have developed lots business related logic/component deployed as
> webapps working seamlessly with solr.
>
> I will give you a simple example, we purchased multi-lingual processors (and
> many other 3rd parties) which we integrated with solr by carefully deploy
> the libraries (e.g.) in the tomcat container so they work together. This
> basically means we have to rewrite all those components to make it work with
> solr 5 or 6. 
>
> In my opinion, for those solr users like our company, it will really be
> beneficial if Solr could keep supporting deploying a war and maintain
> parallel support with its new standalone release, although this might be too
> much work? 

For right now, Solr is still a webapp.  It is not packaged as a .war
file, but the information that would have been extracted from a .war
file is still there, in exactly the same layout.  It is already
exploded, and the included Jetty accesses it as a webapp directly.  You
should be able to add the exploded webapp to another container like you
would a .war file ... although you are on your own to make it work if
you choose this path.

https://wiki.apache.org/solr/WhyNoWar

Eventually, no idea when, Solr will become a true standalone
application, not a webapp.  It is likely that this will be initially
accomplished by embedding Jetty directly into the code, so internally
Solr will remain much the same ... but after we pass that point, Solr
will be free to evolve considerably.

Thanks,
Shawn



Re: Problem with spellchecker component

2016-10-07 Thread Rajesh Hazari
What spellcheckers you have in your collection configs,
do you have any of these

 
wordbreak
solr.WordBreakSolrSpellChecker
.
.
.
.
 

  
default
textSpell
solr.IndexBasedSpellChecker
.
.
.
.
.
 

we have come up with these spellcheckers which works with our schema
definitions.

*Rajesh**.*

On Fri, Oct 7, 2016 at 2:36 PM, la...@2locos.com  wrote:

> I'm using Spellcheck component and it doesn't show me any error for
> combination of words with error, I want to know if it just work on one word
> or it also works on combination of words?and if so what should I do to
> makes it work?
>
> Ladan Nekuii
> Web Developer
> 2locos
> 300 Frank H. Ogawa Plaza, Suite 234
> Oakland, CA 94612
> Tel: 510-465-0101
> Fax: 510-465-0104
> www.2locos.com
>
>


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Rajesh Hazari
Renee,
I dont understand what would be the difficulties of putting your 3rd party
distributions in some contrib folders and
import them in your solr configs and your processors should get loaded
using solr class loader.

we have used our custom based synonym processor putting the jar in contrib
folder and imported in our solrconfig.xml

for ex: 
placed our custom jar file in
${solr_home}/contrib/analysis-extras/lucene-libs/ folder.
with solr 4.9 version using jetty.
Before we upgraded from tomcat based solr deployment we used to have our
custom jar file in solr.war/WEB-INF/lib.

I'm not sure if this answers your question, this to give you comfortability
of solr with jetty, which is more preferred deployment.


*Rajesh**.*

On Fri, Oct 7, 2016 at 3:09 PM, John Bickerstaff 
wrote:

> I won't speak for the committers, but I'm guessing you won't find a lot of
> support for the idea of continuing to provide a WAR file with the standard
> SOLR releases...
>
> I feel for you and your situation however - I've had to wrestle with a
> number of situations where a somewhat monolithic architecture was disturbed
> by newer ways of doing things...
>
> That leaves 2 options...
>
> A self-maintained build of the latest SOLR into a WAR file (challenging in
> it's own way) or the very least amount of code necessary to allow your
> Tomcat-based tools to talk to a 6.x Solr server running in Jetty...
>
> I imagine something better than re-writing all your code can be done,
> although I also don't think you can get away with no new code either...  At
> a high level, some kind of API/message bus/interface comes to mind, but I
> don't know enough about your situation to be able to guess what might be a
> good approach.
>
> If you're interested in a discussion about how to approach the change, I'd
> be happy to offer ideas, but I'd need to know how your other tools
> currently talk to Solr...  Of course, you may not want to even have that
> discussion if the task is just to big...
>
> On Fri, Oct 7, 2016 at 9:33 AM, Renee Sun  wrote:
>
> > Thanks ... but that is an extremely simplified situation.
> >
> > We are not just looking for Solr as a new tool to start using it.
> >
> > In our production, we have cloud based big data indexing using Solr for
> > many
> > years. We have developed lots business related logic/component deployed
> as
> > webapps working seamlessly with solr.
> >
> > I will give you a simple example, we purchased multi-lingual processors
> > (and
> > many other 3rd parties) which we integrated with solr by carefully deploy
> > the libraries (e.g.) in the tomcat container so they work together. This
> > basically means we have to rewrite all those components to make it work
> > with
> > solr 5 or 6.
> >
> > In my opinion, for those solr users like our company, it will really be
> > beneficial if Solr could keep supporting deploying a war and maintain
> > parallel support with its new standalone release, although this might be
> > too
> > much work?
> >
> > Thanks
> > Renee
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-
> > fearing-about-this-tp4300065p4300202.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread John Bickerstaff
I won't speak for the committers, but I'm guessing you won't find a lot of
support for the idea of continuing to provide a WAR file with the standard
SOLR releases...

I feel for you and your situation however - I've had to wrestle with a
number of situations where a somewhat monolithic architecture was disturbed
by newer ways of doing things...

That leaves 2 options...

A self-maintained build of the latest SOLR into a WAR file (challenging in
it's own way) or the very least amount of code necessary to allow your
Tomcat-based tools to talk to a 6.x Solr server running in Jetty...

I imagine something better than re-writing all your code can be done,
although I also don't think you can get away with no new code either...  At
a high level, some kind of API/message bus/interface comes to mind, but I
don't know enough about your situation to be able to guess what might be a
good approach.

If you're interested in a discussion about how to approach the change, I'd
be happy to offer ideas, but I'd need to know how your other tools
currently talk to Solr...  Of course, you may not want to even have that
discussion if the task is just to big...

On Fri, Oct 7, 2016 at 9:33 AM, Renee Sun  wrote:

> Thanks ... but that is an extremely simplified situation.
>
> We are not just looking for Solr as a new tool to start using it.
>
> In our production, we have cloud based big data indexing using Solr for
> many
> years. We have developed lots business related logic/component deployed as
> webapps working seamlessly with solr.
>
> I will give you a simple example, we purchased multi-lingual processors
> (and
> many other 3rd parties) which we integrated with solr by carefully deploy
> the libraries (e.g.) in the tomcat container so they work together. This
> basically means we have to rewrite all those components to make it work
> with
> solr 5 or 6.
>
> In my opinion, for those solr users like our company, it will really be
> beneficial if Solr could keep supporting deploying a war and maintain
> parallel support with its new standalone release, although this might be
> too
> much work?
>
> Thanks
> Renee
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-
> fearing-about-this-tp4300065p4300202.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Problem with spellchecker component

2016-10-07 Thread la...@2locos.com
I'm using Spellcheck component and it doesn't show me any error for combination 
of words with error, I want to know if it just work on one word or it also 
works on combination of words?and if so what should I do to makes it work?

Ladan Nekuii
Web Developer
2locos
300 Frank H. Ogawa Plaza, Suite 234
Oakland, CA 94612
Tel: 510-465-0101
Fax: 510-465-0104
www.2locos.com



Re: (ANNOUNCEMENT) Solr Examples reading group

2016-10-07 Thread Erick Erickson
Rick:

It'll be an uphill battle to change the configs I think.. That said
there are no plans to go in that direction, but it's certainly a valid
discussion to have.

I've seen a _lot_ of configs in the wild with weird indentation, they
tend to get that way because there are lots of situations I've seen
where people edit them through some kind of remote terminal and can
only edit in some vi-like editor. Which may be customized a zillion
different ways in terms of how indentation is handled. That's
something that we'd need to be sensitive to when considering an
indentation-sensitive format.

Erick

On Fri, Oct 7, 2016 at 9:50 AM, Rick Leir  wrote:
> Thanks for using the word bewildering, I agree.
>
>
> While we are talking of simplifying solrconfig.xml, may I mention YAML? I
> find the YAML format so much more readable than XML.
>
>
> I have not looked at the code which reads the config, so I do not know how
> big a change it is to use cfg4j and read in YAML.
>
>
> Perhaps I should ask this on the dev list. And I hope I am not revisiting a
> discussion which has already been decided.
>
> cheers -- Rick
>
>
> On 2016-10-05 10:15 AM, Erick Erickson wrote:
>>
>> Charlie:
>>
>> I like that idea. It's bewildering how much stuff is in the config. I
>> do think there's value in having it all there though since people
>> won't even know to look in, say, "kitchen-sink-solrconfig.xml" for all
>> the _other_ things that can be done
>>
>> Perhaps comment out all the "extra stuff" and move it to the end of
>> the config files?
>>
>> And how many people have been tripped up by un-defining the "text"
>> field in the schema and still having things break by the "df" field
>> being set to text in solrconfig.xml?
>>
>> FWIW,
>> Erick
>>
>> On Wed, Oct 5, 2016 at 1:29 AM, Charlie Hull  wrote:
>>>
>>> On 05/10/2016 02:19, Walter Underwood wrote:

 I’m trying to bring up 6.2.1 after experience with 1.2 through 5.5, and
 it
 is a mess.
 I’ve done solid good practice (Zookeeper ensemble, chroot, data on EBS
 volume),
 but I’m stuck with a completely non-instructive stack trace.

 We run a separate data directory and SOLR_HOME, which is poorly
 supported
 by
 the startup script. I gave up and used a symlink to make it sorta happy.

 Still, I’m stuck with this.

 org.apache.solr.common.SolrException: Error processing the request.
 CoreContainer is either not initialized or shutting down.
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

 I’ve tried loading configs with bin/solr and with zkcli.sh. I get
 useless
 error messages
 from each of them.

 I worked on a commercial search product for ten years, and this wouldn’t
 pass basic QA.
 We need to do some work, people.
>>>
>>>
>>> Last week I had to upgrade a proof-of-concept demo that ran on Solr 4 to
>>> 6.2.1, using the Sunburnt client. I thought I'd use the Files example to
>>> get
>>> started. This wasn't exactly painless: I ended up having to disable
>>> Managed
>>> Schemas (I couldn't get Solr to use the old schema.xml), mess around with
>>> Sunburnt (which assumed a certain URL structure to read schema.xml) and
>>> generally chop large chunks out solrconfig.xml until it began to work.
>>>
>>> When we see clients with Solr issues the problems often stem from there
>>> being *too much* in their configuration - they've started with one of the
>>> example configs and added often conflicting settings, there's all kinds
>>> of
>>> irrelevant stuff hanging around (you probably don't need a Hungarian
>>> stemmer
>>> unless you're Hungarian), a zillion schema types you'll never need...I'm
>>> beginning to wonder if an absolutely minimal Solr example configuration
>>> might be a good idea. It's going to have to make assumptions (e.g. it's
>>> probably going to assume English content) and it won't do anything
>>> particularly clever, but I feel it might be a better place to start than
>>> wondering what on earth all that commented-out XML does, especially the
>>> bits
>>> that say 'you probably shouldn't use this in production'. You can always
>>> copy those bits back in later...
>>>
>>> I'll be in Boston next week if anyone wants to chat about this. Maybe
>>> I'll
>>> have a go at our Lucene Hackday on Tuesday (still some places free!)
>>>
>>> http://www.meetup.com/New-England-Search-Technologies-NEST-Group/events/233492535/
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>>>
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


> On Oct 4, 2016, at 5:59 PM, Alexandre Rafalovitch 
> wrote:
>
> Hello,
>
> Three weeks ago I asked this list whether there was an interest in
> 

Re: (ANNOUNCEMENT) Solr Examples reading group

2016-10-07 Thread Alexandre Rafalovitch
The current direction is JSON. So yeah, I don't think YAML discussion
will get very far.

The problem with examples is that they also serve as documentation and
as a master source for configuration snippets (e.g. language-specific
types). Changing that is happening in several different JIRAs, but it
is not a perfect process.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 7 October 2016 at 23:50, Rick Leir  wrote:
> Thanks for using the word bewildering, I agree.
>
>
> While we are talking of simplifying solrconfig.xml, may I mention YAML? I
> find the YAML format so much more readable than XML.
>
>
> I have not looked at the code which reads the config, so I do not know how
> big a change it is to use cfg4j and read in YAML.
>
>
> Perhaps I should ask this on the dev list. And I hope I am not revisiting a
> discussion which has already been decided.
>
> cheers -- Rick
>
>
> On 2016-10-05 10:15 AM, Erick Erickson wrote:
>>
>> Charlie:
>>
>> I like that idea. It's bewildering how much stuff is in the config. I
>> do think there's value in having it all there though since people
>> won't even know to look in, say, "kitchen-sink-solrconfig.xml" for all
>> the _other_ things that can be done
>>
>> Perhaps comment out all the "extra stuff" and move it to the end of
>> the config files?
>>
>> And how many people have been tripped up by un-defining the "text"
>> field in the schema and still having things break by the "df" field
>> being set to text in solrconfig.xml?
>>
>> FWIW,
>> Erick
>>
>> On Wed, Oct 5, 2016 at 1:29 AM, Charlie Hull  wrote:
>>>
>>> On 05/10/2016 02:19, Walter Underwood wrote:

 I’m trying to bring up 6.2.1 after experience with 1.2 through 5.5, and
 it
 is a mess.
 I’ve done solid good practice (Zookeeper ensemble, chroot, data on EBS
 volume),
 but I’m stuck with a completely non-instructive stack trace.

 We run a separate data directory and SOLR_HOME, which is poorly
 supported
 by
 the startup script. I gave up and used a symlink to make it sorta happy.

 Still, I’m stuck with this.

 org.apache.solr.common.SolrException: Error processing the request.
 CoreContainer is either not initialized or shutting down.
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

 I’ve tried loading configs with bin/solr and with zkcli.sh. I get
 useless
 error messages
 from each of them.

 I worked on a commercial search product for ten years, and this wouldn’t
 pass basic QA.
 We need to do some work, people.
>>>
>>>
>>> Last week I had to upgrade a proof-of-concept demo that ran on Solr 4 to
>>> 6.2.1, using the Sunburnt client. I thought I'd use the Files example to
>>> get
>>> started. This wasn't exactly painless: I ended up having to disable
>>> Managed
>>> Schemas (I couldn't get Solr to use the old schema.xml), mess around with
>>> Sunburnt (which assumed a certain URL structure to read schema.xml) and
>>> generally chop large chunks out solrconfig.xml until it began to work.
>>>
>>> When we see clients with Solr issues the problems often stem from there
>>> being *too much* in their configuration - they've started with one of the
>>> example configs and added often conflicting settings, there's all kinds
>>> of
>>> irrelevant stuff hanging around (you probably don't need a Hungarian
>>> stemmer
>>> unless you're Hungarian), a zillion schema types you'll never need...I'm
>>> beginning to wonder if an absolutely minimal Solr example configuration
>>> might be a good idea. It's going to have to make assumptions (e.g. it's
>>> probably going to assume English content) and it won't do anything
>>> particularly clever, but I feel it might be a better place to start than
>>> wondering what on earth all that commented-out XML does, especially the
>>> bits
>>> that say 'you probably shouldn't use this in production'. You can always
>>> copy those bits back in later...
>>>
>>> I'll be in Boston next week if anyone wants to chat about this. Maybe
>>> I'll
>>> have a go at our Lucene Hackday on Tuesday (still some places free!)
>>>
>>> http://www.meetup.com/New-England-Search-Technologies-NEST-Group/events/233492535/
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>>>
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


> On Oct 4, 2016, at 5:59 PM, Alexandre Rafalovitch 
> wrote:
>
> Hello,
>
> Three weeks ago I asked this list whether there was an interest in
> running a virtual examples reading group, with "all questions
> welcome". The 

Re: Whether replicationFactor=2 makes sense?

2016-10-07 Thread Erick Erickson
Sure, replicationFactor=2 is fine. Solr goes to a lot of effort to
avoid split-brain issues
using Zookeeper.

You're confusing, I think, Solr node replication and Zookeeper. The Solr
replicationFactor has nothing to do with quorum. Having 2 is the same as 3.
Solr uses Zookeeper's Quorum sensing to insure that all Solr nodes
have a consistent picture of the cluster. Solr will refuse to index data if
_Zookeeper_ loses quorum.

But whether Solr has 2 or 3 replicas is not relevant. Solr indexes data through
the leader of each shard, and that keeps all replicas consistent.

As far as other impacts, adding a replica will have an impact on indexing
throughput, you'll have to see whether that makes any difference in your
situation. This is usually on the order of 10% or so, YMMV. And this is only
on the first replica you add, i.e. going from leader-only to 2
replicas costs, say,
10% on throughput, but adding yet another replica does NOT add another 10%
since the leader->replica updates are done in parallel.

Best,
Erick

On Fri, Oct 7, 2016 at 9:43 AM, Jeffery Yuan  wrote:
> We are trying to building our solr cloud servers, we want to increase
> replicationFactor, but don't want to set it as 3 as we have a lot of data.
>
> So I am wondering whether it makes sense to set replicationFactor as 2, and
> what's the impact, whether this will cause problem for replica leader
> election such as split brain etc?
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: (ANNOUNCEMENT) Solr Examples reading group

2016-10-07 Thread Rick Leir

Thanks for using the word bewildering, I agree.


While we are talking of simplifying solrconfig.xml, may I mention YAML? 
I find the YAML format so much more readable than XML.



I have not looked at the code which reads the config, so I do not know 
how big a change it is to use cfg4j and read in YAML.



Perhaps I should ask this on the dev list. And I hope I am not 
revisiting a discussion which has already been decided.


cheers -- Rick

On 2016-10-05 10:15 AM, Erick Erickson wrote:

Charlie:

I like that idea. It's bewildering how much stuff is in the config. I
do think there's value in having it all there though since people
won't even know to look in, say, "kitchen-sink-solrconfig.xml" for all
the _other_ things that can be done

Perhaps comment out all the "extra stuff" and move it to the end of
the config files?

And how many people have been tripped up by un-defining the "text"
field in the schema and still having things break by the "df" field
being set to text in solrconfig.xml?

FWIW,
Erick

On Wed, Oct 5, 2016 at 1:29 AM, Charlie Hull  wrote:

On 05/10/2016 02:19, Walter Underwood wrote:

I’m trying to bring up 6.2.1 after experience with 1.2 through 5.5, and it
is a mess.
I’ve done solid good practice (Zookeeper ensemble, chroot, data on EBS
volume),
but I’m stuck with a completely non-instructive stack trace.

We run a separate data directory and SOLR_HOME, which is poorly supported
by
the startup script. I gave up and used a symlink to make it sorta happy.

Still, I’m stuck with this.

org.apache.solr.common.SolrException: Error processing the request.
CoreContainer is either not initialized or shutting down.
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I’ve tried loading configs with bin/solr and with zkcli.sh. I get useless
error messages
from each of them.

I worked on a commercial search product for ten years, and this wouldn’t
pass basic QA.
We need to do some work, people.


Last week I had to upgrade a proof-of-concept demo that ran on Solr 4 to
6.2.1, using the Sunburnt client. I thought I'd use the Files example to get
started. This wasn't exactly painless: I ended up having to disable Managed
Schemas (I couldn't get Solr to use the old schema.xml), mess around with
Sunburnt (which assumed a certain URL structure to read schema.xml) and
generally chop large chunks out solrconfig.xml until it began to work.

When we see clients with Solr issues the problems often stem from there
being *too much* in their configuration - they've started with one of the
example configs and added often conflicting settings, there's all kinds of
irrelevant stuff hanging around (you probably don't need a Hungarian stemmer
unless you're Hungarian), a zillion schema types you'll never need...I'm
beginning to wonder if an absolutely minimal Solr example configuration
might be a good idea. It's going to have to make assumptions (e.g. it's
probably going to assume English content) and it won't do anything
particularly clever, but I feel it might be a better place to start than
wondering what on earth all that commented-out XML does, especially the bits
that say 'you probably shouldn't use this in production'. You can always
copy those bits back in later...

I'll be in Boston next week if anyone wants to chat about this. Maybe I'll
have a go at our Lucene Hackday on Tuesday (still some places free!)
http://www.meetup.com/New-England-Search-Technologies-NEST-Group/events/233492535/

Cheers

Charlie



wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Oct 4, 2016, at 5:59 PM, Alexandre Rafalovitch 
wrote:

Hello,

Three weeks ago I asked this list whether there was an interest in
running a virtual examples reading group, with "all questions
welcome". The response was sufficient to start planning the first
study group.

The current projected date is start of November. You can register for
it at: https://www.surveymonkey.com/r/YLNVC27 (it is the same survey
as the first time - no need to do that again if you responded before).
The first run is free.

Regards,
Alex.

P.s. If you have Solr-using customers (e.g. you are running a Solr
cloud business), feel free to announce this to them and/or run your
own outreach and contact me directly with bulk emails of those
interested.

P.p.s. I am also presenting on Solr examples at the Lucene/Solr
Revolution in about a week. If you have very strong opinions about
Solr examples, feel free to reach out directly and share them via
email or in person. The opinions do not have to be positive, though
having them constructive would be an nice. :-)

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/





--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 

Whether replicationFactor=2 makes sense?

2016-10-07 Thread Jeffery Yuan
We are trying to building our solr cloud servers, we want to increase
replicationFactor, but don't want to set it as 3 as we have a lot of data.

So I am wondering whether it makes sense to set replicationFactor as 2, and
what's the impact, whether this will cause problem for replica leader
election such as split brain etc?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: mm being ignored by edismax

2016-10-07 Thread Nick Hall
Thanks. I read through this discussion and got it to work by setting
q.op=OR when mm is set, and then it worked as it previously did.

I have two suggestions that may clarify things a little going forward.
First, as I read the documentation it does not seem clear to me that q.op
is intended to be used with the edismax (or dismax) query parsers. The
"common query parameters" page: https://cwiki.apache.
org/confluence/display/solr/Common+Query+Parameters does not list q.op as a
parameter. This parameter is listed on the "standard query parameters"
page: https://cwiki.apache.org/confluence/display/solr/
The+Standard+Query+Parser but not in the dismax page: https://cwiki.apache.
org/confluence/display/solr/The+DisMax+Query+Parser. For clarity it seems
like q.op should be added to the dismax page with a note about how its
behavior relates to mm?

Also, I use the Solr web interface to do test queries while debugging. This
web interface has no field for q.op as far as I can see, so with (e)dismax
the mm field does not work effectively with the web interface.

Thank you for your help,
Nick


On Thu, Oct 6, 2016 at 10:53 PM, Alexandre Rafalovitch 
wrote:

> I think it is the change in the OR and AND treatment that had been
> confusing a number of people. There were discussions before on the
> mailing list about it, for example
> http://search-lucene.com/m/eHNlzBMAHdfxcv1
>
> Regards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 7 October 2016 at 10:24, Nick Hall  wrote:
> > Hello,
> >
> > I'm working on upgrading a Solr installation from 4.0 to 6.2.1 and have
> > everything mostly working but have hit a snag. I kept the schema
> basically
> > the same, just made some minor changes to allow it to work with the new
> > version, but one of my queries is working differently with the new
> version
> > and I'm not sure why.
> >
> > In version 4.0 when I do a query with edismax like:
> >
> > "params":{
> >   "mm":"3",
> >   "debugQuery":"on",
> >   "indent":"on",
> >   "q":"string1 string2 string3 string4 string5",
> >   "qf":"vehicle_string_t^1",
> >   "wt":"json",
> >   "defType":"edismax"}},
> >
> > I get the results I expect, and the debugQuery shows:
> >
> > "rawquerystring":"string1 string2 string3 string4 string5",
> > "querystring":"string1 string2 string3 string4 string5",
> > "parsedquery":"+((DisjunctionMaxQuery((vehicle_string_t:\"string
> 1\"))
> > DisjunctionMaxQuery((vehicle_string_t:\"string 2\"))
> > DisjunctionMaxQuery((vehicle_string_t:\"string 3\"))
> > DisjunctionMaxQuery((vehicle_string_t:\"string 4\"))
> > DisjunctionMaxQuery((vehicle_string_t:\"string 5\")))~3)",
> > "parsedquery_toString":"+(((vehicle_string_t:\"string 1\")
> > (vehicle_string_t:\"string 2\") (vehicle_string_t:\"string 3\")
> > (vehicle_string_t:\"string 4\") (vehicle_string_t:\"string 5\"))~3)",
> >
> >
> > But when I run the same query with version 6.2.1, debugQuery shows:
> >
> > "rawquerystring":"string1 string2 string3 string4 string5",
> > "querystring":"string1 string2 string3 string4 string5",
> > "parsedquery":"(+(+DisjunctionMaxQuery((vehicle_string_t:\"string
> 1\"))
> > +DisjunctionMaxQuery((vehicle_string_t:\"string 2\"))
> > +DisjunctionMaxQuery((vehicle_string_t:\"string 3\"))
> > +DisjunctionMaxQuery((vehicle_string_t:\"string 4\"))
> > +DisjunctionMaxQuery((vehicle_string_t:\"string 5\"/no_coord",
> > "parsedquery_toString":"+(+(vehicle_string_t:\"string 1\")
> > +(vehicle_string_t:\"string 2\") +(vehicle_string_t:\"string 3\")
> > +(vehicle_string_t:\"string 4\") +(vehicle_string_t:\"string 5\"))",
> >
> >
> > You can see that the key difference is that in version 4 it uses the "~3"
> > to indicate the mm, but in 6.2.1 it doesn't matter what I have mm set to,
> > it always ends with "/no_coord" and is trying to match all 5 strings even
> > if mm is set to 1, so mm is being completely ignored.
> >
> > I imagine there is some behavior that changed between 4 and 6.2.1 that I
> > need to adjust something in my configuration to account for, but I'm
> > scratching my head right now. Has anyone else seen this and can point me
> in
> > the right direction? Thanks,
> >
> > Nick
>


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
Thanks ... but that is an extremely simplified situation.

We are not just looking for Solr as a new tool to start using it.

In our production, we have cloud based big data indexing using Solr for many
years. We have developed lots business related logic/component deployed as
webapps working seamlessly with solr.

I will give you a simple example, we purchased multi-lingual processors (and
many other 3rd parties) which we integrated with solr by carefully deploy
the libraries (e.g.) in the tomcat container so they work together. This
basically means we have to rewrite all those components to make it work with
solr 5 or 6. 

In my opinion, for those solr users like our company, it will really be
beneficial if Solr could keep supporting deploying a war and maintain
parallel support with its new standalone release, although this might be too
much work? 

Thanks 
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300202.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-07 Thread Erick Erickson
As both Jan and I mentioned, reindexing is _always_ the least error prone,
so please do that if at all possible And while you're at it it's a fine time
to make any tweaks like adding DocValues for fields you sort or
facet on, perhaps enable the new return stored fields from doc values
functionality and the like.

However, if you absolutely _have_ to upgrade here's what I'd do.

1> create a leader-only, 5-node 6x cluster

2> upgrade one index from each shard 4x->6x and put it on the
correct shard on your new cluster. This is perhaps the most
error-prone. Did you get the index with the same hash range on
the right node for  each shard?

3> verify that it's consistent.

4> Use the collections API to ADDREPLICA on the new system
to build it out to 3 replicas. That'll automatically replicate the index
from the leader. You can choose exactly what node each replica
goes on, although Solr will do a pretty good job of spreading
them out and you can even have those decisions done by a set
of rules you specify...

Best,
Erick

On Fri, Oct 7, 2016 at 4:08 AM, Neeraj Bhatt  wrote:
> Thanks Jan for clarifying, I think I will pull all documents from data
> source again as you and Eric suggested
>
> Thanks
> neeraj
>
> On Fri, Oct 7, 2016 at 2:38 PM, Jan Høydahl  wrote:
>
>> As Erick suggests, you should setup an empty 6.x environment,
>> create an empty collection with shards=5 replicationFactor=3
>> and then re-index all your content from your data source. Once that
>> is in, you can decommission your old cluster.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 7. okt. 2016 kl. 09.55 skrev Neeraj Bhatt :
>> >
>> > Hi Eric
>> >
>> > Thanks for suggestion I was able to upgrade one shard one replica data
>> from
>> > 4.1 to 6.2 through index upgrader, but the new problem is since we were
>> > using solr cloud with multiple shards(5) with some replication (3) so do
>> I
>> > need to manually copy all index directory data for each shard upgrade and
>> > for each replica and place it in new index directory of solr 6.2 ?
>> >
>> > This seems to be error prone as there will be 15 index directories
>> >
>> > thanks
>> >
>> > On Tue, Oct 4, 2016 at 8:35 AM, Erick Erickson 
>> > wrote:
>> >
>> >> the very easiest way is to re-index. 10M documents shouldn't take
>> >> very long unless they're no longer available...
>> >>
>> >> When you say you tried to use the index upgrader, which one? You'd
>> >> have to use the one distributed with 5.x to upgrade from 4.x->5.x, then
>> >> use the one distributed with 6x to go from 5.x->6.x.
>> >>
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Oct 3, 2016 at 8:01 PM, Neeraj Bhatt > >
>> >> wrote:
>> >>> Hello All
>> >>>
>> >>> We are trying to upgrade our production solr with 10 million documents
>> >> from
>> >>> solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2
>> >>>
>> >>> How to upgrade the lucene index created by solr. Should I go into
>> indexes
>> >>> created by each shard and upgrade and  replicate it manually ? Also I
>> >> tried
>> >>> using Index upgrader in one replica of one shard as a test but it gives
>> >>> error as it is looking for _4c.si file and it is not there
>> >>>
>> >>> Any idea what is the easy way to upgrade solr cloud with a 10m
>> repsoitory
>> >>>
>> >>> Thanks
>> >>> neeraj
>> >>
>>
>>


Re: London Lucene Hackday is now running

2016-10-07 Thread Charlie Hull
Yes I'll blog about it and we'll try and get as much as possible captured
in the Github folder. If you've got ideas for Tuesday please could you add
them to that event's Meetup page?

Cheers

Charlie

On 7 October 2016 at 16:20, Alexandre Rafalovitch 
wrote:

> Awesome. Is there a postmortem and lessons learned that we could build upon
> the next week?
>
> I am especially interested in the minimal example outcome, since I
> experimented with that myself.
>
> Regards,
>Alex
> P.s. maybe for next week's meetup, we could look at generating reports out
> of Jira exports.
>
> On 7 Oct 2016 4:52 PM, "Charlie Hull"  wrote:
>
> > Hi all,
> >
> > We're running a Lucene hackday in London - you can follow along with
> > Twitter using hashtag #LuceneSolrLondon and see what we're doing on
> Github
> > at https://github.com/flaxsearch/london-hackday-2016 - as the README
> shows
> > we're currently looking at:
> >
> >1.
> >
> >A Browser-driven explorer for Lucene indexes: “Marple”
> >https://github.com/flaxsearch/marple *- sort of like Luke*
> >2.
> >
> >An absolutely minimal Solr example framework *- a team to try
> installing
> >Solr from scratch and note down problems & issues with the examples
> and
> >guidance*
> > > website-searches-and-more/>
> >3.
> >
> >Different replicas giving different result positions e.g.:
> >https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> > 201301.mbox/%3czarafa.51006f06.0ea4.1386468330e70...@mail.openindex.io
> %3E
> >4.
> >
> >Streaming with Solr 6
> >
> > Cheers
> >
> > Charlie
> >
>


Re: London Lucene Hackday is now running

2016-10-07 Thread Alexandre Rafalovitch
Awesome. Is there a postmortem and lessons learned that we could build upon
the next week?

I am especially interested in the minimal example outcome, since I
experimented with that myself.

Regards,
   Alex
P.s. maybe for next week's meetup, we could look at generating reports out
of Jira exports.

On 7 Oct 2016 4:52 PM, "Charlie Hull"  wrote:

> Hi all,
>
> We're running a Lucene hackday in London - you can follow along with
> Twitter using hashtag #LuceneSolrLondon and see what we're doing on Github
> at https://github.com/flaxsearch/london-hackday-2016 - as the README shows
> we're currently looking at:
>
>1.
>
>A Browser-driven explorer for Lucene indexes: “Marple”
>https://github.com/flaxsearch/marple *- sort of like Luke*
>2.
>
>An absolutely minimal Solr example framework *- a team to try installing
>Solr from scratch and note down problems & issues with the examples and
>guidance*
> website-searches-and-more/>
>3.
>
>Different replicas giving different result positions e.g.:
>https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201301.mbox/%3czarafa.51006f06.0ea4.1386468330e70...@mail.openindex.io%3E
>4.
>
>Streaming with Solr 6
>
> Cheers
>
> Charlie
>


Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-07 Thread Neeraj Bhatt
Thanks Jan for clarifying, I think I will pull all documents from data
source again as you and Eric suggested

Thanks
neeraj

On Fri, Oct 7, 2016 at 2:38 PM, Jan Høydahl  wrote:

> As Erick suggests, you should setup an empty 6.x environment,
> create an empty collection with shards=5 replicationFactor=3
> and then re-index all your content from your data source. Once that
> is in, you can decommission your old cluster.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 7. okt. 2016 kl. 09.55 skrev Neeraj Bhatt :
> >
> > Hi Eric
> >
> > Thanks for suggestion I was able to upgrade one shard one replica data
> from
> > 4.1 to 6.2 through index upgrader, but the new problem is since we were
> > using solr cloud with multiple shards(5) with some replication (3) so do
> I
> > need to manually copy all index directory data for each shard upgrade and
> > for each replica and place it in new index directory of solr 6.2 ?
> >
> > This seems to be error prone as there will be 15 index directories
> >
> > thanks
> >
> > On Tue, Oct 4, 2016 at 8:35 AM, Erick Erickson 
> > wrote:
> >
> >> the very easiest way is to re-index. 10M documents shouldn't take
> >> very long unless they're no longer available...
> >>
> >> When you say you tried to use the index upgrader, which one? You'd
> >> have to use the one distributed with 5.x to upgrade from 4.x->5.x, then
> >> use the one distributed with 6x to go from 5.x->6.x.
> >>
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Oct 3, 2016 at 8:01 PM, Neeraj Bhatt  >
> >> wrote:
> >>> Hello All
> >>>
> >>> We are trying to upgrade our production solr with 10 million documents
> >> from
> >>> solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2
> >>>
> >>> How to upgrade the lucene index created by solr. Should I go into
> indexes
> >>> created by each shard and upgrade and  replicate it manually ? Also I
> >> tried
> >>> using Index upgrader in one replica of one shard as a test but it gives
> >>> error as it is looking for _4c.si file and it is not there
> >>>
> >>> Any idea what is the easy way to upgrade solr cloud with a 10m
> repsoitory
> >>>
> >>> Thanks
> >>> neeraj
> >>
>
>


London Lucene Hackday is now running

2016-10-07 Thread Charlie Hull
Hi all,

We're running a Lucene hackday in London - you can follow along with
Twitter using hashtag #LuceneSolrLondon and see what we're doing on Github
at https://github.com/flaxsearch/london-hackday-2016 - as the README shows
we're currently looking at:

   1.

   A Browser-driven explorer for Lucene indexes: “Marple”
   https://github.com/flaxsearch/marple *- sort of like Luke*
   2.

   An absolutely minimal Solr example framework *- a team to try installing
   Solr from scratch and note down problems & issues with the examples and
   guidance*
   

   3.

   Different replicas giving different result positions e.g.:
   
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3czarafa.51006f06.0ea4.1386468330e70...@mail.openindex.io%3E
   4.

   Streaming with Solr 6

Cheers

Charlie


Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-07 Thread Jan Høydahl
As Erick suggests, you should setup an empty 6.x environment,
create an empty collection with shards=5 replicationFactor=3
and then re-index all your content from your data source. Once that
is in, you can decommission your old cluster.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 7. okt. 2016 kl. 09.55 skrev Neeraj Bhatt :
> 
> Hi Eric
> 
> Thanks for suggestion I was able to upgrade one shard one replica data from
> 4.1 to 6.2 through index upgrader, but the new problem is since we were
> using solr cloud with multiple shards(5) with some replication (3) so do I
> need to manually copy all index directory data for each shard upgrade and
> for each replica and place it in new index directory of solr 6.2 ?
> 
> This seems to be error prone as there will be 15 index directories
> 
> thanks
> 
> On Tue, Oct 4, 2016 at 8:35 AM, Erick Erickson 
> wrote:
> 
>> the very easiest way is to re-index. 10M documents shouldn't take
>> very long unless they're no longer available...
>> 
>> When you say you tried to use the index upgrader, which one? You'd
>> have to use the one distributed with 5.x to upgrade from 4.x->5.x, then
>> use the one distributed with 6x to go from 5.x->6.x.
>> 
>> 
>> Best,
>> Erick
>> 
>> On Mon, Oct 3, 2016 at 8:01 PM, Neeraj Bhatt 
>> wrote:
>>> Hello All
>>> 
>>> We are trying to upgrade our production solr with 10 million documents
>> from
>>> solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2
>>> 
>>> How to upgrade the lucene index created by solr. Should I go into indexes
>>> created by each shard and upgrade and  replicate it manually ? Also I
>> tried
>>> using Index upgrader in one replica of one shard as a test but it gives
>>> error as it is looking for _4c.si file and it is not there
>>> 
>>> Any idea what is the easy way to upgrade solr cloud with a 10m repsoitory
>>> 
>>> Thanks
>>> neeraj
>> 



Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Jan Høydahl
To give an exanmple:

While you earlier would do some thing like

Download and install Tomcat (or some other servlet container)
Setup Tomcat as a service
Download and unpack Solr
Create a SOLR_HOME folder with correct content
copy solr.war into tomcat/webapps
set CATALINA_OPTS=“-Dsolr.solr.home=/path/to/home -Dsolr.x.y=z…. GC-flags etc”
Setup  Tomcat as a service
service tomcat start

You would with 6.x do:

Download Solr and unpack the install-script
solr/bin/install_solr_service solr-6.2.0.tgz  # Install
Tune /etc/default/solr.in.sh to your likings (mem, port, solr-home, Zk etc)
service solr start (or bin/solr start [options])

Your client would talk to Solr on typically http://host.name:8983/solr/ as a 
standalone server, not as one out of many webapps on 8080.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 7. okt. 2016 kl. 02.32 skrev Alexandre Rafalovitch :
> 
> Treat Solr as a blackbox standalone database. Your MySQL is running
> standalone, right?
> 
> And try to go to Solr 6, if you can. 5 is not latest anymore and there had
> been lots of scaling improvements in 6.
> 
> Regards,
>Alex
> 
> On 7 Oct 2016 5:02 AM, "Renee Sun"  wrote:
> 
>> need some general advises please...
>> 
>> our infra is built with multiple webapps with tomcat ... the scale layer is
>> archived on top of those webapps which work hand-in-hand with solr admin
>> APIs / shard queries / commit or optimize / core management etc etc.
>> 
>> While I have not get a chance to actually play with solr 5 yet, just by
>> imagination, we will be facing some huge changes in our infra to be able to
>> upgrade to solr 5, yes?
>> 
>> Thanks
>> Renee
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-
>> fearing-about-this-tp4300065.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: [Solr-5-4-1] Why SolrCloud leader is putting all replicas in recovery at the same time ?

2016-10-07 Thread Petetin Ludovic

Hi Erick,

thanks for taking time to answer.

What we are looking for here is not the root cause but a mechanism to prevent 
the leader from doing this silly thing of putting all the replicas in recovery 
at the same time which inevitably leads to a downtime.

Is there a similar conf in Solr as the conf below in Elastic Search which 
guarantees that we have always enough nodes up to serve the traffic ?

gateway.recover_after_nodes
Recover as long as this many data or master nodes have joined the cluster.
To answer your questions as well, when the problem occured, we had :
- no traffic peak
- no JVM memory issue
- no system memory issue
- no network issue
- no disk space issue
- no Zookeepers issue

The impacted cluster is composed of 6 DELL R420 servers with 96GB of RAM, with 
an index of 10M of documents serving our traffic (400 queries per second) in 
France.
We have several other Solr clusters in the same datacenter sharing the same 
Zookeepers but serving other European and American countries which were not 
impacted when the issue occured.

I paste below the logs of the leader at the moment of the issue and the logs of 
the replica which is put in recovery in this paste (solr-57).

1. Log of the leader
2016-10-04 18:18:46,326 [searcherExecutor-87-thread-1-processing-s:shard1 
x:fr_blue c:fr_blue 
n:dc1-s6-prod-solr-52.prod.dc1.kelkoo.net:8080_searchsolrnodefr 
r:dc1-s6-prod-solr-52.prod.dc1.kelkoo.net:8080_searchsolrnodefr_fr_blue] INFO  
org.apache.solr.core.QuerySenderListener:newSearcher:49  - QuerySenderListener 
sending requests to Searcher@43a597cb[fr_blue] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_gxjd(5.4.1):C3804213/811353:delGen=2)
 Uninverting(_gwmj(5.4.1):C3449095/1848374:delGen=8) 
Uninverting(_gxkn(5.4.1):C2609089/459163:delGen=3) 
Uninverting(_gxs7(5.4.1):c669028/10913:delGen=1) 
Uninverting(_gxxf(5.4.1):c454860/104356:delGen=1) 
Uninverting(_gy55(5.4.1):c373416/59027) Uninverting(_gybq(5.4.1):c222631/1631) 
Uninverting(_gy6c(5.4.1):c29736/591:delGen=1) 
Uninverting(_gy76(5.4.1):C2020/45:delGen=2) 
Uninverting(_gy74(5.4.1):C1909/40:delGen=2) 
Uninverting(_gyhk(5.4.1):c192078/656) Uninverting(_gye4(5.4.1):c66133/977) 
Uninverting(_gyio(5.4.1):c43028/274) Uninverting(_gyjk(5.4.1):c48841/276) 
Uninverting(_gykl(5.4.1):c42989/101) Uninverting(_gylf(5.4.1):c31401/50) 
Uninverting(_gylt(5.4.1):c5802/6) Uninverting(_gyln(5.4.1):c4119/1) 
Uninverting(_gyn1(5.4.1):c36015/68) Uninverting(_gyo6(5.4.1):c27906/18) 
Uninverting(_gyox(5.4.1):c23494/4) Uninverting(_gyp3(5.4.1):c3686) 
Uninverting(_gyov(5.4.1):C1124) Uninverting(_gyot(5.4.1):C1224/1:delGen=1) 
Uninverting(_gyp0(5.4.1):C514/1:delGen=1) 
Uninverting(_gyoy(5.4.1):C573/1:delGen=1) 
Uninverting(_gyp1(5.4.1):C120/1:delGen=1) 
Uninverting(_gyou(5.4.1):C747/1:delGen=1) Uninverting(_gyor(5.4.1):C467) 
Uninverting(_gyp2(5.4.1):C101) Uninverting(_gyoz(5.4.1):C250)))}
2016-10-04 18:18:46,326 [searcherExecutor-87-thread-1-processing-s:shard1 
x:fr_blue c:fr_blue 
n:dc1-s6-prod-solr-52.prod.dc1.kelkoo.net:8080_searchsolrnodefr 
r:dc1-s6-prod-solr-52.prod.dc1.kelkoo.net:8080_searchsolrnodefr_fr_blue] INFO  
org.apache.solr.core.QuerySenderListener:newSearcher:96  - QuerySenderListener 
done.
2016-10-04 18:25:55,647 [commitScheduler-89-thread-1] INFO  
org.apache.solr.core.SolrDeletionPolicy:onCommit:99  - 
SolrDeletionPolicy.onCommit: commits: num=2
   
commit{dir=/opt/kookel/data/searchSolrNode/solrindex/fr1_blue/index.20160731234345634,segFN=segments_7ma,generation=9874}
   
commit{dir=/opt/kookel/data/searchSolrNode/solrindex/fr1_blue/index.20160731234345634,segFN=segments_7mb,generation=9875}
2016-10-04 18:25:55,650 [commitScheduler-89-thread-1] INFO  
org.apache.solr.core.SolrDeletionPolicy:updateCommits:166  - newest commit 
generation = 9875
2016-10-04 18:25:56,010 [commitScheduler-89-thread-1] INFO  
org.apache.solr.search.SolrIndexSearcher::237  - Opening 
Searcher@39da46db[fr_blue] realtime
2016-10-04 18:40:56,195 [commitScheduler-89-thread-1] INFO  
org.apache.solr.core.SolrDeletionPolicy:onCommit:99  - 
SolrDeletionPolicy.onCommit: commits: num=2
   
commit{dir=/opt/kookel/data/searchSolrNode/solrindex/fr1_blue/index.20160731234345634,segFN=segments_7mb,generation=9875}
   
commit{dir=/opt/kookel/data/searchSolrNode/solrindex/fr1_blue/index.20160731234345634,segFN=segments_7mc,generation=9876}
2016-10-04 18:40:56,196 [commitScheduler-89-thread-1] INFO  
org.apache.solr.core.SolrDeletionPolicy:updateCommits:166  - newest commit 
generation = 9876
2016-10-04 18:40:56,387 [commitScheduler-89-thread-1] INFO  
org.apache.solr.search.SolrIndexSearcher::237  - Opening 
Searcher@7ce3335c[fr_blue] realtime
2016-10-04 18:43:39,426 
[updateExecutor-2-thread-105261-processing-http:dc1-s6-prod-solr-57.prod.dc1.kelkoo.net:8080//searchsolrnodefr//fr_blue
 s:shard1 x:fr_blue c:fr_blue 
n:dc1-s6-prod-solr-52.prod.dc1.kelkoo.net:8080_searchsolrnodefr 

Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-07 Thread Neeraj Bhatt
Hi Eric

Thanks for suggestion I was able to upgrade one shard one replica data from
4.1 to 6.2 through index upgrader, but the new problem is since we were
using solr cloud with multiple shards(5) with some replication (3) so do I
need to manually copy all index directory data for each shard upgrade and
for each replica and place it in new index directory of solr 6.2 ?

This seems to be error prone as there will be 15 index directories

thanks

On Tue, Oct 4, 2016 at 8:35 AM, Erick Erickson 
wrote:

> the very easiest way is to re-index. 10M documents shouldn't take
> very long unless they're no longer available...
>
> When you say you tried to use the index upgrader, which one? You'd
> have to use the one distributed with 5.x to upgrade from 4.x->5.x, then
> use the one distributed with 6x to go from 5.x->6.x.
>
>
> Best,
> Erick
>
> On Mon, Oct 3, 2016 at 8:01 PM, Neeraj Bhatt 
> wrote:
> > Hello All
> >
> > We are trying to upgrade our production solr with 10 million documents
> from
> > solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2
> >
> > How to upgrade the lucene index created by solr. Should I go into indexes
> > created by each shard and upgrade and  replicate it manually ? Also I
> tried
> > using Index upgrader in one replica of one shard as a test but it gives
> > error as it is looking for _4c.si file and it is not there
> >
> > Any idea what is the easy way to upgrade solr cloud with a 10m repsoitory
> >
> > Thanks
> > neeraj
>