Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread philippa griggs
Hello Erick,

Thanks for your reply.  

We have one collection and are writing documents to that collection all the 
time- it peaks at around 2,500 per minute and dips to 250 per minute,  the size 
of the document varies. On each node we have around 55,000,000 documents with a 
data size of 43G located on a drive of 200G.

Each node has 122G memory, the heap size is currently set at 45G although we 
have plans to increase this to 50G. 

The heap settings we are using are:

 -XX: +UseG1GC, 
-XX:+ParallelRefProcEnabled.

Please let me know if you need any more information.

Philippa

From: Erick Erickson 
Sent: 07 December 2015 16:53
To: solr-user
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Tell us a bit more.

Are you adding documents to your collections or adding more
collections? Solr is a balancing act between the number of docs you
have on each node and the memory you have allocated. If you're
continually adding docs to Solr, you'll eventually run out of memory
and/or hit big GC pauses.

How much memory are you allocating to Solr? How much physical memory
to you have? etc.

Best,
Erick


On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
 wrote:
> Hello,
>
>
> I'm using:
>
>
> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
>
>
> Zookeeper 3.4.6.
>
>
> About half a year ago we upgraded to Solr 5.2.1 and since then have been 
> experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
> will go down. Sometimes they will recover by themselves but more often than 
> not we have to step in to restart nodes.
>
>
> Nothing in the logs jumps out as being the problem. With the latest wipe out 
> we noticed that 10 out of the 20 nodes had garbage collections over 1min all 
> at the same time, with the heap usage spiking up in some cases to 80%. We 
> also noticed the amount of selects run on the solr cluster increased just 
> before the wipe out.
>
>
> Increasing the heap size seems to help for a while but then it starts 
> happening again- so its more like a delay than a fix. Our GC settings are set 
> to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
>
>
> With our previous version of solr (4.10.0) this didn't happen. We had 
> nodes/shards go down but it was contained, with the new version they all seem 
> to go at around the same time. We can't really continue just increasing the 
> heap size and would like to solve this issue rather than delay it.
>
>
> Has anyone experienced something simular?
>
> Is there a difference between the two versions around the recovery process?
>
> Does anyone have any suggestions on a fix.
>
>
> Many thanks
>
>
> Philippa
>

Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Emir Arnautovic

Hi Ali,
This thread is blocked because cannot obtain update lock - in this 
particular case when doing soft commit. I am guessing that there others 
are blocked for the same reason. Can you tell us bit more about your 
setup and indexing load and procedure? Do you do explicit commits?


Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 08.12.2015 08:16, Ali Nazemian wrote:

Hi,
There is a while since I have had problem with Solr 5.2.1 and I could not
fix it yet. The only think that is clear to me is when I send bulk update
to Solr the commit thread will be blocked! Here is the thread dump output:

"qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting for
monitor entry [0x7f081cf04000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
- waiting to lock <0x00067ba2e660> (a java.lang.Object)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- None

FYI there are lots of blocked thread in thread dump report and Solr becomes
really slow in this case. The temporary solution would be restarting Solr.
But, I am really sick of restarting! I really appreciate if somebody can
help me to solve this problem?

Best regards.



multiword synonym and ManagedSynonymFilterFactory

2015-12-08 Thread Suad Kozlic
Hello,
did someone tested this combination ?
I am not getting any result.

-- 
Suad Kozlić, mr.el.-dipl.ing.el.


Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread philippa griggs
Hello Emir,

The query load is around 35 requests per min on each shard, we don't document 
route so we query the entire index.

We do have some heavy queries like faceting and its possible that a heavy 
queries is causing the nodes to go down- we are looking into this.  I'm new to 
solr so this could be a slightly stupid question but would a heavy query cause 
most of the nodes to go down? This didn't happen with the previous solr version 
we were using Solr 4.10.0, we did have nodes/shards which went down but there 
wasn't wipe out effect where most of the nodes go.

Many thanks

Philippa 


From: Emir Arnautovic 
Sent: 08 December 2015 10:38
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Hi Phillippa,
My guess would be that you are running some heavy queries (faceting/deep
paging/large pages) or have high query load (can you give bit details
about load) or have misconfigured caches. Do you query entire index or
you have query routing?

You have big machine and might consider running two Solr on each node
(with smaller heap) and split shards so queries can be more
parallelized, resources better utilized, and smaller heap to GC.

Regards,
Emir

On 08.12.2015 10:49, philippa griggs wrote:
> Hello Erick,
>
> Thanks for your reply.
>
> We have one collection and are writing documents to that collection all the 
> time- it peaks at around 2,500 per minute and dips to 250 per minute,  the 
> size of the document varies. On each node we have around 55,000,000 documents 
> with a data size of 43G located on a drive of 200G.
>
> Each node has 122G memory, the heap size is currently set at 45G although we 
> have plans to increase this to 50G.
>
> The heap settings we are using are:
>
>   -XX: +UseG1GC,
> -XX:+ParallelRefProcEnabled.
>
> Please let me know if you need any more information.
>
> Philippa
> 
> From: Erick Erickson 
> Sent: 07 December 2015 16:53
> To: solr-user
> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>
> Tell us a bit more.
>
> Are you adding documents to your collections or adding more
> collections? Solr is a balancing act between the number of docs you
> have on each node and the memory you have allocated. If you're
> continually adding docs to Solr, you'll eventually run out of memory
> and/or hit big GC pauses.
>
> How much memory are you allocating to Solr? How much physical memory
> to you have? etc.
>
> Best,
> Erick
>
>
> On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
>  wrote:
>> Hello,
>>
>>
>> I'm using:
>>
>>
>> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
>>
>>
>> Zookeeper 3.4.6.
>>
>>
>> About half a year ago we upgraded to Solr 5.2.1 and since then have been 
>> experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
>> will go down. Sometimes they will recover by themselves but more often than 
>> not we have to step in to restart nodes.
>>
>>
>> Nothing in the logs jumps out as being the problem. With the latest wipe out 
>> we noticed that 10 out of the 20 nodes had garbage collections over 1min all 
>> at the same time, with the heap usage spiking up in some cases to 80%. We 
>> also noticed the amount of selects run on the solr cluster increased just 
>> before the wipe out.
>>
>>
>> Increasing the heap size seems to help for a while but then it starts 
>> happening again- so its more like a delay than a fix. Our GC settings are 
>> set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
>>
>>
>> With our previous version of solr (4.10.0) this didn't happen. We had 
>> nodes/shards go down but it was contained, with the new version they all 
>> seem to go at around the same time. We can't really continue just increasing 
>> the heap size and would like to solve this issue rather than delay it.
>>
>>
>> Has anyone experienced something simular?
>>
>> Is there a difference between the two versions around the recovery process?
>>
>> Does anyone have any suggestions on a fix.
>>
>>
>> Many thanks
>>
>>
>> Philippa
> >

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: secure solr 5.3.1

2015-12-08 Thread kostali hassan
if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
hosting? or dont need a web server having cloud..?

2015-12-08 1:58 GMT+00:00 Don Bosco Durai :

> Have you considered running your Solr as SolrCloud with embedded zookeeper?
>
> If you do, you have multiple options. Basic Auth, Kerberos and
> authorization support.
>
>
> Bosco
>
>
>
>
>
> On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
>
> >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> >searching for the best way to secure my server solr but I found only for
> >cloud mode.
>
>


Re: Use multiple istance simultaneously

2015-12-08 Thread Emir Arnautovic
Can you tolerate having indices in different state or you plan to keep 
them in sync with controlled commits. DIH-ing content from source when 
new machine is needed  will probably be slow and I am afraid that you 
will end up simulating master-slave model (copying state from one of 
healthy nodes and DIH-ing diff). I would recommend using SolrCloud with 
single shard and let Solr do the hard work.


Regards,
Emir

On 04.12.2015 14:37, Gian Maria Ricci - aka Alkampfer wrote:

Many thanks for your response.

I worked with Solr until early version 4.0, then switched to ElasticSearch
for a variety of reasons. I've used replication in the past with SolR, but
with Elasticsearch basically I had no problem because it works similar to
SolrCloud by default and with almost zero configuration.

Now I've a customer that want to use Solr, and he want the simplest possible
stuff to maintain in production. Since most of the work will be done by Data
Import Handler, having multiple parallel and independent machines is easy to
maintain. If one machine fails, it is enough to configure another machine,
configure core and restart DIH.

I'd like to know if other people went through this path in the past.

--
Gian Maria Ricci
Cell: +39 320 0136949
 


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: giovedì 3 dicembre 2015 10:15
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

On 12/3/2015 1:25 AM, Gian Maria Ricci - aka Alkampfer wrote:

In such a scenario could it be feasible to simply configure 2 or 3
identical instance of Solr and configure the application that transfer
data to solr to all the instances simultaneously (the approach will be
a DIH incremental for some core and an external application that push
data continuously for other cores)? Which could be the drawback of
using this approach?

When I first set up Solr, I used replication.  Then version 3.1.0 was
released, including a non-backward-compatible upgrade to javabin, and it was
not possible to replicate between 1.x and 3.x.

This incompatibility meant that it would not be possible to do a gradual
upgrade to 3.x, where the slaves are upgraded first and then the master.

To get around the problem, I basically did exactly wh at you've described.
I turned off replication and configured a second copy of my build program to
update what used to be slave servers.

Later, when I moved to a SolrJ program for index maintenance, I made one
copy of the maintenance program capable of updating multiple copies of the
index in parallel.

I have stuck with this architecture through 4.x and moving into 5.x, even
though I could go back to replication or switch to SolrCloud.
Having completely independent indexes allows a great deal of flexibility
with upgrades and testing new configurations, flexibility that isn't
available with SolrCloud or master-slave replication.

Thanks,
Shawn



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr Auto-Complete

2015-12-08 Thread Salman Ansari
Thanks Alexandre. I think it is clear.

On Sun, Dec 6, 2015 at 5:21 PM, Alexandre Rafalovitch 
wrote:

> For suffix matches, you copy text the field and in the different type add
> string reversal for both index and query portions. So you are doing prefix
> matching algorithm but on reversed strings.
>
> I can dig up an example if it is not clear.
> On 6 Dec 2015 8:06 am, "Salman Ansari"  wrote:
>
> > That is right. I am actually looking for phrase prefixes not each term
> > prefix within the phrase. That satisfies my requirements. However, my
> > additional question was how do I manipulate the filedType to later allow
> > for suffix matches as well? or will that be a completely different
> > fieldType definition?
> >
> > Regards,
> > Salman
> >
> >
> > On Sun, Dec 6, 2015 at 2:12 PM, Andrea Gazzarini 
> > wrote:
> >
> > > Sorry, my damned mobile: "Is that close to what you were looking for?"
> > >
> > > 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini :
> > >
> > > > Do you mean "phrase" or "term" prefixes? If you try to put a field
> > value
> > > > (two or more terms) in the analysis page you will see what the index
> > > > analyzer chain (of my example field type) is doing. The whole value
> is
> > > > managed as a single-ngrammed token, so you will get only a phrase
> > prefix
> > > > search, as in your request.
> > > >
> > > > If you want to manage also terms prefixes, I would also index another
> > > > field (similar to the example you posted); then, the search handler
> > with
> > > > e(dismax) would have something like this:
> > > >
> > > >
> > > >>
> > > > text_suggestion_phrase_prefix_search^b1
> > > > text_suggestion_terms_prefix_search^b2
> > > >
> > > > 
> > > >
> > > >
> > > > b1 and b2 values strictly depend on your search logic.
> > > >
> > > > Is that close that what you were looking for?
> > > >
> > > > Best,
> > > > Andrea
> > > >
> > > >
> > > >
> > > > 2015-12-06 11:53 GMT+01:00 Salman Ansari :
> > > >
> > > >> Thanks a lot Andrea. It did work.
> > > >>
> > > >> However, just for my understanding, can you please explain more how
> > did
> > > >> you
> > > >> make it work for prefixes. I know you mentioned using another
> > Tokenizer
> > > >> but
> > > >> for example, if I want to tweak it later on to work on suffixes or
> > > within
> > > >> phrases how should I go about that?
> > > >>
> > > >> Thanks again for your help.
> > > >>
> > > >> Regards,
> > > >> Salman
> > > >>
> > > >>
> > > >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini <
> > a.gazzar...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hi Salman,
> > > >> > that's because you're using a StandardTokenizer. Try with
> something
> > > like
> > > >> > this (copied, pasted and changed using my phone so probably with a
> > lot
> > > >> of
> > > >> > mistakes ;) but you should be able to get what I mean). BTW I
> don't
> > > >> know if
> > > >> > that's the case but I would also put a MappingCharFilterFactory
> > > >> >
> > > >> >  > > >> > positionIncrementGap="100">
> > > >> > 
> > > >> > * > > >> > mapping="mapping-FoldToASCII.txt"/>*
> > > >> > 
> > > >> > 
> > > >> >  > > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1"
> > > >> > splitOnCaseChange="0" />
> > > >> >  > > >> > maxGramSize="20"/>
> > > >> > 
> > > >> > 
> > > >> > * > > >> > mapping="mapping-FoldToASCII.txt"/>*
> > > >> > 
> > > >> > 
> > > >> >  > > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1"
> > > >> > splitOnCaseChange="0" />
> > > >> > 
> > > >> > 
> > > >> >
> > > >> >
> > > >> > 2015-12-06 9:36 GMT+01:00 Salman Ansari  >:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > I have updated my schema.xml as mentioned in the previous posts
> > > using
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >  > > >> > > positionIncrementGap="100">
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >  > > >> minGramSize="1"
> > > >> > > maxGramSize="20"/>
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > This does the auto-complete, but it does it at every portion of
> > the
> > > >> text
> > > >> > > (not just at the beginning) (prefix). So searching for "And" in
> my
> > > >> field
> > > >> > > for locations returns both of the following documents.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > 
> > > >> > >
> > > >> > > 1
> > > >> > >
> > > >> > > AD
> > > >> > >
> > > >> > > *And*orra
> > > >> > >
> > > >> > > أندورا
> > > >> > >
> > > 

question: partialResults true with pagination

2015-12-08 Thread Vibhor Goel
hey,

i am using single standalone solr instance. Some of my queries are taking
long time due to large number of result documents. I am using timeout
option and it returns me partial results.

i found out partial results returns a random set of resultant documents, in
sorted order.

My query is when i am using solr with pagination, I am making calls with
same query again and again. how to ensure that i wont get repetition in the
documents, as different queries return different number of resultant
documents.

And with filter cache and document cache will the situation get worse.

How can i have a work around with this problem.

-- 
Thanks and regards
Vibhor Goel


Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread Emir Arnautovic

Hi Phillippa,
My guess would be that you are running some heavy queries (faceting/deep 
paging/large pages) or have high query load (can you give bit details 
about load) or have misconfigured caches. Do you query entire index or 
you have query routing?


You have big machine and might consider running two Solr on each node 
(with smaller heap) and split shards so queries can be more 
parallelized, resources better utilized, and smaller heap to GC.


Regards,
Emir

On 08.12.2015 10:49, philippa griggs wrote:

Hello Erick,

Thanks for your reply.

We have one collection and are writing documents to that collection all the 
time- it peaks at around 2,500 per minute and dips to 250 per minute,  the size 
of the document varies. On each node we have around 55,000,000 documents with a 
data size of 43G located on a drive of 200G.

Each node has 122G memory, the heap size is currently set at 45G although we 
have plans to increase this to 50G.

The heap settings we are using are:

  -XX: +UseG1GC,
-XX:+ParallelRefProcEnabled.

Please let me know if you need any more information.

Philippa

From: Erick Erickson 
Sent: 07 December 2015 16:53
To: solr-user
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Tell us a bit more.

Are you adding documents to your collections or adding more
collections? Solr is a balancing act between the number of docs you
have on each node and the memory you have allocated. If you're
continually adding docs to Solr, you'll eventually run out of memory
and/or hit big GC pauses.

How much memory are you allocating to Solr? How much physical memory
to you have? etc.

Best,
Erick


On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
 wrote:

Hello,


I'm using:


Solr 5.2.1 10 shards each with a replica. (20 nodes in total)


Zookeeper 3.4.6.


About half a year ago we upgraded to Solr 5.2.1 and since then have been 
experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
will go down. Sometimes they will recover by themselves but more often than not 
we have to step in to restart nodes.


Nothing in the logs jumps out as being the problem. With the latest wipe out we 
noticed that 10 out of the 20 nodes had garbage collections over 1min all at 
the same time, with the heap usage spiking up in some cases to 80%. We also 
noticed the amount of selects run on the solr cluster increased just before the 
wipe out.


Increasing the heap size seems to help for a while but then it starts happening 
again- so its more like a delay than a fix. Our GC settings are set to -XX: 
+UseG1GC, -XX:+ParallelRefProcEnabled.


With our previous version of solr (4.10.0) this didn't happen. We had 
nodes/shards go down but it was contained, with the new version they all seem 
to go at around the same time. We can't really continue just increasing the 
heap size and would like to solve this issue rather than delay it.


Has anyone experienced something simular?

Is there a difference between the two versions around the recovery process?

Does anyone have any suggestions on a fix.


Many thanks


Philippa

>


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



capacity of storage a single core

2015-12-08 Thread Mugeesh Husain

Capacity regarding 2 simple question:

1.) How many document we could store in single core(capacity of core
storage)
2.) How many core we could create in a single server(single node cluster)


Thanks,
Mugeesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/capacity-of-storage-a-single-core-tp4244197.html
Sent from the Solr - User mailing list archive at Nabble.com.


Issue with Querying Solr

2015-12-08 Thread Salman Ansari
Hi,

I have created a cluster of Solr and Zookeepers on 3 machines connected
together. Currently, I am facing a weird problem. My collection has only
261 documents and when I try to query the documents using the browser such
as

http://
[ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)

it returns the documents properly. However, when I try to do the same using
Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
(although I have very few documents there). Any ideas why I am getting this
error?

Regards,
Salman


Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Ali Nazemian
The indexing load is as follows:
- Around 1000 documents every 5 mins.
- The indexing speed is slow because of the complicated analyzer which is
applied to each document. It takes around 60 seconds to index 1000
documents with applying this analyzer (It is really slow. However, based on
the analyzing part I think it would be acceptable).
- The concurrentsolrclient is used in all the indexing/updating cases.

Regards.

On Tue, Dec 8, 2015 at 6:36 PM, Ali Nazemian  wrote:

> Dear Emir,
> Hi,
> There are some cases that I have soft commit in my application. However,
> the bulk update part has only hard commit for a bulk of 2500 documents.
> Here are some information about the whole indexing/updating scenarios:
> - Indexing part uses soft commit.
> - In a single update cases soft commit is used.
> - For bulk update batch hard commit is used (on 2500 documents)
> - Auto hard commit :120 sec
> - Auto soft commit: disable
>
> Best regards.
>
>
> On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Ali,
>> This thread is blocked because cannot obtain update lock - in this
>> particular case when doing soft commit. I am guessing that there others are
>> blocked for the same reason. Can you tell us bit more about your setup and
>> indexing load and procedure? Do you do explicit commits?
>>
>> Regards,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On 08.12.2015 08:16, Ali Nazemian wrote:
>>
>>> Hi,
>>> There is a while since I have had problem with Solr 5.2.1 and I could not
>>> fix it yet. The only think that is clear to me is when I send bulk update
>>> to Solr the commit thread will be blocked! Here is the thread dump
>>> output:
>>>
>>> "qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting for
>>> monitor entry [0x7f081cf04000]
>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>> at
>>>
>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
>>> - waiting to lock <0x00067ba2e660> (a java.lang.Object)
>>> at
>>>
>>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
>>> at
>>>
>>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>> at
>>>
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
>>> at
>>>
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
>>> at
>>>
>>> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
>>> at
>>>
>>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>> at
>>>
>>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>> at
>>>
>>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
>>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
>>> at
>>>
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>>> at
>>>
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>> at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>>> at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>>> at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>>> at
>>>
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>> at
>>>
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>> at
>>>
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>> at
>>>
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>> at
>>>
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>>> at
>>>
>>> 

Re: capacity of storage a single core

2015-12-08 Thread Toke Eskildsen
On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote:
> Capacity regarding 2 simple question:
> 
> 1.) How many document we could store in single core(capacity of core
> storage)

There is hard limit of 2 billion documents.

> 2.) How many core we could create in a single server(single node cluster)

There is no hard limit. Except for 2 billion cores, I guess. But at this
point in time that is a ridiculously high number of cores.

It is hard to give a suggestion for real-world limits as indexes vary a
lot and the rules of thumb tend to be quite poor when scaling up.
http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

People generally seems to run into problems with more than 1000
not-too-large cores. If the cores are large, there will probably be
performance problems long before that.

You will have to build a prototype and test.

- Toke Eskildsen, State and University Library, Denmark




Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread Emir Arnautovic

Hi Philippa,
It's more likely that this is related to index size/content + queries 
than to Solr version. Did you experience issues immediately after upgrade?


Check slow queries log and see if there are some extremely slow queries. 
Check cache sizes and calculate how much they take. Increasing heap size 
is not likely to help - it might postpone issue but will be harder when 
it hits.


Thanks,
Emir

On 08.12.2015 13:17, philippa griggs wrote:

Hello Emir,

The query load is around 35 requests per min on each shard, we don't document 
route so we query the entire index.

We do have some heavy queries like faceting and its possible that a heavy 
queries is causing the nodes to go down- we are looking into this.  I'm new to 
solr so this could be a slightly stupid question but would a heavy query cause 
most of the nodes to go down? This didn't happen with the previous solr version 
we were using Solr 4.10.0, we did have nodes/shards which went down but there 
wasn't wipe out effect where most of the nodes go.

Many thanks

Philippa


From: Emir Arnautovic 
Sent: 08 December 2015 10:38
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Hi Phillippa,
My guess would be that you are running some heavy queries (faceting/deep
paging/large pages) or have high query load (can you give bit details
about load) or have misconfigured caches. Do you query entire index or
you have query routing?

You have big machine and might consider running two Solr on each node
(with smaller heap) and split shards so queries can be more
parallelized, resources better utilized, and smaller heap to GC.

Regards,
Emir

On 08.12.2015 10:49, philippa griggs wrote:

Hello Erick,

Thanks for your reply.

We have one collection and are writing documents to that collection all the 
time- it peaks at around 2,500 per minute and dips to 250 per minute,  the size 
of the document varies. On each node we have around 55,000,000 documents with a 
data size of 43G located on a drive of 200G.

Each node has 122G memory, the heap size is currently set at 45G although we 
have plans to increase this to 50G.

The heap settings we are using are:

   -XX: +UseG1GC,
-XX:+ParallelRefProcEnabled.

Please let me know if you need any more information.

Philippa

From: Erick Erickson 
Sent: 07 December 2015 16:53
To: solr-user
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Tell us a bit more.

Are you adding documents to your collections or adding more
collections? Solr is a balancing act between the number of docs you
have on each node and the memory you have allocated. If you're
continually adding docs to Solr, you'll eventually run out of memory
and/or hit big GC pauses.

How much memory are you allocating to Solr? How much physical memory
to you have? etc.

Best,
Erick


On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
 wrote:

Hello,


I'm using:


Solr 5.2.1 10 shards each with a replica. (20 nodes in total)


Zookeeper 3.4.6.


About half a year ago we upgraded to Solr 5.2.1 and since then have been 
experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
will go down. Sometimes they will recover by themselves but more often than not 
we have to step in to restart nodes.


Nothing in the logs jumps out as being the problem. With the latest wipe out we 
noticed that 10 out of the 20 nodes had garbage collections over 1min all at 
the same time, with the heap usage spiking up in some cases to 80%. We also 
noticed the amount of selects run on the solr cluster increased just before the 
wipe out.


Increasing the heap size seems to help for a while but then it starts happening 
again- so its more like a delay than a fix. Our GC settings are set to -XX: 
+UseG1GC, -XX:+ParallelRefProcEnabled.


With our previous version of solr (4.10.0) this didn't happen. We had 
nodes/shards go down but it was contained, with the new version they all seem 
to go at around the same time. We can't really continue just increasing the 
heap size and would like to solve this issue rather than delay it.


Has anyone experienced something simular?

Is there a difference between the two versions around the recovery process?

Does anyone have any suggestions on a fix.


Many thanks


Philippa


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Ali Nazemian
Dear Emir,
Hi,
There are some cases that I have soft commit in my application. However,
the bulk update part has only hard commit for a bulk of 2500 documents.
Here are some information about the whole indexing/updating scenarios:
- Indexing part uses soft commit.
- In a single update cases soft commit is used.
- For bulk update batch hard commit is used (on 2500 documents)
- Auto hard commit :120 sec
- Auto soft commit: disable

Best regards.


On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Ali,
> This thread is blocked because cannot obtain update lock - in this
> particular case when doing soft commit. I am guessing that there others are
> blocked for the same reason. Can you tell us bit more about your setup and
> indexing load and procedure? Do you do explicit commits?
>
> Regards,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.12.2015 08:16, Ali Nazemian wrote:
>
>> Hi,
>> There is a while since I have had problem with Solr 5.2.1 and I could not
>> fix it yet. The only think that is clear to me is when I send bulk update
>> to Solr the commit thread will be blocked! Here is the thread dump output:
>>
>> "qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting for
>> monitor entry [0x7f081cf04000]
>> java.lang.Thread.State: BLOCKED (on object monitor)
>> at
>>
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
>> - waiting to lock <0x00067ba2e660> (a java.lang.Object)
>> at
>>
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
>> at
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
>> at
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
>> at
>>
>> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
>> at
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
>> at
>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>> at
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>> at
>>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>> at
>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>> at
>>
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>> at
>>
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>> at
>>
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>> at
>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>> at
>>
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>> at
>>
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>> at org.eclipse.jetty.server.Server.handle(Server.java:497)
>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>> at
>>
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>> at
>> org.eclipse.jetty.io
>> .AbstractConnection$2.run(AbstractConnection.java:540)
>> at
>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>> at
>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>> at 

Re: question: partialResults true with pagination

2015-12-08 Thread Toke Eskildsen
On Tue, 2015-12-08 at 18:42 +0530, Vibhor Goel wrote:
> i am using single standalone solr instance. Some of my queries are taking
> long time due to large number of result documents. I am using timeout
> option and it returns me partial results.

Don't request a large number of documents at a time. The exact size
depends on your setup: If they are small documents you can probably get
away with tens of thousands. Asking for millions is a recipe for poor
performance.

> My query is when i am using solr with pagination, [...]

Are you using cursorMark? You should do so, to avoid progressively worse
performance as you page deeper.

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

- Toke Eskildsen, State and University Library, Denmark




Re: Issue with Querying Solr

2015-12-08 Thread Andrea Gazzarini
I would set up logging in the admin console (queries should be logged out
by default), and then check the difference between the two invocations.
I don't believe the two clients are doing the *same* thing, there should be
some difference.

Another chance could be that the OOM is just behind the corner and the
Solr.NET invocation, the last invoker, is not the true responsible.

Andrea


2015-12-08 13:52 GMT+01:00 Salman Ansari :

> Hi,
>
> I have created a cluster of Solr and Zookeepers on 3 machines connected
> together. Currently, I am facing a weird problem. My collection has only
> 261 documents and when I try to query the documents using the browser such
> as
>
> http://
> [ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)
>
> it returns the documents properly. However, when I try to do the same using
> Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
> (although I have very few documents there). Any ideas why I am getting this
> error?
>
> Regards,
> Salman
>


Re: Issue with Querying Solr

2015-12-08 Thread Alexandre Rafalovitch
Solr by default only returns 10 rows. SolrNet by default returns many
rows. I don't know why that would cause OOM, but that's definitely
your difference unless you dealt with it:
https://github.com/mausch/SolrNet/blob/master/Documentation/Querying.md#pagination

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 8 December 2015 at 07:52, Salman Ansari  wrote:
> Hi,
>
> I have created a cluster of Solr and Zookeepers on 3 machines connected
> together. Currently, I am facing a weird problem. My collection has only
> 261 documents and when I try to query the documents using the browser such
> as
>
> http://
> [ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)
>
> it returns the documents properly. However, when I try to do the same using
> Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
> (although I have very few documents there). Any ideas why I am getting this
> error?
>
> Regards,
> Salman


Re: Issue with Querying Solr

2015-12-08 Thread Salman Ansari
Thanks Andrea and Alexandre for your responses. Indeed it was the problem
that Solr.NET was returning many rows (as I captured this by fiddler).
Currently, my setup has only 500MB of JVM (which I will definitely
increase) but at least I found the culprit by reducing the number of rows
returned.

Regards,
Salman

On Tue, Dec 8, 2015 at 5:30 PM, Alexandre Rafalovitch 
wrote:

> Solr by default only returns 10 rows. SolrNet by default returns many
> rows. I don't know why that would cause OOM, but that's definitely
> your difference unless you dealt with it:
>
> https://github.com/mausch/SolrNet/blob/master/Documentation/Querying.md#pagination
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 8 December 2015 at 07:52, Salman Ansari 
> wrote:
> > Hi,
> >
> > I have created a cluster of Solr and Zookeepers on 3 machines connected
> > together. Currently, I am facing a weird problem. My collection has only
> > 261 documents and when I try to query the documents using the browser
> such
> > as
> >
> > http://
> > [ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)
> >
> > it returns the documents properly. However, when I try to do the same
> using
> > Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
> > (although I have very few documents there). Any ideas why I am getting
> this
> > error?
> >
> > Regards,
> > Salman
>


Re: Issue with Querying Solr

2015-12-08 Thread Don Bosco Durai
You only have 261 documents. That shouldn't be a problem, unless your document 
size is huge.
I feel, the problem still exists somewhere. You have just deferred it...
Bosco






On Tue, Dec 8, 2015 at 6:48 AM -0800, "Salman Ansari"  
wrote:










Thanks Andrea and Alexandre for your responses. Indeed it was the problem
that Solr.NET was returning many rows (as I captured this by fiddler).
Currently, my setup has only 500MB of JVM (which I will definitely
increase) but at least I found the culprit by reducing the number of rows
returned.

Regards,
Salman

On Tue, Dec 8, 2015 at 5:30 PM, Alexandre Rafalovitch 
wrote:

> Solr by default only returns 10 rows. SolrNet by default returns many
> rows. I don't know why that would cause OOM, but that's definitely
> your difference unless you dealt with it:
>
> https://github.com/mausch/SolrNet/blob/master/Documentation/Querying.md#pagination
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 8 December 2015 at 07:52, Salman Ansari 
> wrote:
> > Hi,
> >
> > I have created a cluster of Solr and Zookeepers on 3 machines connected
> > together. Currently, I am facing a weird problem. My collection has only
> > 261 documents and when I try to query the documents using the browser
> such
> > as
> >
> > http://
> > [ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)
> >
> > it returns the documents properly. However, when I try to do the same
> using
> > Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
> > (although I have very few documents there). Any ideas why I am getting
> this
> > error?
> >
> > Regards,
> > Salman
>







Re: capacity of storage a single core

2015-12-08 Thread Mugeesh Husain
Thanks Toke Eskildsen,

Actually i need to join on my core, that why i am going to solrlcoud(join
does not support in solrlcoud)

Is there any alternate way to doing it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/capacity-of-storage-a-single-core-tp4244197p4244248.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting large documents

2015-12-08 Thread Scott Stults
There are two things going on that you should be aware of. The first is,
Solr Highlighting is mainly concerned about putting a representative
snippet in a results listing. There are a couple of configuration changes
you need to do if you want to highlight a whole document, like setting the
fragListBuilder to SingleFragListBuilder and the maxAnalyzedChars setting
you've already mentioned:

https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize

Because full document highlighting is so different from highlighting
snippets in a result list you'll want to configure two different
highlighters: One for snippets and one for the full document.

The other thing you need to know is that performance in highlighting is an
active area of development. Right now the top docs in the current result
list are calculated completely separate from the snippets (highlighting),
which can lead to problems when the most relevant snippets are later in the
document.

What most people do is compromise by making the result list fast but
inaccurate, and having the full-document highlight be accurate but slower.


Hope that helps,
-Scott


On Fri, Dec 4, 2015 at 11:12 AM, Andrea Gazzarini 
wrote:

> No no, sorry, the project is not yet started so I didn't experience your
> issue, but I'll be a careful listener of this thread
>
> Best,
> Andrea
>
> 2015-12-04 17:04 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Hi Andrea,
> >
> > I'm using the original highlighter.
> >
> > Below is my configuration for the highlighter in solrconfig.xml
> >
> >   
> >
> >explicit
> >10
> >json
> >true
> >   text
> >   id, title, content_type, last_modified, url, score
> 
> >
> >   on
> >id, title, content, author 
> >   true
> >true
> >html
> >   200
> >   100
> >
> > true
> > signature
> > true
> > 100
> >   
> >   
> >
> >
> > Have you managed to solve the problem?
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 December 2015 at 23:54, Andrea Gazzarini 
> > wrote:
> >
> > > Hi Zheng,
> > > just curiousity, because shortly I will have to deal with a similar
> > > scenario (Solr 5.3.1 + large documents + highlighting).
> > > Which highlighter are you using?
> > >
> > > Andrea
> > >
> > > 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo :
> > >
> > > > Hi,
> > > >
> > > > I'm using Solr 5.3.0
> > > >
> > > > I found that in large documents, sometimes I face situation that
> when I
> > > do
> > > > a highlight query, the resultset that is returned does not contain
> the
> > > > highlighted query. There are actually matches in the documents, but
> > just
> > > > that they located further back in the documents.
> > > >
> > > > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > > > default value is 51200, and I have documents that are much larger
> than
> > > > 51200 characters. Although this method works, but, when I increase
> this
> > > > value, the performance of the search and highlight drops. It can drop
> > > from
> > > > less than 0.5 seconds to more than 10 seconds.
> > > >
> > > > Would like to check, is this method of increasing the value of the
> > > > hl.maxAnalyzedChars the best method to use, or is there other ways
> > which
> > > > can solve the same purpose, but without affecting the performance
> much?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> >
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: capacity of storage a single core

2015-12-08 Thread Jack Krupansky
Generally, you will be resource limited (memory, cpu) rather than by some
arbitrary numeric limit (like 2 billion.)

My personal general recommendation is for a practical limit is 100 million
documents on a machine/node. Depending on your data model and actual data
that number could be higher or lower. A proof of concept test will allow
you to determine the actual number for your particular use case, but a
presumed limit of 100 million is not a bad start.

You should have enough memory to hold the entire index in system memory. If
not, your query latency will suffer due to I/O required to constantly
re-read portions of the index into memory.

The practical limit for documents is not per core or number of cores but
across all cores on the node since it is mostly a memory limit and the
available CPU resources for accessing that memory.

-- Jack Krupansky

On Tue, Dec 8, 2015 at 8:57 AM, Toke Eskildsen 
wrote:

> On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote:
> > Capacity regarding 2 simple question:
> >
> > 1.) How many document we could store in single core(capacity of core
> > storage)
>
> There is hard limit of 2 billion documents.
>
> > 2.) How many core we could create in a single server(single node cluster)
>
> There is no hard limit. Except for 2 billion cores, I guess. But at this
> point in time that is a ridiculously high number of cores.
>
> It is hard to give a suggestion for real-world limits as indexes vary a
> lot and the rules of thumb tend to be quite poor when scaling up.
>
> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> People generally seems to run into problems with more than 1000
> not-too-large cores. If the cores are large, there will probably be
> performance problems long before that.
>
> You will have to build a prototype and test.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Emir Arnautovic

Hi Ali,
Can you try without explicit commits and see if threads will still be 
blocked.


Thanks,
Emir

On 08.12.2015 16:19, Ali Nazemian wrote:

The indexing load is as follows:
- Around 1000 documents every 5 mins.
- The indexing speed is slow because of the complicated analyzer which is
applied to each document. It takes around 60 seconds to index 1000
documents with applying this analyzer (It is really slow. However, based on
the analyzing part I think it would be acceptable).
- The concurrentsolrclient is used in all the indexing/updating cases.

Regards.

On Tue, Dec 8, 2015 at 6:36 PM, Ali Nazemian  wrote:


Dear Emir,
Hi,
There are some cases that I have soft commit in my application. However,
the bulk update part has only hard commit for a bulk of 2500 documents.
Here are some information about the whole indexing/updating scenarios:
- Indexing part uses soft commit.
- In a single update cases soft commit is used.
- For bulk update batch hard commit is used (on 2500 documents)
- Auto hard commit :120 sec
- Auto soft commit: disable

Best regards.


On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Ali,
This thread is blocked because cannot obtain update lock - in this
particular case when doing soft commit. I am guessing that there others are
blocked for the same reason. Can you tell us bit more about your setup and
indexing load and procedure? Do you do explicit commits?

Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.12.2015 08:16, Ali Nazemian wrote:


Hi,
There is a while since I have had problem with Solr 5.2.1 and I could not
fix it yet. The only think that is clear to me is when I send bulk update
to Solr the commit thread will be blocked! Here is the thread dump
output:

"qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting for
monitor entry [0x7f081cf04000]
 java.lang.Thread.State: BLOCKED (on object monitor)
at

org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
- waiting to lock <0x00067ba2e660> (a java.lang.Object)
at

org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at

org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at

org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at

org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at

org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at

org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at

org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at

org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at

org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at

org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at

org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at

org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at

org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at

org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at

org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at

org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at

org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at

org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)

Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread Erick Erickson
Phillippa:
You simply cannot continue adding documents, increasing memory, adding
more documents, increasing memory forever, if for no other reason than
you'll eventually hit such large GC pauses that your query performance
will suffer greatly.

I'd _strongly_ advise you to pick a number of docs (let's say 50M, but
you could make it smaller or larger, up to you) as the maximum number
of docs you can put in a shard, then create enough shards to
accommodate your eventual total corpus. This may mean "oversharding",
where you host multiple shards in the same JVM and then move them to
new hardware as your doc load on any particular JVM exceeds 50M (i.e.
say 10M docs on each of 5 nodes).

IMO, though, the path you're on is untenable in the long run. You
either have to plan for total capacity or prune your corpus.

Best,
Erick

On Tue, Dec 8, 2015 at 6:06 AM, Emir Arnautovic
 wrote:
> Hi Philippa,
> It's more likely that this is related to index size/content + queries than
> to Solr version. Did you experience issues immediately after upgrade?
>
> Check slow queries log and see if there are some extremely slow queries.
> Check cache sizes and calculate how much they take. Increasing heap size is
> not likely to help - it might postpone issue but will be harder when it
> hits.
>
> Thanks,
> Emir
>
>
> On 08.12.2015 13:17, philippa griggs wrote:
>>
>> Hello Emir,
>>
>> The query load is around 35 requests per min on each shard, we don't
>> document route so we query the entire index.
>>
>> We do have some heavy queries like faceting and its possible that a heavy
>> queries is causing the nodes to go down- we are looking into this.  I'm new
>> to solr so this could be a slightly stupid question but would a heavy query
>> cause most of the nodes to go down? This didn't happen with the previous
>> solr version we were using Solr 4.10.0, we did have nodes/shards which went
>> down but there wasn't wipe out effect where most of the nodes go.
>>
>> Many thanks
>>
>> Philippa
>>
>> 
>> From: Emir Arnautovic 
>> Sent: 08 December 2015 10:38
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>>
>> Hi Phillippa,
>> My guess would be that you are running some heavy queries (faceting/deep
>> paging/large pages) or have high query load (can you give bit details
>> about load) or have misconfigured caches. Do you query entire index or
>> you have query routing?
>>
>> You have big machine and might consider running two Solr on each node
>> (with smaller heap) and split shards so queries can be more
>> parallelized, resources better utilized, and smaller heap to GC.
>>
>> Regards,
>> Emir
>>
>> On 08.12.2015 10:49, philippa griggs wrote:
>>>
>>> Hello Erick,
>>>
>>> Thanks for your reply.
>>>
>>> We have one collection and are writing documents to that collection all
>>> the time- it peaks at around 2,500 per minute and dips to 250 per minute,
>>> the size of the document varies. On each node we have around 55,000,000
>>> documents with a data size of 43G located on a drive of 200G.
>>>
>>> Each node has 122G memory, the heap size is currently set at 45G although
>>> we have plans to increase this to 50G.
>>>
>>> The heap settings we are using are:
>>>
>>>-XX: +UseG1GC,
>>> -XX:+ParallelRefProcEnabled.
>>>
>>> Please let me know if you need any more information.
>>>
>>> Philippa
>>> 
>>> From: Erick Erickson 
>>> Sent: 07 December 2015 16:53
>>> To: solr-user
>>> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>>>
>>> Tell us a bit more.
>>>
>>> Are you adding documents to your collections or adding more
>>> collections? Solr is a balancing act between the number of docs you
>>> have on each node and the memory you have allocated. If you're
>>> continually adding docs to Solr, you'll eventually run out of memory
>>> and/or hit big GC pauses.
>>>
>>> How much memory are you allocating to Solr? How much physical memory
>>> to you have? etc.
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
>>>  wrote:

 Hello,


 I'm using:


 Solr 5.2.1 10 shards each with a replica. (20 nodes in total)


 Zookeeper 3.4.6.


 About half a year ago we upgraded to Solr 5.2.1 and since then have been
 experiencing a 'wipe out' effect where all of a sudden most if not all 
 nodes
 will go down. Sometimes they will recover by themselves but more often than
 not we have to step in to restart nodes.


 Nothing in the logs jumps out as being the problem. With the latest wipe
 out we noticed that 10 out of the 20 nodes had garbage collections over 
 1min
 all at the same time, with the heap usage spiking up in some cases to 80%.
 We also noticed 

Re: secure solr 5.3.1

2015-12-08 Thread Don Bosco Durai
Not sure exactly what you mean here. Even if you are running in SolrCloud, you 
can access it using URL. So there won't be any change on the client side.
Bosco






On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" 
 wrote:










if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
hosting? or dont need a web server having cloud..?

2015-12-08 1:58 GMT+00:00 Don Bosco Durai :

> Have you considered running your Solr as SolrCloud with embedded zookeeper?
>
> If you do, you have multiple options. Basic Auth, Kerberos and
> authorization support.
>
>
> Bosco
>
>
>
>
>
> On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
>
> >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> >searching for the best way to secure my server solr but I found only for
> >cloud mode.
>
>







Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-08 Thread Debraj Manna
Can someone help me on this?
On Dec 7, 2015 7:55 PM, "D"  wrote:

> Hi,
>
> Many time while starting solr I see the below message and then the solr is
> not reachable.
>
> debraj@boutique3:~/solr5$ sudo bin/solr start -p 8789
> Waiting to see Solr listening on port 8789 [-]  Still not seeing Solr 
> listening on 8789 after 30 seconds!
>
> However when I try to start solr again by trying to execute the same
> command. It says that *"solr is already running on port 8789. Try using a
> different port with -p"*
>
> I am having two cores in my local set-up. I am guessing this is happening
> because one of the core is a little big. So solr is timing out while
> loading the core. If I take one of the core out of solr then everything
> works fine.
>
> Can some one let me know how can I increase this timeout value from
> default 30 seconds?
>
> I am using Solr 5.2.1 on Debian 7.
>
> Thanks,
>
>


Re: secure solr 5.3.1

2015-12-08 Thread kostali hassan
   - Kerberos authentication
   
:
   work in SolrCloud or standalone mode but the documentation is not clear
   -
   
https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746


2015-12-08 17:14 GMT+00:00 Don Bosco Durai :

> Not sure exactly what you mean here. Even if you are running in SolrCloud,
> you can access it using URL. So there won't be any change on the client
> side.
> Bosco
>
>
>
>
>
>
> On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> med.has.kost...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
> if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
> hosting? or dont need a web server having cloud..?
>
> 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
>
> > Have you considered running your Solr as SolrCloud with embedded
> zookeeper?
> >
> > If you do, you have multiple options. Basic Auth, Kerberos and
> > authorization support.
> >
> >
> > Bosco
> >
> >
> >
> >
> >
> > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> >
> > >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> > >searching for the best way to secure my server solr but I found only for
> > >cloud mode.
> >
> >
>
>
>
>
>
>


Re: secure solr 5.3.1

2015-12-08 Thread Don Bosco Durai
It was tested and meant to work only in SolrCloud mode.






On Tue, Dec 8, 2015 at 9:30 AM -0800, "kostali hassan" 
 wrote:










   - Kerberos authentication
   :
   work in SolrCloud or standalone mode but the documentation is not clear
   -
   
https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746


2015-12-08 17:14 GMT+00:00 Don Bosco Durai :

> Not sure exactly what you mean here. Even if you are running in SolrCloud,
> you can access it using URL. So there won't be any change on the client
> side.
> Bosco
>
>
>
>
>
>
> On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> med.has.kost...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
> if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
> hosting? or dont need a web server having cloud..?
>
> 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
>
> > Have you considered running your Solr as SolrCloud with embedded
> zookeeper?
> >
> > If you do, you have multiple options. Basic Auth, Kerberos and
> > authorization support.
> >
> >
> > Bosco
> >
> >
> >
> >
> >
> > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> >
> > >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> > >searching for the best way to secure my server solr but I found only for
> > >cloud mode.
> >
> >
>
>
>
>
>
>







Re: capacity of storage a single core

2015-12-08 Thread Upayavira
I understood that on later Solrs, those join issues have been
(partially) resolved. So long as your joined-to collection is replicated
across every box, you should be good. 

Upayavira

On Tue, Dec 8, 2015, at 04:17 PM, Mugeesh Husain wrote:
> Thanks Toke Eskildsen,
> 
> Actually i need to join on my core, that why i am going to solrlcoud(join
> does not support in solrlcoud)
> 
> Is there any alternate way to doing it ?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/capacity-of-storage-a-single-core-tp4244197p4244248.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Ali Nazemian
I did that already. The situation was worse. The autocommit part makes solr
unavailable.
On Dec 8, 2015 7:13 PM, "Emir Arnautovic" 
wrote:

> Hi Ali,
> Can you try without explicit commits and see if threads will still be
> blocked.
>
> Thanks,
> Emir
>
> On 08.12.2015 16:19, Ali Nazemian wrote:
>
>> The indexing load is as follows:
>> - Around 1000 documents every 5 mins.
>> - The indexing speed is slow because of the complicated analyzer which is
>> applied to each document. It takes around 60 seconds to index 1000
>> documents with applying this analyzer (It is really slow. However, based
>> on
>> the analyzing part I think it would be acceptable).
>> - The concurrentsolrclient is used in all the indexing/updating cases.
>>
>> Regards.
>>
>> On Tue, Dec 8, 2015 at 6:36 PM, Ali Nazemian 
>> wrote:
>>
>> Dear Emir,
>>> Hi,
>>> There are some cases that I have soft commit in my application. However,
>>> the bulk update part has only hard commit for a bulk of 2500 documents.
>>> Here are some information about the whole indexing/updating scenarios:
>>> - Indexing part uses soft commit.
>>> - In a single update cases soft commit is used.
>>> - For bulk update batch hard commit is used (on 2500 documents)
>>> - Auto hard commit :120 sec
>>> - Auto soft commit: disable
>>>
>>> Best regards.
>>>
>>>
>>> On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
>>> emir.arnauto...@sematext.com> wrote:
>>>
>>> Hi Ali,
 This thread is blocked because cannot obtain update lock - in this
 particular case when doing soft commit. I am guessing that there others
 are
 blocked for the same reason. Can you tell us bit more about your setup
 and
 indexing load and procedure? Do you do explicit commits?

 Regards,
 Emir

 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr & Elasticsearch Support * http://sematext.com/



 On 08.12.2015 08:16, Ali Nazemian wrote:

 Hi,
> There is a while since I have had problem with Solr 5.2.1 and I could
> not
> fix it yet. The only think that is clear to me is when I send bulk
> update
> to Solr the commit thread will be blocked! Here is the thread dump
> output:
>
> "qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting
> for
> monitor entry [0x7f081cf04000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
> at
>
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
> - waiting to lock <0x00067ba2e660> (a java.lang.Object)
> at
>
>
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at
>
>
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at
>
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
> at
>
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
> at
>
>
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
> at
>
>
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at
>
>
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at
>
>
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
> at
>
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
> at
>
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
>
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
>
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
>
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
>
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
>
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
>
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
>
>

solrconfig.xml - configuration scope

2015-12-08 Thread Fitzpatrick, Adrian
Hi,

This is probably a very basic question that has been asked many times before - 
apologies in advance if so!

I'm looking to validate whether something I **think** I have observed when 
using Solr is a known behaviour:

>From my read of the docs etc. it was my understanding that solrconfig.xml was 
>the configuration for a core, and that if I had multiple cores in my Solr 
>server, each would have their own version of that file with their own 
>settings. However, in practice, when working with such a multiple core setup, 
>what I have observed suggests that some (perhaps many?) of the settings within 
>solrconfig.xml can have a system-wide impact. I.e. I change a setting in core 
>A and I see behaviour in other cores B,C which suggests they are obeying the 
>changed value from the core A rather than the setting value from their own 
>copy of solrconfig.xml

So, as I said, main question is this known/expected behaviour, or am I 
imagining things! If the former, is there any documentation etc. that provides 
any clarification around how the configuration scope operates?

Thanks,

Adrian

Please note that Revenue cannot guarantee that any personal and sensitive data, 
sent in plain text via standard email, is fully secure. Customers who choose to 
use this channel are deemed to have accepted any risk involved. The alternative 
communication methods offered by Revenue include standard post and the option 
to use our (encrypted) MyEnquiries service which is available within myAccount 
and ROS. You can register for either myAccount or ROS on the Revenue website.

Tabhair faoi deara nach fidir leis na Coimisinir 
Ioncaim rthaocht a thabhairt go bhfuil aon sonra 
pearsanta agus ogair a gcuirtear isteach i ngnth-thacs 
tr r-phost caighdenach go huile is go hiomln 
sln. Meastar go nglacann custaimir a 
sideann an cainal seo le haon riosca bainteach. I measc 
na modhanna cumarside eile at ag na Coimisinir 
n post caighdenach agus an rogha r seirbhs 
(criptithe) M'Fhiosruithe a sid, t s ar 
fil laistigh de MoChrsa agus ROS. Is fidir leat 
clr le haghaidh ceachtar MoChrsa n ROS 
ar shuomh grasin na gCoimisinir.


fuzzy searches and EDISMAX

2015-12-08 Thread Felley, James
I am trying to build an edismax search handler that will allow a fuzzy search, 
using the "query fields" property (qf).

I have two instances of SOLR 4.8.1, one of which has edismax "qf" configured 
with no fuzzy search
...
ns_name^3.0  i_topic^3.0  i_object_type^3.0

...
And the other with a fuzzy search for ns_name (non-stemmed name)
ns_name~1^3.0  i_topic^3.0  i_object_type^3.0

...

The index of both includes a record with an ns_name of 'Johnson'

I get no return in either instance with the query
q=Johnso

I get the Johnson record returned in both instances with a query of
q=Johnso~1

The SOLR documentation seems silent on incorporating fuzzy searches in the 
query fields.  I have seen various posts on Google that suggest that 'qf' will 
accept fuzzy search declarations, other posts suggest only the query itself 
will allow fuzzy searches (as seems to be the case for me).

Any guidance will be much appreciated

Jim

Jim Felley
OCIO
Smithsonian Institution
fell...@si.edu






Re: fuzzy searches and EDISMAX

2015-12-08 Thread Walter Underwood
You probably want to apply the patch for SOLR-629. We have this in production 
at Chegg. I’ve been trying to get this feature added to Solr for seven years. 
Not sure why it never gets approved.

https://issues.apache.org/jira/browse/SOLR-629 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 8, 2015, at 9:56 AM, Felley, James  wrote:
> 
> I am trying to build an edismax search handler that will allow a fuzzy 
> search, using the "query fields" property (qf).
> 
> I have two instances of SOLR 4.8.1, one of which has edismax "qf" configured 
> with no fuzzy search
> ...
> ns_name^3.0  i_topic^3.0  i_object_type^3.0
> 
> ...
> And the other with a fuzzy search for ns_name (non-stemmed name)
> ns_name~1^3.0  i_topic^3.0  i_object_type^3.0
> 
> ...
> 
> The index of both includes a record with an ns_name of 'Johnson'
> 
> I get no return in either instance with the query
> q=Johnso
> 
> I get the Johnson record returned in both instances with a query of
> q=Johnso~1
> 
> The SOLR documentation seems silent on incorporating fuzzy searches in the 
> query fields.  I have seen various posts on Google that suggest that 'qf' 
> will accept fuzzy search declarations, other posts suggest only the query 
> itself will allow fuzzy searches (as seems to be the case for me).
> 
> Any guidance will be much appreciated
> 
> Jim
> 
> Jim Felley
> OCIO
> Smithsonian Institution
> fell...@si.edu
> 
> 
> 
> 



Re: solrconfig.xml - configuration scope

2015-12-08 Thread Erick Erickson
What specifically are you seeing? Most things are per-core as you surmised.

There are a few things which, through interaction with Lucene global
variables affect multiple cores, the one that comes to mind is
maxBooleanClauses, where the value in the last core loaded "wins".

There might be some others though, that's the one that I remember.

What version of Solr are you running?

Best,
Erick

On Tue, Dec 8, 2015 at 9:52 AM, Fitzpatrick, Adrian  wrote:
> Hi,
>
> This is probably a very basic question that has been asked many times before 
> - apologies in advance if so!
>
> I'm looking to validate whether something I **think** I have observed when 
> using Solr is a known behaviour:
>
> From my read of the docs etc. it was my understanding that solrconfig.xml was 
> the configuration for a core, and that if I had multiple cores in my Solr 
> server, each would have their own version of that file with their own 
> settings. However, in practice, when working with such a multiple core setup, 
> what I have observed suggests that some (perhaps many?) of the settings 
> within solrconfig.xml can have a system-wide impact. I.e. I change a setting 
> in core A and I see behaviour in other cores B,C which suggests they are 
> obeying the changed value from the core A rather than the setting value from 
> their own copy of solrconfig.xml
>
> So, as I said, main question is this known/expected behaviour, or am I 
> imagining things! If the former, is there any documentation etc. that 
> provides any clarification around how the configuration scope operates?
>
> Thanks,
>
> Adrian
>
> Please note that Revenue cannot guarantee that any personal and sensitive 
> data, sent in plain text via standard email, is fully secure. Customers who 
> choose to use this channel are deemed to have accepted any risk involved. The 
> alternative communication methods offered by Revenue include standard post 
> and the option to use our (encrypted) MyEnquiries service which is available 
> within myAccount and ROS. You can register for either myAccount or ROS on the 
> Revenue website.
>
> Tabhair faoi deara nach fidir leis na Coimisinir 
> Ioncaim rthaocht a thabhairt go bhfuil aon sonra 
> pearsanta agus ogair a gcuirtear isteach i 
> ngnth-thacs tr r-phost caighdenach go huile 
> is go hiomln sln. Meastar go nglacann 
> custaimir a sideann an cainal seo le 
> haon riosca bainteach. I measc na modhanna cumarside eile at 
> ag na Coimisinir n post caighdenach agus an 
> rogha r seirbhs (criptithe) M'Fhiosruithe a 
> sid, t s ar fil laistigh de 
> MoChrsa agus ROS. Is fidir leat clr 
> le haghaidh ceachtar MoChrsa n ROS ar shuomh 
> grasin na gCoimisinir.


Solr memory usage

2015-12-08 Thread Steven White
Hi folks,

My index size on disk (optimized) is 20 GB (single core, single index).  I
have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.

I have run load tests (up to 100 concurrent users) for hours where each
user issuing unique searches (the same search is never executed again for
at least 30 minute since it was last executed).  In all tests I run, Solr's
JVM memory never goes over 10 GB (monitoring http://localhost:8983/).

I read over and over, for optimal performance, Solr should be given enough
RAM to hold the index in memory.  Well, I have done that and some but yet I
don't see Solr using up that whole RAM.  What am I doing wrong?  Is my test
at fault?  I doubled the test load (number of users) and didn't see much of
a difference with RAM usage but yet my search performance went down (takes
about 40% longer now).  I run my tests again but this time with only 12 GB
of RAM given to Solr.  Test result didn't differ much from the 24 GB run
and Solr never used more than 10 GB of RAM.

Can someone help me understand this?  I don't want to give Solr RAM that it
won't use.

PS: This is simply search tests, there is no update to the index at all.

Thanks in advanced.

Steve


Re: Solr memory usage

2015-12-08 Thread Erick Erickson
You're doing nothing wrong, that particular bit of advice has
always needed a bit of explanation.

Solr (well, actually Lucene) uses MMapDirectory for much of
the index structure which uses the OS memory rather than
the JVM heap. See Uwe's excellent:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Plus, the size on disk includes the stored data, which is in the *.fdt
files in data/index. Very little of the stored data is kept in the JVM
so that's another reason your Java heap may be smaller than
your raw index size on disk.

The advice about fitting your entire index into memory really has
the following caveats (at least).
1> "memory" includes the OS memory available to the process
2> The size of the index on disk is misleading, the *.fdt files
 should be subtracted in order to get a truer picture.
3> Both Solr and Lucene create structures in the Java JVM
 that are _not_ reflected in the size on disk.

<1> and <2> mean the JVM memory necessary is smaller
than the size on disk.

<3> means the JVM memory will be larger than.

So you're doing the right thing, testing and seeing what you
_really_ need. I'd pretty much take your test, add some
padding and consider it good. You're _not_ doing the
really bad thing of using the same query over and over
again and hoping .

Best,
Erick


On Tue, Dec 8, 2015 at 11:54 AM, Steven White  wrote:
> Hi folks,
>
> My index size on disk (optimized) is 20 GB (single core, single index).  I
> have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
>
> I have run load tests (up to 100 concurrent users) for hours where each
> user issuing unique searches (the same search is never executed again for
> at least 30 minute since it was last executed).  In all tests I run, Solr's
> JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
>
> I read over and over, for optimal performance, Solr should be given enough
> RAM to hold the index in memory.  Well, I have done that and some but yet I
> don't see Solr using up that whole RAM.  What am I doing wrong?  Is my test
> at fault?  I doubled the test load (number of users) and didn't see much of
> a difference with RAM usage but yet my search performance went down (takes
> about 40% longer now).  I run my tests again but this time with only 12 GB
> of RAM given to Solr.  Test result didn't differ much from the 24 GB run
> and Solr never used more than 10 GB of RAM.
>
> Can someone help me understand this?  I don't want to give Solr RAM that it
> won't use.
>
> PS: This is simply search tests, there is no update to the index at all.
>
> Thanks in advanced.
>
> Steve


Long Running Data Import Handler - Notifications

2015-12-08 Thread Brian Narsi
Is there a way to receive notifications when a Data Import Handler finishes
up and whether it succeeded or failed. (typically runs about an hour)

Thanks


Re: Long Running Data Import Handler - Notifications

2015-12-08 Thread Walter Underwood
Not that I know of. I wrote a script to check the status and sleep until done. 
Like this:

SOLRURL='http://solr-master.prod2.cloud.cheggnet.com:6090/solr/textbooks/dataimport'

while : ; do
echo `date` checking whether Solr indexing is finished
curl -s "${SOLRURL}" | fgrep '"status":"idle"' > /dev/null
[ $? -ne 0 ] || break
sleep 300
done

echo Solr indexing is finished

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 8, 2015, at 5:37 PM, Brian Narsi  wrote:
> 
> Is there a way to receive notifications when a Data Import Handler finishes
> up and whether it succeeded or failed. (typically runs about an hour)
> 
> Thanks



Re: capacity of storage a single core

2015-12-08 Thread Mugeesh Husain
@Upayavira,

could you provice the any link, that issue has been resolved.

>>So long as your joined-to collection is replicated across every box
wher i can find this related link or example.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/capacity-of-storage-a-single-core-tp4244197p4244380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-08 Thread Rahul Ramesh
Hi Debraj,
I dont think increasing the timeout will help. Are you sure solr/ any other
program is not running on 8789? Please check the output of lsof -i :8789 .

Regards,
Rahul

On Tue, Dec 8, 2015 at 11:58 PM, Debraj Manna 
wrote:

> Can someone help me on this?
> On Dec 7, 2015 7:55 PM, "D"  wrote:
>
> > Hi,
> >
> > Many time while starting solr I see the below message and then the solr
> is
> > not reachable.
> >
> > debraj@boutique3:~/solr5$ sudo bin/solr start -p 8789
> > Waiting to see Solr listening on port 8789 [-]  Still not seeing Solr
> listening on 8789 after 30 seconds!
> >
> > However when I try to start solr again by trying to execute the same
> > command. It says that *"solr is already running on port 8789. Try using a
> > different port with -p"*
> >
> > I am having two cores in my local set-up. I am guessing this is happening
> > because one of the core is a little big. So solr is timing out while
> > loading the core. If I take one of the core out of solr then everything
> > works fine.
> >
> > Can some one let me know how can I increase this timeout value from
> > default 30 seconds?
> >
> > I am using Solr 5.2.1 on Debian 7.
> >
> > Thanks,
> >
> >
>


Re: Long Running Data Import Handler - Notifications

2015-12-08 Thread Stefan Matheis
https://wiki.apache.org/solr/DataImportHandler#EventListeners might be
worth a look

-Stefan

On Wed, Dec 9, 2015 at 2:51 AM, Walter Underwood  wrote:
> Not that I know of. I wrote a script to check the status and sleep until 
> done. Like this:
>
> SOLRURL='http://solr-master.prod2.cloud.cheggnet.com:6090/solr/textbooks/dataimport'
>
> while : ; do
> echo `date` checking whether Solr indexing is finished
> curl -s "${SOLRURL}" | fgrep '"status":"idle"' > /dev/null
> [ $? -ne 0 ] || break
> sleep 300
> done
>
> echo Solr indexing is finished
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Dec 8, 2015, at 5:37 PM, Brian Narsi  wrote:
>>
>> Is there a way to receive notifications when a Data Import Handler finishes
>> up and whether it succeeded or failed. (typically runs about an hour)
>>
>> Thanks
>


Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-08 Thread Debraj Manna
. After failed attempt to start solr if I try to start solr again on same
port it says solr is already running. Try running solr on different port.

Can you let me know if it is possible to increase the timeout? So that I
can observe how does it behave.
On Dec 9, 2015 10:10 AM, "Rahul Ramesh"  wrote:

> Hi Debraj,
> I dont think increasing the timeout will help. Are you sure solr/ any other
> program is not running on 8789? Please check the output of lsof -i :8789 .
>
> Regards,
> Rahul
>
> On Tue, Dec 8, 2015 at 11:58 PM, Debraj Manna 
> wrote:
>
> > Can someone help me on this?
> > On Dec 7, 2015 7:55 PM, "D"  wrote:
> >
> > > Hi,
> > >
> > > Many time while starting solr I see the below message and then the solr
> > is
> > > not reachable.
> > >
> > > debraj@boutique3:~/solr5$ sudo bin/solr start -p 8789
> > > Waiting to see Solr listening on port 8789 [-]  Still not seeing Solr
> > listening on 8789 after 30 seconds!
> > >
> > > However when I try to start solr again by trying to execute the same
> > > command. It says that *"solr is already running on port 8789. Try
> using a
> > > different port with -p"*
> > >
> > > I am having two cores in my local set-up. I am guessing this is
> happening
> > > because one of the core is a little big. So solr is timing out while
> > > loading the core. If I take one of the core out of solr then everything
> > > works fine.
> > >
> > > Can some one let me know how can I increase this timeout value from
> > > default 30 seconds?
> > >
> > > I am using Solr 5.2.1 on Debian 7.
> > >
> > > Thanks,
> > >
> > >
> >
>


Re: secure solr 5.3.1

2015-12-08 Thread Ishan Chattopadhyaya
Right, as Bosco said, this has been tested well and supported on SolrCloud.
It should be possible to run it in standalone mode, but it is not something
that has been well test yet.

On Tue, Dec 8, 2015 at 11:02 PM, Don Bosco Durai  wrote:

> It was tested and meant to work only in SolrCloud mode.
>
>
>
>
>
>
> On Tue, Dec 8, 2015 at 9:30 AM -0800, "kostali hassan" <
> med.has.kost...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
>- Kerberos authentication
>:
>work in SolrCloud or standalone mode but the documentation is not clear
>-
>
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746
>
>
> 2015-12-08 17:14 GMT+00:00 Don Bosco Durai :
>
> > Not sure exactly what you mean here. Even if you are running in
> SolrCloud,
> > you can access it using URL. So there won't be any change on the client
> > side.
> > Bosco
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> > med.has.kost...@gmail.com> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
> > hosting? or dont need a web server having cloud..?
> >
> > 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
> >
> > > Have you considered running your Solr as SolrCloud with embedded
> > zookeeper?
> > >
> > > If you do, you have multiple options. Basic Auth, Kerberos and
> > > authorization support.
> > >
> > >
> > > Bosco
> > >
> > >
> > >
> > >
> > >
> > > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> > >
> > > >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> > > >searching for the best way to secure my server solr but I found only
> for
> > > >cloud mode.
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>