solr server heap out

2016-07-12 Thread Midas A
Hi,
I frequently getting solr heap out once or twice a day. what could be the
possible reasons for the same and is there any way to log memory used by
the query in solr.log .

Thanks ,
Abhishek Tiwari


Re: Searching Home's, Homes and Home

2016-07-12 Thread Vijaymhaskar
Hi Surender,

Please go through Stemmer documentation which will give you idea on how
stemmer works.

I see below issues in configured field types,
1. You have added porter stemmer awa english minimal stemmer also. You can
remove one of those based on your requirement. Minimal stemmer is
conservative and removes mainly plural endings.

2. KeywordMarkerFilterFactory Protects words from being modified by
stemmers. Any words in the protected word list will not be modified by any
stemmer in Solr. So it should be added before stemmer.


You can try,

 
  

 





  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching Home's, Homes and Home

2016-07-12 Thread Surender
Hi,

I do not want to use Synonyms.txt as this would require to a big library and
that will be time consuming.

Thanks,
Surender Singh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching Home's, Homes and Home

2016-07-12 Thread Surender
Hi,

The following is the analyzer information and let me know what I am missing.


  










  
  









  
  

  



  










  
  









  
  

  


Thanks,
Surender Singh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286896.html
Sent from the Solr - User mailing list archive at Nabble.com.


Zookeeper overseer queue clogging

2016-07-12 Thread Manohar Sripada
There are 16 Solr Nodes (Solr 5.2.1) & 5 Zookeeper Nodes (Zookeeper 3.4.6)
in our production cluster. We had to restart Solr nodes for some reason and
we are doing it after 3 months. To our surprise, none of the solr nodes
came up. We can see the Solr process running the machine, but, the Solr
Admin console is not reachable. We even tried restarting Zookeeper cluster
and Solr node cluster. Still, the issue remained.

On debugging I have found out -
1. Below exception in solr.log :


>
>
> *ERROR - 2016-07-12 07:43:48.988;
> org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
> solr/home property and the logsERROR - 2016-07-12 07:43:49.012;
> org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: Could not find collection :
> cont_coll_2_frat
> org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:164)*


2.  Connected to zookeeper quorum using Zookeeper's zkCli.sh and found out
that there are few collections (which are deleted using Solr Collections
Delete API) still exists in zookeeper (ls /collections). The same
collections doesn't exist on the solr node disk.

3. There are entries related to these deleted collections in Zookeeper's
clusterstate.json file as well.

4. There are many entries in overseer queue (/overseer/queue) & queue-work
(/overseer/queue-work).

I have tried below things based on some existing suggestions on the net  -
1. Stopped all the Solr nodes and removed unwanted (which are deleted using
Solr Collections Delete API) collections using *rmr *command from Zookeeper
(/collections).

2. Removed all the entries from overseer queue (/overseer/queue) &
queue-work (/overseer/queue-work) as well.

3. Restarted Zookeeper and then Solr.

Even, after doing this the issue still remains. Can someone help me on how
to resolve this?

- Thanks


Re: sorlcloud connection issue

2016-07-12 Thread Kent Mu
Dear Mr. Heisey.

It seems that we can not send the picture or attachments to solr-user, so I
send the screen shot to your personal email, sorry to disturb!

Thanks!
Kent

2016-07-13 8:13 GMT+08:00 Shawn Heisey :

> On 7/12/2016 8:30 AM, Kent Mu wrote:
> > We have configed the maxThreads in JBOSS, and the good news is solrcloud
> > now running OK. but I another issue came across. We find the number of
> the
> > HTTP connections is very high, and the number can be around 3300. and
> > solrcloud does no release the connections.
> > I understand that, the solrcloud needs to connect to zookeeper and
> > communication between leader and replica need the connection. but I think
> > the number should not to be so huge.
> > besides, we use the singleton pattern to connect solrcloud in JAVA.
>
> Are you referring to the number of http connections in your SolrJ app,
> or the number of http connections in Solr itself?  Hopefully these are
> being run by completely separate JVMs.  Where exactly are you looking
> when you see 3300 connections?
>
> The connection to Zookeeper does not use HTTP.  It is a TCP connection
> but the protocol is custom.  Both Solr and SolrJ will maintain a
> connection to each of the zookeeper hosts that are in the zkHost string
> used when they start.
>
> Thanks,
> Shawn
>
>


Re: sorlcloud connection issue

2016-07-12 Thread Kent Mu
we have 5 shards and each shard with one leader and one replica.
the "3300" connections is only for one JVM. please see the follow analysis
in zabbix.


​

​

and we the solrj code as follow:

public synchronized static CloudSolrServer getSolrCloudReadServer() {
if (reviewSolrCloudReadServer == null) {
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 1000);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 100);
HttpClient client = HttpClientUtil.createClient(params);

LBHttpSolrServer lbServer = new LBHttpSolrServer(client);

lbServer.setConnectionTimeout(ReviewProperties.getCloudConnectionTimeOut());
lbServer.setSoTimeout(ReviewProperties.getCloudSoTimeOut());
reviewSolrCloudReadServer = new
CloudSolrServer(ReviewProperties.getZkHost(),lbServer);

reviewSolrCloudReadServer.setDefaultCollection(ReviewProperties.getZkReviewConnection());

reviewSolrCloudReadServer.setZkClientTimeout(ReviewProperties.getZkClientTimeout());

reviewSolrCloudReadServer.setZkConnectTimeout(ReviewProperties.getZkConnectTimeout());
}
return reviewSolrCloudReadServer;
}

2016-07-13 8:13 GMT+08:00 Shawn Heisey :

> On 7/12/2016 8:30 AM, Kent Mu wrote:
> > We have configed the maxThreads in JBOSS, and the good news is solrcloud
> > now running OK. but I another issue came across. We find the number of
> the
> > HTTP connections is very high, and the number can be around 3300. and
> > solrcloud does no release the connections.
> > I understand that, the solrcloud needs to connect to zookeeper and
> > communication between leader and replica need the connection. but I think
> > the number should not to be so huge.
> > besides, we use the singleton pattern to connect solrcloud in JAVA.
>
> Are you referring to the number of http connections in your SolrJ app,
> or the number of http connections in Solr itself?  Hopefully these are
> being run by completely separate JVMs.  Where exactly are you looking
> when you see 3300 connections?
>
> The connection to Zookeeper does not use HTTP.  It is a TCP connection
> but the protocol is custom.  Both Solr and SolrJ will maintain a
> connection to each of the zookeeper hosts that are in the zkHost string
> used when they start.
>
> Thanks,
> Shawn
>
>


Re: solrcloud consumes more time than solr when write index

2016-07-12 Thread Kent Mu
Dear Mr. Wartes,
Thanks for your reply. well, I see. for solr we do have replicas, and for
solrcloud, we have 5 shards and each shards with one leader and one
replica. and the data number is nearly 100 million, you mean we do not need
to optimize the index data?

Thanks!
Kent

2016-07-12 23:02 GMT+08:00 Jeff Wartes :

> Well, two thoughts:
>
>
> 1. If you’re not using solrcloud, presumably you don’t have any replicas.
> If you are, presumably you do. This makes for a biased comparison, because
> SolrCloud won’t acknowledge a write until it’s been safely written to all
> replicas. In short, solrcloud write time is max(per-replica write time).
> The more replicas you add, the bigger the chance some replica randomly
> takes longer (gc pause, perhaps?), and the longer your overall write time,
> assuming a fixed number of indexing threads.
> 2. The parallelism of the optimize operation across replicas has gone back
> and forth a bit, and I’m not sure what it was doing in 4.9. However, at one
> point the optimize happened per-replica, serially. So it’d do
> shard1_replica1, then when that was done, do shard1_replica2, then
> shard2_replica1, etc. Other versions of Solr would do those at the same
> time. Again, I don’t know if you’re comparing to a non-replicated solr
> index, but that could explain some of the difference.
>
> There’s a sort of an obligatory comment at this point that optimize
> doesn’t necessarily save you a lot. There are certainly cases where it
> does, but if you haven’t already, you’ll want to validate that you have one
> of them and that you’re not just doing unnecessary work.
>
>
> On 7/12/16, 7:41 AM, "Kent Mu"  wrote:
>
> >hello, does anybody also come across the issue? can anybody help me?
> >
> >2016-07-11 23:17 GMT+08:00 Kent Mu :
> >
> >> Hi friends!
> >>
> >> solr version: 4.9.0.
> >>
> >> we use solr and solrcloud in our project, that means we use sorl and
> >> solrcloud at the same time.
> >> but we find a phenomenon that sorlcoud consumes more time than solr when
> >> write index. it takes nearly 5 or more times longer. I wonder that is
> why?
> >>
> >> in our project, we have a scheduler job to add index, and then execute
> the
> >> the method of "optimize(false, true, 2)" to optimize the added index.
> >> I wonder if it is caused by solrcloud internal that when writing index,
> >> solrcloud needs to just which shard it should be stored? and when
> >> optimizing the replicate needs to take some time to synchronize the data
> >> from leader?
> >>
> >> and I wonder what about query?  will solrcloud also take more time than
> >> solr when query data?
> >>
>
>


Re: Upgrading solr 4.1.4 to solr 6.1.0

2016-07-12 Thread Rachid Bouacheria
Thank you very much for your prompt response.
I really appreciate it!
Rachid
On Jul 12, 2016 17:13, "Shawn Heisey"  wrote:

> On 7/12/2016 5:54 PM, Rachid Bouacheria wrote:
> > I am running solr 4.10.4 and I would like to upgrade to the latest
> version
> > 6.1.0
> >
> > The documentation I found provides steps to upgrade from 4.10.4 to 5.x
> > And it seems like going from 4.x to 5.x is pretty consequent.
> > Going from 5.x to 6.1.0 seems to be less effort but still non negligible.
> >
> > I am wondering if anyone had to do a similar upgrade? If so how did you
> do
> > it? Upgrade to 5.x and then to 6, or straight from 4.x to 6?
> > Any tips or advice are welcome.
>
> The 6.1.0 version cannot read your 4.x indexes.  It can read 5.x and
> later indexes.
>
> If you can "upgrade" by setting up a new Solr install and reindexing
> everything, that will always achieve the best results.  This is how I do
> upgrades.  There's no need to worry about the old index format at all.
>
> If that's not possible, then you will need to convert your index to 5.x
> format before upgrading to 6.x.  You can do this by upgrading to a 5.x
> version first and optimizing all your indexes, or you can use the
> IndexUpgrader tool from Lucene, first from 5.x, and then from 6.x, to
> upgrade your index in stages.
>
> https://cwiki.apache.org/confluence/display/solr/IndexUpgrader+Tool
>
> Thanks,
> Shawn
>
>


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Shawn Heisey
On 7/12/2016 9:45 AM, Jason wrote:
> I'm using optimize because it's a option for fast search. Our index
> updates one or more weekly. If I don't use optimize, many index files
> should be kept. Any performance issues in that case? And I'm wondering
> relation between index file size and heap size. In case of running as
> master server that only update index, is there any guide for heap size
> include Xmx, NewSize, MaxNewSize, etc.?

In older (2.x and 3.x) versions of Lucene, optimizing an index would
make a huge difference in performance.  In modern versions, the
performance increase from an optimize is much less dramatic.  Lucene
(and by extension, Solr) has gotten very good at dealing with an index
comprised of many segments.  The recommendation for the last few years
has been to AVOID doing an optimize unless it can be done during times
of very low query traffic, when the I/O load will not cause issues.

About the only good reason left for frequent optimizes is when the index
has many updates to existing documents, resulting in a very large
percentage of deleted documents in the index.  In that case, the
optimize will shrink the overall index size, which will make it faster
and make relevancy more accurate.

There is no general information available for setting the heap size. 
There is also no general information available on "acceptable" index
size.  The following wiki page touches a little bit on the heap size topic:

https://wiki.apache.org/solr/SolrPerformanceProblems

The reason that there is no generic information available is covered here:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn



Re: sorlcloud connection issue

2016-07-12 Thread Shawn Heisey
On 7/12/2016 8:30 AM, Kent Mu wrote:
> We have configed the maxThreads in JBOSS, and the good news is solrcloud
> now running OK. but I another issue came across. We find the number of the
> HTTP connections is very high, and the number can be around 3300. and
> solrcloud does no release the connections.
> I understand that, the solrcloud needs to connect to zookeeper and
> communication between leader and replica need the connection. but I think
> the number should not to be so huge.
> besides, we use the singleton pattern to connect solrcloud in JAVA.

Are you referring to the number of http connections in your SolrJ app,
or the number of http connections in Solr itself?  Hopefully these are
being run by completely separate JVMs.  Where exactly are you looking
when you see 3300 connections?

The connection to Zookeeper does not use HTTP.  It is a TCP connection
but the protocol is custom.  Both Solr and SolrJ will maintain a
connection to each of the zookeeper hosts that are in the zkHost string
used when they start.

Thanks,
Shawn



Re: Upgrading solr 4.1.4 to solr 6.1.0

2016-07-12 Thread Shawn Heisey
On 7/12/2016 5:54 PM, Rachid Bouacheria wrote:
> I am running solr 4.10.4 and I would like to upgrade to the latest version
> 6.1.0
>
> The documentation I found provides steps to upgrade from 4.10.4 to 5.x
> And it seems like going from 4.x to 5.x is pretty consequent.
> Going from 5.x to 6.1.0 seems to be less effort but still non negligible.
>
> I am wondering if anyone had to do a similar upgrade? If so how did you do
> it? Upgrade to 5.x and then to 6, or straight from 4.x to 6?
> Any tips or advice are welcome.

The 6.1.0 version cannot read your 4.x indexes.  It can read 5.x and
later indexes.

If you can "upgrade" by setting up a new Solr install and reindexing
everything, that will always achieve the best results.  This is how I do
upgrades.  There's no need to worry about the old index format at all.

If that's not possible, then you will need to convert your index to 5.x
format before upgrading to 6.x.  You can do this by upgrading to a 5.x
version first and optimizing all your indexes, or you can use the
IndexUpgrader tool from Lucene, first from 5.x, and then from 6.x, to
upgrade your index in stages.

https://cwiki.apache.org/confluence/display/solr/IndexUpgrader+Tool

Thanks,
Shawn



Upgrading solr 4.1.4 to solr 6.1.0

2016-07-12 Thread Rachid Bouacheria
Hi All,

I am running solr 4.10.4 and I would like to upgrade to the latest version
6.1.0

The documentation I found provides steps to upgrade from 4.10.4 to 5.x
And it seems like going from 4.x to 5.x is pretty consequent.
Going from 5.x to 6.1.0 seems to be less effort but still non negligible.

I am wondering if anyone had to do a similar upgrade? If so how did you do
it? Upgrade to 5.x and then to 6, or straight from 4.x to 6?
Any tips or advice are welcome.

Thank you all very much!


Update QParserPlugin containing FilteredQuery to Solr 5.5 - HowTo?

2016-07-12 Thread Oliver Obenland

Hi,

we developed a custom QParserPlugin for Solr 4.3. This QParser is for 
comparing the numeric values of the documents with numeric values of the 
search query. The first step was to reduce the number of documents by 
pre parsing the request and creating a lucene query:


final String queryString = "myField:" + preParse(searchString);
final QParser parser = getParser(queryString, "lucene", getReq());

Then we used an FilteredQuery to select only matching records (which 
will then get scored):


this.innerQuery = new MyQuery(new FilteredQuery(parser.parse(), new 
MyFilter(searchString)));


This worked really well. But now we want to update to Solr 5.5. There 
the FilteredQuery is marked as deprecated.

The documentation says:

FilteredQuery will be removed in Lucene 6.0. It should be replaced with 
a BooleanQuery with one BooleanClause.Occur.MUST clause for the query 
and one BooleanClause.Occur.FILTER clause for the filter.


But I don't know how to do this. Is there any tutorial for this?

I would start with:

BooleanQuery.Builder builder = new BooleanQuery.
builder.add(parser.parse(), BooleanClause.Occur.MUST);
builder.add(new MyFilterQuery(searchString), 
BooleanClause.Occur.FILTER);

this.innerQuery = builder.build();

But how do I get MyFilterQuery to filter my results?

Thank you for your time and you help!
-Oliver


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Otis Gospodnetic
Heap: start small and increase as necessary. Leave as much RAM for FS cache, 
don't give it to the JVM until it starts crying. SPM for Solr will help you see 
when Solr and JVM are starting to hurt.

Otis

> On Jul 12, 2016, at 11:45, Jason  wrote:
> 
> I'm using optimize because it's a option for fast search.
> Our index updates one or more weekly.
> If I don't use optimize, many index files should be kept.
> Any performance issues in that case?
> 
> And I'm wondering relation between index file size and heap size.
> In case of running as master server that only update index,
> is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?
> 
> 
> 
> Yonik Seeley wrote
>> Optimize is a very expensive operation.  It involves reading the
>> entire index and merging and rewriting at a single segment.
>> If you find it too expensive, do it less often, or don't do it at all.
>> It's an optional operation.
>> 
>> -Yonik
>> 
>> 
>> On Mon, Jul 11, 2016 at 10:19 PM, Jason 
> 
>> hialooha@
> 
>>  wrote:
>>> hi, all.
>>> 
>>> I'm running solr instance with two cores and JVM max heap is 32G.
>>> Each core index size is 68G, 61G repectively.
>>> I'm always keeping on optimization after update index.
>>> BTW, on last week, document update is completed but optimize phase cpu is
>>> very high.
>>> I think that is because long gc time.
>>> How should I solve this problem?
>>> welcome any idea.
>>> thanks,
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Erick Erickson
It's more a matter of "is unoptimized fast enough"? If so, why bother?
The background merging will keep segment counts relatively
reasonable.

If you're updating your index only once a week, it's reasonable to
optimize. Anecdotal reports are of on the order of a 10% speedup
_at best_.

As Yonik  says, optimizing is expensive. You'll have to evaluate whether
that expense is worth it in your case, there's no universal answer.

Best,
Erick

On Tue, Jul 12, 2016 at 8:45 AM, Jason  wrote:
> I'm using optimize because it's a option for fast search.
> Our index updates one or more weekly.
> If I don't use optimize, many index files should be kept.
> Any performance issues in that case?
>
> And I'm wondering relation between index file size and heap size.
> In case of running as master server that only update index,
> is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?
>
>
>
> Yonik Seeley wrote
>> Optimize is a very expensive operation.  It involves reading the
>> entire index and merging and rewriting at a single segment.
>> If you find it too expensive, do it less often, or don't do it at all.
>> It's an optional operation.
>>
>> -Yonik
>>
>>
>> On Mon, Jul 11, 2016 at 10:19 PM, Jason 
>
>> hialooha@
>
>>  wrote:
>>> hi, all.
>>>
>>> I'm running solr instance with two cores and JVM max heap is 32G.
>>> Each core index size is 68G, 61G repectively.
>>> I'm always keeping on optimization after update index.
>>> BTW, on last week, document update is completed but optimize phase cpu is
>>> very high.
>>> I think that is because long gc time.
>>> How should I solve this problem?
>>> welcome any idea.
>>> thanks,
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching Home's, Homes and Home

2016-07-12 Thread John Blythe
copy in your analyzer from your schema.xml

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Jul 12, 2016 at 8:10 AM, Surender 
wrote:

> Hi,
>
> I have checked the results and I am not getting desired results. Please
> suggest.
>
> Thanks,
> Surender Singh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286757.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching Home's, Homes and Home

2016-07-12 Thread Surender
Hi,

I have checked the results and I am not getting desired results. Please
suggest.

Thanks,
Surender Singh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Jason
I'm using optimize because it's a option for fast search.
Our index updates one or more weekly.
If I don't use optimize, many index files should be kept.
Any performance issues in that case?

And I'm wondering relation between index file size and heap size.
In case of running as master server that only update index,
is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?



Yonik Seeley wrote
> Optimize is a very expensive operation.  It involves reading the
> entire index and merging and rewriting at a single segment.
> If you find it too expensive, do it less often, or don't do it at all.
> It's an optional operation.
> 
> -Yonik
> 
> 
> On Mon, Jul 11, 2016 at 10:19 PM, Jason 

> hialooha@

>  wrote:
>> hi, all.
>>
>> I'm running solr instance with two cores and JVM max heap is 32G.
>> Each core index size is 68G, 61G repectively.
>> I'm always keeping on optimization after update index.
>> BTW, on last week, document update is completed but optimize phase cpu is
>> very high.
>> I think that is because long gc time.
>> How should I solve this problem?
>> welcome any idea.
>> thanks,
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Jason
Let me know the guide reference address which is mentioned reasonable index
size is around 15G.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286790.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcloud consumes more time than solr when write index

2016-07-12 Thread Jeff Wartes
Well, two thoughts:


1. If you’re not using solrcloud, presumably you don’t have any replicas. If 
you are, presumably you do. This makes for a biased comparison, because 
SolrCloud won’t acknowledge a write until it’s been safely written to all 
replicas. In short, solrcloud write time is max(per-replica write time). The 
more replicas you add, the bigger the chance some replica randomly takes longer 
(gc pause, perhaps?), and the longer your overall write time, assuming a fixed 
number of indexing threads.
2. The parallelism of the optimize operation across replicas has gone back and 
forth a bit, and I’m not sure what it was doing in 4.9. However, at one point 
the optimize happened per-replica, serially. So it’d do shard1_replica1, then 
when that was done, do shard1_replica2, then shard2_replica1, etc. Other 
versions of Solr would do those at the same time. Again, I don’t know if you’re 
comparing to a non-replicated solr index, but that could explain some of the 
difference.

There’s a sort of an obligatory comment at this point that optimize doesn’t 
necessarily save you a lot. There are certainly cases where it does, but if you 
haven’t already, you’ll want to validate that you have one of them and that 
you’re not just doing unnecessary work.


On 7/12/16, 7:41 AM, "Kent Mu"  wrote:

>hello, does anybody also come across the issue? can anybody help me?
>
>2016-07-11 23:17 GMT+08:00 Kent Mu :
>
>> Hi friends!
>>
>> solr version: 4.9.0.
>>
>> we use solr and solrcloud in our project, that means we use sorl and
>> solrcloud at the same time.
>> but we find a phenomenon that sorlcoud consumes more time than solr when
>> write index. it takes nearly 5 or more times longer. I wonder that is why?
>>
>> in our project, we have a scheduler job to add index, and then execute the
>> the method of "optimize(false, true, 2)" to optimize the added index.
>> I wonder if it is caused by solrcloud internal that when writing index,
>> solrcloud needs to just which shard it should be stored? and when
>> optimizing the replicate needs to take some time to synchronize the data
>> from leader?
>>
>> and I wonder what about query?  will solrcloud also take more time than
>> solr when query data?
>>



Re: Return docs with only the matched fields for a query

2016-07-12 Thread Walter Underwood
I’m not sure you need a custom component. Try using the standard highlighter. 
Configure hl.simple.pre and hl.simple.post to be empty strings. Configure it to 
return one maximum length snippet. That should return the entire matching 
fields, though I haven’t tested it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 12, 2016, at 1:24 AM, Prasanna Josium  
> wrote:
> 
> Hi all,
> 
> My  requirement is in line with 
> https://issues.apache.org/jira/browse/SOLR-3955
> I'm working on a project that has very low network bandwidth for the clients.
> I'm using Solr 4.10 
> 
> The problem: 
> I have ~ 1M documents with multiple fields(~50),  many of them are indexed, 
> stored and some of them are multivalued.
> Queries are searched across all these fields and often, only a few of the 
> fields have matching terms in them.  
> 
> When I search for a term="Hobbit", I want to return documents only with the 
> matching fields where "Hobbit" is found. 
> All other un matched fields in the result doc shall be dropped from the 
> result set.
> 
> Naïve solution:
> The obvious solution I could think of was to implement a custom search 
> component based on the "Highlighter" component to filter out unwanted fields.
> But I'm not sure of the performance penalty for large number of fields  or 
> many multi valued fields / document.
> 
> Question:
> Is there a better way to solve this problem? Apparently I'm not the first 
> person facing such an issue.
> 
> Thanks
> Cheers
> Prasanna 
> 
> 
> 
> 



Re: solrcloud consumes more time than solr when write index

2016-07-12 Thread Kent Mu
hello, does anybody also come across the issue? can anybody help me?

2016-07-11 23:17 GMT+08:00 Kent Mu :

> Hi friends!
>
> solr version: 4.9.0.
>
> we use solr and solrcloud in our project, that means we use sorl and
> solrcloud at the same time.
> but we find a phenomenon that sorlcoud consumes more time than solr when
> write index. it takes nearly 5 or more times longer. I wonder that is why?
>
> in our project, we have a scheduler job to add index, and then execute the
> the method of "optimize(false, true, 2)" to optimize the added index.
> I wonder if it is caused by solrcloud internal that when writing index,
> solrcloud needs to just which shard it should be stored? and when
> optimizing the replicate needs to take some time to synchronize the data
> from leader?
>
> and I wonder what about query?  will solrcloud also take more time than
> solr when query data?
>


Re: sorlcloud connection issue

2016-07-12 Thread Kent Mu
Dear Mr. Heisey.

We have configed the maxThreads in JBOSS, and the good news is solrcloud
now running OK. but I another issue came across. We find the number of the
HTTP connections is very high, and the number can be around 3300. and
solrcloud does no release the connections.
I understand that, the solrcloud needs to connect to zookeeper and
communication between leader and replica need the connection. but I think
the number should not to be so huge.
besides, we use the singleton pattern to connect solrcloud in JAVA.

look forward to your reply. Thanks!

2016-07-07 22:24 GMT+08:00 Shawn Heisey :

> On 7/6/2016 5:26 AM, Kent Mu wrote:
> > Hi friends!
> > *solr version: 4.9.0*
> >
> > I came across a problem when use solrcloud, it becomes dead lock, we got
> > the java core log, it looks like the http connection pool is exhausted
> and
> > most threads are waiting to get a free connection..
> >
> > I posted the problem in JIRA, the link is
> > https://issues.apache.org/jira/browse/SOLR-9253
> > I have increased http connection defaults for the SolrJ client, and also
> > configed the connection defaults in solr.xml for all shard servers as
> below.
> >
> >  > class="HttpShardHandlerFactory">
> > 6
> > 3
> > 1
> > 500
> > 
>
> I can see JBoss classes in the thread dump that was added to SOLR-9253.
>
> That thread dump shows 213 threads in the RUNNABLE state, and 507 in the
> WAITING state.  I do not think you are running into the configured shard
> handler limits.  I think your container is not allowing enough Solr
> threads to run.
>
> Just like Tomcat and Jetty, JBoss has a "maxThreads" setting that
> defaults to 200.  Increasing this setting is critical for scalability
> when using a third-party container.  I recommend 1 -- which is the
> setting you'll find in the Jetty that's included with Solr.
>
> Note that if you upgrade Solr to 5.x or 6.x, running in JBoss will no
> longer be a supported configuration.
>
> https://wiki.apache.org/solr/WhyNoWar
>
> Thanks,
> Shawn
>
>


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Yonik Seeley
Optimize is a very expensive operation.  It involves reading the
entire index and merging and rewriting at a single segment.
If you find it too expensive, do it less often, or don't do it at all.
It's an optional operation.

-Yonik


On Mon, Jul 11, 2016 at 10:19 PM, Jason  wrote:
> hi, all.
>
> I'm running solr instance with two cores and JVM max heap is 32G.
> Each core index size is 68G, 61G repectively.
> I'm always keeping on optimization after update index.
> BTW, on last week, document update is completed but optimize phase cpu is
> very high.
> I think that is because long gc time.
> How should I solve this problem?
> welcome any idea.
> thanks,
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Post Filter length & performance

2016-07-12 Thread Joel Bernstein
You should be able to send a POST to Solr that would work with larger
requests.

Postfilter performance is driven by three things:

1) How much overhead is involved in the handling of fq parameter, turning
into data structures etc...
2) How many documents the post filter needs to look at.
3) How fast is the filter for each document.

If you have large result sets, you'll need to optimized #3.


Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jul 12, 2016 at 8:06 AM, Vasu Y  wrote:

> Hi,
>  I am implementing a custom post filter for permission checks along the
> lines described by Erik at
> https://lucidworks.com/blog/2012/02/22/custom-security-filtering-in-solr/
>
> Is there a limit to the length (number of characters) of the custom post
> filter? In our case, length of this "fq" could be up to a maximum of 15000
> characters.
> Also, if the post filter is not accessing any external system (no DB access
> and no REST/Web-service calls) and just only doing a look-up of about 4
> field values (for each document) against the passed "fq" values (stored in
> couple of HashSets), would the performance degrade significantly (I do
> understand there will be some cost) when compared to not applying the
> security filter.
>
> Thanks,
> Vasu
>


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Kent Mu
as I said before. we also come across the issue. and I just guess the
possible reason. let's wait the expert to explain for us.
on the other hand. I find that your index data is 68G, that is too large, I
recommend you to use solrcloud, as the guide reference, the reasonable size
is around 15G.
now our project use solr and solrcloud together so that if anyone down or
other issue, we can switch to the well-running one.

2016-07-12 17:02 GMT+08:00 Jason :

> hi, Kent
> thanks your reply.
>
> I think that I need more explain to my server status.
> I'm using solr 4.2.1 and master-slave replication model.
> On master server many solr(tomcat) instances are running.
> (server has 64 cores, 128G ram.)
> Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G
> max heap respectively.
> When cpu is high on optimize phase, load average is almost over 100.
> And high cpu time is continued very long(5 hours over).
> Besides, other process of solr(tomcat) instance use also high cpu.
> But I'd not operated in other instances.
> So, I tried stop the other instances and just run one instance.
> But still cpu is high.
> I don't know how should I do.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


AW: group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field

2016-07-12 Thread Sebastian Riemer
Hi all,

Tested on Solr 6.1.0 (as well as 5.4.0 and 5.5.0) using the "techproducts" 
example the following query throws the same exception as in my original 
question:

To reproduce:
1) set up the techproducts example: 
solr start -e techproducts -noprompt
2) go to Solr Admin: 
http://localhost:8983/solr/#/techproducts/query
3) in "Raw Query Parameters" enter: 

group=true=true=true=manu_id_s=true=popularity
4) Hit "Execute Query"

[..]
"error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","java.lang.IllegalStateException"],
"msg":"Exception during facet.field: popularity",
"trace":"org.apache.solr.common.SolrException: Exception during 
facet.field: popularity\r\n\tat 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$50(SimpleFacets.java:739)\r\n\tat
 org.apache.solr.request.SimpleFacets$$Lambda$37/2022187546.call(Unknown 
Source)\r\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat 
org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:672)\r\n\tat 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:748)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:321)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:265)\r\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:293)\r\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\r\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\r\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:518)\r\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\r\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\r\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\r\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat
 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\r\n\tat
 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\r\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\r\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\r\n\tat
 java.lang.Thread.run(Thread.java:745)\r\nCaused by: 
java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 
'popularity' (expected=SORTED). Use UninvertingReader or index with 
docvalues.\r\n\tat 
org.apache.lucene.index.DocValues.checkField(DocValues.java:212)\r\n\tat 
org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)\r\n\tat 
org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:129)\r\n\tat
 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)\r\n\tat
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:660)\r\n\tat 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)\r\n\tat 
org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:638)\r\n\tat
 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:443)\r\n\tat
 

Re: Searching Home's, Homes and Home

2016-07-12 Thread Vijaymhaskar
Hi Surender, 

Can you share your current field configuration so that we can debug it from
there.. ?

Share your field and fieldtype definition from schema.xml .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching Home's, Homes and Home

2016-07-12 Thread kostali hassan
Or you can build a file called synonym.txt in your directory config of your
core.
Le 11 juil. 2016 17:06, "Surender"  a écrit :

> Thanks...
>
> I am applying these filters and will share update on this issue. It will
> take couple of days.
>
> Thanks,
> Surender Singh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286579.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Custom Post Filter length & performance

2016-07-12 Thread Vasu Y
Hi,
 I am implementing a custom post filter for permission checks along the
lines described by Erik at
https://lucidworks.com/blog/2012/02/22/custom-security-filtering-in-solr/

Is there a limit to the length (number of characters) of the custom post
filter? In our case, length of this "fq" could be up to a maximum of 15000
characters.
Also, if the post filter is not accessing any external system (no DB access
and no REST/Web-service calls) and just only doing a look-up of about 4
field values (for each document) against the passed "fq" values (stored in
couple of HashSets), would the performance degrade significantly (I do
understand there will be some cost) when compared to not applying the
security filter.

Thanks,
Vasu


Re: Multilevel grouping?

2016-07-12 Thread Yonik Seeley
I started this a while ago, but haven't found the time to finish:
https://issues.apache.org/jira/browse/SOLR-7830

-Yonik


On Tue, Jul 12, 2016 at 7:29 AM, Aditya Sundaram
 wrote:
> Does solr support multilevel grouping? I want to group upto 2/3 levels
> based on different fields i.e 1st group on field one, within which i group
> by field 2 etc.
> I am aware of facet.pivot which does the same but retrieves only the count.
> Is there anyway to get the documents as well along with the count in
> facet.pivot???
>
> --
> Aditya Sundaram


Multilevel grouping?

2016-07-12 Thread Aditya Sundaram
Does solr support multilevel grouping? I want to group upto 2/3 levels
based on different fields i.e 1st group on field one, within which i group
by field 2 etc.
I am aware of facet.pivot which does the same but retrieves only the count.
Is there anyway to get the documents as well along with the count in
facet.pivot???

-- 
Aditya Sundaram


Re: Return docs with only the matched fields for a query

2016-07-12 Thread cole worldforsolr
Hi Josium,

Have to try something like this
http://localhost:8983/solr/mycollection/select?fq=Hobbit:*=on=*:*=json

This will return all documents that contain the field Hobbit ONLY.

Well, I'm not very sure to understand what you seeking for, excuse me if my
answer is out of topic.

Best regards,
Cole.
On Tue, Jul 12, 2016 at 9:24 AM, Prasanna Josium <
prasanna.jos...@clustr.co.in> wrote:

> Hi all,
>
> My  requirement is in line with
> https://issues.apache.org/jira/browse/SOLR-3955
> I'm working on a project that has very low network bandwidth for the
> clients.
> I'm using Solr 4.10
>
> The problem:
> I have ~ 1M documents with multiple fields(~50),  many of them are
> indexed, stored and some of them are multivalued.
> Queries are searched across all these fields and often, only a few of the
> fields have matching terms in them.
>
> When I search for a term="Hobbit", I want to return documents only with
> the matching fields where "Hobbit" is found.
> All other un matched fields in the result doc shall be dropped from the
> result set.
>
> Naïve solution:
> The obvious solution I could think of was to implement a custom search
> component based on the "Highlighter" component to filter out unwanted
> fields.
> But I'm not sure of the performance penalty for large number of fields  or
> many multi valued fields / document.
>
> Question:
> Is there a better way to solve this problem? Apparently I'm not the first
> person facing such an issue.
>
> Thanks
> Cheers
> Prasanna
>
>
>
>
>


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Jason
hi, Kent
thanks your reply.

I think that I need more explain to my server status.
I'm using solr 4.2.1 and master-slave replication model.
On master server many solr(tomcat) instances are running.
(server has 64 cores, 128G ram.)
Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G
max heap respectively.
When cpu is high on optimize phase, load average is almost over 100.
And high cpu time is continued very long(5 hours over).
Besides, other process of solr(tomcat) instance use also high cpu.
But I'd not operated in other instances.
So, I tried stop the other instances and just run one instance.
But still cpu is high.
I don't know how should I do.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High cpu and gc time when performing optimization.

2016-07-12 Thread Kent Mu
we also came across this issue. I think it is not caused by gc time, but
the optimize action, though I did not read the source code, I think when
optimize the index in master internally, it will produce the replicate log
file, and the replicates synchronize the log file, just like the DB master
and slave theory, it will consumes much CPU and the IO will be very high.
but It is OK, and will take some time.

2016-07-12 10:19 GMT+08:00 Jason :

> hi, all.
>
> I'm running solr instance with two cores and JVM max heap is 32G.
> Each core index size is 68G, 61G repectively.
> I'm always keeping on optimization after update index.
> BTW, on last week, document update is completed but optimize phase cpu is
> very high.
> I think that is because long gc time.
> How should I solve this problem?
> welcome any idea.
> thanks,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Return docs with only the matched fields for a query

2016-07-12 Thread Prasanna Josium
Hi all,

My  requirement is in line with https://issues.apache.org/jira/browse/SOLR-3955
I'm working on a project that has very low network bandwidth for the clients.
I'm using Solr 4.10 

The problem: 
I have ~ 1M documents with multiple fields(~50),  many of them are indexed, 
stored and some of them are multivalued.
Queries are searched across all these fields and often, only a few of the 
fields have matching terms in them.  

When I search for a term="Hobbit", I want to return documents only with the 
matching fields where "Hobbit" is found. 
All other un matched fields in the result doc shall be dropped from the result 
set.

Naïve solution:
The obvious solution I could think of was to implement a custom search 
component based on the "Highlighter" component to filter out unwanted fields.
But I'm not sure of the performance penalty for large number of fields  or many 
multi valued fields / document.

Question:
Is there a better way to solve this problem? Apparently I'm not the first 
person facing such an issue.

Thanks
Cheers
Prasanna