Re: Boosting only top n results that match a criteria

2019-12-28 Thread Emir Arnautović
You could try and see if field collapsing can help you. That could let you 
return top 5 from each class if that is acceptable. Otherwise, you’ll have to 
go with two queries.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Dec 2019, at 19:08, Nitin Arora  wrote:
> 
> Simply boosting on class A1 won't work since there may be many documents
> from that class, all getting equal boost. I want only top 5 docs of that
> class to get the boost.
> 
> On Fri, 27 Dec 2019 at 22:42, Erick Erickson 
> wrote:
> 
>> Yes. Rerank essentially takes the top N results of one query and re-scores
>> them through another query. So just boost the secondary query.
>> 
>> But you may not even have to do that. Just add a boost clause to a single
>> query and boost your class A1 quite high. See “boost” and/or “bq”.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 27, 2019, at 10:57 AM, Nitin Arora 
>> wrote:
>>> 
>>> Hi Erick, I was not able to figure how exactly I will use
>>> RerankQParserPlugin to achieve the desired reranking. I see that I can
>>> rerank all the top RERANK_DOCS results - it is possible that they
>> contain a
>>> hundred results of class A1 or none. But the desired behaviour I want is
>> to
>>> pick (only) the top 5 results of class A1 from my potentially 100s of
>>> results. Then boost them to first page.
>>> Do you think this(or near this) behaviour is possible
>>> using RerankQParserPlugin? Please shed more light how.
>>> 
>>> On Fri, 27 Dec 2019 at 19:48, Erick Erickson 
>>> wrote:
>>> 
 Have you seen RerankQParserPlugin?
 
 Best,
 Erick
 
> On Dec 27, 2019, at 8:49 AM, Emir Arnautović <
 emir.arnauto...@sematext.com> wrote:
> 
> Hi Nitin,
> Can you simply filter and return top 5:
> 
> ….=class:A1=5
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
> 
> 
> 
>> On 27 Dec 2019, at 13:55, Nitin Arora  wrote:
>> 
>> Hello, I have a complex solr query with various boosts applied that
>> returns, say a few hundred results. Out of these hundreds of results I
 want
>> to further boost, say the top 5 results that satisfy a particular
 criteria
>> - e.g. class=A1. So I want the top 5 results from class A1 in my
 existing
>> results set to come further higher, so that I can show them on the
>> first
>> page of my final results. How do I achieve this?
>> I am new to SOLR and this community so apologies if this is
 trivial/repeat.
>> 
>> Thanks,
>> Nitin
> 
 
 
>> 
>> 



Re: Solr 7.5 speed up, accuracy details

2019-12-28 Thread Dave
There is no increase in speed, but features. Doc values add some but it’s hard 
to quantify, and some people think solr cloud has speed increases but I don’t 
think they exist when hardware cost is nonexistent and it adds too much 
complexity to something that should be simple.  

> On Dec 28, 2019, at 12:52 PM, Rajdeep Sahoo  
> wrote:
> 
> Hi all,
>  How can I get the performance improvement features in indexing and search
> in solr 7.5...
> 
>> On Sat, 28 Dec, 2019, 9:18 PM Rajdeep Sahoo, 
>> wrote:
>> 
>> Thank you for the information
>>  Why you are recommending to use the schema api instead of schema xml?
>> 
>> 
>>> On Sat, 28 Dec, 2019, 8:01 PM Jörn Franke,  wrote:
>>> 
>>> This highly depends on how you designed your collections etc. - there is
>>> no general answer. You have to do a performance test based on your
>>> configuration and documents.
>>> 
>>> I also recommend to check the Solr documentation on how to design a
>>> collection for 7.x and maybe start even from scratch defining it with a new
>>> fresh schema (using the schema api instead of schema.xml and solrconfig.xml
>>> etc). You will have anyway to reindex everything so it is a also a good
>>> opportunity to look at your existing processes and optimize them.
>>> 
 Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo <
>>> rajdeepsahoo2...@gmail.com>:
 
 Hi all,
 Is there any way I can get the speed up,accuracy details i.e.
>>> performance
 improvements of solr 7.5 in comparison with solr 4.6
 Currently,we are using solr 4.6 and we are in a process to upgrade to
 solr 7.5. Need these details.
 
 Thanks in advance
>>> 
>> 


Re: Solr 7.5 speed up, accuracy details

2019-12-28 Thread Rajdeep Sahoo
Hi all,
  How can I get the performance improvement features in indexing and search
in solr 7.5...

On Sat, 28 Dec, 2019, 9:18 PM Rajdeep Sahoo, 
wrote:

> Thank you for the information
>   Why you are recommending to use the schema api instead of schema xml?
>
>
> On Sat, 28 Dec, 2019, 8:01 PM Jörn Franke,  wrote:
>
>> This highly depends on how you designed your collections etc. - there is
>> no general answer. You have to do a performance test based on your
>> configuration and documents.
>>
>> I also recommend to check the Solr documentation on how to design a
>> collection for 7.x and maybe start even from scratch defining it with a new
>> fresh schema (using the schema api instead of schema.xml and solrconfig.xml
>> etc). You will have anyway to reindex everything so it is a also a good
>> opportunity to look at your existing processes and optimize them.
>>
>> > Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo <
>> rajdeepsahoo2...@gmail.com>:
>> >
>> > Hi all,
>> > Is there any way I can get the speed up,accuracy details i.e.
>> performance
>> > improvements of solr 7.5 in comparison with solr 4.6
>> >  Currently,we are using solr 4.6 and we are in a process to upgrade to
>> > solr 7.5. Need these details.
>> >
>> > Thanks in advance
>>
>


Re: Solr 7.3 cluster issue

2019-12-28 Thread David Barnett
Hi Jan et all

clusterstate shows all cores and replicas on node 1 and 2 but node 0 is empty. 
All three nodes live_nodes shows the correct 3 node addresses.

Thanks for the advice, we will use a 4th node.
On 28 Dec 2019, 14:10 +, Jan Høydahl , wrote:
> Wonder what clusterstate actually says. I can think of two things that could 
> possibly heal the cluster:
>
> A rolling restart of all nodes may make Solr heal itself, but the risk is 
> that some shards may not have a replica and if you get stuck in recovery 
> during restart you have downtime.
>
> Another way could be to use admin UI and remove all replicas from the defunct 
> node. Then reboot/reinstall that node and then add back missing replicas and 
> let solr replicate shards to the new node.
>
> A third more defensive way is to add a fourth node, add replicas to it to 
> make all collections redundant and then remove replicas from the defunct node 
> and finally decommission it.
>
> Jan Høydahl
>
> > 28. des. 2019 kl. 02:17 skrev David Barnett :
> >
> > Happy holidays folks, we have a production deployment usage Solr 7.3 in a 
> > three node cluster we have a number of collections setup, three shards with 
> > a replica factor of 2. The system has been fine, but we experienced issues 
> > with disk space one of the nodes.
> >
> > Node 0 starts but does not show any cores / replicas, the solr.log is full 
> > of these "o.a.s.c.ZkController org.apache.solr.common.SolrException: 
> > Replica core_node7 is not present in cluster state: null”
> >
> > Node 1 and Node 2 are OK, all data from all collections is accessible.
> >
> > Can I recreate node 0 as though it had failed completely ?, is it OK to 
> > remove the references to the replicas (missing) and recreate. Would you be 
> > able to provide me some guidance of the safest way to reintroduce node 0 
> > given our situation.
> >
> > Many thanks
> >
> > Dave


Re: Solr 7.5 seed up, accuracy details

2019-12-28 Thread Rajdeep Sahoo
Thank you for the information
  Why you are recommending to use the schema api instead of schema xml?


On Sat, 28 Dec, 2019, 8:01 PM Jörn Franke,  wrote:

> This highly depends on how you designed your collections etc. - there is
> no general answer. You have to do a performance test based on your
> configuration and documents.
>
> I also recommend to check the Solr documentation on how to design a
> collection for 7.x and maybe start even from scratch defining it with a new
> fresh schema (using the schema api instead of schema.xml and solrconfig.xml
> etc). You will have anyway to reindex everything so it is a also a good
> opportunity to look at your existing processes and optimize them.
>
> > Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo  >:
> >
> > Hi all,
> > Is there any way I can get the speed up,accuracy details i.e. performance
> > improvements of solr 7.5 in comparison with solr 4.6
> >  Currently,we are using solr 4.6 and we are in a process to upgrade to
> > solr 7.5. Need these details.
> >
> > Thanks in advance
>


Re: Solr 7.3 cluster issue

2019-12-28 Thread Erick Erickson
+1 to Jan’s comments, especially the idea of adding a 4th node and doing your 
ADDREPLICAs to that before doing the DELETEREPLICAS for the replicas on the 
sick node. I’ve used this to bring clusters back to health. This assumes you 
have at least one active leader for all shards.

That ZK error is weird, what’s the full stack trace?

Best,
Erick


> On Dec 28, 2019, at 9:10 AM, Jan Høydahl  wrote:
> 
> Wonder what clusterstate actually says. I can think of two things that could 
> possibly heal the cluster: 
> 
> A rolling restart of all nodes may make Solr heal itself, but the risk is 
> that some shards may not have a replica and if you get stuck in recovery 
> during restart you have downtime.
> 
> Another way could be to use admin UI and remove all replicas from the defunct 
> node. Then reboot/reinstall that node and then add back missing replicas and 
> let solr replicate shards to the new node.
> 
> A third more defensive way is to add a fourth node, add replicas to it to 
> make all collections redundant and then remove replicas from the defunct node 
> and finally decommission it.
> 
> Jan Høydahl
> 
>> 28. des. 2019 kl. 02:17 skrev David Barnett :
>> 
>> Happy holidays folks, we have a production deployment usage Solr 7.3 in a 
>> three node cluster we have a number of collections setup, three shards with 
>> a replica factor of 2. The system has been fine, but we experienced issues 
>> with disk space one of the nodes.
>> 
>> Node 0 starts but does not show any cores / replicas, the solr.log is full 
>> of these "o.a.s.c.ZkController org.apache.solr.common.SolrException: Replica 
>> core_node7 is not present in cluster state: null”
>> 
>> Node 1 and Node 2 are OK, all data from all collections is accessible.
>> 
>> Can I recreate node 0 as though it had failed completely ?, is it OK to 
>> remove the references to the replicas (missing) and recreate. Would you be 
>> able to provide me some guidance of the safest way to reintroduce node 0 
>> given our situation.
>> 
>> Many thanks
>> 
>> Dave



Re: Solr 7.5 seed up, accuracy details

2019-12-28 Thread Jörn Franke
This highly depends on how you designed your collections etc. - there is no 
general answer. You have to do a performance test based on your configuration 
and documents.

I also recommend to check the Solr documentation on how to design a collection 
for 7.x and maybe start even from scratch defining it with a new fresh schema 
(using the schema api instead of schema.xml and solrconfig.xml etc). You will 
have anyway to reindex everything so it is a also a good opportunity to look at 
your existing processes and optimize them.

> Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo :
> 
> Hi all,
> Is there any way I can get the speed up,accuracy details i.e. performance
> improvements of solr 7.5 in comparison with solr 4.6
>  Currently,we are using solr 4.6 and we are in a process to upgrade to
> solr 7.5. Need these details.
> 
> Thanks in advance


Solr 7.5 seed up, accuracy details

2019-12-28 Thread Rajdeep Sahoo
Hi all,
Is there any way I can get the speed up,accuracy details i.e. performance
improvements of solr 7.5 in comparison with solr 4.6
  Currently,we are using solr 4.6 and we are in a process to upgrade to
solr 7.5. Need these details.

Thanks in advance


Re: Solr 7.3 cluster issue

2019-12-28 Thread Jan Høydahl
Wonder what clusterstate actually says. I can think of two things that could 
possibly heal the cluster: 

A rolling restart of all nodes may make Solr heal itself, but the risk is that 
some shards may not have a replica and if you get stuck in recovery during 
restart you have downtime.

Another way could be to use admin UI and remove all replicas from the defunct 
node. Then reboot/reinstall that node and then add back missing replicas and 
let solr replicate shards to the new node.

A third more defensive way is to add a fourth node, add replicas to it to make 
all collections redundant and then remove replicas from the defunct node and 
finally decommission it.

Jan Høydahl

> 28. des. 2019 kl. 02:17 skrev David Barnett :
> 
> Happy holidays folks, we have a production deployment usage Solr 7.3 in a 
> three node cluster we have a number of collections setup, three shards with a 
> replica factor of 2. The system has been fine, but we experienced issues with 
> disk space one of the nodes.
> 
> Node 0 starts but does not show any cores / replicas, the solr.log is full of 
> these "o.a.s.c.ZkController org.apache.solr.common.SolrException: Replica 
> core_node7 is not present in cluster state: null”
> 
> Node 1 and Node 2 are OK, all data from all collections is accessible.
> 
> Can I recreate node 0 as though it had failed completely ?, is it OK to 
> remove the references to the replicas (missing) and recreate. Would you be 
> able to provide me some guidance of the safest way to reintroduce node 0 
> given our situation.
> 
> Many thanks
> 
> Dave


RE: Exceptions in solr log

2019-12-28 Thread Vadim Ivanov
Hi,
I'm facing the same problem with Solrcloud 7x - 8x.
I have TLOG type of replicas and when I delete Leader,  log is always full
of this:
2019-12-28 14:46:56.239 ERROR (indexFetcher-45942-thread-1) [   ]
o.a.s.h.IndexFetcher No files to download for index generation: 7166
2019-12-28 14:48:03.157 ERROR (indexFetcher-45881-thread-1) [   ]
o.a.s.h.IndexFetcher No files to download for index generation: 10588
Unfortunately, by this error it's hard to say even what exact replica, shard
and collection is in trouble. 
Sometimes, indexing helps - my guess that after commit slave replicas
somehow understands what index generation should be retrieved from new
leader.
Sometimes I have to restart node.

-- 
Vadim

> -Original Message-
> From: Akreeti Agarwal [mailto:akree...@hcl.com]
> Sent: Friday, December 27, 2019 8:20 AM
> To: solr-user@lucene.apache.org
> Subject: Exceptions in solr log
> 
> Hi All,
> 
> Please help me with these exceptions and their workarounds:
> 
> 1. org.apache.solr.common.SolrException:
org.apache.solr.search.SyntaxError:
> Cannot parse
> 2. o.a.s.h.IndexFetcher No files to download for index generation: 1394327
> 3. o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_b]
(this
> one is warning as discussed)
> 
> I am getting these errors always in my solr logs, what can be the reason
> behind them and how should I resolve it.
> 
> 
> Thanks & Regards,
> Akreeti Agarwal
> ::DISCLAIMER::
> 
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only. E-mail transmission is not
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or may contain
viruses in
> transmission. The e mail and its contents (with or without referred
errors)
> shall therefore not attach any liability on the originator or HCL or its
affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the views or opinions of HCL or its
> affiliates. Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of this message without
the
> prior written consent of authorized representative of HCL is strictly
> prohibited. If you have received this email in error please delete it and
notify
> the sender immediately. Before opening any email and/or attachments,
> please check them for viruses and other defects.
> 



Re: does copyFields increase indexe size ?

2019-12-28 Thread Nicolas Paris


> So what will be added is just another set of pointers to each relevant
> term. That's not going to be very large. Probably

Hi Shawn. This explains much ! Thanks.
In case of text fields, the highlight is done on the source fields and
the _text_ field is only used for lookup. This behavior is perfect for
my needs.

On Fri, Dec 27, 2019 at 05:28:25PM -0700, Shawn Heisey wrote:
> On 12/26/2019 1:21 PM, Nicolas Paris wrote:
> > Below a part of the managed-schema. There is 1k section* fields. The
> > second experience, I removed the copyField, droped the collection and
> > re-indexed the whole. To mesure the index size, I went to solr-cloud and
> > looked in the cloud part: 40GO per shard. I also look at the folder
> > size. I made some tests and the _text_ field is indexed.
> 
> Your schema says that the destination field is not stored and doesn't have
> docValues.  So the only thing it has is indexed.
> 
> All of the terms generated by index analysis will already be in the index
> from the source fields.  So what will be added is just another set of
> pointers to each relevant term.  That's not going to be very large. Probably
> only a few bytes for each term.
> 
> So with this copyField, the index will get larger, but probably not
> significantly.
> 
> Thanks,
> Shawn
> 

-- 
nicolas