Re: Solr cloud production set up

2020-02-24 Thread Paras Lehana
Hi Rajdeep,


   1. I assume you had enabled docValues for the facet fields, right?
   2. What does your GC logs tell? Do you get freezes and CPU spikes during
   intervals?
   3. Caching will help in querying. I'll need to see a sample query of
   yours to recommend what you can tweak.


On Tue, 28 Jan 2020 at 19:09, Jason Gerlowski  wrote:

> Hi Rajdeep,
>
> Unfortunately it's near impossible for anyone here to tell you what
> parameters to tweak.  People might take guesses based on their
> individual past experience, but ultimately those are just guesses.
>
> There are just too many variables affecting Solr performance for
> anyone to have a good guess without access to the cluster itself and
> the time and will to dig into it.
>
> Are there GC params that need tweaking?  Very possible, but you'll
> have to look into your gc logs to see how much time is being spent in
> gc.  Are there query params you could be changing?  Very possible, but
> you'll have to identify the types of queries you're submitting and see
> whether the ref-guide offers any information on how to tweak
> performance for those particular qparsers, facets, etc.  Is the number
> of facets the reason for slow queries?  Very possible, but you'll have
> to turn faceting off or run debug=timing and see how what that tells
> you about the QTime's.
>
> Tuning Solr performance is a tough, time consuming process.  I wish
> there was an easier answer for you, but there's not.
>
> Best,
>
> Jason
>
> On Mon, Jan 20, 2020 at 12:06 PM Rajdeep Sahoo
>  wrote:
> >
> > Please suggest anyone
> >
> > On Sun, 19 Jan, 2020, 9:43 AM Rajdeep Sahoo,  >
> > wrote:
> >
> > > Apart from reducing no of facets in the query, is there any other query
> > > params or gc params or heap space or anything else that we need to
> tweak
> > > for improving search response time.
> > >
> > > On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson,  >
> > > wrote:
> > >
> > >> Add =timing to the query and it’ll show you the time each
> component
> > >> takes.
> > >>
> > >> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <
> rajdeepsahoo2...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Thanks for the suggestion,
> > >> >
> > >> > Is there any way to get the info which operation or which query
> params
> > >> are
> > >> > increasing the response time.
> > >> >
> > >> >
> > >> > On Sat, 18 Jan, 2020, 11:59 PM Dave, 
> > >> wrote:
> > >> >
> > >> >> If you’re not getting values, don’t ask for the facet. Facets are
> > >> >> expensive as hell, maybe you should think more about your query’s
> than
> > >> your
> > >> >> infrastructure, solr cloud won’t help you at all especially if your
> > >> asking
> > >> >> for things you don’t need
> > >> >>
> > >> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <
> > >> rajdeepsahoo2...@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> We have assigned 16 gb out of 24gb for heap .
> > >> >>> No other process is running on that node.
> > >> >>>
> > >> >>> 200 facets fields are there in the query but we will not be
> getting
> > >> the
> > >> >>> values for each facets for every search.
> > >> >>> There can be max of 50-60 facets for which we will be getting
> values.
> > >> >>>
> > >> >>> We are using caching,is it not going to help.
> > >> >>>
> > >> >>>
> > >> >>>
> > >>  On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <
> apa...@elyograg.org>
> > >> >> wrote:
> > >> 
> > >> > On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > >> > We are having 2.3 million documents and size is 2.5 gb.
> > >> >  10 core cpu and 24 gb ram . 16 slave nodes.
> > >> >
> > >> >  Still some of the queries are taking 50 sec at solr end.
> > >> > As we are using solr 4.6 .
> > >> >  Other thing is we are having 200 (avg) facet fields  in a
> query.
> > >> > And 30 searchable fields.
> > >> > Is there any way to identify why it is taking 50 sec for a
> query.
> > >> >Multiple concurrent requests are there.
> > >> 
> > >>  Searching 30 fields and computing 200 facets is never going to be
> > >> super
> > >>  fast.  Switching to cloud will not help, and might make it
> slower.
> > >> 
> > >>  Your index is pretty small to a lot of us.  There are people
> running
> > >>  indexes with billions of documents that take terabytes of disk
> space.
> > >> 
> > >>  As Walter mentioned, computing 200 facets is going to require a
> fair
> > >>  amount of heap memory.  One *possible* problem here is that the
> Solr
> > >>  heap size is too small, so a lot of GC is required.  How much of
> the
> > >>  24GB have you assigned to the heap?  Is there any software other
> than
> > >>  Solr running on these nodes?
> > >> 
> > >>  Thanks,
> > >>  Shawn
> > >> 
> > >> >>
> > >>
> > >>
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 

Re: Solr 6.3 and OpenJDK 11

2020-02-24 Thread Paras Lehana
Hi Arnold,

Why not simply use the latest Solr 8 with Java 11? Upgrade is worth it. :)

Actually, no one would be able to answer your question without having the
experience with the same. Or you can experiment and report the experience
here. :P

On Wed, 29 Jan 2020 at 04:43, Arnold Bronley 
wrote:

> Hi,
>
> How much of a problem would it be if I use OpenJDK 11 with Solr 6.3. I am
> aware that the system requirements page for Solr mentions that 'You should
> avoid Java 9 or later for Lucene/Solr 6.x or earlier.' I am interested in
> knowing what sort functionalities would break in Solr if I try to use
> OpenJDK 11 with Solr 6.3.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Spell check with data from database and not from english dictionary

2020-02-24 Thread Paras Lehana
Just asking here.

What are community's experiences regarding using spellcheck with external
file against Synonym filter for exact matches?

On Wed, 29 Jan 2020 at 17:28, seeteshh  wrote:

> Hello Jan
>
> Let me work on your suggestions too.
>
> Also I had one query
>
> While working on the spell check component, I dont any suggestion for the
> incorrect word typed
>
> example : In spellcheck.q,   I type "Teh" instead of "The" or "saa" instead
> of "sea"
>
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "spellcheck.q":"Teh",
>   "spellcheck":"on",
>   "spellcheck.reload":"true",
>   "spellcheck.build":"true",
>   "_":"1580287370193",
>   "spellcheck.collate":"true"}},
>   "command":"build",
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "spellcheck":{
> "suggestions":[],
> "collations":[]}}
>
> I have to create an entry in the synonyms.txt file for teh => The to make
> up
> for this issue.
>
> Does Solr require a 4 digit character in spellcheck.q to provide the proper
> suggestion for the mis-spelt word? Is there any section in the Reference
> guide  where it is documented? These are my findings/observations but need
> to know the rationale behind this.
>
> Regards,
>
> Seetesh Hindlekar
>
>
>
>
>
> -
> Seetesh Hindlekar
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Query Autocomplete Evaluation

2020-02-24 Thread Walter Underwood
Here is a blog article with a worked example for MRR based on customer clicks.

https://observer.wunderwood.org/2016/09/12/measuring-search-relevance-with-mrr/

At my place of work, we compare the CTR and MRR of queries using suggestions to 
those that do not use suggestions. Solr autosuggest based on lexicon of book 
titles is highly effective for us.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 24, 2020, at 9:52 PM, Paras Lehana  wrote:
> 
> Hey Audrey,
> 
> I assume MRR is about the ranking of the intended suggestion. For this, no
> human judgement is required. We track position selection - the position
> (1-10) of the selected suggestion. For example, this is our recent numbers:
> 
> Position 1 Selected (B3) 107,699
> Position 2 Selected (B4) 58,736
> Position 3 Selected (B5) 23,507
> Position 4 Selected (B6) 12,250
> Position 5 Selected (B7) 7,980
> Position 6 Selected (B8) 5,653
> Position 7 Selected (B9) 4,193
> Position 8 Selected (B10) 3,511
> Position 9 Selected (B11) 2,997
> Position 10 Selected (B12) 2,428
> *Total Selections (B13)* *228,954*
> MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%
> 
> Refer here for MRR calculation keeping Auto-Suggest in perspective:
> https://medium.com/@dtunkelang/evaluating-search-measuring-searcher-behavior-5f8347619eb0
> 
> "In practice, this is inverted to obtain the reciprocal rank, e.g., if the
> searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
> of these reciprocal ranks is called the mean reciprocal rank (MRR)."
> 
> nDCG may require human intervention. Please let me know in case I have not
> understood your question properly. :)
> 
> 
> 
> On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - audrey.lorberf...@ibm.com
>  wrote:
> 
>> Hi Paras,
>> 
>> This is SO helpful, thank you. Quick question about your MRR metric -- do
>> you have binary human judgements for your suggestions? If no, how do you
>> label suggestions successful or not?
>> 
>> Best,
>> Audrey
>> 
>> On 2/24/20, 2:27 AM, "Paras Lehana"  wrote:
>> 
>>Hi Audrey,
>> 
>>I work for Auto-Suggest at IndiaMART. Although we don't use the
>> Suggester
>>component, I think you need evaluation metrics for Auto-Suggest as a
>>business product and not specifically for Solr Suggester which is the
>>backend. We use edismax parser with EdgeNGrams Tokenization.
>> 
>>Every week, as the property owner, I report around 500 metrics. I would
>>like to mention a few of those:
>> 
>>   1. MRR (Mean Reciprocal Rate): How high the user selection was
>> among the
>>   returned result. Ranges from 0 to 1, the higher the better.
>>   2. APL (Average Prefix Length): Prefix is the query by user. Lesser
>> the
>>   better. This reports how less an average user has to type for
>> getting the
>>   intended suggestion.
>>   3. Acceptance Rate or Selection: How many of the total searches are
>>   being served from Auto-Suggest. We are around 50%.
>>   4. Selection to Display Ratio: Did you make the user to click any
>> of the
>>   suggestions if they are displayed?
>>   5. Response Time: How fast are you serving your average query.
>> 
>> 
>>The Selection and Response Time are our main KPIs. We track a lot about
>>Auto-Suggest usage on our platform which becomes apparent if you
>> observe
>>the URL after clicking a suggestion on dir.indiamart.com. However, not
>>everything would benefit you. Do let me know for any related query or
>>explanation. Hope this helps. :)
>> 
>>On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com
>> wrote:
>> 
>>> Hi all,
>>> 
>>> How do you all evaluate the success of your query autocomplete (i.e.
>>> suggester) component if you use it?
>>> 
>>> We cannot use MRR for various reasons (I can go into them if you're
>>> interested), so we're thinking of using nDCG since we already use
>> that for
>>> relevance eval of our system as a whole. I am also interested in the
>> metric
>>> "success at top-k," but I can't find any research papers that
>> explicitly
>>> define "success" -- I am assuming it's a suggestion (or suggestions)
>>> labeled "relevant," but maybe it could also simply be the suggestion
>> that
>>> receives a click from the user?
>>> 
>>> Would love to hear from the hive mind!
>>> 
>>> Best,
>>> Audrey
>>> 
>>> --
>>> 
>>> 
>>> 
>> 
>>--
>>--
>>Regards,
>> 
>>*Paras Lehana* [65871]
>>Development Engineer, *Auto-Suggest*,
>>IndiaMART InterMESH Ltd,
>> 
>>11th Floor, Tower 2, Assotech Business Cresterra,
>>Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>> 
>>Mob.: +91-9560911996
>>Work: 0120-4056700 | Extn:
>>*11096*
>> 
>>--
>>*
>>*
>> 
>> <
>> 

Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Paras Lehana
Hey Audrey,

I assume MRR is about the ranking of the intended suggestion. For this, no
human judgement is required. We track position selection - the position
(1-10) of the selected suggestion. For example, this is our recent numbers:

Position 1 Selected (B3) 107,699
Position 2 Selected (B4) 58,736
Position 3 Selected (B5) 23,507
Position 4 Selected (B6) 12,250
Position 5 Selected (B7) 7,980
Position 6 Selected (B8) 5,653
Position 7 Selected (B9) 4,193
Position 8 Selected (B10) 3,511
Position 9 Selected (B11) 2,997
Position 10 Selected (B12) 2,428
*Total Selections (B13)* *228,954*
MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%

Refer here for MRR calculation keeping Auto-Suggest in perspective:
https://medium.com/@dtunkelang/evaluating-search-measuring-searcher-behavior-5f8347619eb0

"In practice, this is inverted to obtain the reciprocal rank, e.g., if the
searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
of these reciprocal ranks is called the mean reciprocal rank (MRR)."

nDCG may require human intervention. Please let me know in case I have not
understood your question properly. :)



On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Hi Paras,
>
> This is SO helpful, thank you. Quick question about your MRR metric -- do
> you have binary human judgements for your suggestions? If no, how do you
> label suggestions successful or not?
>
> Best,
> Audrey
>
> On 2/24/20, 2:27 AM, "Paras Lehana"  wrote:
>
> Hi Audrey,
>
> I work for Auto-Suggest at IndiaMART. Although we don't use the
> Suggester
> component, I think you need evaluation metrics for Auto-Suggest as a
> business product and not specifically for Solr Suggester which is the
> backend. We use edismax parser with EdgeNGrams Tokenization.
>
> Every week, as the property owner, I report around 500 metrics. I would
> like to mention a few of those:
>
>1. MRR (Mean Reciprocal Rate): How high the user selection was
> among the
>returned result. Ranges from 0 to 1, the higher the better.
>2. APL (Average Prefix Length): Prefix is the query by user. Lesser
> the
>better. This reports how less an average user has to type for
> getting the
>intended suggestion.
>3. Acceptance Rate or Selection: How many of the total searches are
>being served from Auto-Suggest. We are around 50%.
>4. Selection to Display Ratio: Did you make the user to click any
> of the
>suggestions if they are displayed?
>5. Response Time: How fast are you serving your average query.
>
>
> The Selection and Response Time are our main KPIs. We track a lot about
> Auto-Suggest usage on our platform which becomes apparent if you
> observe
> the URL after clicking a suggestion on dir.indiamart.com. However, not
> everything would benefit you. Do let me know for any related query or
> explanation. Hope this helps. :)
>
> On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
>  wrote:
>
> > Hi all,
> >
> > How do you all evaluate the success of your query autocomplete (i.e.
> > suggester) component if you use it?
> >
> > We cannot use MRR for various reasons (I can go into them if you're
> > interested), so we're thinking of using nDCG since we already use
> that for
> > relevance eval of our system as a whole. I am also interested in the
> metric
> > "success at top-k," but I can't find any research papers that
> explicitly
> > define "success" -- I am assuming it's a suggestion (or suggestions)
> > labeled "relevant," but maybe it could also simply be the suggestion
> that
> > receives a click from the user?
> >
> > Would love to hear from the hive mind!
> >
> > Best,
> > Audrey
> >
> > --
> >
> >
> >
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
>
> --
> *
> *
>
>  <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc=
> >
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Solr Cloud Question

2020-02-24 Thread Erick Erickson
Assuming that the Solr node you stop does not contain all of the replicas for 
any shard, there should really be no effect. I do strongly recommend that you 
stop the node gracefully if possible unless what you really want to test is 
when a node goes away mysteriously...

What’ll happen is that all of the replica on the downed node will be marked as 
“down”. When the node comes back, the replicas on it will re-sync with the 
current leader and start handling queries and updates again. There should be no 
loss of data or data inconsistencies.

Best,
Erick

> On Feb 24, 2020, at 4:40 PM, Kevin Sante  wrote:
> 
> Hello guys,
> 
> I need some help understanding the setup with solr cloud. I am a newbie to 
> solr and I have successfully set up solr cloud with some alarms on AWS.
> 
> I have a two solr nodes and 3 zookeeper nodes for my set up. I already have 
> data indexed on the nodes and I am able to query the data from my website.
> 
> The question I have is what impact it will have for me to stop one of the 
> solr cloud nodes and then restart it. I want to test if my alarms are right 
> or not.
> 
> Thank you
> 


Re: Reindex Required for Merge Policy Changes?

2020-02-24 Thread Erick Erickson
Thomas:
Yes, upgrading to 7.5+ will automagically take advantage of the improvements, 
eventually... No, you don’t have to reindex.

The “eventually” part. As you add, and particularly replace, existing 
documents, TMP will make decisions based on the new policy. If you’ve optimized 
in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when 
the number of deleted docs exceeds the threshold; I don’t remember what the 
exact number is. Point is it’ll recover from having an over-large segment over 
time and _eventually_ the largest segment will be < 5G.

Absent a previous optimize making a large segment, I’d just consider optimizing 
after you’ve upgraded. The TMP revisions respect the max segment size, so that 
should purge all deleted documents from your index without creating a too-large 
one. Thereafter the number of deleted docs should remain < about 33%. It only 
really approaches that percentage when you’re updating lots of existing docs.

Finally, expungeDeletes is less expensive than optimize because it doesn’t 
rewrite segments with 10% deleted docs so that’s an alternative to optimizing 
after upgrading.


Best,
Erick

> On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas  
> wrote:
> 
> Hi Folks –
> 
> Few questions before I tackled an upgrade here. Looking to go from 7.4 to 
> 7.7.2 to take advantage of the improved Tiered Merge Policy and segment 
> cleanup – we are dealing with some high (45%) deleted doc counts in a few 
> cores. Would simply upgrading Solr and setting the cores to use Lucene 7.7.2 
> take advantage of these features? Would I need to reindex to get existing 
> segments merged more efficiently? Does it depend on the size of my current 
> segments vs the configuration of the merge policy or would upgrading simply 
> allow solr to do its own thing help mitigate this issue?
> 
> Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 
> 8.0 defaults to legacy. Would I potentially need to make changes to my 
> existing configs to ensure they stay on Legacy replication?
> 
> Thanks much!
> TZ
> 
> 
> 


Reindex Required for Merge Policy Changes?

2020-02-24 Thread Zimmermann, Thomas
Hi Folks –

Few questions before I tackled an upgrade here. Looking to go from 7.4 to 7.7.2 
to take advantage of the improved Tiered Merge Policy and segment cleanup – we 
are dealing with some high (45%) deleted doc counts in a few cores. Would 
simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage 
of these features? Would I need to reindex to get existing segments merged more 
efficiently? Does it depend on the size of my current segments vs the 
configuration of the merge policy or would upgrading simply allow solr to do 
its own thing help mitigate this issue?

Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 8.0 
defaults to legacy. Would I potentially need to make changes to my existing 
configs to ensure they stay on Legacy replication?

Thanks much!
TZ





Solr Cloud Question

2020-02-24 Thread Kevin Sante
Hello guys,

I need some help understanding the setup with solr cloud. I am a newbie to solr 
and I have successfully set up solr cloud with some alarms on AWS.

I have a two solr nodes and 3 zookeeper nodes for my set up. I already have 
data indexed on the nodes and I am able to query the data from my website.

The question I have is what impact it will have for me to stop one of the solr 
cloud nodes and then restart it. I want to test if my alarms are right or not.

Thank you



Re: 5th solr node does not connect to zk host

2020-02-24 Thread Sotiris Fragkiskos
It was problem with /etc/resolv.conf in the end, unrelated to solr or zk.
Thanks Erick!


On Mon, Feb 24, 2020 at 3:20 PM Erick Erickson 
wrote:

> Your solr logs on that node will probably give you a better idea of what
> the root cause is.
>
> best,
> Erick
>
> > On Feb 24, 2020, at 8:18 AM, Sotiris Fragkiskos 
> wrote:
> >
> > Hi all!
> > I had a cluster with 2 solr nodes, and a single zk host and it all worked
> > fine.
> > Then we got 3 more VMs, so i configured them *in parallel* to the *exact*
> > same configuration, so that they can be added to the existing cluster.
> > 2 of them connected fine, and the third just refuses to.
> > I can see it in the solr cloud, but cpu and heap have empty "%" sings
> next
> > to them.
> > When doing service solr status, the error I get is
> >
> > ERROR: Failed to get system information from http://
> ***.***.**.**:8983/solr
> > due to: org.apache.solr.common.SolrException: Parse error : 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404
> > Problem accessing /solr/admin/info/system. Reason:
> > Not FoundCaused
> > by:javax.servlet.ServletException:
> > javax.servlet.UnavailableException: Error processing the request.
> > CoreContainer is either not initialized or shutting down.
> >
> > Any idea what this is?
> >
> > Thanks in advance!
> >
> > Sotiri
>


Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Paras,

This is SO helpful, thank you. Quick question about your MRR metric -- do you 
have binary human judgements for your suggestions? If no, how do you label 
suggestions successful or not?

Best,
Audrey

On 2/24/20, 2:27 AM, "Paras Lehana"  wrote:

Hi Audrey,

I work for Auto-Suggest at IndiaMART. Although we don't use the Suggester
component, I think you need evaluation metrics for Auto-Suggest as a
business product and not specifically for Solr Suggester which is the
backend. We use edismax parser with EdgeNGrams Tokenization.

Every week, as the property owner, I report around 500 metrics. I would
like to mention a few of those:

   1. MRR (Mean Reciprocal Rate): How high the user selection was among the
   returned result. Ranges from 0 to 1, the higher the better.
   2. APL (Average Prefix Length): Prefix is the query by user. Lesser the
   better. This reports how less an average user has to type for getting the
   intended suggestion.
   3. Acceptance Rate or Selection: How many of the total searches are
   being served from Auto-Suggest. We are around 50%.
   4. Selection to Display Ratio: Did you make the user to click any of the
   suggestions if they are displayed?
   5. Response Time: How fast are you serving your average query.


The Selection and Response Time are our main KPIs. We track a lot about
Auto-Suggest usage on our platform which becomes apparent if you observe
the URL after clicking a suggestion on dir.indiamart.com. However, not
everything would benefit you. Do let me know for any related query or
explanation. Hope this helps. :)

On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Hi all,
>
> How do you all evaluate the success of your query autocomplete (i.e.
> suggester) component if you use it?
>
> We cannot use MRR for various reasons (I can go into them if you're
> interested), so we're thinking of using nDCG since we already use that for
> relevance eval of our system as a whole. I am also interested in the 
metric
> "success at top-k," but I can't find any research papers that explicitly
> define "success" -- I am assuming it's a suggestion (or suggestions)
> labeled "relevant," but maybe it could also simply be the suggestion that
> receives a click from the user?
>
> Would love to hear from the hive mind!
>
> Best,
> Audrey
>
> --
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 





Re: 5th solr node does not connect to zk host

2020-02-24 Thread Erick Erickson
Your solr logs on that node will probably give you a better idea of what the 
root cause is.

best, 
Erick 

> On Feb 24, 2020, at 8:18 AM, Sotiris Fragkiskos  wrote:
> 
> Hi all!
> I had a cluster with 2 solr nodes, and a single zk host and it all worked
> fine.
> Then we got 3 more VMs, so i configured them *in parallel* to the *exact*
> same configuration, so that they can be added to the existing cluster.
> 2 of them connected fine, and the third just refuses to.
> I can see it in the solr cloud, but cpu and heap have empty "%" sings next
> to them.
> When doing service solr status, the error I get is
> 
> ERROR: Failed to get system information from http://***.***.**.**:8983/solr
> due to: org.apache.solr.common.SolrException: Parse error : 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404
> Problem accessing /solr/admin/info/system. Reason:
> Not FoundCaused
> by:javax.servlet.ServletException:
> javax.servlet.UnavailableException: Error processing the request.
> CoreContainer is either not initialized or shutting down.
> 
> Any idea what this is?
> 
> Thanks in advance!
> 
> Sotiri


Re: Missing methods in solr-common.jar

2020-02-24 Thread Sachin Divekar
Thanks, Jan. Yep, solr-common.jar was not required.

--
Sachin

On Mon, Feb 24, 2020 at 6:56 PM Jan Høydahl  wrote:

> solr-common is old. Try to include solr-core:8.4.1 in your project instead
> (from Maven)
>
> Check this example if you’re stuck:
> https://github.com/cominvent/solr-mapping-processor
>
> Jan
>
> > 24. feb. 2020 kl. 14:03 skrev Sachin Divekar :
> >
> > Hi,
> >
> > I am developing a custom update processor. I am using
> solr-common.jar:1.3.0
> > which I found on Maven.
> >
> > I am studying the code in Solr repo. I found there are many methods
> > available in src/java/org/apache/solr/common/SolrInputDocument.java which
> > are not available for me after importing solr-common.jar:1.3.0 in my
> > project. For example getValue(), setField(), etc.
> >
> > What am I missing? Can I download the latest solr-common.jar from any
> > official source? Or can I build it from Solr codebase? I checked
> build.xml but
> > did not find the task for building solr-common.jar.
> >
> > Thanks
> > Sachin
>
>


Re: Missing methods in solr-common.jar

2020-02-24 Thread Jan Høydahl
solr-common is old. Try to include solr-core:8.4.1 in your project instead 
(from Maven)

Check this example if you’re stuck: 
https://github.com/cominvent/solr-mapping-processor

Jan

> 24. feb. 2020 kl. 14:03 skrev Sachin Divekar :
> 
> Hi,
> 
> I am developing a custom update processor. I am using solr-common.jar:1.3.0
> which I found on Maven.
> 
> I am studying the code in Solr repo. I found there are many methods
> available in src/java/org/apache/solr/common/SolrInputDocument.java which
> are not available for me after importing solr-common.jar:1.3.0 in my
> project. For example getValue(), setField(), etc.
> 
> What am I missing? Can I download the latest solr-common.jar from any
> official source? Or can I build it from Solr codebase? I checked build.xml but
> did not find the task for building solr-common.jar.
> 
> Thanks
> Sachin



5th solr node does not connect to zk host

2020-02-24 Thread Sotiris Fragkiskos
Hi all!
I had a cluster with 2 solr nodes, and a single zk host and it all worked
fine.
Then we got 3 more VMs, so i configured them *in parallel* to the *exact*
same configuration, so that they can be added to the existing cluster.
2 of them connected fine, and the third just refuses to.
I can see it in the solr cloud, but cpu and heap have empty "%" sings next
to them.
When doing service solr status, the error I get is

ERROR: Failed to get system information from http://***.***.**.**:8983/solr
due to: org.apache.solr.common.SolrException: Parse error : 


Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/admin/info/system. Reason:
Not FoundCaused
by:javax.servlet.ServletException:
javax.servlet.UnavailableException: Error processing the request.
CoreContainer is either not initialized or shutting down.

Any idea what this is?

Thanks in advance!

Sotiri


Missing methods in solr-common.jar

2020-02-24 Thread Sachin Divekar
Hi,

I am developing a custom update processor. I am using solr-common.jar:1.3.0
which I found on Maven.

I am studying the code in Solr repo. I found there are many methods
available in src/java/org/apache/solr/common/SolrInputDocument.java which
are not available for me after importing solr-common.jar:1.3.0 in my
project. For example getValue(), setField(), etc.

What am I missing? Can I download the latest solr-common.jar from any
official source? Or can I build it from Solr codebase? I checked build.xml but
did not find the task for building solr-common.jar.

Thanks
Sachin


Re: Ordering in Nested Document

2020-02-24 Thread Mikhail Khludnev
You may try. Content-type should be absolutely the same across parents and
child-free. It may work now.
Earlier, mixing blocks and childfrees in one index wasn't supported.

On Mon, Feb 24, 2020 at 2:57 AM Gajendra Dadheech 
wrote:

> That extra s was intentional, should have added a better name.
>
> So ideally we shouldn't have childfree and blocks together while indexing?
> Or in the whole index they shouldn't be together, i.e. We should have
> atleast one child doc for all if any of doc has one?
>
> On Mon, Feb 24, 2020 at 4:24 PM Mikhail Khludnev  wrote:
>
> > Hello, Gajendra.
> > Pics doesn't come through mailing list.
> > May it caused by unnecessary s  *s*
> > parentDocument?
> > At least earlier mixing childfrees and blocks wasn't allowed, and caused
> > some troubles. Usually, child stub used to keep childfrees in the index.
> >
> > On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech 
> > wrote:
> >
> > > Hi
> > >
> > > i want to ingest below documents, where there is a mix of nested and
> > > un-nested documents:
> > > 
> > >   
> > >   5
> > >   5
> > >   5Solr adds block join support
> > >   sparentDocument
> > >   
> > >  
> > >   1
> > >   1
> > >   Solr adds block join support
> > >   parentDocument
> > >   
> > >   2
> > >   1
> > >   SolrCloud supports it too!
> > >   childDocument
> > >   
> > >   
> > >   
> > >   3
> > >   3
> > >   New Lucene and Solr release is out
> > >   parentDocument
> > >   
> > > 4
> > > 4
> > > Lots of new features
> > > childDocument
> > >   
> > >   
> > > 
> > >
> > >
> > > Output of block join query after ingesting above docs:
> > > [image: image.png]
> > >
> > > So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
> > > believ Id-5 should be a different document tree.
> > >
> > > Shall I Ingest them in some order ?
> > >
> > >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Auto-Suggest within Tier Architecture

2020-02-24 Thread Paras Lehana
Hi Brett,

We, at IndiaMART, have Solr installed behind PHP servers which are
behind Varnish servers.

Yes, you are right exposing Solr URL is not a good idea. A single service
in between would do the trick.

You can try our service at dir.indiamart.com. We have a client-side JS that
handles AJAX requests per keystroke. Do let me know for any other queries.
:)

On Mon, 3 Feb 2020 at 22:10, Moyer, Brett  wrote:

> Hello,
>
> Looking to see how others accomplished this goal. We have a 3 Tier
> architecture, Solr is down deep in T3 far from the end user. How do you
> make Auto-Suggest calls from the Internet Browser through the Tiers down to
> Solr in T3? We essentially created steps down each tier, but I'm looking to
> know what other approaches people have created. Did you put your solr in
> T1, I assume not, that would put it at risk. Thanks!
>
> Brett Moyer
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Exact search in Solr

2020-02-24 Thread Paras Lehana
Use fieldType string.

If you're using custom fieldType, "secret" would not match with "secrets"
unless you use appropriate analyzer (Stemmer, EdgeNGrams) but it may still
match with "secret something" if you're using StandardTokenizer or
something similar (use KeywordTokenizer).

On Tue, 4 Feb 2020 at 20:28, yeikel valdes  wrote:

> You can store a non alayzed version and copy it to an analyzed field.
>
>
> If you need full text search, you se the analyzed version. Otherwise use
> the non analyzed version.
>
>
> If you want to search both you could still do that and boost the non
> alayzed version if needed
>
>
>
>
>  On Tue, 04 Feb 2020 04:50:22 -0500 m...@apache.org wrote 
>
>
> Hello, Łukasz
> The later for sure.
>
> On Tue, Feb 4, 2020 at 12:44 PM Antczak, Lukasz
>  wrote:
>
> > Hi, Solr experts!
> >
> > I would like to learn from you if there is a better solution for doing
> > 'exact search' in Solr.
> > Exact search means no analysis for the text other then tokenization.
> Query
> > "secret" gives back only documents containing exactly "secret" not
> > "secrets", "secrection", etc. Text that needs to be searched is content
> of
> > some articles.
> >
> > Solution 1. - index whole text as string, use regex for searching.
> > Solution 2. - index text with just tokenization, no lowercase, stemming,
> > etc.
> >
> > Which solution will be faster? Any other clever ideas to be evaluated?
> >
> > Regards
> > Łukasz Antczak
> > --
> > *Łukasz Antczak*
> > Senior IT Professional
> > GS Data Frontiers Team 
> >
> > *Planned absences:*
> > *Roche Polska Sp. z o.o.*
> > ADMD Group Services - Business Intelligence Team
> > HQ: ul. Domaniewska 39B, 02-672 Warszawa
> > Office: ul. Abpa Baraniaka 88D, 61-131 Poznań
> >
> > Mobile: +48 519 515 010
> > mailto: lukasz.antc...@roche.com
> >
> > *Informacja o poufności: *Treść tej wiadomości zawiera informacje
> > przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej
> > adresatem, bądź otrzymaliście ją przez pomyłkę, prosimy o powiadomienie o
> > tym nadawcy oraz trwałe jej usunięcie. Wszelkie nieuprawnione
> > wykorzystanie informacji zawartych w tej wiadomości jest zabronione.
> >
> > *Confidentiality Note:* This message is intended only for the use of the
> > named recipient(s) and may contain confidential and/or proprietary
> > information. If you are not the intended recipient, please contact the
> > sender and delete this message. Any unauthorized use of the information
> > contained in this message is prohibited.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: ID is a required field in SolrSchema . But not found in DataConfig

2020-02-24 Thread Paras Lehana
Your schema describes id as a required field. You need to tell Solr which
field from the source (most probably, the primary key) will be the id field
in Solr index.

On Tue, 4 Feb 2020 at 23:15, Karl Stoney
 wrote:

> Hey all,
> I'm trying to use the DIH to copy from one collection to another, it
> appears to work (data gets copied) however I've noticed this in the logs:
>
> 17:39:58.167 [qtp1472216456-87] INFO
> org.apache.solr.handler.dataimport.config.DIHConfiguration - ID is a
> required field in SolrSchema . But not found in DataConfig
>
> I can't find the appropriate configuration to get rid of it.  Do I need to
> care?
>
> My config looks like this:
>
> 
> 
>  query="*:*"
>  batchSize="1000"
>  fl="*,old_version:_version_,old_lmake:L_MAKE,old_lmodel:L_MODEL"
>  wt="javabin"
>  url="http://127.0.0.1/solr/at-uk;>
>
> 
> 
>
> Cheers
> Karl
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Ordering in Nested Document

2020-02-24 Thread Gajendra Dadheech
That extra s was intentional, should have added a better name.

So ideally we shouldn't have childfree and blocks together while indexing?
Or in the whole index they shouldn't be together, i.e. We should have
atleast one child doc for all if any of doc has one?

On Mon, Feb 24, 2020 at 4:24 PM Mikhail Khludnev  wrote:

> Hello, Gajendra.
> Pics doesn't come through mailing list.
> May it caused by unnecessary s  *s*
> parentDocument?
> At least earlier mixing childfrees and blocks wasn't allowed, and caused
> some troubles. Usually, child stub used to keep childfrees in the index.
>
> On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech 
> wrote:
>
> > Hi
> >
> > i want to ingest below documents, where there is a mix of nested and
> > un-nested documents:
> > 
> >   
> >   5
> >   5
> >   5Solr adds block join support
> >   sparentDocument
> >   
> >  
> >   1
> >   1
> >   Solr adds block join support
> >   parentDocument
> >   
> >   2
> >   1
> >   SolrCloud supports it too!
> >   childDocument
> >   
> >   
> >   
> >   3
> >   3
> >   New Lucene and Solr release is out
> >   parentDocument
> >   
> > 4
> > 4
> > Lots of new features
> > childDocument
> >   
> >   
> > 
> >
> >
> > Output of block join query after ingesting above docs:
> > [image: image.png]
> >
> > So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
> > believ Id-5 should be a different document tree.
> >
> > Shall I Ingest them in some order ?
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Ordering in Nested Document

2020-02-24 Thread Mikhail Khludnev
Hello, Gajendra.
Pics doesn't come through mailing list.
May it caused by unnecessary s  *s*
parentDocument?
At least earlier mixing childfrees and blocks wasn't allowed, and caused
some troubles. Usually, child stub used to keep childfrees in the index.

On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech 
wrote:

> Hi
>
> i want to ingest below documents, where there is a mix of nested and
> un-nested documents:
> 
>   
>   5
>   5
>   5Solr adds block join support
>   sparentDocument
>   
>  
>   1
>   1
>   Solr adds block join support
>   parentDocument
>   
>   2
>   1
>   SolrCloud supports it too!
>   childDocument
>   
>   
>   
>   3
>   3
>   New Lucene and Solr release is out
>   parentDocument
>   
> 4
> 4
> Lots of new features
> childDocument
>   
>   
> 
>
>
> Output of block join query after ingesting above docs:
> [image: image.png]
>
> So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
> believ Id-5 should be a different document tree.
>
> Shall I Ingest them in some order ?
>
>

-- 
Sincerely yours
Mikhail Khludnev


Ordering in Nested Document

2020-02-24 Thread Gajendra Dadheech
Hi

i want to ingest below documents, where there is a mix of nested and
un-nested documents:

  
  5
  5
  5Solr adds block join support
  sparentDocument
  
 
  1
  1
  Solr adds block join support
  parentDocument
  
  2
  1
  SolrCloud supports it too!
  childDocument
  
  
  
  3
  3
  New Lucene and Solr release is out
  parentDocument
  
4
4
Lots of new features
childDocument
  
  



Output of block join query after ingesting above docs:
[image: image.png]

So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
believ Id-5 should be a different document tree.

Shall I Ingest them in some order ?


Re: How to monitor the performance of the SolrCloud cluster in real time

2020-02-24 Thread Adonis Ling
Hi Emir,

Thanks for your suggestion!

Cloud based monitoring solution doesn't satisfy the security requirements,
so we can't use it.


Emir Arnautović  于2020年2月24日周一 下午3:48写道:

> Hi Adonis,
> If you are up to 3rd party, cloud based monitoring solution, you can try
> our integration for Solr/SolrCloud: https://sematext.com/cloud/ <
> https://sematext.com/cloud/>
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Feb 2020, at 08:54, Adonis Ling  wrote:
> >
> > Hi team,
> >
> > Our team is using Solr as a complementary full text search service for
> our
> > NoSQL database and I'm building the monitor system for Solr.
> >
> > After I read the related section (Performance Statistics Reference) in
> > reference guide, I realized the requestTimes metrics are collected since
> > the Solr core was first created. Is it possible to monitor the requests
> > (count or latency) of a collection in real time?
> >
> > I think it should reset the related metrics periodically. Are there some
> > configurations to do this?
> >
> > --
> > Adonis
>
>

-- 
Best regards,
Adonis