Re: SolrCloud upgrade concern

2020-05-22 Thread gnandre
Thanks for this reply, Jason.

I am mostly worried about CDCR feature. I am relying heavily on it.
Although, I am planning to use Solr 8.3. It has been long time since CDCR
was first introduced. I wonder what is the state of CDCR is 8.3. Is it
stable now?

On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski  wrote:

> Hi Arnold,
>
> The stability and complexity issues Mark highlighted in his post
> aren't just imagined - there are real, sometimes serious, bugs in
> SolrCloud features.  But at the same time there are many many stable
> deployments out there where SolrCloud is a real success story for
> users.  Small example, I work at a company (Lucidworks) where our main
> product (Fusion) is built heavily on top of SolrCloud and we see it
> deployed successfully every day.
>
> In no way am I trying to minimize Mark's concerns (or David's).  There
> are stability bugs.  But the extent to which those need affect you
> depends a lot on what your deployment looks like.  How many nodes?
> How many collections?  How tightly are you trying to squeeze your
> hardware?  Is your network flaky?  Are you looking to use any of
> SolrCloud's newer, less stable features like CDCR, etc.?
>
> Is SolrCloud better for you than Master/Slave?  It depends on what
> you're hoping to gain by a move to SolrCloud, and on your answers to
> some of the questions above.  I would be leery of following any
> recommendations that are made without regard for your reason for
> switching or your deployment details.  Those things are always the
> biggest driver in terms of success.
>
> Good luck making your decision!
>
> Best,
>
> Jason
>


Re: Alternate Fields for Unified Highlighter

2020-05-22 Thread Furkan KAMACI
Hi David,

Thanks for the response! I use Unified Highlighter combined with
maxAnalyzedChars to accomplish my needs.

I'll file an issue and PR for it!

Kind Regards,
Furkan KAMACI

On Fri, May 22, 2020 at 11:25 PM David Smiley  wrote:

> Feel free to file an issue; I know it's not supported.  I also don't think
> it's a big deal because you can just ask Solr to return the
> "alternateField", thus letting the client side choose when to use that.  I
> suppose it might be large, so I can imagine a concern there.  It'd be nice
> if Solr had a DocTransformer to accomplish that.
>
> I know it's been awhile; I'm curious how the UH has been working for you,
> assuming you are using it.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Jun 2, 2019 at 6:47 AM Furkan KAMACI 
> wrote:
>
> > Hi All,
> >
> > I want to switch to Unified Highlighter due to performance reasons for my
> > Solr 7.6 I was using these fields
> >
> > solrQuery.addHighlightField("content_*")
> > .set("f.content_en.hl.alternateField", "content")
> > .set("f.content_es.hl.alternateField", "content")
> > .set("hl.useFastVectorHighlighter", "true");
> > .set("hl.maxAlternateFieldLength", 300);
> >
> > As far as I see, there is no definitions for alternate fields for unified
> > highlighter. How can I configure such a configuration?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
>


Trouble starting Solr on Windows/Ubuntu

2020-05-22 Thread Stavros Macrakis
I'm trying to follow the Solr Tutorial (
https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html#solr-tutorial).

Yesterday, "bin/solr start" worked fine -- I could see the status page on
http://localhost:8993 . I even created a test config server/solr/test1
through the Web interface.

Today, I'm getting an error message when I try to start Solr. This is from
an Ubuntu top-level shell (I previously tried a shell buffer within Emacs
under Ubuntu, which failed). I've rebooted Windows, and it still fails. See
transcript and version info below.

What am I doing wrong? -- and is solr-user the right place to ask newbie
questions like this?
(None of the env variables mentioned in the error message are defined.)

transcript
xxx:/mnt/c/solr-8.5.1$ bin/solr status -help

No Solr nodes are running.

xxx:/mnt/c/solr-8.5.1$ bin/solr start
Waiting up to 180 seconds to see Solr running on port 8983 [|]  bin/solr:
line 669:  8456 Aborted (core dumped) nohup "$JAVA"
"${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -Dsolr.log.muteconsole
"-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT
$SOLR_LOGS_DIR" -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
$SOLR_JETTY_ADDL_CONFIG > "$SOLR_LOGS_DIR/solr-$SOLR_PORT-console.log" 2>&1
 [/]  Still not seeing Solr listening on 8983 after 180 seconds!
tail: cannot open '/mnt/c/solr-8.5.1/server/logs/solr.log' for reading: No
such file or directory

xxx:/mnt/c/solr-8.5.1$ echo foo > /mnt/c/solr-8.5.1/server/logs/solr.log
xxx:/mnt/c/solr-8.5.1$ cat /mnt/c/solr-8.5.1/server/logs/solr.log
foo   <<< log file is writeable

versions 

xxx:/mnt/c/solr-8.5.1$ uname -a
Linux DESKTOP-M6LDB7Q 4.4.0-18362-Microsoft #836-Microsoft Mon May 05
16:04:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux
xxx:/mnt/c/solr-8.5.1$ which java
/usr/bin/java
xxx:/mnt/c/solr-8.5.1$ java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed
mode, sharing)
xxx:/mnt/c/solr-8.5.1$ bin/solr -version
8.5.1


Re: Creating custom PassageFormatter

2020-05-22 Thread David Smiley
You've probably gotten you answer now but "no".  Basically, you'd need to
specify your own subclass of UnifiedSolrHighlighter in solrconfig.xml like
this:


   Error loading class 'solr.highlight.CustomPassageFormatter'".
>
> Example from solrconfig.xml:
>  class="solr.highlight.CustomPassageFormatter">
> 
>
> I'm asking if this is still the right way? Is the "formatter" tag in XML
> valid option for Unified Highlighter?
>
> Thank you.
>
> Kind regards,
>   Damjan
>


Re: hl.preserveMulti in Unified highlighter?

2020-05-22 Thread David Smiley
Hi Walter,

No, the UnifiedHighlighter does not behave as if this setting were true.

The docs say:

`hl.preserveMulti`::
If `true`, multi-valued fields will return all values in the order they
were saved in the index. If `false`, the default, only values that match
the highlight request will be returned.


The first sentence there is the essence of it.  Notice it's not conditional
on wether there are highlights or not.  The UH won't return values lacking
a highlight. Even hl.defaultSummary isn't triggered because *some* of the
values have a highlight.

As I look at the pertinent code right now, I imagine a solution would be to
provide a custom PassageFormatter.  If we can assume for this use-case that
you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
could basically ignore the passage starts & ends and merely mark up the
original content in entirety, which is a null concatenated sequence of all
the values for this field for a document.

~ David


On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood 
wrote:

> We are testing 6.6.1.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 29, 2019, at 11:02 AM, Walter Underwood 
> wrote:
> >
> > In testing, hl.preserveMulti=true works with the unified highlighter.
> But the documentation says that the parameter is only implemented in the
> original highlighter.
> >
> > Is the documentation wrong? Can we trust this to keep working with
> unified?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Mar 26, 2019, at 12:08 PM, Walter Underwood 
> wrote:
> >>
> >> It looks like hl.preserveMulti is only implemented in the Original
> highlighter. Has anyone looked at doing this for the Unified highlighter?
> >>
> >> We need to preserve order in the highlights for a multi-valued field.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org 
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >
>
>


Re: Alternate Fields for Unified Highlighter

2020-05-22 Thread David Smiley
Feel free to file an issue; I know it's not supported.  I also don't think
it's a big deal because you can just ask Solr to return the
"alternateField", thus letting the client side choose when to use that.  I
suppose it might be large, so I can imagine a concern there.  It'd be nice
if Solr had a DocTransformer to accomplish that.

I know it's been awhile; I'm curious how the UH has been working for you,
assuming you are using it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 2, 2019 at 6:47 AM Furkan KAMACI  wrote:

> Hi All,
>
> I want to switch to Unified Highlighter due to performance reasons for my
> Solr 7.6 I was using these fields
>
> solrQuery.addHighlightField("content_*")
> .set("f.content_en.hl.alternateField", "content")
> .set("f.content_es.hl.alternateField", "content")
> .set("hl.useFastVectorHighlighter", "true");
> .set("hl.maxAlternateFieldLength", 300);
>
> As far as I see, there is no definitions for alternate fields for unified
> highlighter. How can I configure such a configuration?
>
> Kind Regards,
> Furkan KAMACI
>


Re: unified highlighter methods works unexpected

2020-05-22 Thread David Smiley
Hi Roland,

I was not able to reproduce this.  I modified the tech_products same config
to change the name field to use a new field type that had a trivial
edgengram config.  Then I composed this query based. alittle on some of
your parameters, and it did find highlights:
http://localhost:8983/solr/techproducts/select?defType=edismax=id%2Cname=name=unified=on=3%3C74%25=%22hard%20dri%22=name%20text=true=0.1

If you could help me in telling me reproducibility instructions with
tech_products, then I can help diagnose the underlying problem and possibly
fix.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Apr 2, 2020 at 9:02 AM Szűcs Roland 
wrote:

> Hi All,
>
> I use Solr 8.4.1 and implement suggester functionality. As part of the
> suggestions I would like to show product info so I had to implement this
> functionality with normal query parsers instead of suggester component. I
> applied an edgengramm filter without stemming to fasten the analysis of the
> query which is crucial for the suggester functionality.
> I could use the Highlight component with edismax query parser without any
> problem. This is a typical output if hl.method=original (this is the
> default):
> { "responseHeader":{ "status":0, "QTime":4, "params":{ "mm":"3<74%",
> "q":"Arany
> Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
> ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
> "pf":"author_ngram^15
> title_ngram^30", "hl.fl":"title", "hl.method":"original", "_":
> "1585830768672"}}, "response":{"numFound":2,"start":0,"docs":[ {
> "id":"369",
> "title":"Arany János összes költeményei", "price":185.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321",
> "title":"Arany
> János összes költeményei", "price":1400.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{
> "369":{
> "title":["\n \n Arany\n \n János összes költeményei"]}, "
> 26321":{ "title":["\n \n Arany\n \n János összes
> költeményei"]}}}
>
> If I change the method to unified, I get unexpected result:
> { "responseHeader":{ "status":0, "QTime":5, "params":{ "mm":"3<74%",
> "q":"Arany
> Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
> ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
> "pf":"author_ngram^15
> title_ngram^30", "hl.fl":"title", "hl.method":"unified",
> "_":"1585830768672"
> }}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369",
> "title":"Arany
> János összes költeményei", "price":185.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321",
> "title":"Arany
> János összes költeményei", "price":1400.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{
> "369":{
> "title":[]}, "26321":{ "title":[]}}}
>
> Any idea why the newest method fails to deliver the same results?
>
> Thanks,
> Roland
>


Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2020-05-22 Thread David Smiley
FWIW I tried this on the techproducts schema with a modification to the
name field, but did not see the issue.

I suspect you did not re-index after making these schema changes.  If you
did, then also check that the collection (or core) truly started fresh
(never had any previous schema) because if you tried it one way then merely
deleted/replaced the documents after changing the schema, then some
internal metadata in the underlying index data tends to persist.  I suspect
some of the options flipped here might stay sticky.

If that really isn't it, then you might suggest to me exactly how to
reproduce this from what Solr ships with, like the techproducts example
schema and dataset.

~ David


On Sun, Jul 21, 2019 at 10:07 PM Richard Walker 
wrote:

> On 22 Jul 2019, at 11:32 am, Richard Walker 
> wrote:
> > I'm trying out the advice in the user guide
> > (
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
> )
> > for using the unified highlighter.
> >
> > ...
> > * "set storeOffsetsWithPositions to true"
> > * "set termVectors to true but no other term vector
> >  related options on the field being highlighted"
> ...
>
> I completely forgot to mention that I also tried _just_:
>
> > * "set storeOffsetsWithPositions to true"
>
> i.e., without _also_ setting termVectors, and this _doesn't_
> give the exception.
>
> So it seems to be the _combination_ of:
> * unified highlighter
> * storeOffsetsWithPositions
> * termVectors
>
> that seems to be giving the exception.
>
>


[explain style=html] and group=true

2020-05-22 Thread Vincenzo D'Amore
Hi all,

I noticed that the field [explain style=html] has been removed from my
results when I started to group.

Do you know if there is a way to have back the explain even if I'm grouping?

Best regards,
Vincenzo

-- 
Vincenzo D'Amore


Re: Indexing huge data onto solr

2020-05-22 Thread matthew sporleder
I can index (without nested entities ofc ;) ) 100M records in about
6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql
so it is probably worth looking at why it is going slow before writing
your own indexer (which we are finally having to do)

On Fri, May 22, 2020 at 1:22 PM Erick Erickson  wrote:
>
> You have a lot more control over the speed and form of importing data if
> you just do the initial load in SolrJ. Here’s an example, taking the Tika
> parts out is easy:
>
> https://lucidworks.com/post/indexing-with-solrj/
>
> It’s especially instructive to comment out just the call to 
> CloudSolrClient.add(doclist…); If
> that _still_ takes a long time, then your DB query is the root of the 
> problem. Even with 100M
> records, I’d be really surprised if Solr is the bottleneck, but the above 
> test will tell you
> where to go to try to speed things up.
>
> Best,
> Erick
>
> > On May 22, 2020, at 12:39 PM, Srinivas Kashyap 
> >  wrote:
> >
> > Hi All,
> >
> > We are runnnig solr 8.4.1. We have a database table which has more than 100 
> > million of records. Till now we were using DIH to do full-import on the 
> > tables. But for this table, when we do full-import via DIH it is taking 
> > more than 3-4 days to complete and also it consumes fair bit of JVM memory 
> > while running.
> >
> > Are there any speedier/alternates ways to load data onto this solr core.
> >
> > P.S: Only initial data import is problem, further updates/additions to this 
> > core is being done through SolrJ.
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender immediately 
> > by replying to the e-mail, and then delete it without making copies or 
> > using it in any way.
> > No representation is made that this email or any attachments are free of 
> > viruses. Virus scanning is recommended and is the responsibility of the 
> > recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is 
> > confidential. It is intended solely for use by the recipient and others 
> > authorized to receive it. If you are not the recipient, you are hereby 
> > notified that any disclosure, copying, distribution or taking action in 
> > relation of the contents of this information is strictly prohibited and may 
> > be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been 
> > automatically archived by Mimecast Ltd, an innovator in Software as a 
> > Service (SaaS) for business. Providing a safer and more useful place for 
> > your human generated data. Specializing in; Security, archiving and 
> > compliance. To find out more visit the Mimecast website.
>


Re: Indexing huge data onto solr

2020-05-22 Thread Erick Erickson
You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:

https://lucidworks.com/post/indexing-with-solrj/

It’s especially instructive to comment out just the call to 
CloudSolrClient.add(doclist…); If
that _still_ takes a long time, then your DB query is the root of the problem. 
Even with 100M
records, I’d be really surprised if Solr is the bottleneck, but the above test 
will tell you
where to go to try to speed things up.

Best,
Erick

> On May 22, 2020, at 12:39 PM, Srinivas Kashyap 
>  wrote:
> 
> Hi All,
> 
> We are runnnig solr 8.4.1. We have a database table which has more than 100 
> million of records. Till now we were using DIH to do full-import on the 
> tables. But for this table, when we do full-import via DIH it is taking more 
> than 3-4 days to complete and also it consumes fair bit of JVM memory while 
> running.
> 
> Are there any speedier/alternates ways to load data onto this solr core.
> 
> P.S: Only initial data import is problem, further updates/additions to this 
> core is being done through SolrJ.
> 
> Thanks,
> Srinivas
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.
> 
> Disclaimer
> 
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
> 
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more visit the Mimecast website.



Re: Highlighting Solr 8

2020-05-22 Thread David Smiley
What did you end up doing, Eric?  Did you migrate to the Unified
Highlighter?
~ David


On Wed, Oct 16, 2019 at 4:36 PM Eric Allen 
wrote:

> Thanks for the reply.
>
> Currently we are migrating from solr4 to solr8 under solr 4 we wrote our
> own highlighter because the provided one was too slow for our documents.
>
> We deal with many large documents, but we have full term vectors already.
> So as I understand it from my reading of the code the unified highlighter
> should be fast even on these large documents.
>
> The concern about alternate fields was if the highlighter was slow we
> could just return highlights from one field if they existed and if not then
> highlight the other fields.
>
> From my research I'm leaning towards returning highlights from all the
> fields we are interested in because I feel it will be fast.
>
> Eric Allen - Software Devloper, NetDocuments
> eric.al...@netdocuments.com | O: 801.989.9691 | C: 801.989.9691
>
> -Original Message-
> From: sasarun 
> Sent: Wednesday, October 16, 2019 2:45 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting Solr 8
>
> Hi Eric,
>
> Unified highlighter does not have an option to provide alternate field
> when highlighting. That option is available with Orginal and fast vector
> highlighter. As indicated in the Solr documentation, Unified is the
> recommended method for highlighting to meet most of the use cases. Please
> do share more details in case you are facing any specific issue with
> highlighting.
>
> Thanks,
>
> Arun
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Indexing huge data onto solr

2020-05-22 Thread Srinivas Kashyap
Hi All,

We are runnnig solr 8.4.1. We have a database table which has more than 100 
million of records. Till now we were using DIH to do full-import on the tables. 
But for this table, when we do full-import via DIH it is taking more than 3-4 
days to complete and also it consumes fair bit of JVM memory while running.

Are there any speedier/alternates ways to load data onto this solr core.

P.S: Only initial data import is problem, further updates/additions to this 
core is being done through SolrJ.

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: Unbalanced shard requests

2020-05-22 Thread Wei
Hi Michael,

I also verified the patch in SOLR-14471 with 8.4.1 and it fixed the issue
with shards.preference=replica.location:local,replica.type:TLOG in my
setting.  Thanks!

Wei

On Thu, May 21, 2020 at 12:09 PM Phill Campbell
 wrote:

> Yes, JVM heap settings.
>
> > On May 19, 2020, at 10:59 AM, Wei  wrote:
> >
> > Hi Phill,
> >
> > What is the RAM config you are referring to, JVM size? How is that
> related
> > to the load balancing, if each node has the same configuration?
> >
> > Thanks,
> > Wei
> >
> > On Mon, May 18, 2020 at 3:07 PM Phill Campbell
> >  wrote:
> >
> >> In my previous report I was configured to use as much RAM as possible.
> >> With that configuration it seemed it was not load balancing.
> >> So, I reconfigured and redeployed to use 1/4 the RAM. What a difference
> >> for the better!
> >>
> >> 10.156.112.50   load average: 13.52, 10.56, 6.46
> >> 10.156.116.34   load average: 11.23, 12.35, 9.63
> >> 10.156.122.13   load average: 10.29, 12.40, 9.69
> >>
> >> Very nice.
> >> My tool that tests records RPS. In the “bad” configuration it was less
> >> than 1 RPS.
> >> NOW it is showing 21 RPS.
> >>
> >>
> >>
> http://10.156.112.50:10002/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >> <
> >>
> http://10.156.112.50:10002/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >>>
> >> {
> >>  "responseHeader":{
> >>"status":0,
> >>"QTime":161},
> >>  "metrics":{
> >>"solr.core.BTS.shard1.replica_n2":{
> >>  "QUERY./select.requestTimes":{
> >>"count":5723,
> >>"meanRate":6.8163888639859085,
> >>"1minRate":11.557013215119536,
> >>"5minRate":8.760356217628159,
> >>"15minRate":4.707624230995833,
> >>"min_ms":0.131545,
> >>"max_ms":388.710848,
> >>"mean_ms":30.300492048215947,
> >>"median_ms":6.336654,
> >>"stddev_ms":51.527164088667035,
> >>"p75_ms":35.427943,
> >>"p95_ms":140.025957,
> >>"p99_ms":230.533099,
> >>"p999_ms":388.710848
> >>
> >>
> >>
> >>
> http://10.156.122.13:10004/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >> <
> >>
> http://10.156.122.13:10004/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >>>
> >> {
> >>  "responseHeader":{
> >>"status":0,
> >>"QTime":11},
> >>  "metrics":{
> >>"solr.core.BTS.shard2.replica_n8":{
> >>  "QUERY./select.requestTimes":{
> >>"count":6469,
> >>"meanRate":7.502581801189549,
> >>"1minRate":12.211423085368564,
> >>"5minRate":9.445681397767322,
> >>"15minRate":5.216209798637846,
> >>"min_ms":0.154691,
> >>"max_ms":701.657394,
> >>"mean_ms":34.2734699171445,
> >>"median_ms":5.640378,
> >>"stddev_ms":62.27649205954566,
> >>"p75_ms":39.016371,
> >>"p95_ms":156.997982,
> >>"p99_ms":288.883028,
> >>"p999_ms":538.368031
> >>
> >>
> >>
> http://10.156.116.34:10002/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >> <
> >>
> http://10.156.116.34:10002/solr/admin/metrics?group=core=QUERY./select.requestTimes
> >>>
> >> {
> >>  "responseHeader":{
> >>"status":0,
> >>"QTime":67},
> >>  "metrics":{
> >>"solr.core.BTS.shard3.replica_n16":{
> >>  "QUERY./select.requestTimes":{
> >>"count":7109,
> >>"meanRate":7.787524673806184,
> >>"1minRate":11.88519763582083,
> >>"5minRate":9.893315557386755,
> >>"15minRate":5.620178363676527,
> >>"min_ms":0.150887,
> >>"max_ms":472.826462,
> >>"mean_ms":32.184282366621204,
> >>"median_ms":6.977733,
> >>"stddev_ms":55.729908615189196,
> >>"p75_ms":36.655011,
> >>"p95_ms":151.12627,
> >>"p99_ms":251.440162,
> >>"p999_ms":472.826462
> >>
> >>
> >> Compare that to the previous report and you can see the improvement.
> >> So, note to myself. Figure out the sweet spot for RAM usage. Use too
> much
> >> and strange behavior is noticed. While using too much all the load
> focused
> >> on one box and query times slowed.
> >> I did not see any OOM errors during any of this.
> >>
> >> Regards
> >>
> >>
> >>
> >>> On May 18, 2020, at 3:23 PM, Phill Campbell
> >>  wrote:
> >>>
> >>> I have been testing 8.5.2 and it looks like the load has moved but is
> >> still on one machine.
> >>>
> >>> Setup:
> >>> 3 physical machines.
> >>> Each machine hosts 8 instances of Solr.
> >>> Each instance of Solr hosts one replica.
> >>>
> >>> Another way to say it:
> >>> Number of shards = 8. Replication factor = 3.
> >>>
> >>> Here is the cluster state. You can see that the leaders are well
> >> distributed.
> >>>
> >>> {"TEST_COLLECTION":{
> >>>   "pullReplicas":"0",
> >>>   "replicationFactor":"3",
> >>>   "shards":{
> >>> "shard1":{
> >>>   "range":"8000-9fff",
> >>>   "state":"active",
> >>>   "replicas":{
> >>> "core_node3":{
> >>>   

Re: Need help on handling large size of index.

2020-05-22 Thread Phill Campbell
Maybe your problems are in AWS land.


> On May 22, 2020, at 3:45 AM, Modassar Ather  wrote:
> 
> Thanks Erick and Phill.
> 
> We index data weekly once and that is why we do the optimisation and it has
> helped in faster query result. I will experiment with a fewer segments with
> the current hardware.
> The thing I am not  clear about is although there is no constant high usage
> of extra IOPs other than a couple of spike during optimisation why there is
> so much difference in optimisation time when there is extra IOPs vs no
> Extra IOPs.
> The optimisation on different datacenter machine which was of same
> configuration with SSD used to take 4-5 hours to optimise. This time to
> optimise is comparable to r5a.16xlarge with extra 3 IOPs time.
> 
> Best,
> Modassar
> 
> On Fri, May 22, 2020 at 12:56 AM Phill Campbell
>  wrote:
> 
>> The optimal size for a shard of the index is be definition what works best
>> on the hardware with the JVM heap that is in use.
>> More shards mean smaller sizes of the index for the shard as you already
>> know.
>> 
>> I spent months changing the sharing, the JVM heap, the GC values before
>> taking the system live.
>> RAM is important, and I run with enough to allow Solr to load the entire
>> index into RAM. From my understanding Solr uses the system to memory map
>> the index files. I might be wrong.
>> I experimented with less RAM and SSD drives and found that was another way
>> to get the performance I needed. Since RAM is cheaper, I choose that
>> approach.
>> 
>> Again we never optimize. When we have to recover we rebuild the index by
>> spinning up new machines and use a massive EMR (Map reduce job) to force
>> the data into the system. Takes about 3 hours. Solr can ingest data at an
>> amazing rate. Then we do a blue/green switch over.
>> 
>> Query time, from my experience with my environment, is improved with more
>> sharding and additional hardware. Not just more sharding on the same
>> hardware.
>> 
>> My fields are not stored either, except ID. There are some fields that are
>> indexed and have DocValues and those are used for sorting and facets. My
>> queries can have any number of wildcards as well, but my field’s data
>> lengths are maybe a maximum of 100 characters so proximity searching is not
>> too bad. I tokenize and index everything. I do not expand terms at query
>> time to get broader results, I index the alternatives and let the indexer
>> do what it does best.
>> 
>> If you are running in SolrCloud mode and you are using the embedded
>> zookeeper I would change that. Solr and ZK are very chatty with each other,
>> run ZK on machines in proximity to Solr.
>> 
>> Regards
>> 
>>> On May 21, 2020, at 2:46 AM, Modassar Ather 
>> wrote:
>>> 
>>> Thanks Phill for your response.
>>> 
>>> Optimal Index size: Depends on what you are optimizing for. Query Speed?
>>> Hardware utilization?
>>> We are optimising it for query speed. What I understand even if we set
>> the
>>> merge policy to any number the amount of hard disk will still be required
>>> for the bigger segment merges. Please correct me if I am wrong.
>>> 
>>> Optimizing the index is something I never do. We live with about 28%
>>> deletes. You should check your configuration for your merge policy.
>>> There is a delete of about 10-20% in our updates. We have no merge policy
>>> set in configuration as we do a full optimisation after the indexing.
>>> 
>>> Increased sharding has helped reduce query response time, but surely
>> there
>>> is a point where the colation of results starts to be the bottleneck.
>>> The query response time is my concern. I understand the aggregation of
>>> results may increase the search response time.
>>> 
>>> *What does your schema look like? I index around 120 fields per
>> document.*
>>> The schema has a combination of text and string fields. None of the field
>>> except Id field is stored. We also have around 120 fields. A few of them
>>> have docValues enabled.
>>> 
>>> *What does your queries look like? Mine are so varied that caching never
>>> helps, the same query rarely comes through.*
>>> Our search queries are combination of proximity, nested proximity and
>>> wildcards most of the time. The query can be very complex with 100s of
>>> wildcard and proximity terms in it. Different grouping option are also
>>> enabled on search result. And the search queries vary a lot.
>>> 
>>> Oh, another thing, are you concerned about  availability? Do you have a
>>> replication factor > 1? Do you run those replicas in a different region
>> for
>>> safety?
>>> How many zookeepers are you running and where are they?
>>> As of now we do not have any replication factor. We are not using
>> zookeeper
>>> ensemble but would like to move to it sooner.
>>> 
>>> Best,
>>> Modassar
>>> 
>>> On Thu, May 21, 2020 at 9:19 AM Shawn Heisey 
>> wrote:
>>> 
 On 5/20/2020 11:43 AM, Modassar Ather wrote:
> Can you please help me with following few questions?
> 
>   - What is 

Re: What is the logical order of applying sorts in SOLR?

2020-05-22 Thread Stephen Lewis Bianamara
> If you use sort, you are basically ignoring relevancy
That's correct -- this is for querying with a stable sort, not for natural
language. One place this comes up for example is with cursors
 --
cursors require the unique key be part of the sort.

> Do you see performance drop on non-clustered or clustered Solr? Because,
I would not be surprised if, for clustered node, all the results need to be
brought into one place to sort even if only 10 (of say 100) would be sent
back, where without sort, each node is asked for their "top X" matches and
others are never even sent
Good consideration here. The largest non clustered SOLR instance I have on
hand has on the order of 1M docs, but I'm also considering ones in the 1B
to 10B range. On this small one shard one I am not seeing the same loss of
performance, though it's so small in the first place that sorting 1M docs
should be snappy anyway.

It sounds like an interesting case to consider then would be to generate a
1B doc index on one node and two nodes, and see if the distinction is
isolated between these two cases. I would be happy to investigate that soon
and report back.

I think it would be good to return to the original question too -- is SOLR
applying the sort before applying filters? If I have an index with 1
billion docs, but a query which matches 100 docs, the sort should be
exceedingly cheap. Even if each shard is sorting up to 100 docs and sending
them back, if the filter is applied before the sort I would still expect
this to be extremely cheap.

On Wed, May 20, 2020 at 2:19 PM Alexandre Rafalovitch 
wrote:

> If you use sort, you are basically ignoring relevancy (unless you put
> that into sort). Which you seem to know as your example uses FQ.
>
> Do you see performance drop on non-clustered or clustered Solr?
> Because, I would not be surprised if, for clustered node, all the
> results need to be brought into one place to sort even if only 10 (of
> say 100) would be sent back, where without sort, each node is asked
> for their "top X" matches and others are never even sent. That would
> be my working theory anyway, I am not deep into milti-path mode the
> cluster code does.
>
> Regards,
>Alex.
>
> On Mon, 11 May 2020 at 15:16, Stephen Lewis Bianamara
>  wrote:
> >
> > Hi SOLR Community,
> >
> > What is the order of operations which SOLR applies to sorting? I've
> > observed many times and across SOLR versions that a restrictive filter
> with
> > a sort takes an extremely long time to return, suggesting to me that the
> > SORT is applied before the filter.
> >
> > An example situation is querying for fq:Foo=Bar vs querying for
> fq:Foo=Bar
> > sort by Id desc. I've observed over many SOLR versions and collections
> that
> > the former is orders of magnitude cheaper and quicker to respond, even
> when
> > the result set is tiny (10-100).
> >
> > Does anyone in this forum know whether this is the default behavior and
> > whether there is any way through the API or SOLR configuration to apply
> > sorts after filters?
> >
> > Thanks,
> > Stephen
>


Re: Unified highlighter- unable to get results - can get results with original and termvector highlighters

2020-05-22 Thread David Smiley
Hello,

Did you get it to work eventually?

Try setting hl.weightMatches=false and see if that helps.  Wether this
helps or not, I'd like to have a deeper understanding of the internal
structure of the Query (not the original query string).  What query parser
are you using?.  If you pass debug=query to Solr then you'll get a a parsed
version of the query that would be helpful to me.

~ David


On Mon, May 11, 2020 at 10:46 AM Warren, David [USA] 
wrote:

> I am running Solr 8.4 and am attempting to use its highlighting feature.
> It appears to work well when I use the original highlighter or the term
> vector highlighter, but when I try to use the unified highlighter, I get no
> results returned.  My Google searches so far have not revealed anybody
> having this same problem (perhaps user error on my part), hence why I’m
> asking a question to the Solr mailing list.
>
> I am running a query which searches the “title_text” field for a term and
> highlights it.
> The configuration for title_text is this:
>  multiValued="true" termVectors="true"/>
>
> The query looks like this:
>
> https://solr-server/index/c1/select?hl.fl=title_text=unified=true=
> title_text%3Azelda
>
> If hl.method=original or hl.method=termvector, I get back results in the
> highlighting section with “Zelda” surrounded by  tags.
> If hl.method=unified, all results in the highlighting section are blank.
>
> I’ve attached a remote debugger to my Solr server and verified that the
> unified highlighter class
> (org/apache/solr/highlight/UnifiedSolrHighlighter.java) is being invoked
> when I set hl.method=unified.  And I do not see any errors in the Solr logs.
>
> Any idea what I’m doing wrong? In looking at the Solr highlighting
> documentation, I didn’t see any additional configuration which needs to be
> done to get the unified highlighter to work.
>
> I realize I have not provided a bunch of information here, but obviously
> can provide more if needed.
>
> Thank you,
> David Warren
> Booz | Allen | Hamilton
> 703-625-0311 mobile
>
>


Re: +(-...) vs +(*:* -...) vs -(+...)

2020-05-22 Thread Bram Van Dam
Additional reading: https://lucidworks.com/post/why-not-and-or-and-not/

Assuming implicit AND, we perform the following rewrite on strictly
negative queries:

-f:a -> -f:a *:*

Isn't search fun? :-)

 - Bram


On 21/05/2020 20:51, Houston Putman wrote:
> Jochen,
> 
> For the standard query parser, pure negative queries (no positive query in
> front of it, such as "*:*") are only allowed as a top level clause, so not
> nested within parenthesis.
> 
> Check the second bullet point of the this section of the Ref Guide page for
> the Standard Query Parser.
> 
> 
> For the edismax query parser, pure negative queries are allowed to be
> nested within parenthesis. Docs can be found in the Ref Guide page for the
> eDismax Query Parser.
> 
> 
> - Houston
> 
> On Thu, May 21, 2020 at 2:25 PM Jochen Barth 
> wrote:
> 
>> Dear reader,
>>
>> why does +(-x_ss:y) finds 0 docs,
>>
>> while -(+x_ss:y) finds many docs?
>>
>> Ok... +(*:* -x_ss:y) works, too, but I'm a bit surprised.
>>
>> Kind regards, J. Barth
>>
>>
> 



Terms faceting and EnumField

2020-05-22 Thread Poornima Ponnuswamy
Hello,

We have solr 6.6 version.
Below is the field and field type that is defined in solr schema.



Below is the configuration for the enum
   

  servicerequestcorrective
  servicerequestplanned
  servicerequestinstallationandupgrade
  servicerequestrecall
  servicerequestother
  servicerequestinquiry
  servicerequestproactive
  servicerequestsystemupdate
  servicerequesticenteradmin
  servicerequestonwatch
  servicerequestfmi
  servicerequestapplication
   

When I try to invoke using the below request,

http://localhost:8983/solr/activity01us/select?={ServiceRequestTypeCode:{type:terms,
 field:ServiceRequestTypeCode, limit:10}}=on=on=json=*

 I am getting error
"Expected numeric field type 
:ServiceRequestTypeCode{type=ServiceRequestTypeCode,properties=indexed,stored,omitNorms,omitTermFreqAndPositions}"

But when I try to do as below it works fine.


http://localhost:8983/solr/activity01us/select?facet.field=ServiceRequestTypeCode=on=on=*:*=json

I would like to use json facet as it would help me in subfaceting.

Any help would be appreciated




Re: Need help on handling large size of index.

2020-05-22 Thread Modassar Ather
Thanks Erick and Phill.

We index data weekly once and that is why we do the optimisation and it has
helped in faster query result. I will experiment with a fewer segments with
the current hardware.
The thing I am not  clear about is although there is no constant high usage
of extra IOPs other than a couple of spike during optimisation why there is
so much difference in optimisation time when there is extra IOPs vs no
Extra IOPs.
The optimisation on different datacenter machine which was of same
configuration with SSD used to take 4-5 hours to optimise. This time to
optimise is comparable to r5a.16xlarge with extra 3 IOPs time.

Best,
Modassar

On Fri, May 22, 2020 at 12:56 AM Phill Campbell
 wrote:

> The optimal size for a shard of the index is be definition what works best
> on the hardware with the JVM heap that is in use.
> More shards mean smaller sizes of the index for the shard as you already
> know.
>
> I spent months changing the sharing, the JVM heap, the GC values before
> taking the system live.
> RAM is important, and I run with enough to allow Solr to load the entire
> index into RAM. From my understanding Solr uses the system to memory map
> the index files. I might be wrong.
> I experimented with less RAM and SSD drives and found that was another way
> to get the performance I needed. Since RAM is cheaper, I choose that
> approach.
>
> Again we never optimize. When we have to recover we rebuild the index by
> spinning up new machines and use a massive EMR (Map reduce job) to force
> the data into the system. Takes about 3 hours. Solr can ingest data at an
> amazing rate. Then we do a blue/green switch over.
>
> Query time, from my experience with my environment, is improved with more
> sharding and additional hardware. Not just more sharding on the same
> hardware.
>
> My fields are not stored either, except ID. There are some fields that are
> indexed and have DocValues and those are used for sorting and facets. My
> queries can have any number of wildcards as well, but my field’s data
> lengths are maybe a maximum of 100 characters so proximity searching is not
> too bad. I tokenize and index everything. I do not expand terms at query
> time to get broader results, I index the alternatives and let the indexer
> do what it does best.
>
> If you are running in SolrCloud mode and you are using the embedded
> zookeeper I would change that. Solr and ZK are very chatty with each other,
> run ZK on machines in proximity to Solr.
>
> Regards
>
> > On May 21, 2020, at 2:46 AM, Modassar Ather 
> wrote:
> >
> > Thanks Phill for your response.
> >
> > Optimal Index size: Depends on what you are optimizing for. Query Speed?
> > Hardware utilization?
> > We are optimising it for query speed. What I understand even if we set
> the
> > merge policy to any number the amount of hard disk will still be required
> > for the bigger segment merges. Please correct me if I am wrong.
> >
> > Optimizing the index is something I never do. We live with about 28%
> > deletes. You should check your configuration for your merge policy.
> > There is a delete of about 10-20% in our updates. We have no merge policy
> > set in configuration as we do a full optimisation after the indexing.
> >
> > Increased sharding has helped reduce query response time, but surely
> there
> > is a point where the colation of results starts to be the bottleneck.
> > The query response time is my concern. I understand the aggregation of
> > results may increase the search response time.
> >
> > *What does your schema look like? I index around 120 fields per
> document.*
> > The schema has a combination of text and string fields. None of the field
> > except Id field is stored. We also have around 120 fields. A few of them
> > have docValues enabled.
> >
> > *What does your queries look like? Mine are so varied that caching never
> > helps, the same query rarely comes through.*
> > Our search queries are combination of proximity, nested proximity and
> > wildcards most of the time. The query can be very complex with 100s of
> > wildcard and proximity terms in it. Different grouping option are also
> > enabled on search result. And the search queries vary a lot.
> >
> > Oh, another thing, are you concerned about  availability? Do you have a
> > replication factor > 1? Do you run those replicas in a different region
> for
> > safety?
> > How many zookeepers are you running and where are they?
> > As of now we do not have any replication factor. We are not using
> zookeeper
> > ensemble but would like to move to it sooner.
> >
> > Best,
> > Modassar
> >
> > On Thu, May 21, 2020 at 9:19 AM Shawn Heisey 
> wrote:
> >
> >> On 5/20/2020 11:43 AM, Modassar Ather wrote:
> >>> Can you please help me with following few questions?
> >>>
> >>>- What is the ideal index size per shard?
> >>
> >> We have no way of knowing that.  A size that works well for one index
> >> use case may not work well for another, even if the index size in both
> >> 

Re: Solr Atomic update change value and field name

2020-05-22 Thread Hup Chen

> Try adding -format solr to your bin/post command. By default the post command 
> will treat input as arbitrary json, not solr-format json.
Yes, it works!  Thanks a lot!

From: Jan Høydahl 
Sent: Friday, May 22, 2020 4:46 AM
To: solr-user@lucene.apache.org 
Subject: Re: Solr Atomic update change value and field name

Try adding -format solr to your bin/post command. By default the post command 
will treat input as arbitrary json, not solr-format json.

Jan Høydahl

> 21. mai 2020 kl. 02:50 skrev Hup Chen :
>
> I am new to Solr. I tried to do Atomic update by using .json file update. 
> $SOLR/bin/post not only changing field values, but field name also has become 
> "fieldname.set", for instance, "price" become "price.set".  Update by curl 
> /update handler was working well but since I have several millions of 
> records, I can't update by calling curl several million times, that will be 
> extremely slow.
>
> Any help will be appreciated.
>
>
># /usr/local/solr/bin/solr version
>8.5.1
>
># curl http://localhost:8983/solr/books/select?q=id%3A0371558727
>"response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"0371558727",
>"price":19.0,
>"_version_":1667214802265571328}]
>}
>
># cat test.json
>[
>{"id":"0371558727",
> "price":{"set":19.95}
>}
>]
>
># /usr/local/solr/bin/post -p 8983 -c books test.json
>
># curl http://localhost:8983/solr/books/select?q=id%3A0371558727
>"response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"0371558727",
>"price.set":[19.95],
>"_version_":1667214933776924672}]
>}
>
>


Re: Use Subquery Parameters to filter main query

2020-05-22 Thread Mikhail Khludnev
Hello, Rodrigo.
I don'd fully understand your question but the only thing you can do is
group.q=members:6, nothing like using something from subquery in main one
is not possible.
Please clarify your question.

On Fri, May 22, 2020 at 12:21 AM rantonana 
wrote:

> Hello, I need to do the following:
> I have a main query who define a subquery called group with  "fields":
> "*,group:[subquery]",
> the group document has a lot of fields, but I want to filter the main query
> based on one of them.
> ex:
> {
> PID:1,
> type:doc,
>  "group":{"numFound":1,"start":0,"docs":[
> {
> members:[1,2,3]
> }]
> },
> {
> PID:2,
> type:doc,
>  "group":{"numFound":1,"start":0,"docs":[
> {
> members:[4,5,6]
> }]
> }
>
> in the example, I want to filter type documents where members field has the
> 6 value.
>
> thanks
>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev