from:"Mark"

Re: Programmatic Basic Auth on CloudSolrClient

2021-03-04 Thread Mark H. Wood

On Wed, Mar 03, 2021 at 10:34:50AM -0800, Tomás Fernández Löbbe wrote:
> As far as I know the current OOTB options are system properties or
> per-request (which would allow you to use different per collection, but
> probably not ideal if you do different types of requests from different
> parts of your code). A workaround (which I've used in the past) is to have
> a custom client that overrides and sets the credentials in the "request"
> method (you can put whatever logic there to identify which credentials to
> use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
> and https://issues.apache.org/jira/browse/SOLR-15155 to try to address this
> issue in future releases.

I have not tried it, but could you not:

1. set up an HttpClient with an appropriate CredentialsProvider;
2. pass it to HttpSolrClient.Builder.withHttpClient();
2. pass that Builder to LBHttpSolrClient.Builder.withHttpSolrClientBuilder();
3. pass *that* Builder to CloudSolrClient.Builder.withLBHttpSolrClientBuilder();

Now you have control of the CredentialsProvider and can have it return
whatever credentials you wish, so long as you still have a reference
to it.

> On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das  wrote:
> 
> >
> > Hi There,
> >
> > Is there any way to programmatically set basic authentication credential
> > on CloudSolrClient?
> >
> > The only documentation available is to use system property. This is not
> > useful if two collection required two separate set of credentials and they
> > are parallelly accessed.
> > Thanks in advance.
> >

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Mark Robinson

Thanks Marcus for your response.

Best,
Mark

On Wed, Feb 24, 2021 at 4:50 PM Markus Jelsma 
wrote:

> I would stick to the query elevation component, it is pretty fast and
> easier to handle/configure elevation IDs, instead of using function queries
> for it. We have customers that set a dozen of documents for a given query
> and it works just fine.
>
> I also do not expect the function query variant to be more performant, but
> i am not sure. If it were, would it be measurable?
>
> Regards,
> Markus
>
> Op wo 24 feb. 2021 om 12:15 schreef Mark Robinson  >:
>
> > Thanks for the reply Markus!
> >
> > I did try it.
> > My question specifically was (repasting here):-
> >
> > Which is more recommended/ performant?
> >
> > Note:- Assume that I have hundreds of ids to boost like this.
> > Is there a difference to the answer if docs to be boosted after the sort
> is
> > less?
> >
> > Thanks!
> > Mark
> >
> > On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > You are probably looking for the elevator component, check it out:
> > >
> >
> https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html
> > >
> > > Regards,
> > > Markus
> > >
> > > Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson <
> > mark123lea...@gmail.com
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I wanted to sort and then boost some docs to the top and these docs
> > > should
> > > > be my first set in the results and the following ones appearing
> > according
> > > > to my sort criteria.
> > > >
> > > > I understand that sort overrides bq hence bq may not be used in this
> > case
> > > >
> > > > - I brought my boost into sort using "query()" and achieved my goal.
> > > > - I tried sort and then elevate with forceElevation and that also
> > worked.
> > > >
> > > > My question is which is more recommended/ performant?
> > > >
> > > > Note:- Assume that I have hundreds of ids to boost like this.
> > > > Is there a difference to the answer if docs to be boosted after the
> > sort
> > > is
> > > > less?
> > > >
> > > > Could someone please share your thoughts/experience?
> > > >
> > > > Thanks!
> > > > Mark.
> > > >
> > >
> >
>

Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Mark Robinson

Thanks for the reply Markus!

I did try it.
My question specifically was (repasting here):-

Which is more recommended/ performant?

Note:- Assume that I have hundreds of ids to boost like this.
Is there a difference to the answer if docs to be boosted after the sort is
less?

Thanks!
Mark

On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma 
wrote:

> Hello,
>
> You are probably looking for the elevator component, check it out:
> https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html
>
> Regards,
> Markus
>
> Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson  >:
>
> > Hi,
> >
> > I wanted to sort and then boost some docs to the top and these docs
> should
> > be my first set in the results and the following ones appearing according
> > to my sort criteria.
> >
> > I understand that sort overrides bq hence bq may not be used in this case
> >
> > - I brought my boost into sort using "query()" and achieved my goal.
> > - I tried sort and then elevate with forceElevation and that also worked.
> >
> > My question is which is more recommended/ performant?
> >
> > Note:- Assume that I have hundreds of ids to boost like this.
> > Is there a difference to the answer if docs to be boosted after the sort
> is
> > less?
> >
> > Could someone please share your thoughts/experience?
> >
> > Thanks!
> > Mark.
> >
>

Overriding Sort and boosting some docs to the top

2021-02-24 Thread Mark Robinson

Hi,

I wanted to sort and then boost some docs to the top and these docs should
be my first set in the results and the following ones appearing according
to my sort criteria.

I understand that sort overrides bq hence bq may not be used in this case

- I brought my boost into sort using "query()" and achieved my goal.
- I tried sort and then elevate with forceElevation and that also worked.

My question is which is more recommended/ performant?

Note:- Assume that I have hundreds of ids to boost like this.
Is there a difference to the answer if docs to be boosted after the sort is
less?

Could someone please share your thoughts/experience?

Thanks!
Mark.

Proximity Searches with Phrases

2021-01-06 Thread Mark R

Use Case: Is it possible to perform a proximity search using phrases for 
example: "phrase 1" within 10 words of "phrase 2"

SOLR Version: 8.4.1

Query using: "(\"word1 word2\"(\"word3 word4\")"~10

While this returns results seems to be evaluating the words with each other, 
word1 and word2, word1 and word3, word2 and word3, rather than phrase1 and 
phrase2.

Are stop words removed when querying, I assume yes. ?

Thanks in advance

Mark

Proximity Search with phrases

2020-11-27 Thread Mark R

Use Case: Is it possible to perform a proximity search using phrases for 
example: "phrase 1" with 10 words of "phrase 2"

SOLR Version: 8.4.1

Query using: "(\"word1 word2\"(\"word3 word4\")"~10

While this returns results seems to be evaluating the words with each other.

Are stop words removed when querying, I assume yes. ?

Thanks in advance

Mark

Tangent: old Solr versions

2020-10-28 Thread Mark H. Wood

On Tue, Oct 27, 2020 at 04:25:54PM -0500, Mike Drob wrote:
> Based on the questions that we've seen over the past month on this list,
> there are still users with Solr on 6, 7, and 8. I suspect there are still
> Solr 5 users out there too, although they don't appear to be asking for
> help - likely they are in set it and forget it mode.

Oh, there are quite a few instances of Solr 4 out there as well.  Many
of them will be moving to v7 or v8, probably starting in the next 6-12
months.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson

Thanks!

Mark

On Tue, Oct 27, 2020 at 11:56 AM Dave  wrote:

> Agreed. Just a JavaScript check on the input box would work fine for 99%
> of cases, unless something automatic is running them in which case just
> server side redirect back to the form.
>
> > On Oct 27, 2020, at 11:54 AM, Mark Robinson 
> wrote:
> >
> > Hi  Konstantinos ,
> >
> > Thanks for the reply.
> > I too feel the same. Wanted to find what others also in the Solr world
> > thought about it.
> >
> > Thanks!
> > Mark.
> >
> >> On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
> >> konstantinos.koukou...@mecenat.com> wrote:
> >>
> >> Oh hi Mark!
> >>
> >> Why would you wanna do such a thing in the solr end. Imho it would be
> much
> >> more clean and easy to do it on the client side
> >>
> >> Regards,
> >> Konstantinos
> >>
> >>
> >>>> On 27 Oct 2020, at 16:42, Mark Robinson 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I want to block queries having only a digit like "1" or "2" ,... or
> >>> just a letter like "a" or "b" ...
> >>>
> >>> Is it a good idea to block them ... ie just single digits 0 - 9 and  a
> -
> >> z
> >>> by putting them as a stop word? The problem with this I can anticipate
> >> is a
> >>> query like "1 inch screw" can have the important information "1"
> stripped
> >>> out if I tokenize it.
> >>>
> >>> So what would be a good way to avoid  single digit only and single
> letter
> >>> only queries, from the Solr end?
> >>> Or should I not do this at the Solr end at all?
> >>>
> >>> Could someone please share your thoughts?
> >>>
> >>> Thanks!
> >>> Mark
> >>
> >> ==
> >> Konstantinos Koukouvis
> >> konstantinos.koukou...@mecenat.com
> >>
> >> Using Golang and Solr? Try this: https://github.com/mecenat/solr
> >>
> >>
> >>
> >>
> >>
> >>
>

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson

Hi  Konstantinos ,

Thanks for the reply.
I too feel the same. Wanted to find what others also in the Solr world
thought about it.

Thanks!
Mark.

On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
konstantinos.koukou...@mecenat.com> wrote:

> Oh hi Mark!
>
> Why would you wanna do such a thing in the solr end. Imho it would be much
> more clean and easy to do it on the client side
>
> Regards,
> Konstantinos
>
>
> > On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
> >
> > Hello,
> >
> > I want to block queries having only a digit like "1" or "2" ,... or
> > just a letter like "a" or "b" ...
> >
> > Is it a good idea to block them ... ie just single digits 0 - 9 and  a -
> z
> > by putting them as a stop word? The problem with this I can anticipate
> is a
> > query like "1 inch screw" can have the important information "1" stripped
> > out if I tokenize it.
> >
> > So what would be a good way to avoid  single digit only and single letter
> > only queries, from the Solr end?
> > Or should I not do this at the Solr end at all?
> >
> > Could someone please share your thoughts?
> >
> > Thanks!
> > Mark
>
> ==
> Konstantinos Koukouvis
> konstantinos.koukou...@mecenat.com
>
> Using Golang and Solr? Try this: https://github.com/mecenat/solr
>
>
>
>
>
>

Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson

Hello,

I want to block queries having only a digit like "1" or "2" ,... or
just a letter like "a" or "b" ...

Is it a good idea to block them ... ie just single digits 0 - 9 and  a - z
by putting them as a stop word? The problem with this I can anticipate is a
query like "1 inch screw" can have the important information "1" stripped
out if I tokenize it.

So what would be a good way to avoid  single digit only and single letter
only queries, from the Solr end?
Or should I not do this at the Solr end at all?

Could someone please share your thoughts?

Thanks!
Mark

ElevateIds - should I remove those that might be filtered off in the underlying query

2020-10-19 Thread Mark Robinson

Hi,

Suppose I have say 50 ElevateIds and I have a way to identify those that
would get filtered out in the query by predefined  fqs. So they would in
reality never be even in the results and hence never be elevated.

Is there any advantage if I avoid passing them in the elevateIds at the
time of creating the elevateIds,  thinking I can gain performance or they
remaining in the elevateIds does not cause any performance difference?

Thanks!
Mark

security.json help

2020-10-19 Thread Mark Dadisman

Hey, I'm new to configuring Solr. I'm trying to configure Solr with Rule Based 
Authorization. 
https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html

I have permissions working if I allow everything with "all", but I want to 
limit access so that a site can only access its own collection, in addition to 
a server ping path, so I'm trying to add the collection-specific permission at 
the top:

"permissions": [
  {
"name": "custom-example",
"collection": "example",
"path": "*",
"role": [
  "admin",
  "example"
]
  },
  {
"name": "custom-collection",
"collection": "*",
"path": [
  "/admin/luke",
  "/admin/mbeans",
  "/admin/system"
],
"role": "*"
  },
  {
"name": "custom-ping",
"collection": null,
"path": [
  "/admin/info/system"
],
"role": "*"
  },
  {
"name": "all",
"role": "admin"
  }
]

The rule "custom-ping" works, and "all" works. But when the above permissions 
are used, access is denied to the "example" user-role for collection "example" 
at the path "/solr/example/select". If I specify paths explicitly, the 
permissions work, but I can't get permissions to work with path wildcards for a 
specific collection.

I also had to declare "custom-collection" with the specific paths needed to get 
collection info in order for those paths to work. I would've expected that 
these paths would be included in the collection-specific paths and be covered 
by the first rule, but they aren't. For example, the call to 
"/solr/example/admin/luke" will fail if the path is removed from this rule.

I don't really want to specify every single path I might need to use. Am I 
using the path wildcard wrong somehow? Is there a better way to do 
collection-specific authorizations for a collection "example"?

Thanks.
- M

Re: Solr queries slow down over time

2020-09-25 Thread Mark H. Wood

On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
> 
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
> 
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?

I and many others have run into the same issue.  Yes, each windowed
query starts fresh, having to find at least enough records to satisfy
the query, walking the list to discard the first 'start' worth of
them, and then returning the next 'rows' worth.  So as 'start' increases,
the work required of Solr increases and the response time lengthens.

> Is there a better way to do this operation using Solr?

Another answer in this thread gives links to resources for addressing
the problem, and I can't improve on those.

I can say that when I switched from start= windowing to cursormark, I
got a very nice improvement in overall speed and did not see the
progressive slowing anymore.  A query loop that ran for *days* now
completes in under five minutes.  In some way that I haven't quite
figured out, a cursormark tells Solr where in the overall document
sequence to start working.

So yes, there *is* a better way.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Need to update SOLR_HOME in the solr service script and getting errors

2020-09-17 Thread Mark H. Wood

On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote:
> My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms 
> using an external zookeeper assembly.
> I installed Solr 6.6.6 using the install file and then followed the steps for 
> enabling ssl. I am able to start solr, add collections and the like using 
> bin/solr script.
> 
> Example:
> /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force
> 
> However, if I restart the machine or attempt to start solr using the 
> installed service, it naturally goes back to the default SOLR_HOME in the 
> /etc/default/solr.in.sh script: "/var/solr/data"
> 
> I've tried updating SOLR_HOME to "/opt/solr/cloud/test2"

That is what I would do.

> but then when I start the service I see the following error on the Admin 
> Dashboard:
> SolrCore Initialization Failures
> mycollection_shard1_replica1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
> Please check your logs for more information
> 
> I'm including what I believe to be the pertinent information from the logs 
> below:

You did well.

> I suspect this is a permission issue because the solr user created by the 
> install script isn't allowed access to  /opt/solr but I'm new to Linux and 
> haven't completely wrapped my head around the way permissions work with it. 
> Am I correct in guessing the cause of the error and, if so, how do I correct 
> this so that the service can be used to run my instances?

Yes, the stack trace actually tells you explicitly that the problem is
permissions on that file.  Follow the chain of "Caused by:" and you'll see:

  Caused by: java.nio.file.AccessDeniedException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock

Since, in the past, you have started Solr using 'sudo', this probably
means that write.lock is owned by 'root'.  Solr creates this file with
permissions that allow only the owner to write it.  If the service
script runs Solr as any other user (and it should!) then Solr won't be
able to open this file for writing, and because of this it won't
complete the loading of that core.

You should find out what user account is used by the service script,
and 'chown' Solr's entire working directories tree to be owned by that
user.  Then, refrain from ever running Solr as 'root' or the problem
may recur.  Use the normal service start/stop mechanism for
controlling your Solr instances.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Mark Robinson

Thanks Colvin!
All the responses were helpful.

Best
Mark

On Wed, Sep 16, 2020 at 4:06 AM Colvin Cowie 
wrote:

> Hi Mark,
>
> If queries taking 10 (or however many) seconds isn't acceptable, then
> either you need to a) prevent or optimize those queries, b) improve the
> performance of your index, c) use timeAllowed and accept that queries
> taking that long may fail or provide incomplete results, or d) a
> combination of the above.
>
> If you use timeAllowed then you have to accept the possibility that a query
> won't complete within the time allowed. Therefore you need to be able to
> deal with the possibility of the query failing or of it returning
> incomplete results.
>
> In our use of Solr, if a query exceeds timeAllowed we always treat it as a
> failure, even if it might have returned partial results, and return a 5xx
> response from our own server since we don't want to serve incomplete
> results ever. But you could attempt to return whatever results you do
> receive, perhaps with a warning message for your client indicating what
> happened.
>
>
> On Wed, 16 Sep 2020 at 02:05, Mark Robinson 
> wrote:
>
> > Thanks Dominique!
> > So is this parameter generally recommended or not. I wanted to try with a
> > value of 10s. We are not using it now.
> > My goal is to prevent a query from running more than 10s on the solr
> server
> > and choking it.
> >
> > What is the general recommendation.
> >
> > Thanks!
> > Mark
> >
> > On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean <
> > dominique.bej...@eolya.fr>
> > wrote:
> >
> > > Hi,
> > >
> > > 1. Yes, your analysis is correct
> > >
> > > 2. Yes, it can occurs too with very slow query.
> > >
> > > Regards
> > >
> > > Dominique
> > >
> > > Le mar. 15 sept. 2020 à 15:14, Mark Robinson 
> a
> > > écrit :
> > >
> > > > Hi,
> > > >
> > > > When in a sample query I used "timeAllowed" as low as 10mS, I got
> value
> > > for
> > > >
> > > > "numFound" as say 2000, but no docs were returned. But when I
> increased
> > > the
> > > >
> > > > value for timeAllowed to be in seconds, never got this scenario.
> > > >
> > > >
> > > >
> > > > I have 2 qns:-
> > > >
> > > > 1. Why does numFound have a value like say 2000 or even 6000 but no
> > > >
> > > > documents actually returned. During document collection is
> calculation
> > of
> > > >
> > > > numFound done first and doc collection later?. Is doc list empty
> > > because,by
> > > >
> > > > the time doc collection started the timeAllowed cut off took effect?
> > > >
> > > >
> > > >
> > > > 2. If I give timeAllowed a value say, 10s or above do you think the
> > above
> > > >
> > > > scenario of valid count displayed in numFound, but doc list empty can
> > > ever
> > > >
> > > > occur still, as there is more time before cut-off to retrieve at
> least
> > > one
> > > >
> > > > doc ?
> > > >
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > Mark
> > > >
> > > >
> > >
> >
>

Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Mark Robinson

Thanks much Bram!

Best,
Mark

On Wed, Sep 16, 2020 at 3:59 AM Bram Van Dam  wrote:

> There are a couple of open issues related to the timeAllowed parameter.
> For instance it currently doesn't work on conjunction with the
> cursorMark parameter [1]. And on Solr 7 it doesn't work at all [2].
>
> But other than that, when users have a lot of query flexibility, it's a
> pretty good idea to limit them somehow. You don't want your users to
> blow up your servers.
>
> [1] https://issues.apache.org/jira/browse/SOLR-14413
>
> [2] https://issues.apache.org/jira/browse/SOLR-9882
>
>  - Bram
>
> On 16/09/2020 03:04, Mark Robinson wrote:
> > Thanks Dominique!
> > So is this parameter generally recommended or not. I wanted to try with a
> > value of 10s. We are not using it now.
> > My goal is to prevent a query from running more than 10s on the solr
> server
> > and choking it.
> >
> > What is the general recommendation.
> >
> > Thanks!
> > Mark
> >
> > On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean <
> dominique.bej...@eolya.fr>
> > wrote:
> >
> >> Hi,
> >>
> >> 1. Yes, your analysis is correct
> >>
> >> 2. Yes, it can occurs too with very slow query.
> >>
> >> Regards
> >>
> >> Dominique
> >>
> >> Le mar. 15 sept. 2020 à 15:14, Mark Robinson 
> a
> >> écrit :
> >>
> >>> Hi,
> >>>
> >>> When in a sample query I used "timeAllowed" as low as 10mS, I got value
> >> for
> >>>
> >>> "numFound" as say 2000, but no docs were returned. But when I increased
> >> the
> >>>
> >>> value for timeAllowed to be in seconds, never got this scenario.
> >>>
> >>>
> >>>
> >>> I have 2 qns:-
> >>>
> >>> 1. Why does numFound have a value like say 2000 or even 6000 but no
> >>>
> >>> documents actually returned. During document collection is calculation
> of
> >>>
> >>> numFound done first and doc collection later?. Is doc list empty
> >> because,by
> >>>
> >>> the time doc collection started the timeAllowed cut off took effect?
> >>>
> >>>
> >>>
> >>> 2. If I give timeAllowed a value say, 10s or above do you think the
> above
> >>>
> >>> scenario of valid count displayed in numFound, but doc list empty can
> >> ever
> >>>
> >>> occur still, as there is more time before cut-off to retrieve at least
> >> one
> >>>
> >>> doc ?
> >>>
> >>>
> >>>
> >>> Thanks!
> >>>
> >>> Mark
> >>>
> >>>
> >>
> >
>
>

Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-15 Thread Mark Robinson

Thanks Dominique!
So is this parameter generally recommended or not. I wanted to try with a
value of 10s. We are not using it now.
My goal is to prevent a query from running more than 10s on the solr server
and choking it.

What is the general recommendation.

Thanks!
Mark

On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean 
wrote:

> Hi,
>
> 1. Yes, your analysis is correct
>
> 2. Yes, it can occurs too with very slow query.
>
> Regards
>
> Dominique
>
> Le mar. 15 sept. 2020 à 15:14, Mark Robinson  a
> écrit :
>
> > Hi,
> >
> > When in a sample query I used "timeAllowed" as low as 10mS, I got value
> for
> >
> > "numFound" as say 2000, but no docs were returned. But when I increased
> the
> >
> > value for timeAllowed to be in seconds, never got this scenario.
> >
> >
> >
> > I have 2 qns:-
> >
> > 1. Why does numFound have a value like say 2000 or even 6000 but no
> >
> > documents actually returned. During document collection is calculation of
> >
> > numFound done first and doc collection later?. Is doc list empty
> because,by
> >
> > the time doc collection started the timeAllowed cut off took effect?
> >
> >
> >
> > 2. If I give timeAllowed a value say, 10s or above do you think the above
> >
> > scenario of valid count displayed in numFound, but doc list empty can
> ever
> >
> > occur still, as there is more time before cut-off to retrieve at least
> one
> >
> > doc ?
> >
> >
> >
> > Thanks!
> >
> > Mark
> >
> >
>

"timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-15 Thread Mark Robinson

Hi,
When in a sample query I used "timeAllowed" as low as 10mS, I got value for
"numFound" as say 2000, but no docs were returned. But when I increased the
value for timeAllowed to be in seconds, never got this scenario.

I have 2 qns:-
1. Why does numFound have a value like say 2000 or even 6000 but no
documents actually returned. During document collection is calculation of
numFound done first and doc collection later?. Is doc list empty because,by
the time doc collection started the timeAllowed cut off took effect?

2. If I give timeAllowed a value say, 10s or above do you think the above
scenario of valid count displayed in numFound, but doc list empty can ever
occur still, as there is more time before cut-off to retrieve at least one
doc ?

Thanks!
Mark

Re: What is the Best way to block certain types of queries/ query patterns in Solr?

2020-09-08 Thread Mark Robinson

Makes sense.
Thanks much David!

Mark

On Fri, Sep 4, 2020 at 12:13 AM David Smiley  wrote:

> The general assumption in deploying a search platform is that you are going
> to front it with a service you write that has the search features you care
> about, and only those.  Only this service or other administrative functions
> should reach Solr.  Be wary of making your service so flexible to support
> arbitrary parameters you pass to Solr as-is that you don't know about in
> advance (i.e. use an allow-list).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 31, 2020 at 10:57 AM Mark Robinson 
> wrote:
>
> > Hi,
> > I had come across a mail (Oct, 2019 one) which suggested the best way is
> to
> > handle it before it reaches Solr. I was curious whether:-
> >1. Jetty query filter can be used (came across something like
> > that,, need to check)
> > 2. Any new features in Solr itself (like in a request handler...or
> > solrconfig, schema etc..)
> >
> > Thanks!
> > Mark
> >
>

What is the Best way to block certain types of queries/ query patterns in Solr?

2020-08-31 Thread Mark Robinson

Hi,
I had come across a mail (Oct, 2019 one) which suggested the best way is to
handle it before it reaches Solr. I was curious whether:-
   1. Jetty query filter can be used (came across something like
that,, need to check)
2. Any new features in Solr itself (like in a request handler...or
solrconfig, schema etc..)

Thanks!
Mark

Incorrect Insecure Settings Check in CoreContainer

2020-08-05 Thread Mark Todd1



I've configured SolrCloud (8.5) with both SSL and Authentication which is 
working correctly. However, I get the following warning in the logs
 
Solr authentication is enabled, but SSL is off. Consider enabling SSL to 
protect user credentials and data with encryption
 
Looking at the source code for SolrCloud there appears to be a bug
if (authenticationPlugin !=null && 
StringUtils.isNotEmpty(System.getProperty("solr.jetty.https.port"))) {

log.warn("Solr authentication is enabled, but SSL is off.  Consider enabling 
SSL to protect user credentials and data with encryption.");

}
 
Rather than checking for an empty system property (which would indicate SLL is 
off) its checking for a populated one which is what you get when SSL is on.
 
Should I raise this as a Jira bug?
 
Mark ToddUnless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Mark H. Wood

Wandering off topic, but still apropos Solr.

On Sun, Jun 28, 2020 at 12:14:56PM +0200, Ilan Ginzburg wrote:
> I disagree Ishan. We shouldn't get rid of standalone mode.
> I see three layers in Solr:
> 
>1. Lucene (the actual search libraries)
>2. The server infra ("standalone Solr" basically)
>3. Cluster management (SolrCloud)
> 
> There's value in using lower layers without higher ones.
> SolrCloud is a good solution for some use cases but there are others that
> need a search server and for which SolrCloud is not a good fit and will
> likely never be. If standalone mode is no longer available, such use cases
> will have to turn to something other than Solr (or fork and go their own
> way).

A data point:

While working to upgrade a dependent product from Solr 4 to Solr 7, I
came across a number of APIs which would have made things simpler,
neater and more reliable...except that they all are available *only*
is SolrCloud.  I eventually decided that asking thousands of sites to
run "degenerate" SolrCloud clusters (of a single instance, plus the ZK
stuff that most would find mysterious) was just not worth the gain.

So, my wish-list for Solr includes either (a) abolish standalone so
the decision is taken out of my hands, or (b) port some of the
cloud-only APIs back to the standalone layer.  I haven't spent a
moment's thought on how difficult either would be -- as I said, just a
wish.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Mark H. Wood

On Wed, Jun 24, 2020 at 12:45:25PM +0200, Jan Høydahl wrote:
> Master/slave and standalone are used interchangably to mean zk-less Solr. I 
> have a feeling that master/slave is the more popular of the two, but 
> personally I have been using both.

I've been trying to stay quiet and let the new-terminology issue
settle, but I had a thought.  Someone has already pointed out that the
so-called master/slave cluster is misnamed:  the so-called "master"
node doesn't order the "slaves" about and indeed has no notion of
being a master in any sense.  It acts as a servant to the "slave"
nodes, which are in charge of keeping themselves updated.

So, it's kind of odd, but I could get used to calling this mode a
"client/server cluster".

That leaves the question of what to call Solr Cloud mode, in which no
node is permanently special.  I could see calling it a "herd" or
suchlike.

Now I'll try to shut up again. :-)

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread Mark H. Wood

On Fri, Jun 19, 2020 at 09:22:49AM -0400, j.s. wrote:
> On 6/18/20 9:50 PM, Rahul Goswami wrote:
> > So +1 on "slave" being the problematic term IMO, not "master".
> 
> but you cannot have a master without a slave, n'est-ce pas?

Well, yes.  In education:  Master of Science, Arts, etc.  In law:
Special Master (basically a judge's delegate).  See also "magistrate."
None of these has any connotation of the ownership of one person by
another.

(It's a one-way relationship:  there is no slavery without mastery,
but there are other kinds of mastery.)

But this is an emotional issue, not a logical one.  If doing X makes
people angry, and we don't want to make those people angry, then
perhaps we should not do X.

> i think it is better to use the metaphor of copying rather than one of 
> hierarchy. language has so many (unintended) consequences ...

Sensible.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Mark H. Wood

Primary / satellite?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: Script to check if solr is running

2020-06-05 Thread Mark H. Wood

On Thu, Jun 04, 2020 at 12:36:30PM -0400, Ryan W wrote:
> Does anyone have a script that checks if solr is running and then starts it
> if it isn't running?  Occasionally my solr stops running even if there has
> been no Apache restart.  I haven't been able to determine the root cause,
> so the next best thing might be to check every 15 minutes or so if it's
> running and run it if it has stopped.

I've used Monit for things that must be kept running:

  https://mmonit.com/monit/

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: 404 response from Schema API

2020-05-15 Thread Mark H. Wood

On Thu, May 14, 2020 at 02:47:57PM -0600, Shawn Heisey wrote:
> On 5/14/2020 1:13 PM, Mark H. Wood wrote:
> > On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:
> >> On 4/16/2020 10:07 AM, Mark H. Wood wrote:
> >>> I need to ask Solr 4.10 for the name of the unique key field of a
> >>> schema.  So far, no matter what I've done, Solr is returning a 404.
> 
> The Luke Request Handler, normally assigned to the /admin/luke path, 
> will give you the info you're after.  On a stock Solr install, the 
> following URL would work:
> 
> /solr/admin/luke?show=schema
> 
> I have tried this on solr 4.10.4 and can confirm that the response does 
> have the information.

Thank you, for the information and especially for taking the time to test.

> Since you are working with a different context path, you'll need to 
> adjust your URL to match.
> 
> Note that as of Solr 5.0, running with a different context path is not 
> supported.  The admin UI and the more advanced parts of the startup 
> scripts are hardcoded for the /solr context.

Yes.  5.0+ isn't packaged to be run in Tomcat, as we do now, so Big
Changes are coming when we upgrade.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood

On Thu, May 14, 2020 at 03:13:07PM -0400, Mark H. Wood wrote:
> Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
> farther, just yet -- changes between, I think, 5 and 6 mean I'd have
> to spend a week reloading 10 years worth of data.  For now I don't
> want to go any farther than I have to, to make this work.)

Nope, my memory was faulty:  those changes happened in 5.0.  (The
schemas I've been given, used since time immemorial, are chock full of
IntField and DateField.)  I'm stuck with reloading.  Might as well go
to 8.x.  Or give up on asking Solr for the schema's uniqueKey,
configure the client with the field name and cross fingers.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood

On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:
> On 4/16/2020 10:07 AM, Mark H. Wood wrote:
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> 
>  From what I can see, the schema API handler was introduced in version 
> 5.0.  The SchemaHandler class exists in the released javadoc for the 5.0 
> version, but not the 4.10 version.  You'll need a newer version of Solr.

*sigh*  That's what I see too, when I dig through the JARs.  For some
reason, many folks believe that the Schema API existed at least as
far back as 4.2:

https://stackoverflow.com/questions/7247221/does-solr-has-api-to-read-solr-schema-xml

Perhaps because the _Apache Solr Reference Guide 4.10_ says so, on
page 53.

This writer thinks it worked, read-only, on 4.10.3:

https://stackoverflow.com/questions/33784998/solr-rest-api-for-schema-updates-returns-method-not-allowed-405

But it doesn't work here, on 4.10.4:

  curl 'https://toolshed.wood.net:8443/isw6/solr/statistics/schema?wt=json'
  14-May-2020 15:07:03.805 INFO 
[https-jsse-nio-fec0:0:0:1:0:0:0:7-8443-exec-60] 
org.restlet.engine.log.LogFilter.afterHandle 2020-05-14  15:07:03
fec0:0:0:1:0:0:0:7  -   fec0:0:0:1:0:0:0:7  8443GET 
/isw6/solr/schema   wt=json 404 0   0   0   
https://toolshed.wood.net:8443  curl/7.69.1 -

Strangely, Solr dropped the core-name element of the path!

Any idea what happened?

Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
farther, just yet -- changes between, I think, 5 and 6 mean I'd have
to spend a week reloading 10 years worth of data.  For now I don't
want to go any farther than I have to, to make this work.)

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Solr Ref Guide Redesign coming in 8.6

2020-04-29 Thread Mark H. Wood

At first glance, I have no big issues.  It looks clean and functional,
and I like that.  I think it will work well enough for me.

This design still has a minor annoyance that I have noted in the past:
in the table of contents pane it is easy to open a subtree, but the
only way to close it is to open another one.  Obviously not a big
deal.

I'll probably spend too much time researching how to widen the
razor-thin scrollbar in the TOC panel, since it seems to be
independent of the way I spent too much time fixing the browser's own
inadequate scrollbar width. :-) Also, the thumb's color is so close to
the surrounding color that it's really hard to see.  And for some
reason when I use the mouse wheel to scroll the TOC, when it gets to
the top or the bottom the content pane starts scrolling instead, which
is surprising and mildly inconvenient.  Final picky point:  the
scrolling is *very* insensitive -- takes a lot of wheel motion to move
the panel just a bit.

(I'm aware that a lot of the things I complain about in "modern" web
sites are the things that make them "modern".  So, I'm an old fossil. :-)

Firefox 68.7.0esr, Gentoo Linux.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: 404 response from Schema API

2020-04-16 Thread Mark H. Wood

On Thu, Apr 16, 2020 at 02:00:06PM -0400, Erick Erickson wrote:
> Assuming isw6_3 is your collection name, you have
> “solr” and “isw6_3” reversed in the URL.

No.  Solr's context is '/isw6_3/solr' and the core is 'statistics'.

> Should be something like:
> https://toolshed.wood.net:8443/solr/isw6_3/schema/uniquekey
> 
> If that’s not the case you need to mention your collection. But in
> either case your collection name comes after /solr/.

Thank you.  I think that's what I have now.

> > On Apr 16, 2020, at 12:07 PM, Mark H. Wood  wrote:
> > 
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >  curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> > 
> > Ideas?
> > 
> > -- 
> > Mark H. Wood
> > Lead Technology Analyst
> > 
> > University Library
> > Indiana University - Purdue University Indianapolis
> > 755 W. Michigan Street
> > Indianapolis, IN 46202
> > 317-274-0749
> > www.ulib.iupui.edu
> 
> 

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

404 response from Schema API

2020-04-16 Thread Mark H. Wood

I need to ask Solr 4.10 for the name of the unique key field of a
schema.  So far, no matter what I've done, Solr is returning a 404.

This works:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'

This gets a 404:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'

So does this:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'

We normally use the ClassicIndexSchemaFactory.  I tried switching to
ManagedIndexSchemaFactory but it made no difference.  Nothing is
logged for the failed requests.

Ideas?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: Optimal size for queries?

2020-04-15 Thread Mark H. Wood

On Wed, Apr 15, 2020 at 10:09:59AM +0100, Colvin Cowie wrote:
> Hi, I can't answer the question as to what the optimal size of rows per
> request is. I would expect it to depend on the number of stored fields
> being marshaled, and their type, and your hardware.

It was a somewhat naive question, but I wasn't sure how to ask a
better one.  Having thought a bit more, I expect that the eventual
solution to my problem will include a number of different changes,
including larger pages, tuning several caches, providing a progress
indicator to the user, and (as you point out below) re-thinking how I
ask Solr for so many documents.

> But using start + rows is a *bad thing* for deep paging. You need to use
> cursorMark, which looks like it was added in 4.7 originally
> https://issues.apache.org/jira/browse/SOLR-5463
> There's a description on the newer reference guide
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> and in the 4.10 PDF on page 305
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
> 
> http://yonik.com/solr/paging-and-deep-paging/

Thank you for the links.  I think these will be very helpful.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

I need to pull a *lot* of records out of a core, to be statistically
analyzed and the stat.s presented to the user, who is sitting at a
browser waiting. So far I haven't seen a way to calculate the stat.s
I need in Solr itself. It's difficult to know the size of the total
result, so I'm running the query repeatedly and windowing the results
with 'start' and 'rows'. I just guessed that a window of 1000
documents would be reasonable. We currently have about 48GB in the
core.

The product uses Solr 4.10. Yes, I know that's very old.

What I got is that every three seconds or so I get another 1000
documents, totalling around 500KB per response. For a user request
for a large range, this is taking way longer than the user's browser
is willing to wait. The single CPU on my test box is at 99%
continuously, and Solr's memory use is around 90% of 8GB. The test
hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
2.70GHz'.

A sample query:

0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET
/solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
HTTP/1.1" 200 497475 "-"
"Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"

As you can see, my test was getting close to 1000 windows. It's still
going. I don't know how far along that is.

So I'm wondering:

o how can I do better than guessing that 1000 is a good window size?
How big a response is too big?

o what else should I be thinking about?

o given that my test on a full-sized copy of the live data has been
running for an hour and is still going, is it totally impractical
to expect that I can improve the process enough to give a response
to an ad-hoc query while-you-wait?

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: How do you restrict access to Solr?

2020-03-19 Thread Mark H. Wood

On Mon, Mar 16, 2020 at 11:43:10AM -0400, Ryan W wrote:
> On Mon, Mar 16, 2020 at 11:40 AM Walter Underwood 
> wrote:
> 
> > Also, even if you prevent access to the admin UI, a request to /update can
> > delete
> > all the content. It is really easy. This Gist shows how.
> >
> > https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
> 
> 
> 
> This seems important.  In other words, my work isn't necessarily done if
> I've secured the graphical UI.  I can't just visit the admin UI page to see
> if my efforts are successful.

It is VERY IMPORTANT.  You are correct.  The Admin. GUI is just a
convenience layer over extensive REST APIs.  You need to secure access
to the APIs, not just the admin. application that runs on top of them.

If all use is from the local host, then running Solr only on the
loopback address will keep outsiders from connecting to any part of
it.

If other internal hosts need access, then I would run Solr only on an
RFC1918 (non-routed) address, and set up the Solr host's firewall to
grant access to Solr's port (8983 by default) only from permitted hosts.

  https://tools.ietf.org/html/rfc1918

Who/what needs access to Solr?  Do you need to grant different levels
of access to specific groups of users?  Then you need something like
Role-Based Access Control.  This is true even if access is only
internal or even just from the same host.  Address-based controls only
divide the universe between those who can do nothing to your Solr and
those who can do *everything* to your Solr.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-01-17 Thread Mark H. Wood

On Thu, Jan 16, 2020 at 03:13:17PM +, Wolf, Chris (ELS-CON) wrote:
> --- original message ---
> It looks to me as though solr-core is not the only artifact with that
> dependency.  The first thing I would do is examine the output of 'mvn
> dependency:tree' to see what has dragged log4j-slf4j-impl in even when
> it is excluded from solr-core. 
> --- end of original message ---
> 
> Hi, that's the first thing I did and *only* solr-core is pulling in 
> log4j-slf4j-impl, but there is more weirdness to this.  When I build as a WAR 
> project, then version 2.11.0 of in log4j-slf4j-impl is pulled in which 
> results in "multiple implementations" warning and is non-fatal.  
> 
> However, when building as a spring-boot executable jar, for some reason, it 
> pulls in version 2.7 rather then 2.11.0 resulting in fatal 
> "ClassNotFoundException: org.apache.logging.log4j.util.ReflectionUtil"

For the version problem, I would try adding something like:

  

  org.apache.logging.log4j
  log4j-slf4j-impl
  2.11.0

  

to pin down the version no matter what is pulling it in.  Not ideal,
since you want to be rid of this dependency altogether, but at least
it may allow the spring-boot artifact to run, until the other problem
is sorted.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-01-16 Thread Mark H. Wood

On Thu, Jan 16, 2020 at 02:03:06AM +, Wolf, Chris (ELS-CON) wrote:
[snip]
> There are several issues:
> 
>   1.  I don’t want log4j-slf4j-impl at all
>   2.  Somehow the version of “log4j-slf4j-impl” being used for the build is 
> 2.7 rather then the expected 2.11.0
>   3.  Due to the version issue, the app croaks with ClassNotFoundException: 
> org.apache.logging.log4j.util.ReflectionUtil
> 
> For issue #1, I tried:
>   
>   org.apache.solr
>   solr-core
>   7.5.0
>   
> 
>   org.apache.logging.log4j
>   log4j-slf4j-impl
> 
>   
> 
> 
> All to no avail, as that dependency ends up in the packaged build - for WAR, 
> it’s version 2.11.0, so even though it’s a bad build, the app runs, but for 
> building a spring-boot executable JAR with embedded webserver, for some 
> reason, it switches log4j-slf4j-impl from version 2.11.0  to 2,7 (2.11.0  
> works, but should not even be there)
> 
> I also tried this:
> https://docs.spring.io/spring-boot/docs/current/maven-plugin/examples/exclude-dependency.html
> 
> …that didn’t work either.
> 
> I’m thinking that solr-core should have added a classifier of “provided” for 
> “log4j-slf4j-impl”, but that’s conjecture of a possible solution going 
> forward, but does anyone know how I can exclude  “log4j-slf4j-impl”  from a 
> spring-boot build?

It looks to me as though solr-core is not the only artifact with that
dependency.  The first thing I would do is examine the output of 'mvn
dependency:tree' to see what has dragged log4j-slf4j-impl in even when
it is excluded from solr-core.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Get Solr to notice new core without restarting?

2019-12-13 Thread Mark H. Wood

I have a product which comes with several empty Solr cores already
configured and laid out, ready to be copied into place where Solr can
find them.  Is there a way to get Solr to notice new cores without
restarting it?  Is it likely there ever will be?  I'm one of the
people who test and maintain the product, so I'm always creating and
destroying instances.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller

I’d also like to say the last 5 years of my life have been spent being paid
to upgrade Solr systems. I’ve made a lot of doing this.

As I said from the start - take this for what it’s worst. For his guys it’s
not worth much. That’s cool.

And it’s a little inside joke that I’ll be back :) I joke a lot.

But seriously, you have a second chance here.

This mostly concerns SolrCloud. That’s why I recommend standalone mode. But
key people know why to do. I know it will happen - but their lives will be
easier if you help.

Lol.

- Mark

On Sat, Nov 30, 2019 at 9:25 PM Mark Miller  wrote:

> I said the key people understand :)
>
> I’ve worked in Lucene since 2006 and have an insane amount of the code
> foot print in Solr and SolrCloud :) Look up the stats. Do you have any
> contributions?
>
> I said the key people know.
>
> Solr stand-alone is and has been very capable. People are working around
> SolrCloud too.All fine and good. Millions are being made and saved.
> Everyone is comfortable. Some might thinks the sky looks clear and blue.
> I’ve spent a lot of capital to make sure the wrong people don’t think that
> anymore ;)
>
> Unless you are a Developer, you won’t understand the other issues. But you
> don’t need too.
>
> Mark
>
> On Sat, Nov 30, 2019 at 7:05 PM Dave  wrote:
>
>> I’m young here I think, not even 40 and only been using solr since like
>> 2008 or so, so like 1.4 give or take. But I know a really good therapist if
>> you want to talk about it.
>>
>> > On Nov 30, 2019, at 6:56 PM, Mark Miller  wrote:
>> >
>> > Now I have sacrificed to give you a new chance. A little for my
>> community.
>> > It was my community. But it was mostly for me. The developer I started
>> as
>> > would kick my ass today.  Circumstances and luck has brought money to
>> our
>> > project. And it has corrupted our process, our community, and our code.
>> >
>> > In college i would talk about past Mark screwing future Mark and too bad
>> > for him. What did he ever do for me? Well, he got me again ;)
>> >
>> > I’m out of steam, time and wife patentice.
>> >
>> > Enough key people are aware of the scope of the problem now that you
>> won’t
>> > need me. I was never actually part of the package. To the many, many
>> people
>> > that offered me private notes of encouragement and future help - thank
>> you
>> > so much. Your help will be needed.
>> >
>> > You will reset. You will fix this. Or I will be back.
>> >
>> > Mark
>> >
>> >
>> > --
>> > - Mark
>> >
>> > http://about.me/markrmiller
>>
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller

I said the key people understand :)

I’ve worked in Lucene since 2006 and have an insane amount of the code foot
print in Solr and SolrCloud :) Look up the stats. Do you have any
contributions?

I said the key people know.

Solr stand-alone is and has been very capable. People are working around
SolrCloud too.All fine and good. Millions are being made and saved.
Everyone is comfortable. Some might thinks the sky looks clear and blue.
I’ve spent a lot of capital to make sure the wrong people don’t think that
anymore ;)

Unless you are a Developer, you won’t understand the other issues. But you
don’t need too.

Mark

On Sat, Nov 30, 2019 at 7:05 PM Dave  wrote:

> I’m young here I think, not even 40 and only been using solr since like
> 2008 or so, so like 1.4 give or take. But I know a really good therapist if
> you want to talk about it.
>
> > On Nov 30, 2019, at 6:56 PM, Mark Miller  wrote:
> >
> > Now I have sacrificed to give you a new chance. A little for my
> community.
> > It was my community. But it was mostly for me. The developer I started as
> > would kick my ass today.  Circumstances and luck has brought money to our
> > project. And it has corrupted our process, our community, and our code.
> >
> > In college i would talk about past Mark screwing future Mark and too bad
> > for him. What did he ever do for me? Well, he got me again ;)
> >
> > I’m out of steam, time and wife patentice.
> >
> > Enough key people are aware of the scope of the problem now that you
> won’t
> > need me. I was never actually part of the package. To the many, many
> people
> > that offered me private notes of encouragement and future help - thank
> you
> > so much. Your help will be needed.
> >
> > You will reset. You will fix this. Or I will be back.
> >
> > Mark
> >
> >
> > --
> > - Mark
> >
> > http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller

Now I have sacrificed to give you a new chance. A little for my community.
It was my community. But it was mostly for me. The developer I started as
would kick my ass today.  Circumstances and luck has brought money to our
project. And it has corrupted our process, our community, and our code.

In college i would talk about past Mark screwing future Mark and too bad
for him. What did he ever do for me? Well, he got me again ;)

I’m out of steam, time and wife patentice.

Enough key people are aware of the scope of the problem now that you won’t
need me. I was never actually part of the package. To the many, many people
that offered me private notes of encouragement and future help - thank you
so much. Your help will be needed.

You will reset. You will fix this. Or I will be back.

Mark


-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller

It’s going to haunt me if I don’t bring up Hossman. I don’t feel I have to
because who doesn’t know him.

He is a treasure that doesn’t spend much time on SolrCloud and has checked
out of leadership for the large part for reasons I won’t argue with.

Why doesn’t he do much with SolrCloud in a real way? I can only guess. He
will tell you it’s above his pay grade or some dumb shit.

IMO, it’s probably more that super thorough people try to be thorough with
SolrCloud and when you do that, it will poke your eye out with a stick. And
then throw you over a cliff.

Make it something he can work on more than tangentially.

Mark
-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller

I’m including this response to a private email because it’s not something
I’ve brought up and I also think it’s a critical note:

“Yes. That is our biggest advantage. Being Apache. Almost no one seems to
be employed to help other contributors get their work in at the right
level, and all the money has ensured the end of the hobbyist. I hope that
changes too.”

-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller

Yes. That is our biggest advantage. Being Apache. Almost no one seems to be
employed to help other contributors get their work in at the right level,
and all the money has ensured the end of the hobbyist. I hope that changes
too.

Thanks for the note.

Mark

On Thu, Nov 28, 2019 at 1:55 PM Paras Lehana 
wrote:

> Hey Mark,
>
> I was actually expecting (and wanting) this after your LinkedIn post.
>
> At this point, the best way to use Solr is as it’s always been - avoid
>> SolrCloud and setup your own system in standalone mode.
>
>
> That's what I have been telling people who are just getting started with
> Solr and thinking that SolrCloud is actually something superior to the
> standalone mode. That may depend on the use case, but for me, I always
> prefer to achieve things from standalone perspective instead of investing
> my time over switching to Cloud.
>
> I handle Auto-Suggest at IndiaMART. We have over 60 million docs. Single
> server of *standalone* Solr is capable of handling 800 req/sec. In fact,
> on production, we get ~300 req/sec and the single Solr is still able to
> provide responses within 25 ms!
>
> Anyways, I don't think that the project was a failure. All these were the
> small drops of the big Solr Ocean. We, the community and you, tried, we
> tested and we are still here as the open community of one of the most
> powerful search platforms. SolrCloud was also needed to be introduced at
> some time. Notwithstanding, I do think that the project needs to be more
> open with community commits. The community and open-sourceness of Solr is
> what I used to love over those of ElasticSearch's.
>
> Anyways, keep rocking! You have already left your footprints into the
> history of this beast project. 落
>
> On Thu, 28 Nov 2019 at 09:10, Mark Miller  wrote:
>
>> Now one company thinks I’m after them because they were the main source of
>> the jokes.
>>
>> Companies is not a typo.
>>
>> If you are using Solr to make or save tons of money or run your business
>> and you employee developers, please include yourself in this list.
>>
>> You are taking and in my opinion Solr is going down. It’s all against your
>> own interest even.
>>
>> I know of enough people that want to solve this now, that it’s likely only
>> a matter of time before they fix the situation - you ever know though.
>> Things change, people get new jobs, jobs change. It will take at least 3-6
>> months to make things reasonable even with a good group banding together.
>>
>> But if you are extracting value from this project and have Solr developers
>> - id like to think you have enough of a stake in this to think about
>> changing the approach everyone has been taking. It’s not working, and the
>> longer it goes on, the harder it’s getting to fix things.
>>
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
>
> <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller

The people I have identified that I have the most faith in to lead the
fixing of Solr are Ishan, Noble and David. I encourage you all to look at
and follow and join in their leadership.

You can do this.


Mark
-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller

Now one company thinks I’m after them because they were the main source of
the jokes.

Companies is not a typo.

If you are using Solr to make or save tons of money or run your business
and you employee developers, please include yourself in this list.

You are taking and in my opinion Solr is going down. It’s all against your
own interest even.

I know of enough people that want to solve this now, that it’s likely only
a matter of time before they fix the situation - you ever know though.
Things change, people get new jobs, jobs change. It will take at least 3-6
months to make things reasonable even with a good group banding together.

But if you are extracting value from this project and have Solr developers
- id like to think you have enough of a stake in this to think about
changing the approach everyone has been taking. It’s not working, and the
longer it goes on, the harder it’s getting to fix things.


-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller

If SolrCloud worked well I’d still agree both options are very valid
depending on your use case. As it is, I’m embarrassed that people give me
any credit for this. I’m here to try and delight users and I have failed in
that. I tried to put a lot of my own time to address things outside of
working on my job of integrating Hadoop and upgrading Solr 4 instances for
years. But I couldn’t convince anyone of what was necessary to address what
has been happening, and my paid job has always been doing other things
since 2012.

On Wed, Nov 27, 2019 at 6:23 PM David Hastings 
wrote:

> Personally I found nothing in solr cloud worth changing from standalone
> for, and just added more complications, more servers, and required becoming
> an expert/knowledgeable in zoo keeper, id rather spend my time developing
> than becoming a systems administrator
>
> On Wed, Nov 27, 2019 at 3:45 AM Mark Miller  wrote:
>
>> This is your queue to come and make your jokes with your name attached.
>> I’m
>> sure the Solr users will appreciate them more than I do. I can’t laugh at
>> this situation because I take production code seriously.
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller

This is your queue to come and make your jokes with your name attached. I’m
sure the Solr users will appreciate them more than I do. I can’t laugh at
this situation because I take production code seriously.

-- 
- Mark

http://about.me/markrmiller

Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller

And if you are a developer, enjoy that Gradle build! It was the highlight
of my year.

On Wed, Nov 27, 2019 at 10:00 AM Mark Miller  wrote:

> If you have a SolrCloud installation that is somehow working for you,
> personally I would never upgrade. The software is getting progressively
> more unstable every release.
>
>
> I wrote most of the core of SolrCloud in a prototype fashion many, many
> years ago. Only Yonik’s isolated work is solid and most of my work still
> stands as it was. This situation has me abandoning that project so that
> people understand I won’t stand by garbage work.
>
> Given that no one seems to understand what is happening in SolrCloud under
> the covers or how it was intended to work, their best bet is to start
> rewriting. Until they do this, I recommend you do not upgrade from an
> install that is working for your needs. A new feature will not be worth the
> headaches.
>
>
> Some of the other committers, who certainly do not understand the scope of
> the problem or my code (they would have touched it a bit if they did) would
> prefer to laugh or form a defensive posture than fix the situation. Wait
> them out. The project will collapse or get better. If I ran a production
> instance of SolrCloud, I would wait to see which happens first before
> embracing any update.
>
>
> At this point, the best way to use Solr is as it’s always been - avoid
> SolrCloud and setup your own system in standalone mode. If I had to build a
> new Solr install today, this is what I would do.
>
>
> In my opinion, the companies that have been claiming to back Solr and
> SolrCloud have been negligent, and all of the users are paying the price.
> It hasn’t been my job to work on it in any real fashion since 2012. I’m
> sorry I couldn’t help improve the situation for you.
>
>
> Take it for what it’s worth. To some, not much I’m sure.
>
>
> Mark Miller
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

A Last Message to the Solr Users

2019-11-27 Thread Mark Miller

If you have a SolrCloud installation that is somehow working for you,
personally I would never upgrade. The software is getting progressively
more unstable every release.


I wrote most of the core of SolrCloud in a prototype fashion many, many
years ago. Only Yonik’s isolated work is solid and most of my work still
stands as it was. This situation has me abandoning that project so that
people understand I won’t stand by garbage work.

Given that no one seems to understand what is happening in SolrCloud under
the covers or how it was intended to work, their best bet is to start
rewriting. Until they do this, I recommend you do not upgrade from an
install that is working for your needs. A new feature will not be worth the
headaches.


Some of the other committers, who certainly do not understand the scope of
the problem or my code (they would have touched it a bit if they did) would
prefer to laugh or form a defensive posture than fix the situation. Wait
them out. The project will collapse or get better. If I ran a production
instance of SolrCloud, I would wait to see which happens first before
embracing any update.


At this point, the best way to use Solr is as it’s always been - avoid
SolrCloud and setup your own system in standalone mode. If I had to build a
new Solr install today, this is what I would do.


In my opinion, the companies that have been claiming to back Solr and
SolrCloud have been negligent, and all of the users are paying the price.
It hasn’t been my job to work on it in any real fashion since 2012. I’m
sorry I couldn’t help improve the situation for you.


Take it for what it’s worth. To some, not much I’m sure.


Mark Miller
-- 
- Mark

http://about.me/markrmiller

Re: Active directory integration in Solr

2019-11-20 Thread Mark H. Wood

On Mon, Nov 18, 2019 at 03:08:51PM +, Kommu, Vinodh K. wrote:
> Does anyone know that Solr has any out of the box capability to integrate 
> Active directory (using LDAP) when security is enabled? Instead of creating 
> users in security.json file, planning to use users who already exists in 
> active directory so they can use their individual credentials rather than 
> defining in Solr. Did anyone came across similar requirement? If so was there 
> any working solution?

Searching for "solr authentication ldap" turned up this:

https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap.html

ADS also uses Kerberos, and Solr has a Kerberos authN plugin.  Would
that help?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Anyway to encrypt admin user plain text password in Solr

2019-11-14 Thread Mark H. Wood

On Thu, Nov 14, 2019 at 11:35:47AM +, Kommu, Vinodh K. wrote:
> We store the plain text password in basicAuth.conf file. This is a normal 
> file & we are securing it only with 600 file permissions so that others 
> cannot read it. We also run various solr APIs in our custom script for 
> various purposes using curl commands which needs admin user credentials to 
> perform operations. If admin credentials details from basicAuth.conf file or 
> from curl commands are exposed/compromised, eventually any person within the 
> organization who knows credentials can login to admin UI and perform any 
> read/write operations. This is a concern and auditing issue as well.

If the password is encrypted, then the decryption key must be supplied
before the password can be used.  This leads to one of two unfortunate
situations:

o  The user must enter the decryption key every time.  This defeats
   the purpose of storing credentials at the client.

   - or -

o  The decryption key is stored at the client, making it a new secret
   that must be protected (by encrypting it? you see where this is
   going)

There is no way around this.  If the client system stores a full set
of credentials, then anyone with sufficient access to the client
system can get everything he needs to authenticate an identity, no
matter what you do.  If the client system does not store a full set of
credentials, then the user must supply at least some of them whenever
they are needed.  The best one can usually do is to reduce the
frequency at which some credential must be entered manually.

Solr supplies several authentication mechanisms besides BasicAuth.
Would one of those serve?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Solr 7.7.2 Autoscaling policy - Poor performance

2019-09-03 Thread Mark Miller

Hook up a profiler to the overseer and see what it's doing, file a JIRA and
note the hotspots or what methods appear to be hanging out.

On Tue, Sep 3, 2019 at 1:15 PM Andrew Kettmann 
wrote:

>
> > You’re going to want to start by having more than 3gb for memory in my
> opinion but the rest of your set up is more complex than I’ve dealt with.
>
> right now the overseer is set to a max heap of 3GB, but is only using
> ~260MB of heap, so memory doesn't seem to be the issue unless there is a
> part of the picture I am missing there?
>
> Our overseers only jobs are being overseer and holding the .system
> collection. I would imagine if the overseer were hitting memory constraints
> it would have allocated more than 300MB of the total 3GB it is allowed,
> right?
>
> evolve24 Confidential & Proprietary Statement: This email and any
> attachments are confidential and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. It
> is intended for the use of the recipients. If you are not the intended
> recipient, or believe that you have received this communication in error,
> please do not read, print, copy, retransmit, disseminate, or otherwise use
> the information. Please delete this email and attachments, without reading,
> printing, copying, forwarding or saving them, and notify the Sender
> immediately by reply email. No confidentiality or privilege is waived or
> lost by any transmission in error.
>


-- 
- Mark

http://about.me/markrmiller

Re: HttpShardHandlerFactory

2019-08-20 Thread Mark Robinson

Hello Michael,

Thank you for pointing that out.
Today I am planning to try this out along with the insights Shawn had
shared.

Thanks!
Mark.

On Mon, Aug 19, 2019 at 9:21 AM Michael Gibney 
wrote:

> Mark,
>
> Another thing to check is that I believe the configuration you posted may
> not actually be taking effect. Unless I'm mistaken, I think the correct
> element name to configure the shardHandler is "shardHandler*Factory*", not
> "shardHandler" ... as in, ' class="HttpShardHandlerFactory">...'
>
> The element name is documented correctly in the refGuide page for "Format
> of solr.xml":
>
> https://lucene.apache.org/solr/guide/8_1/format-of-solr-xml.html#the-shardhandlerfactory-element
>
> ... but the incorrect (?) element name is included in the refGuide page for
> "Distributed Requests":
>
> https://lucene.apache.org/solr/guide/8_1/distributed-requests.html#configuring-the-shardhandlerfactory
>
> Michael
>
> On Fri, Aug 16, 2019 at 9:40 AM Shawn Heisey  wrote:
>
> > On 8/16/2019 3:51 AM, Mark Robinson wrote:
> > > I am trying to understand the socket time out and connection time out
> in
> > > the HttpShardHandlerFactory:-
> > >
> > > 
> > >10
> > >20
> > > 
> >
> > The shard handler is used when that Solr instance needs to make
> > connections to another Solr instance (which could be itself, as odd as
> > that might sound).  It does not apply to the requests that you make from
> > outside Solr.
> >
> > > 1.Could some one please help me understand the effect of using such low
> > > values of 10 ms
> > >  and 20ms as given above inside my /select handler?
> >
> > A connection timeout of 10 milliseconds *might* result in connections
> > not establishing at all.  This is translated down to the TCP socket as
> > the TCP connection timeout -- the time limit imposed on making the TCP
> > connection itself.  Which as I understand it, is the completion of the
> > "SYN", "SYN/ACK", and "ACK" sequence.  If the two endpoints of the
> > connection are on a LAN, you might never see a problem from this -- LAN
> > connections are very low latency.  But if they are across the Internet,
> > they might never work.
> >
> > The socket timeout of 20 milliseconds means that if the connection goes
> > idle for 20 milliseconds, it will be forcibly closed.  So if it took 25
> > milliseconds for the remote Solr instance to respond, this Solr instance
> > would have given up and closed the connection.  It is extremely common
> > for requests to take 100, 500, 2000, or more milliseconds to respond.
> >
> > > 2. What is the guidelines for setting these parameters? Should they be
> > low
> > > or high
> >
> > I would probably use a value of about 5000 (five seconds) for the
> > connection timeout if everything's on a local LAN.  I might go as high
> > as 15 seconds if there's a high latency network between them, but five
> > seconds is probably long enough too.
> >
> > For the socket timeout, you want a value that's considerably longer than
> > you expect requests to ever take.  Probably somewhere between two and
> > five minutes.
> >
> > > 3. How can I test the effect of this chunk of code after adding it to
> my
> > > /select handler ie I want to
> > >   make sure the above code snippet is working. That is why I gave
> > such
> > > low values and
> > >   thought when I fire a query I would get both time out errors in
> the
> > > logs. But did not!
> > >   Or is it that within the above time frame (10 ms, 20ms) if no
> > request
> > > comes the socket will
> > >   time out and the connection will be lost. So to test this should
> I
> > > give a say 100 TPS load with
> > >   these low values and then increase the values to maybe 1000 ms
> and
> > > 1500 ms respectively
> > >   and see lesser time out error messages?
> >
> > If you were running a multi-server SolrCloud setup (or a single-server
> > setup with multiple shards and/or replicas), you probably would see
> > problems from values that low.  But if Solr never has any need to make
> > connections to satisfy a request, then the values will never take effect.
> >
> > If you want to control these values for requests made from outside Solr,
> > you will need to do it in your client software that is making the
> request.
> >
> > Thanks,
> > Shawn
> >
>

Re: HttpShardHandlerFactory

2019-08-20 Thread Mark Robinson

Hello Shawn,

Thank you so much for the detailed response.
It was so helpful!

Thanks!
Mark.

On Fri, Aug 16, 2019 at 9:40 AM Shawn Heisey  wrote:

> On 8/16/2019 3:51 AM, Mark Robinson wrote:
> > I am trying to understand the socket time out and connection time out in
> > the HttpShardHandlerFactory:-
> >
> > 
> >10
> >20
> > 
>
> The shard handler is used when that Solr instance needs to make
> connections to another Solr instance (which could be itself, as odd as
> that might sound).  It does not apply to the requests that you make from
> outside Solr.
>
> > 1.Could some one please help me understand the effect of using such low
> > values of 10 ms
> >  and 20ms as given above inside my /select handler?
>
> A connection timeout of 10 milliseconds *might* result in connections
> not establishing at all.  This is translated down to the TCP socket as
> the TCP connection timeout -- the time limit imposed on making the TCP
> connection itself.  Which as I understand it, is the completion of the
> "SYN", "SYN/ACK", and "ACK" sequence.  If the two endpoints of the
> connection are on a LAN, you might never see a problem from this -- LAN
> connections are very low latency.  But if they are across the Internet,
> they might never work.
>
> The socket timeout of 20 milliseconds means that if the connection goes
> idle for 20 milliseconds, it will be forcibly closed.  So if it took 25
> milliseconds for the remote Solr instance to respond, this Solr instance
> would have given up and closed the connection.  It is extremely common
> for requests to take 100, 500, 2000, or more milliseconds to respond.
>
> > 2. What is the guidelines for setting these parameters? Should they be
> low
> > or high
>
> I would probably use a value of about 5000 (five seconds) for the
> connection timeout if everything's on a local LAN.  I might go as high
> as 15 seconds if there's a high latency network between them, but five
> seconds is probably long enough too.
>
> For the socket timeout, you want a value that's considerably longer than
> you expect requests to ever take.  Probably somewhere between two and
> five minutes.
>
> > 3. How can I test the effect of this chunk of code after adding it to my
> > /select handler ie I want to
> >   make sure the above code snippet is working. That is why I gave
> such
> > low values and
> >   thought when I fire a query I would get both time out errors in the
> > logs. But did not!
> >   Or is it that within the above time frame (10 ms, 20ms) if no
> request
> > comes the socket will
> >   time out and the connection will be lost. So to test this should I
> > give a say 100 TPS load with
> >   these low values and then increase the values to maybe 1000 ms and
> > 1500 ms respectively
> >   and see lesser time out error messages?
>
> If you were running a multi-server SolrCloud setup (or a single-server
> setup with multiple shards and/or replicas), you probably would see
> problems from values that low.  But if Solr never has any need to make
> connections to satisfy a request, then the values will never take effect.
>
> If you want to control these values for requests made from outside Solr,
> you will need to do it in your client software that is making the request.
>
> Thanks,
> Shawn
>

HttpShardHandlerFactory

2019-08-16 Thread Mark Robinson

Hello,

I am trying to understand the socket time out and connection time out in
the HttpShardHandlerFactory:-

   
  10
  20
   

1.Could some one please help me understand the effect of using such low
values of 10 ms
and 20ms as given above inside my /select handler?

2. What is the guidelines for setting these parameters? Should they be low
or high

3. How can I test the effect of this chunk of code after adding it to my
/select handler ie I want to
 make sure the above code snippet is working. That is why I gave such
low values and
 thought when I fire a query I would get both time out errors in the
logs. But did not!
 Or is it that within the above time frame (10 ms, 20ms) if no request
comes the socket will
 time out and the connection will be lost. So to test this should I
give a say 100 TPS load with
 these low values and then increase the values to maybe 1000 ms and
1500 ms respectively
 and see lesser time out error messages?

I am trying to understand how these parameters can be put to good use.

Thanks!
Mark

Re: Enumerating cores via SolrJ

2019-08-13 Thread Mark H. Wood

On Fri, Aug 09, 2019 at 03:45:21PM -0600, Shawn Heisey wrote:
> On 8/9/2019 3:07 PM, Mark H. Wood wrote:
> > Did I miss something, or is there no way, using SolrJ, to enumerate
> > loaded cores, as:
> > 
> >curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'
> > 
> > does?
> 
> This code will do so.  I tested it.
[snip]

Thank you.  That was just the example I needed.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Solr restricting time-consuming/heavy processing queries

2019-08-13 Thread Mark Robinson

Thank you Jan for the reply.
I will try it out.

Best,
Mark.

On Mon, Aug 12, 2019 at 6:29 PM Jan Høydahl  wrote:

> I have never used such settings, but you could check out
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#segmentterminateearly-parameter
> which will allow you to pre-sort the index so that any early termination
> will actually return the most relevant docs. This will probably be easier
> to setup once https://issues.apache.org/jira/browse/SOLR-13681 is done.
>
> According to that same page you will not be able to abort long-running
> faceting using timeAllowed, but there are other ways to optimize faceting,
> such as using jsonFacet, threaded execution etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 12. aug. 2019 kl. 23:10 skrev Mark Robinson :
>
> Hi Jan,
>
> Thanks for the reply.
> Our normal search times is within 650 ms.
> We were analyzing some queries and found that few of them were like 14675
> ms, 13767 ms etc...
> So was curious to see whether we have some way to restrict the query to
> not run beyond say 5s or some ideal timing  in SOLR even if it returns only
> partial results.
>
> That is how I came across the "timeAllowed" and wanted to check on it.
> Also was curious to know whether  "shardHandler"  could be used to work
> in those lines or it is meant for a totally different functionality.
>
> Thanks!
> Best,
> Mark
>
>
> On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:
>
>> What is the root use case you are trying to solve? What kind of solr
>> install is this and do you not have control over the clients or what is the
>> reason that users overload your servers?
>>
>> Normally you would scale the cluster to handle normal expected load
>> instead of trying to give users timeout exceptions. What kind of query
>> times do you experience that are above 1s and are these not important
>> enough to invest extra HW? Trying to understand the real reason behind your
>> questions.
>>
>> Jan Høydahl
>>
>> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
>> >
>> > Hello,
>> > Could someone share their thoughts please or point to some link that
>> helps
>> > understand my above queries?
>> > In the Solr documentation I came across a few lines on timeAllowed and
>> > shardHandler, but if there was an example scenario for both it would
>> help
>> > understand them more thoroughly.
>> > Also curious to know different ways if any n SOLR to restrict/ limit a
>> time
>> > consuming query from processing for a long time.
>> >
>> > Thanks!
>> > Mark
>> >
>> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
>> > wrote:
>> >
>> >>
>> >> Hello,
>> >> I have the following questions please:-
>> >>
>> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
>> >> "/select" handler and added the following to my new "/selecttimeout":-
>> >>  
>> >>10
>> >>20
>> >>  
>> >>
>> >> 1.
>> >> Does the above mean that if I dont get a request once in 10ms on the
>> >> socket handling the /selecttimeout handler, that socket will be closed?
>> >>
>> >> 2.
>> >> Same with  connTimeOut? ie the connection  object remains live only if
>> at
>> >> least a connection request comes once in every 20 mS; if not the object
>> >> gets closed?
>> >>
>> >> Suppose a time consumeing query (say with lots of facets etc...), is
>> fired
>> >> against SOLR. How can I prevent Solr processing it for not more than
>> 1s?
>> >>
>> >> 3.
>> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
>> ways
>> >> to do this in Solr?
>> >>
>> >> 4
>> >> For the same purpose to prevent heavy queries overloading SOLR, does
>> the
>> >>  above help in anyway or is it that shardHandler has
>> nothing
>> >> to restrict a query once fired against Solr?
>> >>
>> >>
>> >> Could someone pls share your views?
>> >>
>> >> Thanks!
>> >> Mark
>> >>
>>
>
>

Re: Solr restricting time-consuming/heavy processing queries

2019-08-12 Thread Mark Robinson

Hi Jan,

Thanks for the reply.
Our normal search times is within 650 ms.
We were analyzing some queries and found that few of them were like 14675
ms, 13767 ms etc...
So was curious to see whether we have some way to restrict the query to not
run beyond say 5s or some ideal timing  in SOLR even if it returns only
partial results.

That is how I came across the "timeAllowed" and wanted to check on it.
Also was curious to know whether  "shardHandler"  could be used to work in
those lines or it is meant for a totally different functionality.

Thanks!
Best,
Mark


On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:

> What is the root use case you are trying to solve? What kind of solr
> install is this and do you not have control over the clients or what is the
> reason that users overload your servers?
>
> Normally you would scale the cluster to handle normal expected load
> instead of trying to give users timeout exceptions. What kind of query
> times do you experience that are above 1s and are these not important
> enough to invest extra HW? Trying to understand the real reason behind your
> questions.
>
> Jan Høydahl
>
> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
> >
> > Hello,
> > Could someone share their thoughts please or point to some link that
> helps
> > understand my above queries?
> > In the Solr documentation I came across a few lines on timeAllowed and
> > shardHandler, but if there was an example scenario for both it would help
> > understand them more thoroughly.
> > Also curious to know different ways if any n SOLR to restrict/ limit a
> time
> > consuming query from processing for a long time.
> >
> > Thanks!
> > Mark
> >
> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
> > wrote:
> >
> >>
> >> Hello,
> >> I have the following questions please:-
> >>
> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
> >> "/select" handler and added the following to my new "/selecttimeout":-
> >>  
> >>10
> >>20
> >>  
> >>
> >> 1.
> >> Does the above mean that if I dont get a request once in 10ms on the
> >> socket handling the /selecttimeout handler, that socket will be closed?
> >>
> >> 2.
> >> Same with  connTimeOut? ie the connection  object remains live only if
> at
> >> least a connection request comes once in every 20 mS; if not the object
> >> gets closed?
> >>
> >> Suppose a time consumeing query (say with lots of facets etc...), is
> fired
> >> against SOLR. How can I prevent Solr processing it for not more than 1s?
> >>
> >> 3.
> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
> ways
> >> to do this in Solr?
> >>
> >> 4
> >> For the same purpose to prevent heavy queries overloading SOLR, does the
> >>  above help in anyway or is it that shardHandler has
> nothing
> >> to restrict a query once fired against Solr?
> >>
> >>
> >> Could someone pls share your views?
> >>
> >> Thanks!
> >> Mark
> >>
>

Re: Solr restricting time-consuming/heavy processing queries

2019-08-11 Thread Mark Robinson

Hello,
Could someone share their thoughts please or point to some link that helps
understand my above queries?
In the Solr documentation I came across a few lines on timeAllowed and
shardHandler, but if there was an example scenario for both it would help
understand them more thoroughly.
Also curious to know different ways if any n SOLR to restrict/ limit a time
consuming query from processing for a long time.

Thanks!
Mark

On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
wrote:

>
> Hello,
> I have the following questions please:-
>
> In solrconfig.xml I created a new "/selecttimeout" handler copying
> "/select" handler and added the following to my new "/selecttimeout":-
>   
> 10
> 20
>   
>
> 1.
> Does the above mean that if I dont get a request once in 10ms on the
> socket handling the /selecttimeout handler, that socket will be closed?
>
> 2.
> Same with  connTimeOut? ie the connection  object remains live only if at
> least a connection request comes once in every 20 mS; if not the object
> gets closed?
>
> Suppose a time consumeing query (say with lots of facets etc...), is fired
> against SOLR. How can I prevent Solr processing it for not more than 1s?
>
> 3.
> Is this achieved by setting timeAllowed=1000?  Or are there any other ways
> to do this in Solr?
>
> 4
> For the same purpose to prevent heavy queries overloading SOLR, does the
>  above help in anyway or is it that shardHandler has nothing
> to restrict a query once fired against Solr?
>
>
> Could someone pls share your views?
>
> Thanks!
> Mark
>

Enumerating cores via SolrJ

2019-08-09 Thread Mark H. Wood

Did I miss something, or is there no way, using SolrJ, to enumerate
loaded cores, as:

  curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'

does?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Solr restricting time-consuming/heavy processing queries

2019-08-09 Thread Mark Robinson

Hello,
I have the following questions please:-

In solrconfig.xml I created a new "/selecttimeout" handler copying
"/select" handler and added the following to my new "/selecttimeout":-
  
10
20
  

1.
Does the above mean that if I dont get a request once in 10ms on the socket
handling the /selecttimeout handler, that socket will be closed?

2.
Same with  connTimeOut? ie the connection  object remains live only if at
least a connection request comes once in every 20 mS; if not the object
gets closed?

Suppose a time consumeing query (say with lots of facets etc...), is fired
against SOLR. How can I prevent Solr processing it for not more than 1s?

3.
Is this achieved by setting timeAllowed=1000?  Or are there any other ways
to do this in Solr?

4
For the same purpose to prevent heavy queries overloading SOLR, does the
 above help in anyway or is it that shardHandler has nothing
to restrict a query once fired against Solr?


Could someone pls share your views?

Thanks!
Mark

Solr 7.7 restore issue

2019-07-12 Thread Mark Thill

I have a 4 node cluster.  My goal is to have 2 shards with two replicas
each and only allowing 1 core on each node.  I have a cluster policy set to:

[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

I then manually create a collection with:

name: test
config set: test
numShards: 2
replicationFact: 2

This works and I get a collection that looks like what I expect.  I then
backup this collection.  But when I try to restore the collection it fails
and says

"Error getting replica locations : No node can satisfy the rules"
[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

If I set my cluster-policy rules back to [] and try to restore it then
successfully restores my collection exactly how I expect it to be.  It
appears that having any cluster-policy rules in place is affecting my
restore, but the "error getting replica locations" is strange.

Any suggestions?

mark

Solr 7.7 autoscaling trigger

2019-07-08 Thread Mark Thill

My scenario is:

   - 60 GB collection
   - 2 shards of ~30GB
   - Each shard having 2 replicas so I have a backup
   - So I have 4 nodes with each node holding a single core

My goal is to have autoscaling handle when I lose a node.  So upon loss of
a node the nodeLost event deletes the node.  Then when I add back in
another node I want it to replace the node I lost keeping each shard with 2
replicas.   The problem is that I can't find a policy that keeps 2 replicas
per shard because when the nodeAdded event fires it wants to add a 3rd
replica to the shard that already has 2 replicas.  I can't seem to get it
to add the replica to the shard that is left with the single replica.

Any suggestions on a policy to keep this balanced?

Mark

Re: Sort on PointFieldType

2019-07-03 Thread Mark Sholund

My thought is that “greater than” and “less than” are generally undefined for 
n-dimensional points where n>1.
Is (45,45) > (-45,-45)?  If you’re talking about distance from (0,0) they’re 
“equal”. If you’re talking about distance from some arbitrary point then they 
are not necessarily “equal”; what would make one sort higher/lower?

On Wed, Jul 3, 2019 at 2:50 PM, Prince Manohar  
wrote:

> Hi,
> I have a *field* that is of *PointType *and I tried to sort on that field.
> But looks like sorting does not work on PointType.
> Or am I doing something wrong?
> Find my query below:-
> http://localhost:8983/solr/testcollection/select?indent=on=*:*=abc.pqr_d
> DESC=json
> 
>
> --
> *Regards,
> *
> *Prince Manohar
> *
> *B.Tech (InformationTechnology)
> *
> *Bengaluru
> *
> *+91 7797045315
> *-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: qf in conjunction with boost

2019-06-29 Thread Mark Sholund

On further reading it seems that maybe

=map(popularity_d,0,0,1) might work

On Sat, Jun 29, 2019 at 8:56 PM, Shawn Heisey  wrote:

> On 6/27/2019 8:54 PM, Mark Sholund wrote:
>> qf=title^5 description^5 _text_
>>
>> And now I want to include additional boosting based on a popularity
>> score include with some documents. I’ve done this as follows
>>
>> q={!boost b=map(popularity_d,0,0,1)}
>>
>> However now it seems that the score is the same regardless of whether qf
>> is included or not - this renders qf irrelevant to my query.
>
> qf is a parameter for the dismax and edismax query parsers, but your
> query has changed to the boost query parser. It is very unlikely that
> the boost parser uses the qf parameter.
>
> It looks like using edismax with its "boost" parameter MIGHT be what you
> are after:
>
> https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html
>
> The edismax parser also supports the bq and bf parameters from the
> dismax parser:
>
> https://lucene.apache.org/solr/guide/8_1/the-dismax-query-parser.html#bq-boost-query-parameter
>
> Thanks,
> Shawn-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: qf in conjunction with boost

2019-06-29 Thread Mark Sholund

Thanks for you reply.

The problem I ran into with using the boost parameter was that not all of my 
documents have the boosting field and those were coming back with a score of 
zero.  I tried setting a default for that field but it didn’t help. I found 
somewhere that using the boost parser with a map function would get me around 
that and it seemed to until I noticed that the qf parameter was being ignored. 
Maybe I can use bf instead.

Sent from ProtonMail Mobile

On Sat, Jun 29, 2019 at 8:56 PM, Shawn Heisey  wrote:

> On 6/27/2019 8:54 PM, Mark Sholund wrote:
>> qf=title^5 description^5 _text_
>>
>> And now I want to include additional boosting based on a popularity
>> score include with some documents. I’ve done this as follows
>>
>> q={!boost b=map(popularity_d,0,0,1)}
>>
>> However now it seems that the score is the same regardless of whether qf
>> is included or not - this renders qf irrelevant to my query.
>
> qf is a parameter for the dismax and edismax query parsers, but your
> query has changed to the boost query parser. It is very unlikely that
> the boost parser uses the qf parameter.
>
> It looks like using edismax with its "boost" parameter MIGHT be what you
> are after:
>
> https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html
>
> The edismax parser also supports the bq and bf parameters from the
> dismax parser:
>
> https://lucene.apache.org/solr/guide/8_1/the-dismax-query-parser.html#bq-boost-query-parameter
>
> Thanks,
> Shawn-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: qf in conjunction with boost

2019-06-29 Thread Mark Sholund

No responses yet, is my question unclear or is this not possible?

On Thu, Jun 27, 2019 at 10:54 PM, Mark Sholund  
wrote:

> Hello,
>
> I have been using the following to boost based on field content.
>
> qf=title^5 description^5 _text_
>
> And now I want to include additional boosting based on a popularity score 
> include with some documents. I’ve done this as follows
>
> q={!boost b=map(popularity_d,0,0,1)}
>
> However now it seems that the score is the same regardless of whether qf is 
> included or not - this renders qf irrelevant to my query.
>
> Can I do both of these boostings?  A multiplicative boost on the fields is 
> acceptable if that’s possible and simplifies an answer.
>
> - Mark-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

qf in conjunction with boost

2019-06-27 Thread Mark Sholund

Hello,

I have been using the following to boost based on field content.

qf=title^5 description^5 _text_

And now I want to include additional boosting based on a popularity score 
include with some documents. I’ve done this as follows

q={!boost b=map(popularity_d,0,0,1)}

However now it seems that the score is the same regardless of whether qf is 
included or not - this renders qf irrelevant to my query.

Can I do both of these boostings?  A multiplicative boost on the fields is 
acceptable if that’s possible and simplifies an answer.

- Mark-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: SolrInputDocument setField method

2019-06-26 Thread Mark Sholund

I noticed this yesterday as well. The toString() and jsonStr() (in later 
versions) of SolrJ both include things like

toString(): 
{id=id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx), ...}
or
jsonStr(): 
{"id":"id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx)",...}

However Solr does not reject the documents so this must just be an issue with 
the two methods.

On Wed, Jun 26, 2019 at 12:31 PM, Samuel Kasimalla  wrote:

> Hi Vicenzo,
>
> May be looking at the overridden toString() would give you a clue.
>
> The second part, I don't think SolrJ holds it it twice(if you are worried
> about redundant usage of memory), BUT if you haven't used SolrJ so far and
> wanted to know if this is the format in which it pushes to Solr, I'm pretty
> sure it doesn't push this format into Solr.
>
> Thanks,
> Sam
> https://www.linkedin.com/in/skasimalla
>
> On Wed, Jun 26, 2019 at 11:52 AM Vincenzo D'Amore 
> wrote:
>
>> Hi all,
>>
>> I have a very basic question related to the SolrInputDocument behaviour.
>>
>> Looking at SolrInputDocument source code I found how the method setField
>> works:
>>
>> public void setField(String name, Object value )
>> {
>> SolrInputField field = new SolrInputField( name );
>> _fields.put( name, field );
>> field.setValue( value );
>> }
>>
>> The field name is "duplicated" into the SolrInputField.
>>
>> For example, if I'm storing a field "color" with value "red" what we have
>> is a Map like this:
>>
>> { "key" : "color", "value" : { "name" : "color", "value" : "red" } }
>>
>> the name field "color" appears twice. Very likely there is a reason for
>> this, could you please point me in the right direction?
>>
>> For example, I'm worried about at what happens with SolrJ when I'm sending
>> a lot of documents, where for each field the fieldName is sent twice.
>>
>> Thanks,
>> Vincenzo
>>
>>
>> --
>> Vincenzo D'Amore
>>-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19

2019-06-26 Thread Mark Sullivan

All,


THIS EMAIL IS PHISHING AND IMPERSONATED MY EMAIL ADDRESS.


PLEASE IGNORE!


Mark


From: Mark Sullivan
Sent: Wednesday, June 26, 2019 1:29:09 PM
Subject: Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19


Hi,



Mark used box to share INV-6873



Kindly press REVIEW DOCUMENT 
<https://app.box.com/s/st6swhttif7gyh6lzr522ddrsbwjxzj1> to access the secure 
document



Please let us know if there is any skipped invoices.



Thank you

Mark V. Sullivan

CIO & Application Architect

Sobek Digital Hosting and Consulting, LLC

mark.v.sulli...@sobekdigital.com<mailto:mark.v.sulli...@sobekdigital.com>

866-981-5016 (office)

352-682-9692 (mobile)

Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19

2019-06-26 Thread Mark Sullivan

Hi,



Mark used box to share INV-6873



Kindly press REVIEW DOCUMENT 
<https://app.box.com/s/st6swhttif7gyh6lzr522ddrsbwjxzj1> to access the secure 
document



Please let us know if there is any skipped invoices.



Thank you
Mark V. Sullivan
CIO & Application Architect
Sobek Digital Hosting and Consulting, LLC
mark.v.sulli...@sobekdigital.com<mailto:mark.v.sulli...@sobekdigital.com>
866-981-5016 (office)
352-682-9692 (mobile)

bug in SolrInputDocument.toString()?

2019-06-25 Thread Mark Sholund

Hello,
First time poster here so sorry for any formatting problems.  I am sorry if 
this has been asked before but I've tried several version of SolrJ 
(6.5.1-8.1.1) with the same result.

I am runing the following example code and am seeing odd output.

String id = "foo123";
int popularity=1;
SolrInputDocument inputDocument = new SolrInputDocument();
System.out.println("creating document for " + id);
inputDocument.addField(ID_FIELD, id);
inputDocument.addField(POPULARITY_FIELD, Collections.singletonMap("set", 
popularity));
// System.out.println("document: " + inputDocument);
System.out.println("json: " + inputDocument.jsonStr());

This produces the output

creating document for foo123
document: {id=id=foo123, popularity_i=popularity_i={set=1}}
json: {"id":"id=foo123","popularity_i":"popularity_i={set=1}"}

I cannot see anything that I am doing wrong and the update succeeds.  Is this 
just a bug in the toString() and jsonStr() methods?

publickey - mark.d.sholund@protonmail.com - 0x9EF69757.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

Re: SOLR Suggester returns either the full field value or single terms only

2019-06-20 Thread Mark H. Wood

On Wed, Jun 19, 2019 at 12:20:43PM -0700, ppunet wrote:
> As the SuggeterComponent provides the 'entire content' of the field in the
> suggestions. How is it possible to have Suggester to return only part of the
> content of the field, instead of the entire content, which in my scenario
> quite long?

Possibly worthless newbie suggestion:  could you use highlighting to
locate the text that triggered the suggestion, and just chop off
leading and trailing context down to a reasonable length surrounding
the match?  Kind of like you'd see in a printed KWIC index:  give as
much context as will fit the available space, and don't worry about
the rest.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal

Disregard my previous response.  When I reindexed, something went wrong and
so my Lucene database was empty, which explains the immediate results and 0
results.  I reindexed again (properly) and all is working find now.  Thanks
for the help.
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>

Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal

I added "posttime" to the schema first thing this morning, but your message
reminded me that I needed to re-index the table, which I did.  My schema
entry:



But my SQL contains "SELECT posttime as id" as so I tried both "posttime"
and "id" in my setParam() function, namely,
query.setParam("fq", "id:[2007-01-01T00:00:00Z TO 2010-01-01T00:00:00Z]");

So, whether I use "id" (string) or "posttime" (date), my results are an
immediate return of zero results.

I did look in the admin interface and *did* see posttime listed as one of
the index items.  The two rows (Index Analyzer and Query Analyzer) show the
same thing: org.apache.solr.schema.FieldType$DefaultAnalyzer, though I'm
not certain of the implications of this.

I have not attempted your =query suggestion just yet...
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>

Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal

So, instead of addDateRangeFacet(), I used:
query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
2015-01-01T00:00:00Z]");

I didn't get any errors, but the query returned immediately with 0
results.  Without this contraint, it searches 13,000 records and takes 1 to
2 minutes and returns 356 records.  So something is not quite right, and
I'm too new at this to understand where I went wrong.
Mark

On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
wrote:

> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> it doesn't have any constraint on the results (i.e. it doesn't filter at
> all).
> You need to add a filter query [1] with a date range clause (e.g.
> fq=field:[ TO  or *>]).
>
> Best,
> Andrea
>
> [1]
>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>
> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> > Hello!
> >
> > I have a search setup and it works fine.  I search a text field called
> > "logtext" in a database table.  My Java code is like this:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > query.setParam("df", "logtext");
> >
> > Then I execute the search... and it works just great.  But now I want to
> > add a constraint to only search for the "searchWord" within a certain
> range
> > of time -- given timestamps in the column called "posttime".  So, I added
> > the code in bold below:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > *query.setFacet(true);*
> > *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
> -
> > 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> /*
> > from 1 year ago to present) */*
> > query.setParam("df", "logtext");
> >
> > But this gives me a complaint: *undefined field: "posttime"* so I clearly
> > do not understand the arguments needed to addDateRangeFacet().  Can
> someone
> > help me determine the proper code for doing what I want?
> >
> > Further, I am puzzled about the "gap" argument [last one in
> > addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> have
> > no idea the purpose of this.  I haven't found any documentation that
> > explains this well.
> >
> > Mark
> >
>
>

searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal

Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark

Proper type(s) for adding a DatePointField value [was: problems with indexing documents]

2019-04-04 Thread Mark H. Wood

One difficulty is that the documentation of
SolrInputDocument.addField(String, Object) is not at all specific.
I'm aware of SOLR-2298 and I accept that the patch is an improvement,
but still...

  @param value Value of the field, should be of same class type as
  defined by "type" attribute of the corresponding field in
  schema.xml.

The corresponding 's 'type' attribute is an arbitrary label
referencing the 'name' attribute of a .  It could be
"boysenberry" or "axolotl".  So we need to look at the 'class'
attribute of the fieldType?  So, if I have in my schema:

  
  

then I need to pass an instance of DatePointField?

  myDoc.addField("created", new DatePointField(bla bla));

That doesn't seem right, but go ahead and surprise me.

But I *know* that it accepts a properly formatted String value for a
field using DatePointField.  So, how can I determine the set of Java
types that is accepted as a new field value for a field whose field
type's class attribute is X?  And where should I have read that?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature

Concurrent Updates

2019-04-02 Thread Mark Johnson

We have a SolrCloud cluster (of 3 nodes) running solr 6.4.2. Every night,
we delete and recreate our whole catalog. In this process, we're
simultaneously running a query which recreates the product catalog (which
includes child documents of a different type) and a query that creates a
third document type that we use for joining. When we issue a search against
one shard, we see the response we expect. But when we issue the same search
against another shard, instead of the prescribed child documents, we'll
have children that are this third type of document.

This seems to only affect the occasional document. We're wondering if
anybody out there has experience with this, and might have some ideas as to
why it is happening. Thanks so much.

-- 
*This message is intended only for the use of the
individual or entity to 
which it is addressed and may contain information that
is privileged, 
confidential and exempt from disclosure under applicable law. If
you have 
received this message in error, you are hereby notified that any use,

dissemination, distribution or copying of this message is prohibited. If 
you
have received this communication in error, please notify the sender 
immediately
and destroy the transmitted information.*

Re: problems with indexing documents

2019-04-02 Thread Mark H. Wood

I'm also working on this with Bill.

On Tue, Apr 02, 2019 at 09:44:16AM +0800, Zheng Lin Edwin Yeo wrote:
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "-MM-DDTHH:MM:SSZ"?

As may be seen from the sample code:

> > doc.addField ( "date", new java.util.Date() );

we were not using a string format at all, but passing a java.util.Date
object.  In the past this was interpreted successfully and correctly.
After upgrading, we get an error:

> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

which suggests to me that something in or below
SolrInputDocument.addField(String, Object) is applying Date.toString()
to the Object, which yields a string format that Solr does not
understand.

I am dealing with this by trying to hunt down all the places where
Date was passed to addField, and explicitly convert it to a String in
Solr format.  But we would like to know if there is a better way, or
at least what I did wrong.

The SolrJ documentation says nothing about how the field value Object
is handled.  It does say that it should match the schema, but I can
find no table showing what Java object types "match" the stock schema
fieldtype classes such as DatePointField.  I would naively suppose that
j.u.Date is a particularly *good* match for DatePointField.  What have
I missed?

> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen  wrote:
> 
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc
Description: PGP signature

Re: Setting up MiniSolrCloudCluster to use pre-built index

2018-10-24 Thread Mark Miller

The merge can be really fast - it can just dump in the new segments and
rewrite the segments file basically.

I guess for you want, that's perhaps not the ideal route though. You could
maybe try and use collection aliases.

I thought about adding shard aliases way back, but never got to it.

On Tue, Oct 23, 2018 at 7:10 PM Ken Krugler 
wrote:

> Hi Mark,
>
> I’ll have a completely new, rebuilt index that’s (a) large, and (b)
> already sharded appropriately.
>
> In that case, using the merge API isn’t great, in that it would take
> significant time and temporarily use double (or more) disk space.
>
> E.g. I’ve got an index with 250M+ records, and about 200GB. There are
> other indexes, still big but not quite as large as this one.
>
> So I’m still wondering if there’s any robust way to swap in a fresh set of
> shards, especially without relying on legacy cloud mode.
>
> I think I can figure out where the data is being stored for an existing
> (empty) collection, shut that down, swap in the new files, and reload.
>
> But I’m wondering if that’s really the best (or even sane) approach.
>
> Thanks,
>
> — Ken
>
> On May 19, 2018, at 6:24 PM, Mark Miller  wrote:
>
> You create MiniSolrCloudCluster with a base directory and then each Jetty
> instance created gets a SolrHome in a subfolder called node{i}. So if
> legacyCloud=true you can just preconfigure a core and index under the right
> node{i} subfolder. legacyCloud=true should not even exist anymore though,
> so the long term way to do this would be to create a collection and then
> use the merge API or something to merge your index into the empty
> collection.
>
> - Mark
>
> On Sat, May 19, 2018 at 5:25 PM Ken Krugler 
> wrote:
>
> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
>
> - Mark
> about.me/markrmiller
>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

-- 
- Mark

http://about.me/markrmiller

Re: Setting up MiniSolrCloudCluster to use pre-built index

2018-05-19 Thread Mark Miller

You create MiniSolrCloudCluster with a base directory and then each Jetty
instance created gets a SolrHome in a subfolder called node{i}. So if
legacyCloud=true you can just preconfigure a core and index under the right
node{i} subfolder. legacyCloud=true should not even exist anymore though,
so the long term way to do this would be to create a collection and then
use the merge API or something to merge your index into the empty
collection.

 - Mark

On Sat, May 19, 2018 at 5:25 PM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
- Mark
about.me/markrmiller

Re: question about updates to shard leaders only

2018-05-15 Thread Mark Miller

Yeah, basically ConcurrentUpdateSolrClient is a shortcut to getting multi
threaded bulk API updates out of the single threaded, single update API.
The downsides to this are: It is not cloud aware - you have to point it at
a server, you have to add special code to see if there are any errors, you
don't get any fine grained error information back, you still basically have
to break up updates into batches of success/fail units but with fewer
guard rails.

If you want to bulk load it usually makes much more sense to use the bulk
api on CloudSolrServer and treat the whole group of updates as a single
success/fail unit.

- Mark

On Tue, May 15, 2018 at 9:25 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> bq. But don't forget a final client.add(list) after the while-loop ;-)
>
> Ha! But only "if (list.size() > 0)"
>
> And then there was the memorable time I forgot the "list.clear()" when
> I sent the batch and wondered why my indexing progress got slower and
> slower...
>
> Not to mention the time I re-used the same SolrInputDocument that got
> bigger and bigger and bigger.
>
> Not to mention the other zillion screw-ups I've managed to perpetrate
> in my career "Who wrote this stupid code? Oh, wait, it was me.
> DON'T LOOK!!!"...
>
> Astronomy anecdote
>
> Dale Vrabeck...was at a party with [Rudolph] Minkowski and Dale said
> he’d heard about the astronomer who had exposed a plate all night and
> then put it in the hypo first. Minkowski said, “It was three nights,
> and it was me.”
>
> On Tue, May 15, 2018 at 10:10 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> > On 5/15/2018 12:12 AM, Bernd Fehling wrote:
> >>
> >> OK, I have the CloudSolrClient with SolrJ now running but it seams
> >> a bit slower compared to ConcurrentUpdateSolrClient.
> >> This was not expected.
> >> The logs show that CloudSolrClient send the docs only to the leaders.
> >>
> >> So the only advantage of CloudSolrClient is that it is "Cloud aware"?
> >>
> >> With ConcurrentUpdateSolrClient I get about 1600 docs/sec for loading.
> >> With CloudSolrClient I get only about 1200 docs/sec.
> >
> >
> > ConcurrentUpdateSolrClient internally puts all indexing requests on a
> queue
> > and then can use multiple threads to do parallel indexing in the
> backround.
> > The design of the client has one big disadvantage -- it returns control
> to
> > your program immediately (before indexing actually begins) and always
> > indicates success.  All indexing errors are swallowed.  They are logged,
> but
> > the calling program is never informed that any errors have occurred.
> >
> > Like all other SolrClient implementations, CloudSolrClient is
> thread-safe,
> > but it is not multi-threaded unless YOU create multiple threads that all
> use
> > the same client object.  Full error handling is possible with this
> client.
> > It is also fully cloud aware, adding and removing Solr servers as the
> > SolrCloud changes, without needing to be reconfigured or recreated.
> >
> > Thanks,
> > Shawn
> >
>
-- 
- Mark
about.me/markrmiller

Re: Solr soft commits

2018-05-10 Thread Mark Miller

A soft commit does not control merging. The IndexWriter controls merging
and hard commits go through the IndexWriter. A soft commit tells Solr to
try and open a new SolrIndexSearcher with the latest view of the index. It
does this with a mix of using the on disk index and talking to the
IndexWriter to see updates that have not been committed.

Opening a new SolrIndexSearcher using the IndexWriter this way does have a
cost. You may flush segments, you may apply deletes, you may have to
rebuild partial or full in memory data structures. It's generally much
faster than a hard commit to get a refreshed view of the index though.

Given how SolrCloud was designed, it's usually best to set an auto hard
commit to something that works for you, given how large it will make tlogs
(affecting recovery times), and how much RAM is used. Then use soft commits
for visibility. It's best to use them as infrequently as your use case
allows.

- Mark

On Thu, May 10, 2018 at 10:49 AM Shivam Omar <shivam.o...@jeevansathi.com>
wrote:

> Hi,
>
> I need some help in understanding solr soft commits.  As soft commits are
> about visibility and are fast in nature. They are advised for nrt use
> cases. I want to understand does soft commit also honor merge policies and
> do segment merging for docs in memory. For example, in case, I keep hard
> commit interval very high and allow few million documents to be in memory
> by using soft commit with no hard commit, can it affect solr query time
> performance.
>
>
> Shivam
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> DISCLAIMER
> This email and any files transmitted with it are intended solely for the
> person or the entity to whom they are addressed and may contain information
> which is Confidential and Privileged. Any misuse of the information
> contained in this email, including but not limited to retransmission or
> dissemination of the said information by person or entities other than the
> intended recipient is unauthorized and strictly prohibited. If you are not
> the intended recipient of this email, please delete this email and contact
> the sender immediately.
>
-- 
- Mark
about.me/markrmiller

Re: question about updates to shard leaders only

2018-05-09 Thread Mark Miller

It's been a while since I've been in this deeply, but it should be
something like:

sendUpdateOnlyToShardLeaders will select the leaders for each shard as the
load balanced targets for update. The updates may not go to the *right*
leader, but only the leaders will be chosen, followers (non leader
replicas) will not be part of the load balanced server list.

sendDirectUpdatesToShardLeadersOnly is the same, followers are not part of
the mix, but also, updates are sent directly to the right leader as long as
the right hashing field is specified (id by default). We hash the id client
side and know where it should end up.

Optimally, you want sendDirectUpdatesToShardLeadersOnly to be true
configured with the correct id field.

- Mark

On Wed, May 9, 2018 at 4:54 AM Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
wrote:

> Hi list,
>
> while going from single core master/slave to cloud multi core/node
> with leader/replica I want to change my SolrJ loading, because
> ConcurrentUpdateSolrClient isn't cloud aware and has performance
> impacts.
> I want to use CloudSolrClient with LBHttpSolrClient and updates
> should only go to shard leaders.
>
> Question, what is the difference between sendUpdatesOnlyToShardLeaders
> and sendDirectUpdatesToShardLeadersOnly?
>
> Regards,
> Bernd
>
-- 
- Mark
about.me/markrmiller

Re: 7.3 pull replica with 7.2 tlog leader

2018-05-06 Thread Mark Miller

Yeah, the project should never use built in serialization. I'd file a JIRA
issue. We should remove this when we can.

- Mark

On Sun, May 6, 2018 at 9:39 PM Will Currie <w...@currie.id.au> wrote:

> Premise: During an upgrade I should be able to run a 7.3 pull replica
> against a 7.2 tlog leader. Or vice versa.
>
> Maybe I'm totally wrong in assuming that!
>
> Assuming that's correct it looks like adding a new method[1] to
> SolrResponse has broken binary compatibility. When I try to register a new
> pull replica using the admin api[2] I get an HTTP 500 responseI see this
> error logged: java.io.InvalidClassException:
> org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
> classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -793110010336024264
>
> The replica actually seems to register ok it just can't read the response
> because the bytes from the 7.2 leader include a different serialVersionUID.
>
> Should SolrResponse include a serialVersionIUID? All subclasses too.
>
> It looks like stock java serialization is only used for these admin
> responses. Query responses use JavaBinCodec instead..
>
> Full(ish) stack trace:
>
> ERROR HttpSolrCall null:org.apache.solr.common.SolrException:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> local class incompatible: st
> ream classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -7931100103360242645
> at
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:73)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:348)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:256)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
> at
>
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
>
> [1]
>
> https://github.com/apache/lucene-solr/commit/5ce83237e804ac1130eaf5cf793955667793fee0#diff-b809fa594f93aa6805381029a188e4e2R46
> [2]
>
> http://localhost:8983/solr/admin/collections?action=ADDREPLICA=blah=shard1=blah=pull
>
> Thanks,
> Will
>
-- 
- Mark
about.me/markrmiller

Re: UIMA-SOLR integration

2018-03-29 Thread Mark Robinson

Thanks much Steve for the suggestions and pointers.

Best,
Mark

On Thu, Mar 29, 2018 at 3:17 PM, Steve Rowe <sar...@gmail.com> wrote:

> Hi Mark,
>
> Not sure about the advisability of pursuing UIMA - I’ve never used it with
> Lucene or Solr - but soon-to-be-released Solr 7.3, will include OpenNLP
> integration:
>
> * Language analysis, in the Solr reference guide: <
> https://builds.apache.org/view/L/view/Lucene/job/Solr-
> reference-guide-7.3/javadoc/language-analysis.html#opennlp-integration>
>
> * Language detection, in the Solr reference guide: <
> https://builds.apache.org/view/L/view/Lucene/job/Solr-
> reference-guide-7.3/javadoc/detecting-languages-during-indexing.html>
>
> * NER, in javadocs (sorry, couldn’t think of a place where a pre-release
> HTML view is available): <https://git-wip-us.apache.
> org/repos/asf?p=lucene-solr.git;a=blob;f=solr/contrib/
> analysis-extras/src/java/org/apache/solr/update/processor/
> OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java;hb=
> refs/heads/branch_7_3#l60>
>
> --
> Steve
> www.lucidworks.com
>
> > On Mar 29, 2018, at 6:40 AM, Mark Robinson <mark123lea...@gmail.com>
> wrote:
> >
> > Hi All,
> >
> > Is it still advisable to pursue UIMA or can some one pls advise something
> > else to check on related to SOLR and NLP?
> >
> > Thanks!
> > Mark
> >
> >
> > -- Forwarded message --
> > From: Mark Robinson <mark123lea...@gmail.com>
> > Date: Wed, Mar 28, 2018 at 2:21 PM
> > Subject: UIMA-SOLR integration
> > To: solr-user@lucene.apache.org
> >
> >
> > Hi,
> >
> > I was trying to integrate UIMA into SOLR following the solr docs and many
> > other hints on the net.
> > While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and
> got
> > the following advice:-
> >
> > "As announced a year a go the Alchemy Service was scheduled and shutdown
> on
> > March 7th, 2018, and is no longer supported.  The AlchemAPI services was
> > broken down into three other services where AlchemyLanguage has been
> > replaced by Natural Language Understanding, AlchemyVision by Visual
> > Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
> > migrated to the respective merged service in order to be able to take
> > advantage of the features."
> >
> > Could someone please share any other suggestions instead of having to
> > use ALCHEMYAPI so that I can still continue with my work.
> >
> > Note:- I already commented out OPENCALAIS references in
> > OverridingParamsExtServicesAE.xml as I was getting error with OPEN
> > CALAIS so was relying only on AlchemyAPI only.
> >
> > Any immediate help is greatly appreciated!
> >
> > Thanks!
> >
> > Mark
>
>

Fwd: UIMA-SOLR integration

2018-03-29 Thread Mark Robinson

Hi All,

Is it still advisable to pursue UIMA or can some one pls advise something
else to check on related to SOLR and NLP?

Thanks!
Mark


-- Forwarded message --
From: Mark Robinson <mark123lea...@gmail.com>
Date: Wed, Mar 28, 2018 at 2:21 PM
Subject: UIMA-SOLR integration
To: solr-user@lucene.apache.org


Hi,

I was trying to integrate UIMA into SOLR following the solr docs and many
other hints on the net.
While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and got
the following advice:-

"As announced a year a go the Alchemy Service was scheduled and shutdown on
March 7th, 2018, and is no longer supported.  The AlchemAPI services was
broken down into three other services where AlchemyLanguage has been
replaced by Natural Language Understanding, AlchemyVision by Visual
Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
migrated to the respective merged service in order to be able to take
advantage of the features."

Could someone please share any other suggestions instead of having to
use ALCHEMYAPI so that I can still continue with my work.

Note:- I already commented out OPENCALAIS references in
OverridingParamsExtServicesAE.xml as I was getting error with OPEN
CALAIS so was relying only on AlchemyAPI only.

Any immediate help is greatly appreciated!

Thanks!

Mark

UIMA-SOLR integration

2018-03-28 Thread Mark Robinson

Hi,

I was trying to integrate UIMA into SOLR following the solr docs and many
other hints on the net.
While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and got
the following advice:-

"As announced a year a go the Alchemy Service was scheduled and shutdown on
March 7th, 2018, and is no longer supported.  The AlchemAPI services was
broken down into three other services where AlchemyLanguage has been
replaced by Natural Language Understanding, AlchemyVision by Visual
Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
migrated to the respective merged service in order to be able to take
advantage of the features."

Could someone please share any other suggestions instead of having to
use ALCHEMYAPI so that I can still continue with my work.

Note:- I already commented out OPENCALAIS references in
OverridingParamsExtServicesAE.xml as I was getting error with OPEN
CALAIS so was relying only on AlchemyAPI only.

Any immediate help is greatly appreciated!

Thanks!

Mark

Multi-core logging - can't tell which core?

2018-01-17 Thread Mark Sullivan

I am migrating a good number of cores over to the latest instance of solr 
(well, 7.1.0) installed locally.  It is working well, but my code is 
occasionally sending requests to search or index an old field that was replaced 
in the schema.


I see this in the logging, but I can't determine which core the log comes from. 
  How can I tell which core is receiving the offending requests?


Many thanks in advance!


Mark

Re: Regex Phrases

2017-03-23 Thread Mark Johnson

So I managed to get the tokenizing to work with
both PatternTokenizerFactory and WordDelimiterFilterFactory (used in
combination with WhitespaceTokenizerFactory). For PT I used a regex that
matches the various permutations of the phrases, and for WDF/WT I used
protected words with every permutation (there are only 40 or 50).

In both cases, via the admin/analysis screen, the Index and Query values
were tokenized correctly (for example, "Super Vitamin C" was tokenized as
"Super" and "Vitamin C").

However, when I do a query like "DisplayName:(Super Vitamin C)" with
"debug=query", I see that the parsed query is "DisplayName:Super
DisplayName:Vitamin DisplayName:C" ("DisplayName" is the field I'm working
on here).

Shouldn't that instead be parsed as something like "DIsplayName:Super
DisplayName:"Vitamin C"" or something similar? Or am I not understanding
how query parsing works?

In either case, I'm seeing results where DisplayName contains things like
"Vitamin B 90 Caps" or "Super Orange 30 pkts", neither of which contain the
phrase "Vitamin C", so I suspect something is wrong.

On Thu, Mar 23, 2017 at 8:08 AM, Joel Bernstein <joels...@gmail.com> wrote:

> You can also checkout
> https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-
> RegularExpressionPatternTokenizer
> .
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Susheel:
> >
> > That'll work, but the options you've specified for
> > WordDelimiterFilterFactory pretty much make it so it's doing nothing.
> > I realize it's commented out...
> >
> > That said, it's true that if you have a very specific pattern you want
> > to recognize a Regex can do the trick. WDFF is a bit more generic
> > though when you have less specific requirements.
> >
> > Best,
> > Erick
> >
> > On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com>
> > wrote:
> > > I have used PatternReplaceFilterFactory in some of these situations.
> e.g.
> > > below
> > >
> > >  
>  > > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$"
> > > replacement="$1$2$3"/>
> > >
> > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <
> > mjohn...@emersonecologics.com
> > >> wrote:
> > >
> > >> Awesome, thank you much!
> > >>
> > >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <
> > erickerick...@gmail.com>
> > >> wrote:
> > >>
> > >> > Take a close look at WordDelimiterFilterFactory, it's designed to
> deal
> > >> > with things like part numbers, phone numbers and the like, and the
> > >> > example you gave is in the same class of problem I think. It'll take
> > >> > a bit to get your head around what it does, but it'll perfom better
> > >> > than regexes, assuming you can get what you need out of it.
> > >> >
> > >> > And the admin/analysis page will help you _greatly_ in understanding
> > >> > what the effects of the various parameters are.
> > >> >
> > >> > Best,
> > >> > Erick
> > >> >
> > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> > >> > <mjohn...@emersonecologics.com> wrote:
> > >> > > Is it possible to configure Solr to treat text that matches a
> regex
> > as
> > >> a
> > >> > > phrase?
> > >> > >
> > >> > > I have a database full of products, and the Title and Description
> > >> fields
> > >> > > are text_en, tokenized via the StandardTokenizerFactory. This
> works
> > in
> > >> > most
> > >> > > cases, but a number of products have names like:
> > >> > >
> > >> > >  - Vitamin A
> > >> > >  - Vitamin-A
> > >> > >  - Vitamin B12
> > >> > >  - Vitamin B-12
> > >> > > ...and so on
> > >> > >
> > >> > > I have a regex that will match all of the permutations and would
> > like
> > >> to
> > >> > > configure the field type so that anything that matches the regex
> > >> pattern
> > >> > is
> > >> > > treated as a single token, instead of being broken up by spaces,
>

Re: Regex Phrases

2017-03-22 Thread Mark Johnson

Awesome, thank you much!

On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Take a close look at WordDelimiterFilterFactory, it's designed to deal
> with things like part numbers, phone numbers and the like, and the
> example you gave is in the same class of problem I think. It'll take
> a bit to get your head around what it does, but it'll perfom better
> than regexes, assuming you can get what you need out of it.
>
> And the admin/analysis page will help you _greatly_ in understanding
> what the effects of the various parameters are.
>
> Best,
> Erick
>
> On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > Is it possible to configure Solr to treat text that matches a regex as a
> > phrase?
> >
> > I have a database full of products, and the Title and Description fields
> > are text_en, tokenized via the StandardTokenizerFactory. This works in
> most
> > cases, but a number of products have names like:
> >
> >  - Vitamin A
> >  - Vitamin-A
> >  - Vitamin B12
> >  - Vitamin B-12
> > ...and so on
> >
> > I have a regex that will match all of the permutations and would like to
> > configure the field type so that anything that matches the regex pattern
> is
> > treated as a single token, instead of being broken up by spaces, etc. Is
> > that possible?
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

<http://www.emersonecologics.com/>  <https://wellevate.me/#/>

*Supporting The Practice Of Healthy Living*

<http://blog.emersonecologics.com/>
<https://www.linkedin.com/company/emerson-ecologics>
<https://www.facebook.com/emersonecologics/>
<https://twitter.com/EmersonEcologic>
<https://www.instagram.com/emerson_ecologics/>
<https://www.pinterest.com/emersonecologic/>
<https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm>

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Regex Phrases

2017-03-22 Thread Mark Johnson

Is it possible to configure Solr to treat text that matches a regex as a
phrase?

I have a database full of products, and the Title and Description fields
are text_en, tokenized via the StandardTokenizerFactory. This works in most
cases, but a number of products have names like:

 - Vitamin A
 - Vitamin-A
 - Vitamin B12
 - Vitamin B-12
...and so on

I have a regex that will match all of the permutations and would like to
configure the field type so that anything that matches the regex pattern is
treated as a single token, instead of being broken up by spaces, etc. Is
that possible?

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Thank you for the heads up! I think in some cases we will want to strip out
punctuation but in others we might need it (for example, "liquid courage."
should tokenize to "liquid" and "courage", while "1.5 oz liquid courage"
should tokenize to "1.5", "oz", "liquid" and "courage").

I'll have to do some experimenting to see which one will work best for us.

On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Yeah, they've saved me on numerous occasions, glad to see they helped.
>
> One caution BTW when you start changing fieldTypes is you have to
> watch punctuation. StandardTokenizerFactory won't pass through most
> punctuation.
>
> WordDelimiterFilterFactory breaks on non alpha-num, including
> punctuation effectively throwing it out.
>
> But WhitespaceTokenizer does just that and spits out punctuation as
> part of tokens, i.e.
> "my words." (note period) is broken up as "my" "words." and wouldn't
> match a search on "word".
>
> One other note, there's a tokenizer/filter for a zillion different
> cases, you can go wild. Here's a partial
> list:https://cwiki.apache.org/confluence/display/solr/
> Understanding+Analyzers%2C+Tokenizers%2C+and+Filters,
> see the "Tokenizer", "Filters" and CharFilters" links. There are 12
> tokenizers listed and 40 or so filters... and the list is not
> guaranteed to be complete.
>
> On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> My guess: Your analysis chain for the fields is different, i.e. they
> >> have a different fieldType. In particular, watch out for the "string"
> >> type, people are often confused about it. It does _not_ break input
> >> into tokens, you need a text-based field type, text_en is one example
> >> that is usually in the configs by default.
> >>
> >> Two tools that'll help you enormously:
> >>
> >> admin UI>>select core (or collection) from the drop-down>>analysis
> >> That shows you exactly how Solr/Lucene break up text at query and index
> >> time
> >>
> >> add =query to the URL. That'll show you how the query was parsed.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> >> <mjohn...@emersonecologics.com> wrote:
> >> > Oh, great! Thank you!
> >> >
> >> > So if I switch over to eDisMax I'd specify the fields to query via the
> >> "qf"
> >> > parameter, right? That seems to have the same result (only matches
> when I
> >> > specify the exact phrase in the field, not just certain words from
> it).
> >> >
> >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> >> df is default field - you can only give one. To search over multiple
> >> >> fields, you switch to eDisMax query parser and fl parameter.
> >> >>
> >> >> Then, the question will be what type definition your fields have.
> When
> >> you
> >> >> search text field, you are using its definition because of copyField.
> >> Your
> >> >> original fields may be strings.
> >> >>
> >> >> Remember to reload core and reminded when you change definitions.
> >> >>
> >> >> Regards,
> >> >>Alex
> >> >>
> >> >>
> >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com>
> >> >> wrote:
> >> >>
> >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but I
> >> >> can't
> >> >> > seem to find an explanation for the behavior I'm seeing.
> >> >> >
> >> >> > If I have a document that look

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Wow, that's really powerful! Thank you!

On Thu, Mar 16, 2017 at 11:19 AM, Charlie Hull <char...@flax.co.uk> wrote:

> Hi Mark,
>
> Open Source Connection's excellent www.splainer.io might also be useful to
> help you break down exactly what your query is doing.
>
> Cheers
>
> Charlie
>
> P.S. planning a blog soon listing 'useful Solr tools'
>
> On 16 March 2017 at 14:39, Mark Johnson <mjohn...@emersonecologics.com>
> wrote:
>
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > My guess: Your analysis chain for the fields is different, i.e. they
> > > have a different fieldType. In particular, watch out for the "string"
> > > type, people are often confused about it. It does _not_ break input
> > > into tokens, you need a text-based field type, text_en is one example
> > > that is usually in the configs by default.
> > >
> > > Two tools that'll help you enormously:
> > >
> > > admin UI>>select core (or collection) from the drop-down>>analysis
> > > That shows you exactly how Solr/Lucene break up text at query and index
> > > time
> > >
> > > add =query to the URL. That'll show you how the query was parsed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> > > <mjohn...@emersonecologics.com> wrote:
> > > > Oh, great! Thank you!
> > > >
> > > > So if I switch over to eDisMax I'd specify the fields to query via
> the
> > > "qf"
> > > > parameter, right? That seems to have the same result (only matches
> > when I
> > > > specify the exact phrase in the field, not just certain words from
> it).
> > > >
> > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> df is default field - you can only give one. To search over multiple
> > > >> fields, you switch to eDisMax query parser and fl parameter.
> > > >>
> > > >> Then, the question will be what type definition your fields have.
> When
> > > you
> > > >> search text field, you are using its definition because of
> copyField.
> > > Your
> > > >> original fields may be strings.
> > > >>
> > > >> Remember to reload core and reminded when you change definitions.
> > > >>
> > > >> Regards,
> > > >>Alex
> > > >>
> > > >>
> > > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but
> > I
> > > >> can't
> > > >> > seem to find an explanation for the behavior I'm seeing.
> > > >> >
> > > >> > If I have a document that looks like this:
> > > >> > {
> > > >> > field1: "aaa bbb",
> > > >> > field2: "ccc ddd",
> > > >> > field3: "eee fff"
> > > >> > }
> > > >> >
> > > >> > And I do a search where "q" is "aaa ccc", I get the document in
> the
> > > >> > results. This is because (please correct me if I'm wrong) the
> > default
> > > >> "df"
> > > >> > is set to the "_text_" field, which contains the text values from
> > all
> > > >> > fields.
> > > >> >
> > > >> > However, if I do a search where "df" is "field1" and "field2" and
> > "q"
> > > is
> > > >> > "aaa ccc" (words from field1 and field2) I get no results.
> > > >> >
> > > >> > In a simpler example, if I do a searc

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

You're right! The fields I'm searching are all "string" type. I switched to
"text_en" and now it's working exactly as I need it to! I'll do some
research to see if "text_en" or another "text" type field is best for our
needs.

Also, those debug options are amazing! They'll help tremendously in the
future.

Thank you much!

On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> My guess: Your analysis chain for the fields is different, i.e. they
> have a different fieldType. In particular, watch out for the "string"
> type, people are often confused about it. It does _not_ break input
> into tokens, you need a text-based field type, text_en is one example
> that is usually in the configs by default.
>
> Two tools that'll help you enormously:
>
> admin UI>>select core (or collection) from the drop-down>>analysis
> That shows you exactly how Solr/Lucene break up text at query and index
> time
>
> add =query to the URL. That'll show you how the query was parsed.
>
> Best,
> Erick
>
> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > Oh, great! Thank you!
> >
> > So if I switch over to eDisMax I'd specify the fields to query via the
> "qf"
> > parameter, right? That seems to have the same result (only matches when I
> > specify the exact phrase in the field, not just certain words from it).
> >
> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> df is default field - you can only give one. To search over multiple
> >> fields, you switch to eDisMax query parser and fl parameter.
> >>
> >> Then, the question will be what type definition your fields have. When
> you
> >> search text field, you are using its definition because of copyField.
> Your
> >> original fields may be strings.
> >>
> >> Remember to reload core and reminded when you change definitions.
> >>
> >> Regards,
> >>Alex
> >>
> >>
> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com>
> >> wrote:
> >>
> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> >> can't
> >> > seem to find an explanation for the behavior I'm seeing.
> >> >
> >> > If I have a document that looks like this:
> >> > {
> >> > field1: "aaa bbb",
> >> > field2: "ccc ddd",
> >> > field3: "eee fff"
> >> > }
> >> >
> >> > And I do a search where "q" is "aaa ccc", I get the document in the
> >> > results. This is because (please correct me if I'm wrong) the default
> >> "df"
> >> > is set to the "_text_" field, which contains the text values from all
> >> > fields.
> >> >
> >> > However, if I do a search where "df" is "field1" and "field2" and "q"
> is
> >> > "aaa ccc" (words from field1 and field2) I get no results.
> >> >
> >> > In a simpler example, if I do a search where "df" is "field1" and "q"
> is
> >> > "aaa" (a word from field1) I still get no results.
> >> >
> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> >> > value of field1) then I get the document in the results.
> >> >
> >> > So I'm concluding that when using "df" to specify which fields to
> search
> >> > then only an exact match on the full field value will return a
> document.
> >> >
> >> > Is that a correct conclusion? Is there another way to specify which
> >> fields
> >> > to search without requiring an exact match? The results I'd like to
> >> achieve
> >> > are:
> >> >
> >> > Would Match:
> >> > q=aaa
> >> > q=aaa bbb
> >> > q=aaa ccc
> >> > q=aaa fff
> >> >
> >> > Would Not Match:
> >> > q=eee
> >> > q=fff
> >> > q=eee fff
> >> >
> >> > --
> >> > *This message is intended only for the use of the individual or
> entity to
> >> > which it is addressed and may contain information that is privileged,
> >> > confidential and exempt from di

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Oh, great! Thank you!

So if I switch over to eDisMax I'd specify the fields to query via the "qf"
parameter, right? That seems to have the same result (only matches when I
specify the exact phrase in the field, not just certain words from it).

On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> df is default field - you can only give one. To search over multiple
> fields, you switch to eDisMax query parser and fl parameter.
>
> Then, the question will be what type definition your fields have. When you
> search text field, you are using its definition because of copyField. Your
> original fields may be strings.
>
> Remember to reload core and reminded when you change definitions.
>
> Regards,
>Alex
>
>
> On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com>
> wrote:
>
> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> can't
> > seem to find an explanation for the behavior I'm seeing.
> >
> > If I have a document that looks like this:
> > {
> > field1: "aaa bbb",
> > field2: "ccc ddd",
> > field3: "eee fff"
> > }
> >
> > And I do a search where "q" is "aaa ccc", I get the document in the
> > results. This is because (please correct me if I'm wrong) the default
> "df"
> > is set to the "_text_" field, which contains the text values from all
> > fields.
> >
> > However, if I do a search where "df" is "field1" and "field2" and "q" is
> > "aaa ccc" (words from field1 and field2) I get no results.
> >
> > In a simpler example, if I do a search where "df" is "field1" and "q" is
> > "aaa" (a word from field1) I still get no results.
> >
> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> > value of field1) then I get the document in the results.
> >
> > So I'm concluding that when using "df" to specify which fields to search
> > then only an exact match on the full field value will return a document.
> >
> > Is that a correct conclusion? Is there another way to specify which
> fields
> > to search without requiring an exact match? The results I'd like to
> achieve
> > are:
> >
> > Would Match:
> > q=aaa
> > q=aaa bbb
> > q=aaa ccc
> > q=aaa fff
> >
> > Would Not Match:
> > q=eee
> > q=fff
> > q=eee fff
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
> >
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

<http://www.emersonecologics.com/>  <https://wellevate.me/#/>

*Supporting The Practice Of Healthy Living*

<http://blog.emersonecologics.com/>
<https://www.linkedin.com/company/emerson-ecologics>
<https://www.facebook.com/emersonecologics/>
<https://twitter.com/EmersonEcologic>
<https://www.instagram.com/emerson_ecologics/>
<https://www.pinterest.com/emersonecologic/>
<https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm>

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Partial Match with DF

2017-03-16 Thread Mark Johnson

Forgive me if I'm missing something obvious -- I'm new to Solr, but I can't
seem to find an explanation for the behavior I'm seeing.

If I have a document that looks like this:
{
field1: "aaa bbb",
field2: "ccc ddd",
field3: "eee fff"
}

And I do a search where "q" is "aaa ccc", I get the document in the
results. This is because (please correct me if I'm wrong) the default "df"
is set to the "_text_" field, which contains the text values from all
fields.

However, if I do a search where "df" is "field1" and "field2" and "q" is
"aaa ccc" (words from field1 and field2) I get no results.

In a simpler example, if I do a search where "df" is "field1" and "q" is
"aaa" (a word from field1) I still get no results.

If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
value of field1) then I get the document in the results.

So I'm concluding that when using "df" to specify which fields to search
then only an exact match on the full field value will return a document.

Is that a correct conclusion? Is there another way to specify which fields
to search without requiring an exact match? The results I'd like to achieve
are:

Would Match:
q=aaa
q=aaa bbb
q=aaa ccc
q=aaa fff

Would Not Match:
q=eee
q=fff
q=eee fff

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Re: Solr node not found in ZK live_nodes

2016-12-07 Thread Mark Miller

That already happens. The ZK client itself will reconnect when it can and
trigger everything to be setup like when the cluster first starts up,
including a live node and leader election, etc.

You may have hit a bug or something else missing from this conversation,
but reconnecting after losing the ZK connection is a basic feature from day
one.

Mark
On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada <manohar...@gmail.com>
wrote:

> Thanks Erick! Should I create a JIRA issue for the same?
>
> Regarding the logs, I have changed the log level to WARN. That may be the
> reason, I couldn't get anything from it.
>
> Thanks,
> Manohar
>
> On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Most likely reason is that the Solr node in question,
> > was not reachable thus it was removed from
> > live_nodes. Perhaps due to temporary network
> > glitch, long GC pause or the like. If you're rolling
> > your logs over it's quite possible that any illuminating
> > messages were lost. The default 4M size for each
> > log is quite lo at INFO level...
> >
> > It does seem possible for a Solr node to periodically
> > check its status and re-insert itself into live_nodes,
> > go through recovery and all that. So far most of that
> > registration logic is baked into startup code. What
> > do others think? Worth a JIRA?
> >
> > Erick
> >
> > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com>
> > wrote:
> > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> > >
> > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > setup
> > > was done 3 months back. Suddenly, few days back our search started
> > failing
> > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > i.e.,
> > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > > However, the corresponding Solr process was up and running.
> > >
> > > To my surprise, I couldn't find any errors or warnings in solr or
> > zookeeper
> > > logs related to this. I have few questions -
> > >
> > > 1. Is there any reason why this registration to ZK was lost? I know
> logs
> > > should provide some information, but, it didn't. Did anyone encountered
> > > similar issue, if so, what can be the root cause?
> > > 2. Shouldn't Solr be clever enough to detect that the registration to
> ZK
> > > was lost (for some reason) and should try to re-register again?
> > >
> > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > curious to know why it happened in the first place.
> > >
> > > Thanks
> >
>
-- 
- Mark
about.me/markrmiller

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2164 matches

Mail list logo