Re: adding documents to a secured solr server.

2017-11-01 Thread Shawn Heisey

On 11/1/2017 10:04 PM, Phil Scadden wrote:

For testing, I changed to HttpSolrClient and specifying the core on process and 
commit instead of opening it as server/core. This time worked... sort of. 
Despite deleting the entire index with deletebyquery and seeing that it was 
empty in the coreAdmin, I get :

possible analysis error: cannot change DocValues type from SORTED_SET to NUMERIC for 
field "access"

I tried deleting the field in the admin interface and then adding it back in 
again in that admin interface. But, no. Still comes up with that error. I know 
deleting the index files on disk works but I don’t have access to the server. 
This is a frustrating problem.


Variations of this error happen when settings on a field with 
docValues="true" are changed, and the index already has documents added 
with the previous settings.


Each Lucene segment stores information about what kind of docValues are 
present for each field that has docValues, and if you change an aspect 
of the field (multivalued, field class, etc) and try to add a new 
document with that different information, Lucene will complain.  The 
reason that deleting all documents didn't work is that when you delete 
documents, they are only MARKED as deleted, the segments (and deleted 
docs) remain on the disk.


The only SURE way to fix it is to completely delete the index directory 
(or directories), reload the core/collection (or restart Solr), and 
reindex from scratch.  One thing you *might* be able to do if you don't 
have access to the server is delete all documents and then optimize the 
index, which should delete all segments and effectively leave you with a 
brand new empty index.  I'm not 100% sure that this would take care of 
it, but I *think* it would.


Thanks,
Shawn


RE: adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
Requested reload and now it indexes with secure server using HttpSolrClietn. 
Phew. I now look to see if I can optimize and get concurrentupdate server to 
work.
At least I can get the index back now.

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Thursday, 2 November 2017 5:04 p.m.
To: solr-user@lucene.apache.org
Subject: RE: adding documents to a secured solr server.

For testing, I changed to HttpSolrClient and specifying the core on process and 
commit instead of opening it as server/core. This time worked... sort of. 
Despite deleting the entire index with deletebyquery and seeing that it was 
empty in the coreAdmin, I get :

possible analysis error: cannot change DocValues type from SORTED_SET to 
NUMERIC for field "access"

I tried deleting the field in the admin interface and then adding it back in 
again in that admin interface. But, no. Still comes up with that error. I know 
deleting the index files on disk works but I don’t have access to the server. 
This is a frustrating problem.



-Original Message-
From: Shawn Heisey [mailto:elyog...@elyograg.org]
Sent: Thursday, 2 November 2017 3:55 p.m.
To: solr-user@lucene.apache.org
Subject: Re: adding documents to a secured solr server.

On 11/1/2017 8:13 PM, Phil Scadden wrote:
> 14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner:
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6e
> eba4a
> 14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse
> error response from http://online-dev.gns.cri.nz:8983/solr/prindex due
> to: java.lang.RuntimeException: Invalid version (expected 2, but 60)
> or the data in not in 'javabin' format



> Even more puzzling. Authentication is set. What is the invalid version bit?? 
> I think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match 
> exactly??

The only time I would be worried about different SolrJ and Solr versions is 
when using the CloudSolrClient object.  For the other client types, you can 
usually have a VERY wide version spread without problems.  For the cloud 
object, you *might* have problems with different versions, or it might work 
fine.  If the SolrJ version is higher than the Solr version, the cloud client 
tends to work.

I would always recommend that the client version be the same or higher than the 
server version... but with non-cloud clients, it won't matter very much.  I 
would not expect problems with the two versions you have, as long as you don't 
try to use the cloud client.

This error is different.  It's happening because SolrJ is expecting a Javabin 
response, but it is getting an HTML error response instead, with the "require 
authentication" error.  This logging message will happen anytime SolrJ gets an 
error response instead of a "real" response.  What this error says is 
technically correct, but very confusing to novice users.

The specific numbers in the message are a result of the first character of the 
response.  With javabin, the first character would be 0x02 to indicate the 
javabin version of 2, but with HTML, the first character is the opening angle 
bracket, or character number 60 (0x3C).  This is where those two numbers come 
from.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Sum area polygon solr

2017-11-01 Thread David Smiley
Hi,

Ah, no -- sorry.  If you want to roll up your sleeves and write a Solr
plugin (a ValueSource in this case, perhaps) then you could lookup the
index polygon and then call out to JTS to compute the intersection and then
ask it for the area.  But that's going to be a very heavyweight computation
to score/sort on!  Instead, perhaps you can use BBoxField's overlapRatio to
compare bounding boxes which is relatively fast.

~ David

On Tue, Oct 31, 2017 at 8:45 AM Samur Araujo  wrote:

> Hi all, is it possible to sum the area of a polygon in solr?
>
> Suppose I do an polygon intersect and I want to retrieve the total area of
> the resulting polygon.
>
> Is it possible?
>
> Best,
>
> --
> Head of Data
> Geophy
> www.geophy.com
>
> Nieuwe Plantage 54
> -55
> 2611XK  Delft
> +31 (0)70 7640725 <+31%2070%20764%200725>
>
> 1 Fore Street
> EC2Y 9DT  London
> +44 (0)20 37690760 <+44%2020%203769%200760>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


RE: adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
For testing, I changed to HttpSolrClient and specifying the core on process and 
commit instead of opening it as server/core. This time worked... sort of. 
Despite deleting the entire index with deletebyquery and seeing that it was 
empty in the coreAdmin, I get :

possible analysis error: cannot change DocValues type from SORTED_SET to 
NUMERIC for field "access"

I tried deleting the field in the admin interface and then adding it back in 
again in that admin interface. But, no. Still comes up with that error. I know 
deleting the index files on disk works but I don’t have access to the server. 
This is a frustrating problem.



-Original Message-
From: Shawn Heisey [mailto:elyog...@elyograg.org]
Sent: Thursday, 2 November 2017 3:55 p.m.
To: solr-user@lucene.apache.org
Subject: Re: adding documents to a secured solr server.

On 11/1/2017 8:13 PM, Phil Scadden wrote:
> 14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner:
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6e
> eba4a
> 14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse
> error response from http://online-dev.gns.cri.nz:8983/solr/prindex due
> to: java.lang.RuntimeException: Invalid version (expected 2, but 60)
> or the data in not in 'javabin' format



> Even more puzzling. Authentication is set. What is the invalid version bit?? 
> I think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match 
> exactly??

The only time I would be worried about different SolrJ and Solr versions is 
when using the CloudSolrClient object.  For the other client types, you can 
usually have a VERY wide version spread without problems.  For the cloud 
object, you *might* have problems with different versions, or it might work 
fine.  If the SolrJ version is higher than the Solr version, the cloud client 
tends to work.

I would always recommend that the client version be the same or higher than the 
server version... but with non-cloud clients, it won't matter very much.  I 
would not expect problems with the two versions you have, as long as you don't 
try to use the cloud client.

This error is different.  It's happening because SolrJ is expecting a Javabin 
response, but it is getting an HTML error response instead, with the "require 
authentication" error.  This logging message will happen anytime SolrJ gets an 
error response instead of a "real" response.  What this error says is 
technically correct, but very confusing to novice users.

The specific numbers in the message are a result of the first character of the 
response.  With javabin, the first character would be 0x02 to indicate the 
javabin version of 2, but with HTML, the first character is the opening angle 
bracket, or character number 60 (0x3C).  This is where those two numbers come 
from.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: max docs, deleted docs optimization

2017-11-01 Thread kshitij tyagi
Thanks eric for your promp response, it was really helpful.

On Tue, Oct 31, 2017 at 8:30 PM, Erick Erickson 
wrote:

> 1> 2 lakh at most. If the standard background merging is going on it
> may be less than that.
>
> 2> Some, but whether you notice or not is an open question. In an
> index with only 10 lakh docs, it's unlikely even having 50% deleted
> documents is going to make much of a difference.
>
> 3> Yes, the deleted docs are in segment until it's merged away. Lucene
> is very efficient (according to Mike McCandless) at skipping deleted
> docs.
>
> 4> It rewrites all segments, purging deleted documents. However, it
> has some pitfalls, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-
> documents-optimize-may-bad/.
> In general it's simply not recommended to optimize. There is a Solr
> JIRA discussing this in detail, but I can't get to the site to link it
> right now.
>
> In general, as an index is updated segments are merged together and
> during that process any deleted documents are purged.
>
> Two resources:
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> See the third animation TieredMergePolicy which is the default here:
> http://blog.mikemccandless.com/2011/02/visualizing-
> lucenes-segment-merges.html
>
> Best,
> Erick
>
> On Tue, Oct 31, 2017 at 4:40 AM, kshitij tyagi
>  wrote:
> > Hi,
> >
> > I am using atomic update to update one of the fields, I want to know :
> >
> > 1. if total docs in core are 10 lakh and I partially update 2 lakhs docs
> > then what will be the number of deleted docs?
> >
> > 2. Does higher number of deleted docs have affect on query time? means
> does
> > query time increases if deleted docs are more
> >
> > 3. Are deleted docs present in segment? during query execution does
> deleted
> > docs are traversed.
> >
> > 4. What doe optimized button on solr admin does exactly.
> >
> > Help is much appreciated.
> >
> > Regards,
> > Kshitij
>


RE: adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
So the real error is authentication, (the version is spurious) but why that 
when authentication is being set on the updateRequest?

-Original Message-
From: Shawn Heisey [mailto:elyog...@elyograg.org]
Sent: Thursday, 2 November 2017 3:55 p.m.
To: solr-user@lucene.apache.org
Subject: Re: adding documents to a secured solr server.

On 11/1/2017 8:13 PM, Phil Scadden wrote:
> 14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner:
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6e
> eba4a
> 14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse
> error response from http://online-dev.gns.cri.nz:8983/solr/prindex due
> to: java.lang.RuntimeException: Invalid version (expected 2, but 60)
> or the data in not in 'javabin' format



> Even more puzzling. Authentication is set. What is the invalid version bit?? 
> I think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match 
> exactly??

The only time I would be worried about different SolrJ and Solr versions is 
when using the CloudSolrClient object.  For the other client types, you can 
usually have a VERY wide version spread without problems.  For the cloud 
object, you *might* have problems with different versions, or it might work 
fine.  If the SolrJ version is higher than the Solr version, the cloud client 
tends to work.

I would always recommend that the client version be the same or higher than the 
server version... but with non-cloud clients, it won't matter very much.  I 
would not expect problems with the two versions you have, as long as you don't 
try to use the cloud client.

This error is different.  It's happening because SolrJ is expecting a Javabin 
response, but it is getting an HTML error response instead, with the "require 
authentication" error.  This logging message will happen anytime SolrJ gets an 
error response instead of a "real" response.  What this error says is 
technically correct, but very confusing to novice users.

The specific numbers in the message are a result of the first character of the 
response.  With javabin, the first character would be 0x02 to indicate the 
javabin version of 2, but with HTML, the first character is the opening angle 
bracket, or character number 60 (0x3C).  This is where those two numbers come 
from.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: adding documents to a secured solr server.

2017-11-01 Thread Shawn Heisey

On 11/1/2017 8:13 PM, Phil Scadden wrote:

14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6eeba4a
14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse error 
response from http://online-dev.gns.cri.nz:8983/solr/prindex due to: 
java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in 
not in 'javabin' format





Even more puzzling. Authentication is set. What is the invalid version bit?? I 
think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match exactly??


The only time I would be worried about different SolrJ and Solr versions 
is when using the CloudSolrClient object.  For the other client types, 
you can usually have a VERY wide version spread without problems.  For 
the cloud object, you *might* have problems with different versions, or 
it might work fine.  If the SolrJ version is higher than the Solr 
version, the cloud client tends to work.


I would always recommend that the client version be the same or higher 
than the server version... but with non-cloud clients, it won't matter 
very much.  I would not expect problems with the two versions you have, 
as long as you don't try to use the cloud client.


This error is different.  It's happening because SolrJ is expecting a 
Javabin response, but it is getting an HTML error response instead, with 
the "require authentication" error.  This logging message will happen 
anytime SolrJ gets an error response instead of a "real" response.  What 
this error says is technically correct, but very confusing to novice users.


The specific numbers in the message are a result of the first character 
of the response.  With javabin, the first character would be 0x02 to 
indicate the javabin version of 2, but with HTML, the first character is 
the opening angle bracket, or character number 60 (0x3C).  This is where 
those two numbers come from.


Thanks,
Shawn


Re: adding documents to a secured solr server.

2017-11-01 Thread Shawn Heisey

On 11/1/2017 7:59 PM, Phil Scadden wrote:

After some digging, I tried this approach...
solr = new ConcurrentUpdateSolrClient.Builder(solrUrl)
.withQueueSize(20)
.build();
  SolrInputDocument up = new SolrInputDocument();
  up.addField("id",f.getCanonicalPath());
  up.addField("title",title);
  up.addField("author",author);
  String content = textHandler.toString();
  up.addField("_text_",content);
  UpdateRequest req = new UpdateRequest();
  req.setCommitWithin(1000);
  req.add(up);
  req.setBasicAuthCredentials("solrAdmin", password);
  UpdateResponse ur =  req.process(solr);


You need to create an UpdateRequest object and set the auth parameters 
on that object, rather than using the sugar methods on the client to 
have it add the docs directly.


See this:

https://stackoverflow.com/a/44637540

An alternate idea would be to create a custom HttpClient object (using 
their Builder methods) that has the authentication credentials baked 
into it, and build the solr client using that object.  If you do that, 
then you won't need to add authentication to any request object.


Side note about custom HttpClient objects:  If you intend to use your 
solr client object with multiple threads, you will need to create a 
custom HttpClient object anyway.  This is because the default thread 
limit on the HttpClient that is created in the background is two 
threads.  This limit is not in Solr code, it's in HttpClient.  To allow 
more, the HttpClient object must be custom-built.  I suspect that the 
reason you chose ConcurrentUpdateSolrClient was for automatic handling 
of several threads (you set the queue size to 20) ... but with a default 
object, that won't be what you actually get.  I have filed the following 
issue to try and improve the default situation:


https://issues.apache.org/jira/browse/SOLR-11596

Something else to add as a strong caution:  ConcurrentUpdateSolrClient 
swallows all indexing errors.  If the Solr server were completely down, 
you would not see any exceptions on "add" calls, even though the 
requests all would fail ... the program would only get an error on the 
"commit" call, and it is fairly common for developers to leave the 
commit out, letting Solr handle all commits.  If you want your program 
to be aware of all indexing errors, you will need to use HttpSolrClient 
or CloudSolrClient and handle multiple threads in your own code.


Thanks,
Shawn


Re: Advice on Stemming in Solr

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi Emir,

We do have quite alot of words that should not be stemmed. Currently, the
KStemFilterFactory are stemming all the non-English words that end with
"ing" as well. There are quite alot of places and names which ends in
"ing", and all these are being stemmed as well, which leads to an
inaccurate search.

Regards,
Edwin


On 1 November 2017 at 18:20, Emir Arnautović 
wrote:

> Hi Edwin,
> If the number of words that should not be stemmed is not high you could
> use KeywordMarkerFilterFactory to flag those words as keywords and it
> should prevent stemmer from changing them.
> Depending on what you want to achieve, you might not be able to avoid
> using stemmer at indexing time. If you want to find documents that contain
> only “walking” with search term “walk”, then you have to stem at index
> time. Cases when you use stemming on query time only are rare and specific.
> If you want to prefer exact matches over stemmed matches, you have to
> index same content with and without stemming and boost matches on field
> without stemming.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > We are currently using KStemFilterFactory in Solr, but we found that it
> is
> > actually doing stemming on non-English words like "ximenting", which it
> > stem to "ximent". This is not what we wanted.
> >
> > Another option is to use the HunspellStemFilterFactory, but there are
> some
> > English words like "running", walking" that are not being stemmed.
> >
> > Would like to check, is it advisable to use Stemming at index? Or we
> should
> > not use Stemming at index time, but at query time, do a search for the
> > stemmed words as well, like for example, if the user search for
> "walking",
> > we will do the search together with "walk", and the actual word of
> walking
> > will have higher weightage.
> >
> > I'm currently using Solr 6.5.1.
> >
> > Regards,
> > Edwin
>
>


RE: Stateless queries to secured SOLR server.

2017-11-01 Thread Phil Scadden
Thanks for that Shawn. What I am doing is working fine now. I need the middle 
proxy to audit and modify what client sends to solr (based on user rights) not 
to mention keeping solr from direct exposure to internet.

-Original Message-
From: Shawn Heisey [mailto:elyog...@elyograg.org]
Sent: Thursday, 2 November 2017 3:13 p.m.
To: solr-user@lucene.apache.org
Subject: Re: Stateless queries to secured SOLR server.

On 11/1/2017 4:22 PM, Phil Scadden wrote:
> Except that I am using solrj in an intermediary proxy and passing the
> response directly to a javascript client. It is expect json or csv
> depending on what it passes in wt=

That's a different use case than I had imagined.  Thanks for the detail.

My statement about SolrJ is correct if the code that will handle the response 
is Java.  Sounds like it's not -- you've just said that the code that will 
actually decode and use the response is javascript.

When the code that will handle the response is Java, SolrJ is a perfect fit, 
because SolrJ will handle decoding the response and the programmer doesn't need 
to worry about the format, they are given an object that contains the full 
response, where information can easily be extracted by someone familiar with 
typical Java objects.

There probably is a way to access the full response "text" with SolrJ, rather 
than the decoded object, but I do not know enough about the low-level details 
to tell you how that might be accomplished.  If you can figure that part out, 
then you could use SolrJ and have access to its methods for constructing the 
query.

With your Java code simply acting as a proxy, the way you're going about it 
might be the best option -- build the http request with a particular wt 
parameter, get the response, and pass the response on unmodified.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Stateless queries to secured SOLR server.

2017-11-01 Thread Shawn Heisey

On 11/1/2017 4:22 PM, Phil Scadden wrote:

Except that I am using solrj in an intermediary proxy and passing the response 
directly to a javascript client. It is expect json or csv depending on what it 
passes in wt=


That's a different use case than I had imagined.  Thanks for the detail.

My statement about SolrJ is correct if the code that will handle the 
response is Java.  Sounds like it's not -- you've just said that the 
code that will actually decode and use the response is javascript.


When the code that will handle the response is Java, SolrJ is a perfect 
fit, because SolrJ will handle decoding the response and the programmer 
doesn't need to worry about the format, they are given an object that 
contains the full response, where information can easily be extracted by 
someone familiar with typical Java objects.


There probably is a way to access the full response "text" with SolrJ, 
rather than the decoded object, but I do not know enough about the 
low-level details to tell you how that might be accomplished.  If you 
can figure that part out, then you could use SolrJ and have access to 
its methods for constructing the query.


With your Java code simply acting as a proxy, the way you're going about 
it might be the best option -- build the http request with a particular 
wt parameter, get the response, and pass the response on unmodified.


Thanks,
Shawn


RE: adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
And my security.json looks like:
{
  "authentication":{
"class":"solr.BasicAuthPlugin",
"blockUnknown":true,
"credentials":{
  "solrAdmin":" a hash ",
  "solrGuest":"another hash"},
"":{"v":0}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
  {
"name":"all",
"role":"admin"},
  {
"name":"read",
"role":"guest"}],
"user-role":{"solrAdmin":["admin","guest"],"solrGuest":"guest"}}}

It looks like I should be able to add.

this one worked to delete the entire index:
   UpdateRequest up = new UpdateRequest();
   up.setBasicAuthCredentials("solrAdmin",password);
   up.deleteByQuery("*:*");
   up.setCommitWithin(1000);
   up.process(solr);

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Thursday, 2 November 2017 2:59 p.m.
To: solr-user@lucene.apache.org
Subject: RE: adding documents to a secured solr server.

After some digging, I tried this approach...
   solr = new ConcurrentUpdateSolrClient.Builder(solrUrl)
   .withQueueSize(20)
   .build();
 SolrInputDocument up = new SolrInputDocument();
 up.addField("id",f.getCanonicalPath());
 up.addField("title",title);
 up.addField("author",author);
 String content = textHandler.toString();
 up.addField("_text_",content);
 UpdateRequest req = new UpdateRequest();
 req.setCommitWithin(1000);
 req.add(up);
 req.setBasicAuthCredentials("solrAdmin", password);
 UpdateResponse ur =  req.process(solr);

However,  I get error back of:
14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6eeba4a
14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse error 
response from http://online-dev.gns.cri.nz:8983/solr/prindex due to: 
java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in 
not in 'javabin' format
14:52:46,224 ERROR ConcurrentUpdateSolrClient:540 - error
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://online-dev.gns.cri.nz:8983/solr/prindex: require 
authentication

request: 
http://online-dev.gns.cri.nz:8983/solr/prindex/update?wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
14:52:46,224 DEBUG ConcurrentUpdateSolrClient:210 - finished: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6eeba4a

Even more puzzling. Authentication is set. What is the invalid version bit?? I 
think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match exactly??

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Thursday, 2 November 2017 11:28 a.m.
To: solr-user@lucene.apache.org
Subject: adding documents to a secured solr server.

Solrj QueryRequest object has a method to set basic authorization 
username/password but what is the equivalent way to pass authorization when you 
are adding new documents to an index?
   ConcurrentUpdateSolrClient solr = new 
ConcurrentUpdateSolrClient(solrProperties.getServer(),10,2);
...
 up.addField("id","myid");
 up.addField("title",title);
 up.addField("author",author);
 String content = textHandler.toString();
 up.addField("_text_",content);
 solr.add(up);
 solr.commit();

I cant see where authorization occurs?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in err

Re: Making a String field case-insensitive

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi Emir,

Thanks for your advice. This works.

Regards,
Edwin


On 1 November 2017 at 18:08, Emir Arnautović 
wrote:

> Hi,
> You can use KeywordTokenizer and LowerCaseTokenFilterFactory.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > Would like to find out, what is the best way to lower-case a String index
> > in Solr, to make it case insensitive, while preserving the structure of
> the
> > string (ie It should not break into different tokens at space, and should
> > not remove any characters or symbols)
> >
> > I found that solr.StrField does not use lower case filter. But if I
> change
> > it to solr.TextField and uses Standard Tokenizer, the fields get broken
> up.
> >
> > Eg:
> >
> > For this configuration,
> >
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="false">
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >   
> >
> > The string "*SYStem 500 **" gets broken down into this
> >
> > *system | 500*
> >
> > The system and 500 are separated into 2 tokens, which is not what we
> want.
> > Also, the * is being removed.
> >
> >
> > We will like to have something like this. This will preserve what it is
> as
> > a string but just lowercase it.
> >
> > *system 500 **
>
>


RE: adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
After some digging, I tried this approach...
   solr = new ConcurrentUpdateSolrClient.Builder(solrUrl)
   .withQueueSize(20)
   .build();
 SolrInputDocument up = new SolrInputDocument();
 up.addField("id",f.getCanonicalPath());
 up.addField("title",title);
 up.addField("author",author);
 String content = textHandler.toString();
 up.addField("_text_",content);
 UpdateRequest req = new UpdateRequest();
 req.setCommitWithin(1000);
 req.add(up);
 req.setBasicAuthCredentials("solrAdmin", password);
 UpdateResponse ur =  req.process(solr);

However,  I get error back of:
14:52:45,962 DEBUG ConcurrentUpdateSolrClient:177 - starting runner: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6eeba4a
14:52:46,224  WARN ConcurrentUpdateSolrClient:343 - Failed to parse error 
response from http://online-dev.gns.cri.nz:8983/solr/prindex due to: 
java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in 
not in 'javabin' format
14:52:46,224 ERROR ConcurrentUpdateSolrClient:540 - error
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://online-dev.gns.cri.nz:8983/solr/prindex: require 
authentication

request: 
http://online-dev.gns.cri.nz:8983/solr/prindex/update?wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
14:52:46,224 DEBUG ConcurrentUpdateSolrClient:210 - finished: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner@6eeba4a

Even more puzzling. Authentication is set. What is the invalid version bit?? I 
think my solrj is 6.4.1; the server is 6.6.2. Do these have  to match exactly??

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Thursday, 2 November 2017 11:28 a.m.
To: solr-user@lucene.apache.org
Subject: adding documents to a secured solr server.

Solrj QueryRequest object has a method to set basic authorization 
username/password but what is the equivalent way to pass authorization when you 
are adding new documents to an index?
   ConcurrentUpdateSolrClient solr = new 
ConcurrentUpdateSolrClient(solrProperties.getServer(),10,2);
...
 up.addField("id","myid");
 up.addField("title",title);
 up.addField("author",author);
 String content = textHandler.toString();
 up.addField("_text_",content);
 solr.add(up);
 solr.commit();

I cant see where authorization occurs?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Upgrade path from 5.4.1

2017-11-01 Thread Yonik Seeley
On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson  wrote:
> I _always_ prefer to reindex if possible. Additionally, as of Solr 7
> all the numeric types are deprecated in favor of points-based types
> which are faster on all fronts and use less memory.

They are a good step forward in genera, and faster for range queries
(and multiple-dimensions), but looking at the design I'd guess that
they may be slower for exact-match queries?
Has anyone tested this?

-Yonik


Error executing a SQL query sorting on both a field and an expression

2017-11-01 Thread Fabio Corneti
Hello,
I'm doing some tests against the SQL handler on a single node
SolrCloud 7.1.0 installation.

When running this query:

  SELECT Org_Type, COUNT(*) FROM obesity GROUP BY Org_Type ORDER BY
Org_Type DESC, COUNT(*) ASC

I get the following error:

  Failed to execute sqlQuery : java.io.IOException: If multiple
sorts are specified there must be a sort for each bucket.

Is this a bug or the handler cannot support this kind of mixed sorting?


For reference, the following queries work as expected:

  SELECT Org_Type, COUNT(*) FROM obesity GROUP BY Org_Type ORDER BY COUNT(*) ASC

  SELECT Org_Type, COUNT(*) FROM obesity GROUP BY Org_Type ORDER BY
Org_Type DESC

  SELECT Org_Type, Org_Name, COUNT(*) FROM obesity GROUP BY Org_Type,
Org_Name ORDER BY Org_Type DESC, Org_Name DESC


Thanks,
Fabio


Re: App Studio

2017-11-01 Thread Vincenzo D'Amore
Hi, I'm interested too.

On Thu, Nov 2, 2017 at 12:46 AM, Kojo  wrote:

> I would like to try that!
>
>
> Em 1 de nov de 2017 18:04, "Will Hayes"  escreveu:
>
> There is a community edition of App Studio for Solr and Elasticsearch being
> released by Lucidworks in November. Drop me a line if you would like to get
> a preview release.
> -wh
>
> --
> Will Hayes | CEO | Lucidworks
> direct. +1.415.997.9455 | email. w...@lucidworks.com
>
> On Wed, Nov 1, 2017 at 12:54 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
> > Hey all, at the conference it was mentioned that lucidworks would release
> > app studio as its own and free project.  is that still the case?
> >
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: App Studio

2017-11-01 Thread Kojo
I would like to try that!


Em 1 de nov de 2017 18:04, "Will Hayes"  escreveu:

There is a community edition of App Studio for Solr and Elasticsearch being
released by Lucidworks in November. Drop me a line if you would like to get
a preview release.
-wh

--
Will Hayes | CEO | Lucidworks
direct. +1.415.997.9455 | email. w...@lucidworks.com

On Wed, Nov 1, 2017 at 12:54 PM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> Hey all, at the conference it was mentioned that lucidworks would release
> app studio as its own and free project.  is that still the case?
>


Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Kojo
Pratik's information  answered the question.

Thanks!



Em 1 de nov de 2017 19:45, "Amrit Sarkar"  escreveu:

Following Pratik's spot-on comment and not really related to your question,

Even the "partitionKeys" parameter needs to be specified the "over" field
while using "parallel" streaming.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Nov 2, 2017 at 2:38 AM, Pratik Patel  wrote:

> Roll up needs documents to be sorted by the "over" field.
> Check this for more details
> http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-
> returning-results-with-duplicate-tuples-td4342398.html
>
> On Wed, Nov 1, 2017 at 3:41 PM, Kojo  wrote:
>
> > Wrap cartesianProduct function with fetch function works as expected.
> >
> > But rollup function over cartesianProduct doesn´t aggregate on a
returned
> > field of the cartesianProduct.
> >
> >
> > The field "id_researcher" bellow is a Multivalued field:
> >
> >
> >
> > This one works:
> >
> >
> > fetch(reasercher,
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_reasercher",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_reasercher),
> > fl="name, django_id",
> > on="id_reasercher=django_id"
> > )
> >
> >
> > This one doesn´t works:
> >
> > rollup(
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_researcher, status",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_researcher),
> > over=id_researcher,count(*)
> > )
> >
> > If I aggregate over a non MultiValued field, it works.
> >
> >
> > Is that correct, rollup doesn´t work on a cartesianProduct?
> >
>


adding documents to a secured solr server.

2017-11-01 Thread Phil Scadden
Solrj QueryRequest object has a method to set basic authorization 
username/password but what is the equivalent way to pass authorization when you 
are adding new documents to an index?
   ConcurrentUpdateSolrClient solr = new 
ConcurrentUpdateSolrClient(solrProperties.getServer(),10,2);
...
 up.addField("id","myid");
 up.addField("title",title);
 up.addField("author",author);
 String content = textHandler.toString();
 up.addField("_text_",content);
 solr.add(up);
 solr.commit();

I cant see where authorization occurs?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Stateless queries to secured SOLR server.

2017-11-01 Thread Phil Scadden
Except that I am using solrj in an intermediary proxy and passing the response 
directly to a javascript client. It is expect json or csv depending on what it 
passes in wt=

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, 2 November 2017 2:48 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Stateless queries to secured SOLR server.

On 10/31/2017 2:08 PM, Phil Scadden wrote:
> Thanks Shawn. I have done it with SolrJ. Apart from needing the 
> NoopResponseParser to handle the wt=, it was pretty painless.

This is confusing to me, because with SolrJ, you do not need to be concerned 
with the response format *AT ALL*.  You don't need to use the wt parameter, 
SolrJ will handle that for you.  In fact, you should NOT set the wt parameter.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: App Studio

2017-11-01 Thread Kris Musshorn
Yes pleaSE

-Original Message-
From: Will Hayes [mailto:w...@lucidworks.com] 
Sent: Wednesday, November 1, 2017 4:04 PM
To: solr-user@lucene.apache.org
Subject: Re: App Studio

There is a community edition of App Studio for Solr and Elasticsearch being 
released by Lucidworks in November. Drop me a line if you would like to get a 
preview release.
-wh

--
Will Hayes | CEO | Lucidworks
direct. +1.415.997.9455 | email. w...@lucidworks.com

On Wed, Nov 1, 2017 at 12:54 PM, David Hastings < hastings.recurs...@gmail.com> 
wrote:

> Hey all, at the conference it was mentioned that lucidworks would 
> release app studio as its own and free project.  is that still the case?
>



Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Amrit Sarkar
Following Pratik's spot-on comment and not really related to your question,

Even the "partitionKeys" parameter needs to be specified the "over" field
while using "parallel" streaming.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Nov 2, 2017 at 2:38 AM, Pratik Patel  wrote:

> Roll up needs documents to be sorted by the "over" field.
> Check this for more details
> http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-
> returning-results-with-duplicate-tuples-td4342398.html
>
> On Wed, Nov 1, 2017 at 3:41 PM, Kojo  wrote:
>
> > Wrap cartesianProduct function with fetch function works as expected.
> >
> > But rollup function over cartesianProduct doesn´t aggregate on a returned
> > field of the cartesianProduct.
> >
> >
> > The field "id_researcher" bellow is a Multivalued field:
> >
> >
> >
> > This one works:
> >
> >
> > fetch(reasercher,
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_reasercher",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_reasercher),
> > fl="name, django_id",
> > on="id_reasercher=django_id"
> > )
> >
> >
> > This one doesn´t works:
> >
> > rollup(
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_researcher, status",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_researcher),
> > over=id_researcher,count(*)
> > )
> >
> > If I aggregate over a non MultiValued field, it works.
> >
> >
> > Is that correct, rollup doesn´t work on a cartesianProduct?
> >
>


Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Pratik Patel
Roll up needs documents to be sorted by the "over" field.
Check this for more details
http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-returning-results-with-duplicate-tuples-td4342398.html

On Wed, Nov 1, 2017 at 3:41 PM, Kojo  wrote:

> Wrap cartesianProduct function with fetch function works as expected.
>
> But rollup function over cartesianProduct doesn´t aggregate on a returned
> field of the cartesianProduct.
>
>
> The field "id_researcher" bellow is a Multivalued field:
>
>
>
> This one works:
>
>
> fetch(reasercher,
>
> cartesianProduct(
> having(
> cartesianProduct(
> search(schoolarship,zkHost="localhost:9983",qt="/export",
> q="*:*",
> fl="process, area, id_reasercher",sort="process asc"),
> area
> ),
> eq(area, val(Anything))),
> id_reasercher),
> fl="name, django_id",
> on="id_reasercher=django_id"
> )
>
>
> This one doesn´t works:
>
> rollup(
>
> cartesianProduct(
> having(
> cartesianProduct(
> search(schoolarship,zkHost="localhost:9983",qt="/export",
> q="*:*",
> fl="process, area, id_researcher, status",sort="process asc"),
> area
> ),
> eq(area, val(Anything))),
> id_researcher),
> over=id_researcher,count(*)
> )
>
> If I aggregate over a non MultiValued field, it works.
>
>
> Is that correct, rollup doesn´t work on a cartesianProduct?
>


Re: App Studio

2017-11-01 Thread Will Hayes
There is a community edition of App Studio for Solr and Elasticsearch being
released by Lucidworks in November. Drop me a line if you would like to get
a preview release.
-wh

--
Will Hayes | CEO | Lucidworks
direct. +1.415.997.9455 | email. w...@lucidworks.com

On Wed, Nov 1, 2017 at 12:54 PM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> Hey all, at the conference it was mentioned that lucidworks would release
> app studio as its own and free project.  is that still the case?
>


App Studio

2017-11-01 Thread David Hastings
Hey all, at the conference it was mentioned that lucidworks would release
app studio as its own and free project.  is that still the case?


Streaming Expression - cartesianProduct

2017-11-01 Thread Kojo
Wrap cartesianProduct function with fetch function works as expected.

But rollup function over cartesianProduct doesn´t aggregate on a returned
field of the cartesianProduct.


The field "id_researcher" bellow is a Multivalued field:



This one works:


fetch(reasercher,

cartesianProduct(
having(
cartesianProduct(
search(schoolarship,zkHost="localhost:9983",qt="/export",q="*:*",
fl="process, area, id_reasercher",sort="process asc"),
area
),
eq(area, val(Anything))),
id_reasercher),
fl="name, django_id",
on="id_reasercher=django_id"
)


This one doesn´t works:

rollup(

cartesianProduct(
having(
cartesianProduct(
search(schoolarship,zkHost="localhost:9983",qt="/export",q="*:*",
fl="process, area, id_researcher, status",sort="process asc"),
area
),
eq(area, val(Anything))),
id_researcher),
over=id_researcher,count(*)
)

If I aggregate over a non MultiValued field, it works.


Is that correct, rollup doesn´t work on a cartesianProduct?


Re: Upgrade path from 5.4.1

2017-11-01 Thread Erick Erickson
I _always_ prefer to reindex if possible. Additionally, as of Solr 7
all the numeric types are deprecated in favor of points-based types
which are faster on all fronts and use less memory. However, to use
this functionality you'll need to re-index anyway.

Solr 7 will still support Trie types, but support for those is not
guaranteed after that, so it's a chance to get ahead of the curve.

I _strongly_ recommend that you start with the default configs in 7x
and apply any changes made (fields, fieldtypes, requesthandlers
whatever) to the 7x rather than copy the old configs. Of course that's
not really a problem if you can't find the configs in the first
place

You can download your configs from ZooKeeper so at least they aren't
permanently lost

new (6x+) Solr's let you move things back and forth pretty easily, try
`bin/solr zk -help'

Best,
Erick

On Wed, Nov 1, 2017 at 9:23 AM, Petersen, Robert (Contr)
 wrote:
> Hi Guys,
>
>
> I just took over the care and feeding of three poor neglected solr 5.4.1 
> cloud clusters at my new position. While spinning up new collections and 
> supporting other business initiatives I am pushing management to give me the 
> green light on migrating to a newer version of solr. The last solr I worked 
> with was 6.6.1 and I was thinking of doing an upgrade to that (er actually 
> 6.6.2) as I was reading an existing index only upgrades one major version 
> number at a time.
>
>
> Then I realized the existing 5.4.1 cloud clusters here were set up with 
> unmanaged configs, so now I'm starting to lean toward just spinning up clean 
> new 6.6.2 or 7.1 clouds on new machines leaving the existing 5.4.1 machines 
> in place then reindexing everything on to the new machines with the intention 
> of testing and then swapping in the new machines and finally destroying the 
> old ones when the dust settles (they're all virtuals so NP just destroying 
> the old instances and recovering their resources).
>
>
> Thoughts?
>
>
> Thanks
>
> Robi
>
> 
>
> This communication is confidential. Frontier only sends and receives email on 
> the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Re: SOLR-11504: Provide a config to restrict number of indexing threads

2017-11-01 Thread Nawab Zada Asad Iqbal
Well, the reason i want to control number of indexing threads is to
restrict number of "segments" being created at one time in the RAM. One
indexing thread in lucene  corresponds to one segment being written. I need
a fine control on the number of segments. Less than that, and I will not be
fully utilizing my writing capacity. On the other hand, if I have more
threads, then I will end up a lot more segments of small size, which I will
need to flush frequently and then merge, and that will cause a different
kind of problem.

Your suggestion will require me and other such solr users to create a tight
coupling between the clients and the Solr servers. My client is not SolrJ
based. IN a scenario when I am connecting and indexing to Solr remotely, I
want more requests to be waiting on the solr side so that they start
writing as soon as an Indexing thread is available, vs waiting on my client
side - on the other side of the wire.

Thanks
Nawab

On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey  wrote:

> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:
>
>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
>> migrating to solr6 and locally working around it in Lucene code. I am
>> thinking to fix it properly and hopefully patch back to Solr. Since,
>> Lucene
>> code does not want to keep any such config, I am thinking to use a
>> counting
>> semaphore in Solr code before calling IndexWriter.addDocument(s) or
>> IndexWriter.updateDocument(s).
>>
>
> There's a fairly simple way to control the number of indexing threads that
> doesn't require ANY changes to Solr:  Don't start as many threads/processes
> on your indexing client(s).  If you control the number of simultaneous
> requests sent to Solr, then Solr won't start as many indexing threads.
> That kind of control over your indexing system is something that's always
> preferable to have.
>
> Thanks,
> Shawn
>


Upgrade path from 5.4.1

2017-11-01 Thread Petersen, Robert (Contr)
Hi Guys,


I just took over the care and feeding of three poor neglected solr 5.4.1 cloud 
clusters at my new position. While spinning up new collections and supporting 
other business initiatives I am pushing management to give me the green light 
on migrating to a newer version of solr. The last solr I worked with was 6.6.1 
and I was thinking of doing an upgrade to that (er actually 6.6.2) as I was 
reading an existing index only upgrades one major version number at a time.


Then I realized the existing 5.4.1 cloud clusters here were set up with 
unmanaged configs, so now I'm starting to lean toward just spinning up clean 
new 6.6.2 or 7.1 clouds on new machines leaving the existing 5.4.1 machines in 
place then reindexing everything on to the new machines with the intention of 
testing and then swapping in the new machines and finally destroying the old 
ones when the dust settles (they're all virtuals so NP just destroying the old 
instances and recovering their resources).


Thoughts?


Thanks

Robi



This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Re: Solr streaming questions

2017-11-01 Thread Erick Erickson
Perhaps if you bothered to explain your use-case we could suggest alternatives.

Streaming is built to handle very large result sets in a
divide-and-conquer manner,
thus the ability to specify worker nodes each of which handles a
sub-set of the results.

Partitioning the output streams requires a way to bucket the results
from multiple sources
to workers such that all the documents that fall into buckets can be
routed to the
same worker. There may be many sources (think shards) and many replicas.

Score is unsuitable for such bucketing. You're simply trying to use
streaming for
a use-case it was not designed for.

You have two choices here.
> use streaming as it was intended,
> use cursorMark for processing in batches.

Best,
Erick

On Wed, Nov 1, 2017 at 8:33 AM, Webster Homer  wrote:
> I know that /select supports score. However, I don't want to have to page
> the results, I want to use stream to stream the results of a search, but I
> cannot sort by the relevancy of the result. This seems like a MAJOR deficit
> for the streaming API
>
> /select wants to do paging which in my case I don't want.
>
> This all seems fairly arbitrary to me and a questionable limitation for
> /export, especially since /export has a search facility
>
> On Tue, Oct 31, 2017 at 7:46 PM, Joel Bernstein  wrote:
>
>> It is not possible to use score with the /export handler. The /export
>> handler currently only supports sorting by fields.
>>
>> You can sort by score using the default /select handler.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, Oct 31, 2017 at 1:50 PM, Webster Homer 
>> wrote:
>>
>> > I have a potential use case for solr searching via streaming expressions.
>> > I am currently using solr 6.2.0, but we will soon be upgrading to the
>> 7.1.0
>> > version.
>> >
>> > I started testing out searching using streaming expressions.
>> > 1. If I use an alias instead of a collection name it fails. I see that
>> > there is a Jira, SOLR-7377. Is this fixed in 7.1.0?
>> >
>> > 2. If I try to sort the results by score, it gives me an undefined field
>> > error. So it seems that streaming searches must not return values ordered
>> > by relevancy?
>> > This is a stopper for us if it has not been addressed.
>> >
>> > This is my query:
>> > search(test-catalog-product-170724,defType="edismax",q="
>> > 7732-18-5",qf="searchmv_cas_number",mm="2<-12%",fl="id_record_spec,
>> > id_s, score",sort="score desc",qt="/export")
>> >
>> > This is the error:
>> > "EXCEPTION": "java.util.concurrent.ExecutionException:
>> > java.io.IOException:
>> > -->
>> > http://141.247.245.207:8983/solr/test-catalog-product-
>> > 170724_shard2_replica1/:org.apache.solr.common.SolrException:
>> > undefined field: \"score\"",
>> >
>> > I could not find a Jira for this issue. Is it not possible to retrieve
>> the
>> > results ordered relevancy (score desc)?
>> >
>> > Seems kind of limiting
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>> >
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to ac

Re: Automatic creation of indexes

2017-11-01 Thread Emir Arnautović
>Emir, your message did not actually include anything related to the 
>presentation you mentioned.
Ups - seems I forgot to paste: https://www.youtube.com/watch?v=1gzwAgrk47c 
 

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 15:03, Shawn Heisey  wrote:
> 
> On 10/31/2017 5:32 AM, Jokin Cuadrado wrote:
>> Hi, I'm using solr to store time series data, log events etc. Right now I
>> use a solr cloud collection and cleaning it deleting documents via queries,
>> but I would like to know what approaches are other people using.
>> Is there a way to  create a collection when receiving a post to a
>> inexistent inded? So i could use the date as part of the index name, and
>> the cleanup process would be just to delete the old collections.
> 
> Solr will not automatically create indexes/collections/shards.
> 
> Automatic handling of time-partitioned indexes is something that is being 
> worked on by at least one Solr developer.  There is no ETA available.
> 
> https://issues.apache.org/jira/browse/SOLR-11299
> 
> Emir, your message did not actually include anything related to the 
> presentation you mentioned.  There's no URL pointing anywhere.  If you 
> included it as an attachment, that's generally something that doesn't work on 
> this list -- most attachments are filtered by the list software.
> 
> 
> Thanks,
> Shawn



Re: Solr streaming questions

2017-11-01 Thread Webster Homer
I know that /select supports score. However, I don't want to have to page
the results, I want to use stream to stream the results of a search, but I
cannot sort by the relevancy of the result. This seems like a MAJOR deficit
for the streaming API

/select wants to do paging which in my case I don't want.

This all seems fairly arbitrary to me and a questionable limitation for
/export, especially since /export has a search facility

On Tue, Oct 31, 2017 at 7:46 PM, Joel Bernstein  wrote:

> It is not possible to use score with the /export handler. The /export
> handler currently only supports sorting by fields.
>
> You can sort by score using the default /select handler.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Oct 31, 2017 at 1:50 PM, Webster Homer 
> wrote:
>
> > I have a potential use case for solr searching via streaming expressions.
> > I am currently using solr 6.2.0, but we will soon be upgrading to the
> 7.1.0
> > version.
> >
> > I started testing out searching using streaming expressions.
> > 1. If I use an alias instead of a collection name it fails. I see that
> > there is a Jira, SOLR-7377. Is this fixed in 7.1.0?
> >
> > 2. If I try to sort the results by score, it gives me an undefined field
> > error. So it seems that streaming searches must not return values ordered
> > by relevancy?
> > This is a stopper for us if it has not been addressed.
> >
> > This is my query:
> > search(test-catalog-product-170724,defType="edismax",q="
> > 7732-18-5",qf="searchmv_cas_number",mm="2<-12%",fl="id_record_spec,
> > id_s, score",sort="score desc",qt="/export")
> >
> > This is the error:
> > "EXCEPTION": "java.util.concurrent.ExecutionException:
> > java.io.IOException:
> > -->
> > http://141.247.245.207:8983/solr/test-catalog-product-
> > 170724_shard2_replica1/:org.apache.solr.common.SolrException:
> > undefined field: \"score\"",
> >
> > I could not find a Jira for this issue. Is it not possible to retrieve
> the
> > results ordered relevancy (score desc)?
> >
> > Seems kind of limiting
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Using bits in multi tenant document routing index in single shard

2017-11-01 Thread Ketan Thanki
Hi,

I have 4 shard and 4 replica and I do Composite document routing for my unique 
field 'Id'  as mentions below.
e.g : projectId:158380 modelId:3606 where tenants bits use as 
projectId/Numbits!modelId/Numbits! prefix with Id

NumBits distributed as mention below
3 bits would spread the tenant over 1/8th of the collection.
2 bits would spread the tenant over 1/4th of the collection.
1 bit would spread the tenant  over 1/2 the collection.
0 bits would spread the tenant across the entire collection.

Query :  I have use projectId and ModelId as string and I have try with 1,1 & 
2,2 & 3,2 all combination but its distribute in  only single shard not in 
multiple shard

Can anyone please let me know why its indexing in single shard.


Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
here to Vote

[CC Award Winners!]



Re: SOLR-11504: Provide a config to restrict number of indexing threads

2017-11-01 Thread Shawn Heisey

On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:

I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
migrating to solr6 and locally working around it in Lucene code. I am
thinking to fix it properly and hopefully patch back to Solr. Since, Lucene
code does not want to keep any such config, I am thinking to use a counting
semaphore in Solr code before calling IndexWriter.addDocument(s) or
IndexWriter.updateDocument(s).


There's a fairly simple way to control the number of indexing threads 
that doesn't require ANY changes to Solr:  Don't start as many 
threads/processes on your indexing client(s).  If you control the number 
of simultaneous requests sent to Solr, then Solr won't start as many 
indexing threads.  That kind of control over your indexing system is 
something that's always preferable to have.


Thanks,
Shawn


Re: Automatic creation of indexes

2017-11-01 Thread Shawn Heisey

On 10/31/2017 5:32 AM, Jokin Cuadrado wrote:

Hi, I'm using solr to store time series data, log events etc. Right now I
use a solr cloud collection and cleaning it deleting documents via queries,
but I would like to know what approaches are other people using.
Is there a way to  create a collection when receiving a post to a
inexistent inded? So i could use the date as part of the index name, and
the cleanup process would be just to delete the old collections.


Solr will not automatically create indexes/collections/shards.

Automatic handling of time-partitioned indexes is something that is 
being worked on by at least one Solr developer.  There is no ETA available.


https://issues.apache.org/jira/browse/SOLR-11299

Emir, your message did not actually include anything related to the 
presentation you mentioned.  There's no URL pointing anywhere.  If you 
included it as an attachment, that's generally something that doesn't 
work on this list -- most attachments are filtered by the list software.



Thanks,
Shawn


Re: Stateless queries to secured SOLR server.

2017-11-01 Thread Shawn Heisey

On 10/31/2017 2:08 PM, Phil Scadden wrote:

Thanks Shawn. I have done it with SolrJ. Apart from needing the 
NoopResponseParser to handle the wt=, it was pretty painless.


This is confusing to me, because with SolrJ, you do not need to be 
concerned with the response format *AT ALL*.  You don't need to use the 
wt parameter, SolrJ will handle that for you.  In fact, you should NOT 
set the wt parameter.


Thanks,
Shawn


Re: Solr response with original value

2017-11-01 Thread Shawn Heisey

On 10/31/2017 1:38 PM, Venkateswarlu Bommineni wrote:

Thanks for the reply Shawn.

But I am little confused on faceting on one field and return the result of
another field.
could you please give sample query. Thanks a lot in advance!!!



I really don't know what you mean.  Facets do not have results with 
fields -- they return document counts.


Thanks,
Shawn


Re: Automatic creation of indexes

2017-11-01 Thread Emir Arnautović
Hi Jokin,
Here is presentation of my colleagues talking about using Solr for logs.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 31 Oct 2017, at 12:32, Jokin Cuadrado  wrote:
> 
> Hi, I'm using solr to store time series data, log events etc. Right now I
> use a solr cloud collection and cleaning it deleting documents via queries,
> but I would like to know what approaches are other people using.
> Is there a way to  create a collection when receiving a post to a
> inexistent inded? So i could use the date as part of the index name, and
> the cleanup process would be just to delete the old collections.



Re: Query regarding to multi tenant composite ID document routing

2017-11-01 Thread Emir Arnautović
Hi Katan,
Here is blog post explaining how to use routing: 
https://sematext.com/blog/solrcloud-large-tenants-and-routing/ 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 07:51, Ketan Thanki  wrote:
> 
> Hi,
> 
> Need the help regarding to below mention query.
> 
> I have 2 collections with each has 4 shard and 4 replica and i want to  
> implemented Composite document routing for my unique field 'Id'  as mentions 
> below.
> e.g : projectId:158380 modelId:3606 where tenants bits use as 
> projectId/2!modelId/2 prefix with Id where Id is the unique solr documentID 
> which is combination of value.
> 
> Query: How it will index in solr while use bit means it need to index with 
> '/' or not as mention below
> Like "158380/2! 3606/2!Id" OR "id":"79190!1803!Id"
> 
> And which value I need to pass in query for _route_ parameter.
> 
> Please do needful.
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
> here to Vote
> 
> [CC Award Winners!]
> 



Re: Advice on Stemming in Solr

2017-11-01 Thread Emir Arnautović
Hi Edwin,
If the number of words that should not be stemmed is not high you could use 
KeywordMarkerFilterFactory to flag those words as keywords and it should 
prevent stemmer from changing them.
Depending on what you want to achieve, you might not be able to avoid using 
stemmer at indexing time. If you want to find documents that contain only 
“walking” with search term “walk”, then you have to stem at index time. Cases 
when you use stemming on query time only are rare and specific.
If you want to prefer exact matches over stemmed matches, you have to index 
same content with and without stemming and boost matches on field without 
stemming.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> We are currently using KStemFilterFactory in Solr, but we found that it is
> actually doing stemming on non-English words like "ximenting", which it
> stem to "ximent". This is not what we wanted.
> 
> Another option is to use the HunspellStemFilterFactory, but there are some
> English words like "running", walking" that are not being stemmed.
> 
> Would like to check, is it advisable to use Stemming at index? Or we should
> not use Stemming at index time, but at query time, do a search for the
> stemmed words as well, like for example, if the user search for "walking",
> we will do the search together with "walk", and the actual word of walking
> will have higher weightage.
> 
> I'm currently using Solr 6.5.1.
> 
> Regards,
> Edwin



Re: Making a String field case-insensitive

2017-11-01 Thread Emir Arnautović
Hi,
You can use KeywordTokenizer and LowerCaseTokenFilterFactory.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> Would like to find out, what is the best way to lower-case a String index
> in Solr, to make it case insensitive, while preserving the structure of the
> string (ie It should not break into different tokens at space, and should
> not remove any characters or symbols)
> 
> I found that solr.StrField does not use lower case filter. But if I change
> it to solr.TextField and uses Standard Tokenizer, the fields get broken up.
> 
> Eg:
> 
> For this configuration,
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> The string "*SYStem 500 **" gets broken down into this
> 
> *system | 500*
> 
> The system and 500 are separated into 2 tokens, which is not what we want.
> Also, the * is being removed.
> 
> 
> We will like to have something like this. This will preserve what it is as
> a string but just lowercase it.
> 
> *system 500 **



LatLonPointSpatialField, sorting : sort param could not be parsed as a query, and is not a field that exists in the index

2017-11-01 Thread Clemens Wyss DEV
Context: solr 6.6.0

Im switching my schemas from derprecated solr.LatLonType to 
solr.LatLonPointSpatialField. Now my sortquery (which used to work with 
solr.LatLonType):

sort=geodist(b4_location__geo_si,47.36667,8.55) asc

raises the error

"sort param could not be parsed as a query, and is not a field that exists in 
the index: geodist(b4_location__geo_si,47.36667,8.55)"

Invoking sort by 

sfield=b4_location__geo_si&pt=47.36667,8.55&sort=geodist() asc

works as expected though...

Why does "sort=geodict(fld,lat,ln)" no more work?

Thx for any hints advices
Clemens


Advice on Stemming in Solr

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi,

We are currently using KStemFilterFactory in Solr, but we found that it is
actually doing stemming on non-English words like "ximenting", which it
stem to "ximent". This is not what we wanted.

Another option is to use the HunspellStemFilterFactory, but there are some
English words like "running", walking" that are not being stemmed.

Would like to check, is it advisable to use Stemming at index? Or we should
not use Stemming at index time, but at query time, do a search for the
stemmed words as well, like for example, if the user search for "walking",
we will do the search together with "walk", and the actual word of walking
will have higher weightage.

I'm currently using Solr 6.5.1.

Regards,
Edwin


Making a String field case-insensitive

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, what is the best way to lower-case a String index
in Solr, to make it case insensitive, while preserving the structure of the
string (ie It should not break into different tokens at space, and should
not remove any characters or symbols)

I found that solr.StrField does not use lower case filter. But if I change
it to solr.TextField and uses Standard Tokenizer, the fields get broken up.

Eg:

For this configuration,










   

The string "*SYStem 500 **" gets broken down into this

*system | 500*

The system and 500 are separated into 2 tokens, which is not what we want.
Also, the * is being removed.


We will like to have something like this. This will preserve what it is as
a string but just lowercase it.

*system 500 **