Re: Accent insensitive search for greek characters

2017-10-16 Thread Chitra
Hi Shawan,
 Thank you so much for the kind response.


> Those filters operate on single characters from the input, which means
> they cannot take character context into account like ICU does.  If I am
> reading what the ASCII filter does correctly, it may not work for Greek
> characters at all -- it says that it folds to the lower range of ASCII,
> and that character set doesn't have Greek letters


 yes, as of now we are using ASCIIFolding filter and LowerCaseFilter to
remove diacritics and case folding but in some cases, it doesn't work for
greek accent characters.

So only, I am looking for better solution.


-- 
Regards,
Chitra


Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread David M Giannone




Sent via the Samsung Galaxy S® 6, an AT 4G LTE smartphone


 Original message 
From: Randy Fradin 
Date: 10/16/17 7:38 PM (GMT-05:00)
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

Each shard has around 4.2 million documents which are around 40GB on disk.
Two nodes have 3 shard replicas each and the third has 2 shard replicas.

The text of the exception is: java.lang.OutOfMemoryError: Java heap space
And the heap dump is a full 24GB indicating the full heap space was being
used.

Here is the solrconfig as output by the config request handler:

{
  "responseHeader":{
"status":0,
"QTime":0},
  "config":{
"znodeVersion":0,
"luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
"updateHandler":{
  "indexWriter":{"closeWaitsForMerges":true},
  "commitWithin":{"softCommit":true},
  "autoCommit":{
"maxDocs":5,
"maxTime":30,
"openSearcher":false},
  "autoSoftCommit":{
"maxDocs":-1,
"maxTime":3}},
"query":{
  "useFilterForSortedQuery":false,
  "queryResultWindowSize":1,
  "queryResultMaxDocsCached":2147483647,
  "enableLazyFieldLoading":false,
  "maxBooleanClauses":1024,
  "":{
"size":"1",
"showItems":"-1",
"initialSize":"10",
"name":"fieldValueCache"}},
"jmx":{
  "agentId":null,
  "serviceUrl":null,
  "rootName":null},
"requestHandler":{
  "/select":{
"name":"/select",
"defaults":{
  "rows":10,
  "echoParams":"explicit"},
"class":"solr.SearchHandler"},
  "/update":{
"useParams":"_UPDATE",
"class":"solr.UpdateRequestHandler",
"name":"/update"},
  "/update/json":{
"useParams":"_UPDATE_JSON",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/json"},
"name":"/update/json"},
  "/update/csv":{
"useParams":"_UPDATE_CSV",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/csv"},
"name":"/update/csv"},
  "/update/json/docs":{
"useParams":"_UPDATE_JSON_DOCS",
"class":"solr.UpdateRequestHandler",
"invariants":{
  "update.contentType":"application/json",
  "json.command":"false"},
"name":"/update/json/docs"},
  "update":{
"class":"solr.UpdateRequestHandlerApi",
"useParams":"_UPDATE_JSON_DOCS",
"name":"update"},
  "/config":{
"useParams":"_CONFIG",
"class":"solr.SolrConfigHandler",
"name":"/config"},
  "/schema":{
"class":"solr.SchemaHandler",
"useParams":"_SCHEMA",
"name":"/schema"},
  "/replication":{
"class":"solr.ReplicationHandler",
"useParams":"_REPLICATION",
"name":"/replication"},
  "/get":{
"class":"solr.RealTimeGetHandler",
"useParams":"_GET",
"defaults":{
  "omitHeader":true,
  "wt":"json",
  "indent":true},
"name":"/get"},
  "/admin/ping":{
"class":"solr.PingRequestHandler",
"useParams":"_ADMIN_PING",
"invariants":{
  "echoParams":"all",
  "q":"{!lucene}*:*"},
"name":"/admin/ping"},
  "/admin/segments":{
"class":"solr.SegmentsInfoRequestHandler",
"useParams":"_ADMIN_SEGMENTS",
"name":"/admin/segments"},
  "/admin/luke":{
"class":"solr.LukeRequestHandler",
"useParams":"_ADMIN_LUKE",
"name":"/admin/luke"},
  "/admin/system":{
"class":"solr.SystemInfoHandler",
"useParams":"_ADMIN_SYSTEM",
"name":"/admin/system"},
  "/admin/mbeans":{
"class":"solr.SolrInfoMBeanHandler",
"useParams":"_ADMIN_MBEANS",
"name":"/admin/mbeans"},
  "/admin/plugins":{
"class":"solr.PluginInfoHandler",
"name":"/admin/plugins"},
  "/admin/threads":{
"class":"solr.ThreadDumpHandler",
"useParams":"_ADMIN_THREADS",
"name":"/admin/threads"},
  "/admin/properties":{
"class":"solr.PropertiesRequestHandler",
"useParams":"_ADMIN_PROPERTIES",
"name":"/admin/properties"},
  "/admin/logging":{
"class":"solr.LoggingHandler",
"useParams":"_ADMIN_LOGGING",
"name":"/admin/logging"},
  "/admin/file":{
"class":"solr.ShowFileRequestHandler",
"useParams":"_ADMIN_FILE",
"name":"/admin/file"},
  "/export":{
"class":"solr.ExportHandler",
"useParams":"_EXPORT",
"components":["query"],
"defaults":{"wt":"json"},
"invariants":{
  "rq":"{!xport}",
  "distrib":false},
"name":"/export"},
  "/graph":{
"class":"solr.GraphHandler",
"useParams":"_ADMIN_GRAPH",
"invariants":{
  

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Randy Fradin
Each shard has around 4.2 million documents which are around 40GB on disk.
Two nodes have 3 shard replicas each and the third has 2 shard replicas.

The text of the exception is: java.lang.OutOfMemoryError: Java heap space
And the heap dump is a full 24GB indicating the full heap space was being
used.

Here is the solrconfig as output by the config request handler:

{
  "responseHeader":{
"status":0,
"QTime":0},
  "config":{
"znodeVersion":0,
"luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
"updateHandler":{
  "indexWriter":{"closeWaitsForMerges":true},
  "commitWithin":{"softCommit":true},
  "autoCommit":{
"maxDocs":5,
"maxTime":30,
"openSearcher":false},
  "autoSoftCommit":{
"maxDocs":-1,
"maxTime":3}},
"query":{
  "useFilterForSortedQuery":false,
  "queryResultWindowSize":1,
  "queryResultMaxDocsCached":2147483647,
  "enableLazyFieldLoading":false,
  "maxBooleanClauses":1024,
  "":{
"size":"1",
"showItems":"-1",
"initialSize":"10",
"name":"fieldValueCache"}},
"jmx":{
  "agentId":null,
  "serviceUrl":null,
  "rootName":null},
"requestHandler":{
  "/select":{
"name":"/select",
"defaults":{
  "rows":10,
  "echoParams":"explicit"},
"class":"solr.SearchHandler"},
  "/update":{
"useParams":"_UPDATE",
"class":"solr.UpdateRequestHandler",
"name":"/update"},
  "/update/json":{
"useParams":"_UPDATE_JSON",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/json"},
"name":"/update/json"},
  "/update/csv":{
"useParams":"_UPDATE_CSV",
"class":"solr.UpdateRequestHandler",
"invariants":{"update.contentType":"application/csv"},
"name":"/update/csv"},
  "/update/json/docs":{
"useParams":"_UPDATE_JSON_DOCS",
"class":"solr.UpdateRequestHandler",
"invariants":{
  "update.contentType":"application/json",
  "json.command":"false"},
"name":"/update/json/docs"},
  "update":{
"class":"solr.UpdateRequestHandlerApi",
"useParams":"_UPDATE_JSON_DOCS",
"name":"update"},
  "/config":{
"useParams":"_CONFIG",
"class":"solr.SolrConfigHandler",
"name":"/config"},
  "/schema":{
"class":"solr.SchemaHandler",
"useParams":"_SCHEMA",
"name":"/schema"},
  "/replication":{
"class":"solr.ReplicationHandler",
"useParams":"_REPLICATION",
"name":"/replication"},
  "/get":{
"class":"solr.RealTimeGetHandler",
"useParams":"_GET",
"defaults":{
  "omitHeader":true,
  "wt":"json",
  "indent":true},
"name":"/get"},
  "/admin/ping":{
"class":"solr.PingRequestHandler",
"useParams":"_ADMIN_PING",
"invariants":{
  "echoParams":"all",
  "q":"{!lucene}*:*"},
"name":"/admin/ping"},
  "/admin/segments":{
"class":"solr.SegmentsInfoRequestHandler",
"useParams":"_ADMIN_SEGMENTS",
"name":"/admin/segments"},
  "/admin/luke":{
"class":"solr.LukeRequestHandler",
"useParams":"_ADMIN_LUKE",
"name":"/admin/luke"},
  "/admin/system":{
"class":"solr.SystemInfoHandler",
"useParams":"_ADMIN_SYSTEM",
"name":"/admin/system"},
  "/admin/mbeans":{
"class":"solr.SolrInfoMBeanHandler",
"useParams":"_ADMIN_MBEANS",
"name":"/admin/mbeans"},
  "/admin/plugins":{
"class":"solr.PluginInfoHandler",
"name":"/admin/plugins"},
  "/admin/threads":{
"class":"solr.ThreadDumpHandler",
"useParams":"_ADMIN_THREADS",
"name":"/admin/threads"},
  "/admin/properties":{
"class":"solr.PropertiesRequestHandler",
"useParams":"_ADMIN_PROPERTIES",
"name":"/admin/properties"},
  "/admin/logging":{
"class":"solr.LoggingHandler",
"useParams":"_ADMIN_LOGGING",
"name":"/admin/logging"},
  "/admin/file":{
"class":"solr.ShowFileRequestHandler",
"useParams":"_ADMIN_FILE",
"name":"/admin/file"},
  "/export":{
"class":"solr.ExportHandler",
"useParams":"_EXPORT",
"components":["query"],
"defaults":{"wt":"json"},
"invariants":{
  "rq":"{!xport}",
  "distrib":false},
"name":"/export"},
  "/graph":{
"class":"solr.GraphHandler",
"useParams":"_ADMIN_GRAPH",
"invariants":{
  "wt":"graphml",
  "distrib":false},
"name":"/graph"},
  "/stream":{
"class":"solr.StreamHandler",
"useParams":"_STREAM",
"defaults":{"wt":"json"},
"invariants":{"distrib":false},
"name":"/stream"},
  "/sql":{

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Shawn Heisey
On 10/16/2017 3:19 PM, Randy Fradin wrote:
> We are seeing a lot of full GC events and eventual OOM errors in Solr
> during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap.
> At these times indexing is the only activity taking place. The collection
> has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
> hundred fields each), and indexing is using the normal update handler, 1
> document per request, up to 240 request at a time.
>
> The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
> instances of DocumentsWriter. Within those instances, all of the heap is
> retained by the blockedFlushes LinkedList inside the flushControl object.
> Each node in the LinkedList appears to be retaining around 55MB.
>
> Clearly something to do with flushing is at play here but I'm at a loss
> what tuning parameters I should be looking at. I would expect things to
> start blocking if I fall too far behind on flushing but apparently that's
> not happening. The ramBufferSizeMB is set to the default 100. My heap size
> is already absurdly more than I thought we would need for this volume.

One of the first things we need to find out is about your index size.

In each of your shards, how many documents are there?  How much disk
space does one shard replica take up?  How many shard replica cores does
each node have on it in total?

I would also like to get a look at your full solrconfig.xml file.  The
schema may be helpful at a later date, along with an example of a
document that you're indexing.  With ramBufferSizeMB at the default,
having a ton of memory used up by a class used for indexing seems very odd.

Do you have the text of the OOM exception? Is it saying out of heap
space, or some other problem?

Thanks,
Shawn



Re: JAr errors with SoLr 6.6.1 and http client and core

2017-10-16 Thread Shawn Heisey
On 10/16/2017 1:45 PM, Johnson, Jaya wrote:
> Hi I have the following code.
> System.out.println("Initializing server");
> SystemDefaultHttpClient cl = new SystemDefaultHttpClient();
> client = new 
> HttpSolrClient("http://localhost:8983/solr/#/prosp_poc_collection",cl);
> System.out.println("Completed initializing the server");
> client.deleteByQuery( "*:*" );
>
> Version of solr 6.6.1 and http client is 4.5.3 and 4.4.8 for http core.
> I get the following exception please advise.
>
> Exception Details:
>   Location:
> 
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
>  @57: areturn
>   Reason:
> Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current 
> frame, stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)

There are two problems here.  The first problem, which is causing the
exception, is that you used SystemDefaultHttpClient when you initialized
the SolrJ client.  SolrJ has used CloseableHttpClient for quite a while
now.  Since you aren't customizing the HttpClient when you create it,
you don't need to even worry about that part in your code.  Just remove
the parameter entirely.  SolrJ will create the HttpClient internally. 
If you *do* decide in the future that you want to customize the
HttpClient, which is a good idea if the code is multi-threaded, then you
should use the Builder paradigm provided by the HttpClient project to
handle that.  The SystemDefaultHttpClient class was deprecated in
HttpClient version 4.3 and is gone in the 5.0 alpha releases.

The second problem will manifest itself after you fix the problem with
the HttpClient object.  In the HttpSolrClient constructor, you are using
a URL that isn't going to work.  The URL you've provided is from your
browser in the admin UI -- it has the "#" character in it.  Solr URLS
with "#" in them *only* work in an actual full-blown browser with
Javascript.  In the code examples below, I will indicate the correct URL
that you should be using.

You have two choices about how to fix the URL problem.  One is to
provide the correct base URL for the collection, the other is to NOT
provide a collection in the URL, and to give it the collection on every
request.  I prefer the second option, but there is nothing wrong with
the first.

//
client = new
HttpSolrClient("http://localhost:8983/solr/prosp_poc_collection;);
client.deleteByQuery("*:*");
//

or

//
client = new HttpSolrClient("http://localhost:8983/solr;);
client.deleteByQuery("prosp_poc_collection", "*:*");
//

Note that if your servers are running in cloud mode, you should use
CloudSolrClient, not HttpSolrClient.  The initialization of this class
uses the same ZooKeeper connection string used when starting Solr itself:

//
client = new
CloudSolrClient("zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/chroot");
client.deleteByQuery("prosp_poc_collection", "*:*");
//

Thanks,
Shawn



Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-16 Thread Shawn Heisey
On 10/13/2017 7:13 AM, Rick Leir wrote:
> What is the earliest version which was vulnerable?

The XML query parser was added to Solr in version 5.5.  Since that's a
critical part of the remote exploit, that's the minimum version to be
worried about in situations where end users cannot reach Solr directly. 
If end users have direct access to the Solr server, then that opens up a
whole different class of problems.

https://issues.apache.org/jira/browse/SOLR-839

Because the XML parser is enabled by default in Solr without any
configuration, currently the only way to "turn off" that parser is to
redefine the "xmlparser" name to another parser, with a config line like
this in solrconfig.xml:

  

This config doesn't actually unload the XML parser, but it does
effectively make it inaccessible, because the name is redirected to a
different parser.

I opened this issue:

https://issues.apache.org/jira/browse/SOLR-11495

Thanks,
Shawn



Re: Concern on solr commit

2017-10-16 Thread Shawn Heisey
I'm supplementing the other replies you've already gotten.  See inline:


On 10/13/2017 2:30 AM, Leo Prince wrote:
> I am getting the following errors/warnings from Solr > > 1, ERROR: >
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >
> Error opening new searcher. exceeded limit of maxWarmingSearchers=2,
> try again later. 2, PERFORMANCE WARNING: Overlapping > onDeckSearchers=2 3, 
> WARN: DistributedUpdateProcessor error sending
See this FAQ entry:

https://wiki.apache.org/solr/FAQ?highlight=%28ondecksearchers%29#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

> So my concern is, is there any chance of performance issues when > number of 
> commits are high at a particular point of time. In our >
application, we are approximating like 100-500 commits can happen >
simultaneously from application and autocommit too for those >
individual requests which are not committing individually after the >
write. > > Autocommit is configured as follows, > > 
15000 > false 
The commits generated by this configuration are not opening new
searchers, so they are not connected in any way to the error messages
you're getting, which are about new searchers.  Note that this
particular kind of configuration is strongly recommended for ALL Solr
installs using Solr 4.0 and later, so that transaction logs do not grow
out of control.  I would personally use a value of at least 6 for
autoCommit, but there is nothing wrong with a 15 second interval.

The initial reply you got on this thread mentioned that commits from the
application are discouraged.  I don't agree with this statement, but I
will say that the way that people *use* commits from the application is
frequently very wrong, and because of that, switching to fully automatic
soft commits is often the best solution, because they are somewhat
easier to control.

We have no way of knowing how long it will take to open a new searcher
on your index.  It depends on a lot of factors.  Whatever that time is,
commits should not be happening on a more frequent basis than that
interval.  They should happen *less* frequently than that interval if at
all possible.  Depending on exactly how Solr is configured, it might be
possible to reduce the amount of time that a commit with a new searcher
takes to complete.

Definitely avoid sending a commit after every document.  It is generally
also a bad idea to send a commit with every update request.  If you want
to do commits manually, then you should index a bunch of data and then
send one commit to make all those changes visible, and not do another
commit until you do another batch of indexing.

Thanks,
Shawn



Re: Solr related questions

2017-10-16 Thread Shawn Heisey
On 10/13/2017 5:50 AM, startrekfan wrote:
> Thank you for your answer.
>
> To 3.)
> The file is on server A, my program is on server B and  solr is on server
> C. If I use a normal http(rest) post, my program has to fetch the file
> content from server A to Server B and then post it from server B to server
> C as there is no open connection between A and C. So the file has to be
> transmitted two times.
> Is there a way to tell solr to read the file _directly_ from Server A (e.g.
> via SMB)

What exactly is in a "file" in this situation, and what does your
service do with that file in order to decide what information gets sent
to Solr?  This information will be vital to figuring out whether you can
do what you're wanting to do.

If your service does not have business-specific logic, and the files on
your server are more generic, Solr does have the ability to "directly"
index rich text files like PDF, Word, etc.  Typically the file is still
sent to Solr even with that functionality.  I think there are ways to
have it fetch the file, but I have no idea what kind of fetching is
supported.

There is one major issue with using that ability, called the Extracting
Request Handler.  That functionality uses another piece of Apache
software called Tika.  Because the exact structure of the documents that
Tika supports can change subtly and not all of those formats are fully
documented, Tika has a habit of exploding when it encounters something
that its authors have never seen before.  If Tika is running inside Solr
when it explodes, that explosion can take down the entire Solr process. 
For that reason, we do not actually recommend running that functionality
inside Solr, but rather in an external program that extracts information
and sends it to Solr.

The Tika authors do take such explosions seriously, and they do try to
fix those problems when they are encountered.  It is impossible for the
Tika project to prevent such problems from occurring, because there will
always be documents produced that contain data formats that they've
never seen before.

Generally speaking, if you already have a well-tested way of extracting
information from files and sending it to Solr, the recommendation is
that you stick with that software, rather than try to get Solr to
directly index your files.

Thanks,
Shawn



Re: Accent insensitive search for greek characters

2017-10-16 Thread Shawn Heisey
On 10/13/2017 1:28 AM, Chitra wrote:
>I want to search greek characters(with accent insensitive) by removing
> or replacing accent marks with similar characters.
>
> Eg: when searching a greek accent word say *πῬοἲὅν*, we expect accent
> insensitive search ie need equivalent greek accent like *προιον*
>
> Moreover, I am not having more knowledge on Greek characters. so only I am
> looking for standard rules to perform greek accent insensitive search.
>
> Does *ICUFoldingFilter* solve my case? I have tried this already. Its
> working fine for greek accent characters. But this is not language
> specific... It has internalization support for all languages. Here, I am
> not sure whether it will break my existing language behavior in the index.
>
> Is there any way to make ICUFoldingFilter as language specific?

The entire point of the ICU filters is that they are functional across
all of Unicode -- all languages.  As far as I am aware, there is no way
to adjust what ICUFoldingFilter does.  According to the code, it
offloads all work to IBM's ICU library and does not offer any
configurability.

The following filters also exist, with less functionality than the ICU
filter:

https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ASCIIFoldingFilter
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LowerCaseFilter

Those filters operate on single characters from the input, which means
they cannot take character context into account like ICU does.  If I am
reading what the ASCII filter does correctly, it may not work for Greek
characters at all -- it says that it folds to the lower range of ASCII,
and that character set doesn't have Greek letters.

Thanks,
Shawn



SolrCloud - scalability issues with many collections

2017-10-16 Thread Shawn Heisey
Some time ago, I did some testing with SolrCloud (mostly 5.0 and
branch_5x) handling thousands of collections.

How that testing went is documented in SOLR-7191.  That issue has been
marked as resolved in version 6.3, but no commits were made in the
issue, and I haven't seen any evidence to suggest that recent SolrCloud
versions can handle thousands of collections any better than older
versions did.  In fact, what evidence I *have* seen suggests that newer
versions are worse than older versions in this regard.

More recently, I tried to re-run the testing with version 6.4.0, but
never could get all the collections created, so I couldn't do the same
tests.  Between SOLR-10130 and a number of SolrCloud fixes included in
later 6.x releases, I had hoped that a new version would fare better.

I have begun some new testing with 7.0.1.  This has been plagued by a
bunch of problems.  I am still having trouble creating all of the
collections for the test, and I'm seeing evidence of very slow and
problematic restarts even though I haven't yet created all the
collections I'm after.

One persistent problem I've encountered is that every collection
creation (using the Collections API) is taking longer than expected to
complete.  Just creating the "gettingstarted" collection while setting
up the cloud example had a QTime for the Collections API of over seven
seconds, and subsequent collections each take at least four seconds. 
Creating those initial collections should take a LOT less time.

Because a number of scalability-related fixes have been made since
SOLR-7191, I will likely file a new issue to cover problems I've
encountered with 7.0.1.

I wanted to add some temporary logging to the collection creation code,
so I could see how long each part of the creation was taking.  I hope to
locate the bottlenecks that cause creation to take several seconds. 
Unfortunately, I was unable to figure out *where* that code is, so I've
got no idea where to add the logging.  Can anyone point me to the right
place in the code?

Thanks,
Shawn



OOM during indexing with 24G heap - Solr 6.5.1

2017-10-16 Thread Randy Fradin
We are seeing a lot of full GC events and eventual OOM errors in Solr
during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap.
At these times indexing is the only activity taking place. The collection
has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
hundred fields each), and indexing is using the normal update handler, 1
document per request, up to 240 request at a time.

The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
instances of DocumentsWriter. Within those instances, all of the heap is
retained by the blockedFlushes LinkedList inside the flushControl object.
Each node in the LinkedList appears to be retaining around 55MB.

Clearly something to do with flushing is at play here but I'm at a loss
what tuning parameters I should be looking at. I would expect things to
start blocking if I fall too far behind on flushing but apparently that's
not happening. The ramBufferSizeMB is set to the default 100. My heap size
is already absurdly more than I thought we would need for this volume.

Any idea what could be causing this?


JAr errors with SoLr 6.6.1 and http client and core

2017-10-16 Thread Johnson, Jaya

Hi I have the following code.
System.out.println("Initializing server");
SystemDefaultHttpClient cl = new SystemDefaultHttpClient();
client = new 
HttpSolrClient("http://localhost:8983/solr/#/prosp_poc_collection",cl);
System.out.println("Completed initializing the server");
client.deleteByQuery( "*:*" );


Version of solr 6.6.1 and http client is 4.5.3 and 4.4.8 for http core.
I get the following exception please advise.

Exception Details:
  Location:

org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
 @57: areturn
  Reason:
Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, 
stack[0]) is not assignable to 
'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
  Current Frame:
bci: @57
flags: { }
locals: { 'org/apache/solr/common/params/SolrParams', 
'org/apache/solr/common/params/ModifiableSolrParams', 
'org/apache/http/impl/client/SystemDefaultHttpClient' }
stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
  Bytecode:
0x000: bb00 0959 2ab7 000a 4cb2 000b b900 0c01
0x010: 0099 001e b200 0bbb 000d 59b7 000e 120f
0x020: b600 102b b600 11b6 0012 b900 1302 00b8
0x030: 0014 4d2c 2bb8 0015 2cb0
  Stackmap Table:
append_frame(@47,Object[#172])

   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:514)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
   at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:895)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:858)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:873)
   at 
com.moodys.poc.tika.test.PocApacheTikaSecFilings.initSolrServer(PocApacheTikaSecFilings.java:35)
   at 
com.moodys.poc.tika.test.PocApacheTikaSecFilings.main(PocApacheTikaSecFilings.java:41)

-

Moody's monitors email communications through its networks for regulatory 
compliance purposes and to protect its customers, employees and business and 
where allowed to do so by applicable law. The information contained in this 
e-mail message, and any attachment thereto, is confidential and may not be 
disclosed without our express permission. If you are not the intended recipient 
or an employee or agent responsible for delivering this message to the intended 
recipient, you are hereby notified that you have received this message in error 
and that any review, dissemination, distribution or copying of this message, or 
any attachment thereto, in whole or in part, is strictly prohibited. If you 
have received this message in error, please immediately notify us by telephone, 
fax or e-mail and delete the message and all of its attachments. Every effort 
is made to keep our network free from viruses. You should, however, review this 
e-mail message, as well as any attachment thereto, for viruses. We take no 
responsibility and have no liability for any computer virus which may be 
transferred via this e-mail message.

-


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Gus Heck
Headers however do not display in many mail clients/webUIs...

On Mon, Oct 16, 2017 at 9:23 AM, Richard 
wrote:

> The list help/unsubscribe/post/etc. details are, as is not uncommon,
> in the message header:
>
>   List-Help: 
>   List-Unsubscribe: 
>   List-Post: 
>
> of all messages posted to the list.
>
>
>  Original Message 
> > Date: Monday, October 16, 2017 09:16:08 -0400
> > From: Gus Heck 
> > To: solr-user@lucene.apache.org
> > Subject: Re: HOW DO I UNSUBSCRIBE FROM GROUP?
> >
> > While this has been the traditional response, and it's accurate and
> > helpful, the user that complained about no unsubscribe link has a
> > point. This is the normal expectation in this day and age. Maybe
> > Apache should consider appending a "You are receiving this because
> > you are subscribed to (list) click here to unsubscribe" line, but I
> > know that if I hadn't been dealing with various apache mailing
> > lists on and off for 15 years and I found that I was getting emails
> > with no unsubscribe links in undesired quantities, the spam bucket
> > would be my answer (probably never send the email asking for how to
> > unsubscribe). That's certainly the policy I use for any marketing
> > type mails (no unsubscribe == spam bucket)... A simple unsubscribe
> > tagline could help us not get tagged as spam, and avoid this type
> > of email (which has been a regular occurrence for 15 years)
> >
> > -Gus
> >
> > Hi,
> >>
> >> If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by
> >> following the instructions on the
> >> http://lucene.apache.org/solr/community.html. Hope this
> >> helps.
> >>
> >>
>
>  End Original Message 
>
>
>


-- 
http://www.the111shift.com


Re: spell-check does not return collations when using search query with filter

2017-10-16 Thread Arnold Bronley
With q instead of spellcheck.q I get following response:


{
  "responseHeader": {
"status": 0,
"QTime": 23,
"params": {
  "q": "tag:polt",
  "spellcheck.collateExtendedResults": "true",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [
  "polt",
  {
"numFound": 7,
"startOffset": 3,
"endOffset": 8,
"origFreq": 0,
"suggestion": [
  {
"word": "plot",
"freq": 5934
  },
  {
"word": "port",
"freq": 495
  },
  {
"word": "post",
"freq": 233
  },
  {
"word": "poly",
"freq": 216
  },
  {
"word": "pole",
"freq": 175
  },
  {
"word": "poll",
"freq": 12
  },
  {
"word": "polm",
"freq": 9
  }
]
  }
],
"correctlySpelled": false,
"collations": [

]
  }
}


with q and using the workaround that I mentioned, I get proper response as
follows (Note that I passed tag:\polt to q but in responseHeader, it shows
the escaped version i.e. tag:\\polt):

{
  "responseHeader": {
"status": 0,
"QTime": 20,
"params": {
  "q": "tag:\\polt",
  "spellcheck.collateExtendedResults": "true",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [
  "polt",
  {
"numFound": 7,
"startOffset": 4,
"endOffset": 9,
"origFreq": 0,
"suggestion": [
  {
"word": "plot",
"freq": 5934
  },
  {
"word": "port",
"freq": 495
  },
  {
"word": "post",
"freq": 233
  },
  {
"word": "poly",
"freq": 216
  },
  {
"word": "pole",
"freq": 175
  },
  {
"word": "poll",
"freq": 12
  },
  {
"word": "polm",
"freq": 9
  }
]
  }
],
"correctlySpelled": false,
"collations": [
  "collation",
  {
"collationQuery": "tag:plot",
"hits": 703,
"misspellingsAndCorrections": [
  "polt",
  "plot"
]
  },
  "collation",
  {
"collationQuery": "tag:port",
"hits": 8,
"misspellingsAndCorrections": [
  "polt",
  "port"
]
  },
  "collation",
  {
"collationQuery": "tag:post",
"hits": 3,
"misspellingsAndCorrections": [
  "polt",
  "post"
]
  }
]
  }

On Mon, Oct 16, 2017 at 3:00 PM, Arnold Bronley 
wrote:

> with spellcheck.q I don't get anything back at all.
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 10,
> "params": {
>   "spellcheck.collateExtendedResults": "true",
>   "spellcheck.q": "tag:polt",
>   "indent": "true",
>   "spellcheck": "true",
>   "spellcheck.accuracy": "0.72",
>   "spellcheck.maxCollations": "3",
>   "spellcheck.onlyMorePopular": "true",
>   "spellcheck.count": "7",
>   "spellcheck.maxCollationTries": "3",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "spellcheck.collate": "true"
> }
>   },
>   "response": {
> "numFound": 0,
> "start": 0,
> "docs": [
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>
> ],
> "correctlySpelled": false,
> "collations": [
>
> ]
>   }
> }
>
> On Mon, Oct 16, 2017 at 5:03 AM, alessandro.benedetti <
> a.benede...@sease.io> wrote:
>
>> Interesting, what happens when you pass it as spellcheck.q=polt ?
>> What is the behavior you get ?
>>
>>
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: spell-check does not return collations when using search query with filter

2017-10-16 Thread Arnold Bronley
with spellcheck.q I don't get anything back at all.

{
  "responseHeader": {
"status": 0,
"QTime": 10,
"params": {
  "spellcheck.collateExtendedResults": "true",
  "spellcheck.q": "tag:polt",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [

],
"correctlySpelled": false,
"collations": [

]
  }
}

On Mon, Oct 16, 2017 at 5:03 AM, alessandro.benedetti 
wrote:

> Interesting, what happens when you pass it as spellcheck.q=polt ?
> What is the behavior you get ?
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr JDBC with Core (vs Collection)

2017-10-16 Thread OTH
Hello,
Sorry for continuing this thread after such a long time.
I just wanted to check, whether streaming expressions / SQL are now working
in non-SolrCloud mode, in the latest Solr release?
Much thanks
Omer

On Thu, Mar 9, 2017 at 1:27 AM, Joel Bernstein  wrote:

> Getting streaming expression and SQL working in non-SolrCloud mode is my
> top priority right now.
>
> I'm testing the first parts of
> https://issues.apache.org/jira/browse/SOLR-10200 today and will be
> committing soon. The first functionality delivered will be the
> significantTerms Streaming Expression. Here is a sample query:
>
> expr=significantTerms(enron, q="from:tana.jo...@enron.com", field="to",
> limit="20")=http://localhost:8983/solr/enron
>
> Notice the enron.shards http param. This provides the shards for the
> "enron" collection.
>
> This will release as part of the first release of the significantTerms
> expression in Solr 6.5.
>
> Solr 6.6 will likely have support for all stream source and parallel
> SQL/JDBC.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Mar 8, 2017 at 2:19 PM, OTH  wrote:
>
> > Hello,
> >
> > Yes, I was trying to use it with a non-cloud setup.
> >
> > Basically, our application probably won't be requiring cloud features;
> > however, it would be extremely helpful to use JDBC with Solr.
> >
> > Of course, we don't mind using SolrCloud if that's what is needed for
> JDBC.
> >
> > Are there any drawbacks to using SolrCloud, if a distributed setup
> probably
> > won't be required?
> >
> > Much thanks
> >
> > On Thu, Mar 9, 2017 at 12:13 AM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > I believe JDBC requires streams, which requires SolrCloud, which
> > > requires Collections (even if it is a single-core collection).
> > >
> > > Are you trying to use it with non-cloud setup?
> > >
> > > Regards,
> > >Alex.
> > > 
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > > On 8 March 2017 at 14:02, OTH  wrote:
> > > > Hello,
> > > >
> > > > From the examples I am seeing online and in the reference guide (
> > > > https://cwiki.apache.org/confluence/display/solr/Solr+
> > > JDBC+-+SQuirreL+SQL),
> > > > I can only see Solr JDBC being used against a collection.  Is it
> > possible
> > > > however to use it with a core?  What should the JDBC URL be like in
> > that
> > > > case?
> > > >
> > > > Thanks
> > >
> >
>


Re: Strange Behavior When Extracting Features

2017-10-16 Thread Michael Alcorn
If anyone else is following this thread, I replied on the Jira.

On Mon, Oct 16, 2017 at 4:07 AM, alessandro.benedetti 
wrote:

> This is interesting, the EFI parameter resolution should work using the
> quotes independently of the query parser.
> At that point, the query parsers (both) receive a multi term text.
> Both of them should work the same.
> At the time I saw the mail I tried to reproduce it through the LTR module
> tests and I didn't succeed .
> It would be quite useful if you can contribute a test that is failing with
> the field query parser.
> Have you tried just with the same query, but in a request handler ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: is there a way to remove deleted documents from index without optimize

2017-10-16 Thread Shawn Heisey
On 10/12/2017 10:01 PM, Erick Erickson wrote:
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
>
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to 
>> your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion 
>> of 6something.
>> you update your index enough to rewrite all segments before moving to the 
>> _next_ version. When Lucene sees merges a segment, it writes the new segment 
>> according to the luceneMatchVersion in solrconfig.xml. So as long as you are 
>> on a version long enough for all segments to be merged into new segments, 
>> you don't have to worry.

As far as I am aware, luceneMatchVersion in Solr will not change the
segment format, but only how some Lucene components (primarily analysis)
function.  Have I got incorrect information?

Something else that might be worth mentioning:  The IndexUpgrader is an
fairly simple piece of code.  It runs forceMerge (optimize) on the
index, creating a single new segment from the entire existing index. 
That ties into this thread's initial subject and LUCENE-7976.  I wonder
if perhaps the upgrade merge policy should be changed so that it just
rewrites all existing segments instead of fully merging them.

Thanks,
Shawn



RE: solrcloud dead-lock

2017-10-16 Thread Younge, Kent A - Norman, OK - Contractor
Jack, 

No I still have the issue on one box only.  I have re-requested certificates 
several times and still come back with the same issue.  If I put a working 
certificate on the box everything works the way it should.  Also if I browse 
the https:  to the server name instead of the registered certificate name Solr 
comes up with a untrusted certificate showing that the site is registered to my 
certificate name.  So SOLR is working but, not with my certificates.   I have 
messed with the java security settings that did not help.  The box works like 
it should and for whatever, reason with that certificate it will not work.  I 
have changed the names of the certificate I had a hyphen in the name and 
thought that was causing an issue.  Took the hyphen out it made no difference.  
In IE I get the turn on TLS and even though it is set.  In Chrome I get 
ERR_SSL_Version or Cipher_MISMATCH.  






-Original Message-
From: SOLR6931 [mailto:solrpubl...@gmail.com] 
Sent: Monday, October 16, 2017 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: solrcloud dead-lock

Hey Kent,
Have you managed to find a solution to your problem?
I'm currently encountering the exact same issue.

Jack



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: solrcloud dead-lock

2017-10-16 Thread SOLR6931
Hey Kent,
Have you managed to find a solution to your problem?
I'm currently encountering the exact same issue.

Jack



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


unsubscribe please

2017-10-16 Thread Horace


The Westfield Leader and The Times
www.goleader.com



Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
It takes more time after I stopped the indexing.

The load firstly was with the first node and after I restarted the indexing
process the load with changed to the second node the first node worked
properly.

Thanks,
Mahmoud


On Mon, Oct 16, 2017 at 5:29 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Does the load stops when you stop indexing or it last for some more time?
> Is it always one node that behaves like this and it starts as soon as you
> start indexing? Is load different between nodes when you are doing lighter
> indexing?
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 16 Oct 2017, at 13:35, Mahmoud Almokadem 
> wrote:
> >
> > The transition of the load happened after I restarted the bulk insert
> > process.
> >
> > The size of the index on each server about 500GB.
> >
> > There are about 8 warnings on each server for "Not found segment file"
> like
> > that
> >
> > Error getting file length for [segments_2s4]
> >
> > java.nio.file.NoSuchFileException:
> > /media/ssd_losedata/solr-home/data/documents_online_shard16_
> replica_n1/data/index/segments_2s4
> > at
> > java.base/sun.nio.fs.UnixException.translateToIOException(
> UnixException.java:92)
> > at
> > java.base/sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:111)
> > at
> > java.base/sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:116)
> > at
> > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(
> UnixFileAttributeViews.java:55)
> > at
> > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(
> UnixFileSystemProvider.java:145)
> > at
> > java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(
> LinuxFileSystemProvider.java:99)
> > at java.base/java.nio.file.Files.readAttributes(Files.java:1755)
> > at java.base/java.nio.file.Files.size(Files.java:2369)
> > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
> > at
> > org.apache.lucene.store.NRTCachingDirectory.fileLength(
> NRTCachingDirectory.java:128)
> > at
> > org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(
> LukeRequestHandler.java:611)
> > at
> > org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(
> LukeRequestHandler.java:584)
> > at
> > org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(
> LukeRequestHandler.java:136)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:177)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2474)
> > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:378)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:322)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:335)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> > at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> > at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> 

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
Does the load stops when you stop indexing or it last for some more time? Is it 
always one node that behaves like this and it starts as soon as you start 
indexing? Is load different between nodes when you are doing lighter indexing?

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 13:35, Mahmoud Almokadem  wrote:
> 
> The transition of the load happened after I restarted the bulk insert
> process.
> 
> The size of the index on each server about 500GB.
> 
> There are about 8 warnings on each server for "Not found segment file" like
> that
> 
> Error getting file length for [segments_2s4]
> 
> java.nio.file.NoSuchFileException:
> /media/ssd_losedata/solr-home/data/documents_online_shard16_replica_n1/data/index/segments_2s4
> at
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:145)
> at
> java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.base/java.nio.file.Files.readAttributes(Files.java:1755)
> at java.base/java.nio.file.Files.size(Files.java:2369)
> at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
> at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:611)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:584)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:136)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2474)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:378)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:322)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.base/java.lang.Thread.run(Thread.java:844)
> 
> On Mon, Oct 16, 2017 at 1:08 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> I did not look at graph details - 

Re: Parallel SQL: GROUP BY throws exception

2017-10-16 Thread Joel Bernstein
Ok, I just the read the query again.

Try the failing query like this:

SELECT people_person_id, sum(amount) as total FROM donation GROUP BY
people_person_id

That is the correct syntax for the SQL group by aggregation.

It looks like you found a null pointer though where a proper error message
is needed.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein  wrote:

> Also what version are you using?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein 
> wrote:
>
>> Can you provide the stack trace?
>>
>> Are you in SolrCloud mode?
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov <
>> dgerasi...@kommunion.com> wrote:
>>
>>> Hi all!
>>>
>>> This query works as expected:
>>> SELECT sum(amount) as total FROM donation
>>>
>>> Adding GROUP BY:
>>> SELECT sum(amount) as total FROM donation GROUP BY people_person_id
>>>
>>> Now I get response:
>>> {
>>>   "result-set":{
>>> "docs":[{
>>> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount) as
>>> total  FROM donation GROUP BY people_person_id' against JDBC connection
>>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT sum(amount) as
>>> total  FROM donation GROUP BY people_person_id\": null",
>>> "EOF":true,
>>> "RESPONSE_TIME":279}]}
>>> }
>>>
>>> Any ideas on what is causing this? Or how to debug?
>>>
>>>
>>> Here is the collection structure:
>>>
>>> >> required="true"
>>> multiValued="false"/>
>>> >> required="true" multiValued="false" docValues="true"/>
>>> >> required="true" multiValued="false"/>
>>> >> multiValued="false" docValues="true"/>
>>>
>>>
>>> Thanks!
>>>
>>
>>
>


Re: CVE-2017-12629 which versions are vulnerable?

2017-10-16 Thread Uwe Reh

Sorry,

I missed the post from Florian Gleixner:
>Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)


Am 16.10.2017 um 16:52 schrieb Uwe Reh:

Hi,

I'm still using V4.10. Is this version also vulnerable by 
http://openwall.com/lists/oss-security/2017/10/13/1 ?


Uwe


CVE-2017-12629 which versions are vulnerable?

2017-10-16 Thread Uwe Reh

Hi,

I'm still using V4.10. Is this version also vulnerable by 
http://openwall.com/lists/oss-security/2017/10/13/1 ?


Uwe


Re: zero-day exploit security issue

2017-10-16 Thread Keith L
Additionally, it looks like the commits are public on github. Is this
backported to 5.5.x too? Users that are still on 5x might want to backport
some of the issues themselves since is not officially supported anymore.

On Mon, Oct 16, 2017 at 10:11 AM Mike Drob  wrote:

> Given that the already public nature of the disclosure, does it make sense
> to make the work being done public prior to release as well?
>
> Normally security fixes are kept private while the vulnerabilities are
> private, but that's not the case here...
>
> On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Yes, there is but it is private i.e. only the Apache Lucene PMC
> > members can see it. This is standard for all security issues in Apache
> > land. The fixes for this issue has been applied to the release
> > branches and the Solr 7.1.0 release candidate is already up for vote.
> > Barring any unforeseen circumstances, a 7.1.0 release with the fixes
> > should be expected this week.
> >
> > On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
> > > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
> > >
> > > https://lucene.apache.org/solr/news.html#12-october-
> > 2017-please-secure-your-apache-solr-servers-since-a-
> > zero-day-exploit-has-been-reported-on-a-public-mailing-list
> > >
> > > Sean
> > >
> > > Confidentiality Notice::  This email, including attachments, may
> include
> > non-public, proprietary, confidential or legally privileged information.
> > If you are not an intended recipient or an authorized agent of an
> intended
> > recipient, you are hereby notified that any dissemination, distribution
> or
> > copying of the information contained in or transmitted with this e-mail
> is
> > unauthorized and strictly prohibited.  If you have received this email in
> > error, please notify the sender by replying to this message and
> permanently
> > delete this e-mail, its attachments, and any copies of it immediately.
> You
> > should not retain, copy or use this e-mail or any attachment for any
> > purpose, nor disclose all or any part of the contents to any other
> person.
> > Thank you.
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>


Re: zero-day exploit security issue

2017-10-16 Thread Mike Drob
Given that the already public nature of the disclosure, does it make sense
to make the work being done public prior to release as well?

Normally security fixes are kept private while the vulnerabilities are
private, but that's not the case here...

On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes, there is but it is private i.e. only the Apache Lucene PMC
> members can see it. This is standard for all security issues in Apache
> land. The fixes for this issue has been applied to the release
> branches and the Solr 7.1.0 release candidate is already up for vote.
> Barring any unforeseen circumstances, a 7.1.0 release with the fixes
> should be expected this week.
>
> On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
> > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
> >
> > https://lucene.apache.org/solr/news.html#12-october-
> 2017-please-secure-your-apache-solr-servers-since-a-
> zero-day-exploit-has-been-reported-on-a-public-mailing-list
> >
> > Sean
> >
> > Confidentiality Notice::  This email, including attachments, may include
> non-public, proprietary, confidential or legally privileged information.
> If you are not an intended recipient or an authorized agent of an intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of the information contained in or transmitted with this e-mail is
> unauthorized and strictly prohibited.  If you have received this email in
> error, please notify the sender by replying to this message and permanently
> delete this e-mail, its attachments, and any copies of it immediately.  You
> should not retain, copy or use this e-mail or any attachment for any
> purpose, nor disclose all or any part of the contents to any other person.
> Thank you.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: SOLR cores are getting locked

2017-10-16 Thread Erick Erickson
bin/solr start -help

will give you a lot of info. But yes, the -s option is what you should
use. Here's one of my batch files I used to start various cloud
examples:

bin/solr start -c -z localhost:2181 -p 898 -s example/cloud/node1/solr

On Sun, Oct 15, 2017 at 11:48 PM, Gunalan V  wrote:
> Thanks Erick,
>
> I'm using the one VM where all SOLRCloud and Zookeeper nodes are running.
>
> I have two solr nodes in solrcloud. Just wanted to check do I need to
> create different solr home directory using -s param for each SOLR nodes ?
>
> If yes kindly share me some documentation to configure separate node
> directories.
>
>
> GVK
>
>
> On Thu, Oct 12, 2017 at 10:17 AM, Erick Erickson 
> wrote:
>
>> You might be hitting SOLR-11297, which is fixed in Solr 7.0.1. The
>> patch should back-port cleanly to 6x versions though.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 12, 2017 at 12:14 AM, Gunalan V  wrote:
>> > Hello,
>> >
>> > I'm using SOLR 6.5.1 and I have 2 SOLR nodes in SOLRCloud and created
>> > collection using the below [1] and it was created successfully during
>> > initial time but next day I tried to restart the nodes in SOLR cloud.
>> When
>> > I start the first node the collection health is active but when I start
>> the
>> > second node the collection is became down and could see the locks in the
>> > logs [2].
>> >
>> > Also I have the set the solr home in zookeeper using the command [3].
>> >
>> > Did anyone came across this issue? If so please let me know how to fix
>> it.
>> >
>> >
>> > [1]
>> > http://localhost:8983/solr/admin/collections?action=
>> CREATE=testcollection=2=
>> 2=2=testconfigs
>> >
>> >
>> > [2]  Caused by: org.apache.lucene.store.LockObtainFailedException: Index
>> > dir
>> > '/data01/solr/solr-6.5.1/server/solr/testcollection_
>> shard1_replica2/data/index/'
>> > of core 'testcollection_shard1_replica2' is already locked. The most
>> likely
>> > cause is another Solr server (or another solr core in this server) also
>> > configured to use this directory; other possible causes may be specific
>> to
>> > lockType: native
>> > at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:713)
>> >
>> >
>> > [3]  ./solr zk cp file:/data01/solr/solr-6.5.1/server/solr/solr.xml
>> > zk:/solr.xml -z 10.120.166.12:2181,10.120.166.12:2182,10.120.166.12:2183
>> >
>> >
>> >
>> > Thanks,
>> > GVK
>>


Re: Parallel SQL: GROUP BY throws exception

2017-10-16 Thread Joel Bernstein
Also what version are you using?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein  wrote:

> Can you provide the stack trace?
>
> Are you in SolrCloud mode?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov <
> dgerasi...@kommunion.com> wrote:
>
>> Hi all!
>>
>> This query works as expected:
>> SELECT sum(amount) as total FROM donation
>>
>> Adding GROUP BY:
>> SELECT sum(amount) as total FROM donation GROUP BY people_person_id
>>
>> Now I get response:
>> {
>>   "result-set":{
>> "docs":[{
>> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount) as
>> total  FROM donation GROUP BY people_person_id' against JDBC connection
>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT sum(amount) as
>> total  FROM donation GROUP BY people_person_id\": null",
>> "EOF":true,
>> "RESPONSE_TIME":279}]}
>> }
>>
>> Any ideas on what is causing this? Or how to debug?
>>
>>
>> Here is the collection structure:
>>
>> > required="true"
>> multiValued="false"/>
>> > required="true" multiValued="false" docValues="true"/>
>> > required="true" multiValued="false"/>
>> > multiValued="false" docValues="true"/>
>>
>>
>> Thanks!
>>
>
>


Re: Parallel SQL: GROUP BY throws exception

2017-10-16 Thread Joel Bernstein
Can you provide the stack trace?

Are you in SolrCloud mode?



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov 
wrote:

> Hi all!
>
> This query works as expected:
> SELECT sum(amount) as total FROM donation
>
> Adding GROUP BY:
> SELECT sum(amount) as total FROM donation GROUP BY people_person_id
>
> Now I get response:
> {
>   "result-set":{
> "docs":[{
> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount) as
> total  FROM donation GROUP BY people_person_id' against JDBC connection
> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT sum(amount) as
> total  FROM donation GROUP BY people_person_id\": null",
> "EOF":true,
> "RESPONSE_TIME":279}]}
> }
>
> Any ideas on what is causing this? Or how to debug?
>
>
> Here is the collection structure:
>
>  multiValued="false"/>
>  required="true" multiValued="false" docValues="true"/>
>  required="true" multiValued="false"/>
>  multiValued="false" docValues="true"/>
>
>
> Thanks!
>


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Richard
The list help/unsubscribe/post/etc. details are, as is not uncommon,
in the message header:

  List-Help: 
  List-Unsubscribe: 
  List-Post: 

of all messages posted to the list.


 Original Message 
> Date: Monday, October 16, 2017 09:16:08 -0400
> From: Gus Heck 
> To: solr-user@lucene.apache.org
> Subject: Re: HOW DO I UNSUBSCRIBE FROM GROUP?
>
> While this has been the traditional response, and it's accurate and
> helpful, the user that complained about no unsubscribe link has a
> point. This is the normal expectation in this day and age. Maybe
> Apache should consider appending a "You are receiving this because
> you are subscribed to (list) click here to unsubscribe" line, but I
> know that if I hadn't been dealing with various apache mailing
> lists on and off for 15 years and I found that I was getting emails
> with no unsubscribe links in undesired quantities, the spam bucket
> would be my answer (probably never send the email asking for how to
> unsubscribe). That's certainly the policy I use for any marketing
> type mails (no unsubscribe == spam bucket)... A simple unsubscribe
> tagline could help us not get tagged as spam, and avoid this type
> of email (which has been a regular occurrence for 15 years)
> 
> -Gus
> 
> Hi,
>> 
>> If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by
>> following the instructions on the
>> http://lucene.apache.org/solr/community.html. Hope this
>> helps.
>> 
>> 

 End Original Message 




Parallel SQL: GROUP BY throws exception

2017-10-16 Thread Dmitry Gerasimov
Hi all!

This query works as expected:
SELECT sum(amount) as total FROM donation

Adding GROUP BY:
SELECT sum(amount) as total FROM donation GROUP BY people_person_id

Now I get response:
{
  "result-set":{
"docs":[{
"EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount) as
total  FROM donation GROUP BY people_person_id' against JDBC connection
'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT sum(amount) as
total  FROM donation GROUP BY people_person_id\": null",
"EOF":true,
"RESPONSE_TIME":279}]}
}

Any ideas on what is causing this? Or how to debug?


Here is the collection structure:







Thanks!


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Gus Heck
While this has been the traditional response, and it's accurate and
helpful, the user that complained about no unsubscribe link has a point.
This is the normal expectation in this day and age. Maybe Apache should
consider appending a "You are receiving this because you are subscribed to
(list) click here to unsubscribe" line, but I know that if I hadn't been
dealing with various apache mailing lists on and off for 15 years and I
found that I was getting emails with no unsubscribe links in undesired
quantities, the spam bucket would be my answer (probably never send the
email asking for how to unsubscribe). That's certainly the policy I use for
any marketing type mails (no unsubscribe == spam bucket)... A simple
unsubscribe tagline could help us not get tagged as spam, and avoid this
type of email (which has been a regular occurrence for 15 years)

-Gus

Hi,
>
> If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by following the
> instructions on the http://lucene.apache.org/solr/community.html. Hope
> this
> helps.
>
>


Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
The transition of the load happened after I restarted the bulk insert
process.

The size of the index on each server about 500GB.

There are about 8 warnings on each server for "Not found segment file" like
that

Error getting file length for [segments_2s4]

java.nio.file.NoSuchFileException:
/media/ssd_losedata/solr-home/data/documents_online_shard16_replica_n1/data/index/segments_2s4
at
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:145)
at
java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.base/java.nio.file.Files.readAttributes(Files.java:1755)
at java.base/java.nio.file.Files.size(Files.java:2369)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:611)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:584)
at
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:136)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2474)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:378)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:322)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.base/java.lang.Thread.run(Thread.java:844)

On Mon, Oct 16, 2017 at 1:08 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> I did not look at graph details - now I see that it is over 3h time span.
> It seems that there was a load on the other server before this one and
> ended with 14GB read spike and 10GB write spike, just before load started
> on this server. Do you see any errors or suspicious logs lines?
> How big is your index?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 16 Oct 2017, at 12:39, Mahmoud Almokadem 
> wrote:
> >
> > Yes, it's constantly since I started this bulk indexing process.
> > As you see the write operations on the loaded server 

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
I did not look at graph details - now I see that it is over 3h time span. It 
seems that there was a load on the other server before this one and ended with 
14GB read spike and 10GB write spike, just before load started on this server. 
Do you see any errors or suspicious logs lines?
How big is your index?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 12:39, Mahmoud Almokadem  wrote:
> 
> Yes, it's constantly since I started this bulk indexing process.
> As you see the write operations on the loaded server are 3x the normal
> server despite Disk writes not 3x times.
> 
> Mahmoud
> 
> 
> On Mon, Oct 16, 2017 at 12:32 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Mahmoud,
>> Is this something that you see constantly? Network charts suggests that
>> your servers are loaded equally and as you said - you are not using routing
>> so expected. Disk read/write and CPU are not equal and it is expected to
>> not be equal during heavy indexing since it also triggers segment merges
>> which require those resources. Even if host same documents (e.g. leader and
>> replica) merges are not likely to happen at the same time and you can
>> expect to see such cases.
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 16 Oct 2017, at 11:58, Mahmoud Almokadem 
>> wrote:
>>> 
>>> Here are the screen shots for the two server metrics on Amazon
>>> 
>>> https://ibb.co/kxBQam
>>> https://ibb.co/fn0Jvm
>>> https://ibb.co/kUpYT6
>>> 
>>> 
>>> 
>>> On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem <
>> prog.mahm...@gmail.com>
>>> wrote:
>>> 
 Hi Emir,
 
 We doesn't use routing.
 
 Servers is already balanced and the number of documents on each shard
>> are
 approximately the same.
 
 Nothing running on the servers except Solr and ZooKeeper.
 
 I initialized the client as
 
 String zkHost = "192.168.1.89:2181,192.168.1.99:2181";
 
 CloudSolrClient solrCloud = new CloudSolrClient.Builder()
   .withZkHost(zkHost)
   .build();
 
   solrCloud.setIdField("document_id");
   solrCloud.setDefaultCollection(collection);
   solrCloud.setRequestWriter(new BinaryRequestWriter());
 
 
 And the documents are approximately the same size.
 
 I Used 10 threads with 10 SolrClients to send data to solr and every
 thread send a batch of 1000 documents every time.
 
 Thanks,
 Mahmoud
 
 
 
 On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
 emir.arnauto...@sematext.com> wrote:
 
> Hi Mahmoud,
> Do you use routing? Are your servers equally balanced - do you end up
> having approximately the same number of documents hosted on both
>> servers
> (counted all shards)?
> Do you have anything else running on those servers?
> How do you initialise your SolrJ client?
> Are documents of similar size?
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
> 
> 
> 
>> On 16 Oct 2017, at 10:46, Mahmoud Almokadem 
> wrote:
>> 
>> We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
>> 
>> The configurations and the specs of the two servers are identical.
>> 
>> When running bulk indexing using SolrJ we see one of the servers is
> fully
>> loaded as you see on the images and the other is normal.
>> 
>> Images URLs:
>> 
>> https://ibb.co/jkE6gR
>> https://ibb.co/hyzvam
>> https://ibb.co/mUpvam
>> https://ibb.co/e4bxo6
>> 
>> How can I figure this issue?
>> 
>> Thanks,
>> Mahmoud
> 
> 
 
>> 
>> 



Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
Yes, it's constantly since I started this bulk indexing process.
As you see the write operations on the loaded server are 3x the normal
server despite Disk writes not 3x times.

Mahmoud


On Mon, Oct 16, 2017 at 12:32 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Mahmoud,
> Is this something that you see constantly? Network charts suggests that
> your servers are loaded equally and as you said - you are not using routing
> so expected. Disk read/write and CPU are not equal and it is expected to
> not be equal during heavy indexing since it also triggers segment merges
> which require those resources. Even if host same documents (e.g. leader and
> replica) merges are not likely to happen at the same time and you can
> expect to see such cases.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 16 Oct 2017, at 11:58, Mahmoud Almokadem 
> wrote:
> >
> > Here are the screen shots for the two server metrics on Amazon
> >
> > https://ibb.co/kxBQam
> > https://ibb.co/fn0Jvm
> > https://ibb.co/kUpYT6
> >
> >
> >
> > On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem <
> prog.mahm...@gmail.com>
> > wrote:
> >
> >> Hi Emir,
> >>
> >> We doesn't use routing.
> >>
> >> Servers is already balanced and the number of documents on each shard
> are
> >> approximately the same.
> >>
> >> Nothing running on the servers except Solr and ZooKeeper.
> >>
> >> I initialized the client as
> >>
> >> String zkHost = "192.168.1.89:2181,192.168.1.99:2181";
> >>
> >> CloudSolrClient solrCloud = new CloudSolrClient.Builder()
> >>.withZkHost(zkHost)
> >>.build();
> >>
> >>solrCloud.setIdField("document_id");
> >>solrCloud.setDefaultCollection(collection);
> >>solrCloud.setRequestWriter(new BinaryRequestWriter());
> >>
> >>
> >> And the documents are approximately the same size.
> >>
> >> I Used 10 threads with 10 SolrClients to send data to solr and every
> >> thread send a batch of 1000 documents every time.
> >>
> >> Thanks,
> >> Mahmoud
> >>
> >>
> >>
> >> On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Mahmoud,
> >>> Do you use routing? Are your servers equally balanced - do you end up
> >>> having approximately the same number of documents hosted on both
> servers
> >>> (counted all shards)?
> >>> Do you have anything else running on those servers?
> >>> How do you initialise your SolrJ client?
> >>> Are documents of similar size?
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 16 Oct 2017, at 10:46, Mahmoud Almokadem 
> >>> wrote:
> 
>  We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
> 
>  The configurations and the specs of the two servers are identical.
> 
>  When running bulk indexing using SolrJ we see one of the servers is
> >>> fully
>  loaded as you see on the images and the other is normal.
> 
>  Images URLs:
> 
>  https://ibb.co/jkE6gR
>  https://ibb.co/hyzvam
>  https://ibb.co/mUpvam
>  https://ibb.co/e4bxo6
> 
>  How can I figure this issue?
> 
>  Thanks,
>  Mahmoud
> >>>
> >>>
> >>
>
>


Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
Hi Mahmoud,
Is this something that you see constantly? Network charts suggests that your 
servers are loaded equally and as you said - you are not using routing so 
expected. Disk read/write and CPU are not equal and it is expected to not be 
equal during heavy indexing since it also triggers segment merges which require 
those resources. Even if host same documents (e.g. leader and replica) merges 
are not likely to happen at the same time and you can expect to see such cases. 

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 11:58, Mahmoud Almokadem  wrote:
> 
> Here are the screen shots for the two server metrics on Amazon
> 
> https://ibb.co/kxBQam
> https://ibb.co/fn0Jvm
> https://ibb.co/kUpYT6
> 
> 
> 
> On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem 
> wrote:
> 
>> Hi Emir,
>> 
>> We doesn't use routing.
>> 
>> Servers is already balanced and the number of documents on each shard are
>> approximately the same.
>> 
>> Nothing running on the servers except Solr and ZooKeeper.
>> 
>> I initialized the client as
>> 
>> String zkHost = "192.168.1.89:2181,192.168.1.99:2181";
>> 
>> CloudSolrClient solrCloud = new CloudSolrClient.Builder()
>>.withZkHost(zkHost)
>>.build();
>> 
>>solrCloud.setIdField("document_id");
>>solrCloud.setDefaultCollection(collection);
>>solrCloud.setRequestWriter(new BinaryRequestWriter());
>> 
>> 
>> And the documents are approximately the same size.
>> 
>> I Used 10 threads with 10 SolrClients to send data to solr and every
>> thread send a batch of 1000 documents every time.
>> 
>> Thanks,
>> Mahmoud
>> 
>> 
>> 
>> On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>> 
>>> Hi Mahmoud,
>>> Do you use routing? Are your servers equally balanced - do you end up
>>> having approximately the same number of documents hosted on both servers
>>> (counted all shards)?
>>> Do you have anything else running on those servers?
>>> How do you initialise your SolrJ client?
>>> Are documents of similar size?
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 16 Oct 2017, at 10:46, Mahmoud Almokadem 
>>> wrote:
 
 We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
 
 The configurations and the specs of the two servers are identical.
 
 When running bulk indexing using SolrJ we see one of the servers is
>>> fully
 loaded as you see on the images and the other is normal.
 
 Images URLs:
 
 https://ibb.co/jkE6gR
 https://ibb.co/hyzvam
 https://ibb.co/mUpvam
 https://ibb.co/e4bxo6
 
 How can I figure this issue?
 
 Thanks,
 Mahmoud
>>> 
>>> 
>> 



Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-16 Thread alessandro.benedetti
I was having the discussion with a colleague of mine recently, about
E-commerce search.
Of course there are tons of things you can do to improve relevancy:
Custom similarity - edismax tuning - basic user events processing - machine
learning integrations - semantic search ect ect

more you do, better the results will potentially be, basically it is an
ocean to explore.
To avoid going off topic and being pertinent to your initial request, let's
take a look to the custom similarity problem.

In e-commerce, and generally in proper nouns searches TF is not relevant.
IDF can help, but we need to focus on what IDF is used for in general, in
lucene search :
Mostly IDF is a measure of "how much this term is important in the user
query".
Basically Lucene ( and in general TF/IDF based Information Retrieval systems
) assume that more a term is rare in the corpus, more likely it is that it
is important for the search query.
That is not always true in e-commerce :
"iphone cover" means the user is looking for a cover, which is good for
his/her phone.
iphone is rare. Cover is not. IDF will recognize "Iphone" to be the most
pertinent term to the user intent.
There's a lot to talk in here, let's stop :)

Anyway as a conclusion, go step by step, custom similarity + edismax
optimised with proper phrase and shingle boosts should be a good start.
Tie-breaking for e-commerce is likely to be ok, set to the default.
But to discover that I would recommend to set up a relevancy measuring
framework with golden queries and users feedback.

cheers





-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
Here are the screen shots for the two server metrics on Amazon

https://ibb.co/kxBQam
https://ibb.co/fn0Jvm
https://ibb.co/kUpYT6



On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem 
wrote:

> Hi Emir,
>
> We doesn't use routing.
>
> Servers is already balanced and the number of documents on each shard are
> approximately the same.
>
> Nothing running on the servers except Solr and ZooKeeper.
>
> I initialized the client as
>
> String zkHost = "192.168.1.89:2181,192.168.1.99:2181";
>
> CloudSolrClient solrCloud = new CloudSolrClient.Builder()
> .withZkHost(zkHost)
> .build();
>
> solrCloud.setIdField("document_id");
> solrCloud.setDefaultCollection(collection);
> solrCloud.setRequestWriter(new BinaryRequestWriter());
>
>
> And the documents are approximately the same size.
>
> I Used 10 threads with 10 SolrClients to send data to solr and every
> thread send a batch of 1000 documents every time.
>
> Thanks,
> Mahmoud
>
>
>
> On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Mahmoud,
>> Do you use routing? Are your servers equally balanced - do you end up
>> having approximately the same number of documents hosted on both servers
>> (counted all shards)?
>> Do you have anything else running on those servers?
>> How do you initialise your SolrJ client?
>> Are documents of similar size?
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 16 Oct 2017, at 10:46, Mahmoud Almokadem 
>> wrote:
>> >
>> > We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
>> >
>> > The configurations and the specs of the two servers are identical.
>> >
>> > When running bulk indexing using SolrJ we see one of the servers is
>> fully
>> > loaded as you see on the images and the other is normal.
>> >
>> > Images URLs:
>> >
>> > https://ibb.co/jkE6gR
>> > https://ibb.co/hyzvam
>> > https://ibb.co/mUpvam
>> > https://ibb.co/e4bxo6
>> >
>> > How can I figure this issue?
>> >
>> > Thanks,
>> > Mahmoud
>>
>>
>


Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
Hi Emir,

We doesn't use routing.

Servers is already balanced and the number of documents on each shard are
approximately the same.

Nothing running on the servers except Solr and ZooKeeper.

I initialized the client as

String zkHost = "192.168.1.89:2181,192.168.1.99:2181";

CloudSolrClient solrCloud = new CloudSolrClient.Builder()
.withZkHost(zkHost)
.build();

solrCloud.setIdField("document_id");
solrCloud.setDefaultCollection(collection);
solrCloud.setRequestWriter(new BinaryRequestWriter());


And the documents are approximately the same size.

I Used 10 threads with 10 SolrClients to send data to solr and every thread
send a batch of 1000 documents every time.

Thanks,
Mahmoud



On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Mahmoud,
> Do you use routing? Are your servers equally balanced - do you end up
> having approximately the same number of documents hosted on both servers
> (counted all shards)?
> Do you have anything else running on those servers?
> How do you initialise your SolrJ client?
> Are documents of similar size?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 16 Oct 2017, at 10:46, Mahmoud Almokadem 
> wrote:
> >
> > We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
> >
> > The configurations and the specs of the two servers are identical.
> >
> > When running bulk indexing using SolrJ we see one of the servers is fully
> > loaded as you see on the images and the other is normal.
> >
> > Images URLs:
> >
> > https://ibb.co/jkE6gR
> > https://ibb.co/hyzvam
> > https://ibb.co/mUpvam
> > https://ibb.co/e4bxo6
> >
> > How can I figure this issue?
> >
> > Thanks,
> > Mahmoud
>
>


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread alessandro.benedetti
The Terms component[1] should do the trick for you.
Just use the regular expression or prefix filtering and you should be able
to get the stats you want.

If you were interested in extracting the DV when returning docs you may be
interested in function queries and specifically this one :

docfreq(field,val)
"Returns the number of documents that contain the term in the field. This is
a constant (the same value for all documents in the index).

You can quote the term if it’s more complex, or do parameter substitution
for the term value.
docfreq(text,'solr')"

…​=func =docfreq(text,$myterm)=solr



[1] https://lucene.apache.org/solr/guide/6_6/the-terms-component.html



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Strange Behavior When Extracting Features

2017-10-16 Thread alessandro.benedetti
This is interesting, the EFI parameter resolution should work using the
quotes independently of the query parser.
At that point, the query parsers (both) receive a multi term text.
Both of them should work the same.
At the time I saw the mail I tried to reproduce it through the LTR module
tests and I didn't succeed .
It would be quite useful if you can contribute a test that is failing with
the field query parser.
Have you tried just with the same query, but in a request handler ?



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: spell-check does not return collations when using search query with filter

2017-10-16 Thread alessandro.benedetti
Interesting, what happens when you pass it as spellcheck.q=polt ?
What is the behavior you get ?





-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
Hi Mahmoud,
Do you use routing? Are your servers equally balanced - do you end up having 
approximately the same number of documents hosted on both servers (counted all 
shards)?
Do you have anything else running on those servers?
How do you initialise your SolrJ client?
Are documents of similar size?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 10:46, Mahmoud Almokadem  wrote:
> 
> We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.
> 
> The configurations and the specs of the two servers are identical.
> 
> When running bulk indexing using SolrJ we see one of the servers is fully
> loaded as you see on the images and the other is normal.
> 
> Images URLs:
> 
> https://ibb.co/jkE6gR
> https://ibb.co/hyzvam
> https://ibb.co/mUpvam
> https://ibb.co/e4bxo6
> 
> How can I figure this issue?
> 
> Thanks,
> Mahmoud



Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-16 Thread Emir Arnautović
Hi Vincenzo,
Unless you have really specific ranking requirements, I would not suggest you 
to start with you proprietary similarity implementation. In most cases edismax 
will be good enough to cover your requirements. It is not easy task to tune 
edismax since it has a log knobs that you can use.
In general there are two approaches that you can use: Create a golden set of 
query-results pairs and use it with some metric (e.g. you can start with simple 
F-measure) and tune parameters to maximize metric. The alternative approach 
(complements the first one) is to let user use your search, track clicks and 
monitor search metrics like mean reciprocal rank, zero result queries, page 
depth etc. and tune queries to get better results. If you can do A/B testing, 
you can use that as well to see which changes are better.
In most cases, this is iterative process and you should not expect to get it 
right the first time and that you will be able to tune it to cover all cases.

Good luck!

HTH,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 10:30, Vincenzo D'Amore  wrote:
> 
> Hi all,
> 
> I'm trying to figure out how to tune Solr for an e-commerce search.
> 
> I want to share with you what I did in the hope to understand if I was
> right and, if there, I could also improve my configuration.
> 
> I also read that the boolean model has to be preferred in this case.
> 
> https://nlp.stanford.edu/IR-book/html/htmledition/the-extended-boolean-model-versus-ranked-retrieval-1.html
> 
> 
> So, I first wrote my own implementation of DefaultSimilarity returning
> constantly 1.0 for TF and IDF.
> 
> Now I'm struggling to understand how to configure tie-break parameter, my
> opinion was to configure it to 0.1 or 0.0, thats because, if I understood
> well, in this way the boolean model should be preferred, that's because
> only the maximum scoring subquery contributes to final score.
> 
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thetie_TieBreaker_Parameter
> 
> 
> Not sure if this could be enough or if you need more information, thanks in
> advance for anyone would add a bit in this discussion.
> 
> Best regards,
> Vincenzo
> 
> -- 
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251 <349%20851%203251>



Unbalanced CPU no SolrCloud

2017-10-16 Thread Mahmoud Almokadem
We've installed SolrCloud 7.0.1 with two nodes and 8 shards per node.

The configurations and the specs of the two servers are identical.

When running bulk indexing using SolrJ we see one of the servers is fully
loaded as you see on the images and the other is normal.

Images URLs:

https://ibb.co/jkE6gR
https://ibb.co/hyzvam
https://ibb.co/mUpvam
https://ibb.co/e4bxo6

How can I figure this issue?

Thanks,
Mahmoud


E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-16 Thread Vincenzo D'Amore
Hi all,

I'm trying to figure out how to tune Solr for an e-commerce search.

I want to share with you what I did in the hope to understand if I was
right and, if there, I could also improve my configuration.

I also read that the boolean model has to be preferred in this case.

https://nlp.stanford.edu/IR-book/html/htmledition/the-extended-boolean-model-versus-ranked-retrieval-1.html


So, I first wrote my own implementation of DefaultSimilarity returning
constantly 1.0 for TF and IDF.

Now I'm struggling to understand how to configure tie-break parameter, my
opinion was to configure it to 0.1 or 0.0, thats because, if I understood
well, in this way the boolean model should be preferred, that's because
only the maximum scoring subquery contributes to final score.

https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thetie_TieBreaker_Parameter


Not sure if this could be enough or if you need more information, thanks in
advance for anyone would add a bit in this discussion.

Best regards,
Vincenzo

-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251 <349%20851%203251>


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Amrit Sarkar
Hi,

If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by following the
instructions on the http://lucene.apache.org/solr/community.html. Hope this
helps.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Oct 16, 2017 at 9:56 AM,  wrote:

>
> Hi,
>
> Just wondering how do I 'unsubscribe' from the emails I'm receiving from
> the
> group?
>
> I'm getting way more emails than I need right now and would like them to
> 'stop'... But there is NO UNSUBSCRIBE link in any of the emails.
>
> Thanks,
> Rita
>
> -Original Message-
> From: Reth RM [mailto:reth.ik...@gmail.com]
> Sent: Sunday, October 15, 2017 10:57 PM
> To: solr-user@lucene.apache.org
> Subject: Efficient query to obtain DF
>
> Dear Solr-User Group,
>
>Can you please suggest efficient query for retrieving term to document
> frequency(df) of that term at shard index level?
>
> I know we can get term to df mapping by applying termVectors component
>  vector-component.html#The
> TermVectorComponent-RequestParameters>,
> however, results returned by this component are each doc to term and its
> df. I was looking for straight forward flat list of terms-df mapping,
> similar to how terms component returns term-tf (term frequency) map list.
>
> Thank you.
>
>


Re: SOLR cores are getting locked

2017-10-16 Thread Gunalan V
Thanks Erick,

I'm using the one VM where all SOLRCloud and Zookeeper nodes are running.

I have two solr nodes in solrcloud. Just wanted to check do I need to
create different solr home directory using -s param for each SOLR nodes ?

If yes kindly share me some documentation to configure separate node
directories.


GVK


On Thu, Oct 12, 2017 at 10:17 AM, Erick Erickson 
wrote:

> You might be hitting SOLR-11297, which is fixed in Solr 7.0.1. The
> patch should back-port cleanly to 6x versions though.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 12:14 AM, Gunalan V  wrote:
> > Hello,
> >
> > I'm using SOLR 6.5.1 and I have 2 SOLR nodes in SOLRCloud and created
> > collection using the below [1] and it was created successfully during
> > initial time but next day I tried to restart the nodes in SOLR cloud.
> When
> > I start the first node the collection health is active but when I start
> the
> > second node the collection is became down and could see the locks in the
> > logs [2].
> >
> > Also I have the set the solr home in zookeeper using the command [3].
> >
> > Did anyone came across this issue? If so please let me know how to fix
> it.
> >
> >
> > [1]
> > http://localhost:8983/solr/admin/collections?action=
> CREATE=testcollection=2=
> 2=2=testconfigs
> >
> >
> > [2]  Caused by: org.apache.lucene.store.LockObtainFailedException: Index
> > dir
> > '/data01/solr/solr-6.5.1/server/solr/testcollection_
> shard1_replica2/data/index/'
> > of core 'testcollection_shard1_replica2' is already locked. The most
> likely
> > cause is another Solr server (or another solr core in this server) also
> > configured to use this directory; other possible causes may be specific
> to
> > lockType: native
> > at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:713)
> >
> >
> > [3]  ./solr zk cp file:/data01/solr/solr-6.5.1/server/solr/solr.xml
> > zk:/solr.xml -z 10.120.166.12:2181,10.120.166.12:2182,10.120.166.12:2183
> >
> >
> >
> > Thanks,
> > GVK
>


Re: zero-day exploit security issue

2017-10-16 Thread Shalin Shekhar Mangar
Yes, there is but it is private i.e. only the Apache Lucene PMC
members can see it. This is standard for all security issues in Apache
land. The fixes for this issue has been applied to the release
branches and the Solr 7.1.0 release candidate is already up for vote.
Barring any unforeseen circumstances, a 7.1.0 release with the fixes
should be expected this week.

On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
> Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
>
> https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list
>
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include 
> non-public, proprietary, confidential or legally privileged information.  If 
> you are not an intended recipient or an authorized agent of an intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of the information contained in or transmitted with this e-mail is 
> unauthorized and strictly prohibited.  If you have received this email in 
> error, please notify the sender by replying to this message and permanently 
> delete this e-mail, its attachments, and any copies of it immediately.  You 
> should not retain, copy or use this e-mail or any attachment for any purpose, 
> nor disclose all or any part of the contents to any other person. Thank you.



-- 
Regards,
Shalin Shekhar Mangar.