date:20180704

Re: Creating single CloudSolrClient object which can be used throughout the application

2018-07-04 Thread Ritesh Kumar

Hello Shawn,

I wasn't explicitly closing the client object but I fetched the client
object inside the try block and this seems to automatically destroy the
client object.

Taking it out of the try block worked like magic.

Problem solved!

Best

On Wed, Jul 4, 2018 at 10:40 PM Shawn Heisey  wrote:

> On 7/4/2018 2:41 AM, Ritesh Kumar wrote:
> > I did exactly as you told, created a public static synchronized method.
> The
> > problem still exists.
>
> I wasn't addressing the connection problem.  I was addressing the
> question in the subject -- one client object that you can use
> everywhere.  But I think the problem you're having now is different than
> the initial problem, based on the error message:
>
> > Maybe returning the client object if it is not null is causing
> > " java.lang.IllegalStateException: Connection pool shut down" error. It
> > does run fine for just one time.
>
> This means that you are still calling client.close() after you use the
> client.  That must be removed.  The entire point of the close() call is
> to release the internal objects to garbage and make the client unusable.
>
> Thanks,
> Shawn
>
>

Re: AddReplica to shard with lowest node count

2018-07-04 Thread Shalin Shekhar Mangar

The rule based replica placement was deprecated. The autoscaling APIs are
the way to go. Please see
http://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling.html

Your use-case is interesting. By default, the trigger for nodeAdded event
will move replicas from the most loaded nodes to the new node. That does
not take care of your use-case. Can you please open a Jira to add this
feature?

On Thu, Jul 5, 2018 at 6:45 AM Gus Heck  wrote:

> Perhaps the rule based replica placement stuff would do the trick?
>
> https://lucene.apache.org/solr/guide/7_3/rule-based-replica-placement.html
>
> I haven't used it myself but I've seen lots of work going into it lately...
>
> On Wed, Jul 4, 2018 at 12:35 PM, Duncan, Adam 
> wrote:
>
> > Hi all,
> >
> > Our team use Solrcloud for Solr 5.1 and are investigating an upgrade to
> 7.3
> > Currently we have a working scale-up approach for adding a new server to
> > the cluster beyond the initial collection creation.
> > We’ve automated the install of Solr on new servers and, following that,
> we
> > register the new instance with zookeeper so that the server will be
> > included in the list of live nodes.
> > Finally we use the CoreAdmin API ‘Create’ command to associate the new
> > node with our collection. Solr 5.1's CoreAdmin Create command would
> > conveniently auto-assign the new node to the shard with the least nodes.
> >
> > In Solr 7.3, the CoreAdmin API documentation warns us not to use the
> > Create command with SolrCloud.
> > We tried 7.3’s CoreAdmin API Create command regardless and,
> > unsurprisingly, it did not work.
> > The 7.3 documentation suggests we use the Collections API AddReplica
> > command.The problem with AddReplica is that it expects us to specify the
> > shard name.
> > This is unfortunate as it makes it hard for us to keep shards balanced.
> It
> > puts the onus on us to work out the least populated shard via a call to
> the
> > cluster status endpoint.
> > With that we now face the problem managing this correctly when scaling up
> > multiple servers at once.
> >
> > Are we missing something here? Is there really no way for a node to be
> > auto-assigned to a shard in 7.3?
> > And if so, are there any recommendations for an approach to reliably
> doing
> > this ourselves?
> >
> > Thanks!
> > Adam
> >
>
>
>
> --
> http://www.the111shift.com
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: AddReplica to shard with lowest node count

2018-07-04 Thread Gus Heck

Perhaps the rule based replica placement stuff would do the trick?

https://lucene.apache.org/solr/guide/7_3/rule-based-replica-placement.html

I haven't used it myself but I've seen lots of work going into it lately...

On Wed, Jul 4, 2018 at 12:35 PM, Duncan, Adam 
wrote:

> Hi all,
>
> Our team use Solrcloud for Solr 5.1 and are investigating an upgrade to 7.3
> Currently we have a working scale-up approach for adding a new server to
> the cluster beyond the initial collection creation.
> We’ve automated the install of Solr on new servers and, following that, we
> register the new instance with zookeeper so that the server will be
> included in the list of live nodes.
> Finally we use the CoreAdmin API ‘Create’ command to associate the new
> node with our collection. Solr 5.1's CoreAdmin Create command would
> conveniently auto-assign the new node to the shard with the least nodes.
>
> In Solr 7.3, the CoreAdmin API documentation warns us not to use the
> Create command with SolrCloud.
> We tried 7.3’s CoreAdmin API Create command regardless and,
> unsurprisingly, it did not work.
> The 7.3 documentation suggests we use the Collections API AddReplica
> command.The problem with AddReplica is that it expects us to specify the
> shard name.
> This is unfortunate as it makes it hard for us to keep shards balanced. It
> puts the onus on us to work out the least populated shard via a call to the
> cluster status endpoint.
> With that we now face the problem managing this correctly when scaling up
> multiple servers at once.
>
> Are we missing something here? Is there really no way for a node to be
> auto-assigned to a shard in 7.3?
> And if so, are there any recommendations for an approach to reliably doing
> this ourselves?
>
> Thanks!
> Adam
>



-- 
http://www.the111shift.com

Re: Solr - zoo with more than 1000 collections

2018-07-04 Thread Gus Heck

Hi Bertrand,

Are you by any chance using the new Time Routed Aliases feature? You didn't
mention it so I suspect not, but you might want to look...  It's still
pretty new, but it would be interesting to get your feedback on it if it
looks like it would help. I'm wondering how you get to that many
collections, and if any of those collections are old data that doesn't need
to be queried anymore? If so, TRA's can clean up collections with old data
automatically... (see router.autoDeleteAge) That would possibly put an
upper bound on the number of collections you have to handle, and allow the
performance to be stable indefinitely once things are sized correctly
(assuming a steady rate of new data, and a steady query rate/complexity).

https://lucene.apache.org/solr/guide/7_3/collections-api.html#createalias

More some improvements and a dedicated section in the docs were added in
7.4

https://lucene.apache.org/solr/guide/7_4/collections-api.html#createalias

-Gus

On Fri, Jun 29, 2018 at 12:49 PM, Yago Riveiro 
wrote:

> Solr doesn’t scale very well with ~2K collections, and yes de bottleneck
> is Zookeeper itself.
>
> Zookeeper doesn’t perform operation as quickly as expected with folders
> with a lot of children.
>
> In a scenario where you are in a recovery state (a node crash), this
> limitation will hurt a lot, the queue work stacks recovery operations due
> the low throughput to consume the queue.
>
> Regards.
>
> --
>
> Yago Riveiro
>
> On 29 Jun 2018 17:38 +0100, Bertrand Mahé , wrote:
> > Hi,
> >
> >
> >
> > In order to store timeseries data and perform deletion easily, we create
> a
> > several collections per day and then use aliases.
> >
> >
> >
> > We are using SOLR 7.3 and we have 2 questions:
> >
> >
> >
> > Q1 : In order to access quickly the latest data would it be possible to
> load
> > cores in descending chronological order rather than alphabetical order?
> >
> >
> >
> > Q2: When we exceed 1200-1300 collections, zookeeper suddenly changes from
> > 6-700 KB RAM to 3 GB RAM which makes zoo very slow or almost unusable. Is
> > this normal?
> >
> >
> >
> > Thanks in advance,
> >
> >
> >
> > Bertrand
> >
>

-- 
http://www.the111shift.com

Re: Indexing part of Binary Documents and not the entire contents

2018-07-04 Thread Gus Heck

You might consider using a free tool like JesterJ (www.jesterj.org) which
can possibly also automate the acquisition of the documents and
transmission to solr. As well as provide a framework for massaging the
contents of the document in between (including Tika processing)

(Disclaimer: I'm the primary author of JesterJ so I'ms slightly biased ;) )

-Gus

On Wed, Jun 27, 2018 at 5:08 AM, neotorand  wrote:

> Thanks Erick
> I already have gone through the link from tika example you shared.
> Please look at the code in bold.
> I believe still the entire contents is pushed to memory with handler
> object.
> sorry i copied lengthy code from tika site.
>
> Regards
> Neo
>
> *Streaming the plain text in chunks*
> Sometimes, you want to chunk the resulting text up, perhaps to output as
> you
> go minimising memory use, perhaps to output to HDFS files, or any other
> reason! With a small custom content handler, you can do that.
>
> public List parseToPlainTextChunks() throws IOException,
> SAXException, TikaException {
> final List chunks = new ArrayList<>();
> chunks.add("");
> ContentHandlerDecorator handler = new ContentHandlerDecorator() {
> @Override
> public void characters(char[] ch, int start, int length) {
> String lastChunk = chunks.get(chunks.size() - 1);
> String thisStr = new String(ch, start, length);
>
> if (lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
> chunks.add(thisStr);
> } else {
> chunks.set(chunks.size() - 1, lastChunk + thisStr);
> }
> }
> };
>
> AutoDetectParser parser = new AutoDetectParser();
> Metadata metadata = new Metadata();
> try (InputStream stream =
> ContentHandlerExample.class.getResourceAsStream("test2.doc")) {
> *parser.parse(stream, handler, metadata);*
> return chunks;
> }
> }
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
http://www.the111shift.com

Re: MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

2018-07-04 Thread Benoit Delbosc

On 04.07.2018 19:01, Shawn Heisey wrote:
> On 7/4/2018 1:36 AM, Benoit Delbosc wrote:
>> I have a complex integration test that is failing systematically since
>> we upgraded the Elasticsearch cluster to 6.3.0 (Lucene 7.3.1).
>
> This is a Solr mailing list.  Solr is a subproject of Lucene, but it
> is not Lucene.
such a mistake indeed,
sorry for the disturbance & thanks for your reply

ben

Re: [SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

2018-07-04 Thread will martin

The cve id was reserved in April. The jira ticket 1 mo ago. Is this the
first notice to this list?

Thx

On Wed, Jul 4, 2018, 12:56 PM Uwe Schindler  wrote:

> CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload
> (exchange rate provider config / enum field config / TIKA parsecontext)
>
> Severity: High
>
> Vendor:
> The Apache Software Foundation
>
> Versions Affected:
> Solr 6.0.0 to 6.6.4
> Solr 7.0.0 to 7.3.1
>
> Description:
> The details of this vulnerability were reported by mail to the Apache
> security mailing list.
> This vulnerability relates to an XML external entity expansion (XXE) in
> Solr
> config files (currency.xml, enumsConfig.xml referred from schema.xml,
> TIKA parsecontext config file). In addition, Xinclude functionality
> provided
> in these config files is also affected in a similar way. The vulnerability
> can
> be used as XXE using file/ftp/http protocols in order to read arbitrary
> local files from the Solr server or the internal network. The manipulated
> files can be uploaded as configsets using Solr's API, allowing to exploit
> that vulnerability. See [1] for more details.
>
> Mitigation:
> Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases
> both
> of which address the vulnerability. Once upgrade is complete, no other
> steps
> are required. Those releases only allow external entities and Xincludes
> that
> refer to local files / zookeeper resources below the Solr instance
> directory
> (using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
> mind, that external entities and XInclude are explicitly supported to
> better
> structure config files in large installations. Before Solr 6 this was no
> problem, as config files were not accessible through the APIs.
>
> If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are
> advised to make sure that Solr instances are only used locally without
> access
> to public internet, so the vulnerability cannot be exploited. In addition,
> reverse proxies should be guarded to not allow end users to reach the
> configset APIs. Please refer to [2] on how to correctly secure Solr
> servers.
>
> Solr 5.x and earlier are not affected by this vulnerability; those versions
> do not allow to upload configsets via the API. Nevertheless, users should
> upgrade those versions as soon as possible, because there may be other ways
> to inject config files through file upload functionality of the old web
> interface. Those versions are no longer maintained, so no deep analysis was
> done.
>
> Credit:
> Yuyang Xiao, Ishan Chattopadhyaya
>
> References:
> [1] https://issues.apache.org/jira/browse/SOLR-12450
> [2] https://wiki.apache.org/solr/SolrSecurity
>
> -
> Uwe Schindler
> uschind...@apache.org
> ASF Member, Apache Lucene PMC / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>

Re: Parent-child query; subqueries on child docs of the same set of fields

2018-07-04 Thread Mikhail Khludnev

agh... It's my pet peeve.
what about
q= {!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}
AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}

^leading space
q=_query_:{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} AND _query_:{!parent which="isParent:true"
v='attrname:country
AND attrvalue:USA'}
q=+{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} +{!parent
which="isParent:true" v='attrname:country AND attrvalue:USA'}
Beware of escape encoding. it might require to replace + to %2b.
Post debug=query response here.

On Tue, Jul 3, 2018 at 9:25 PM TK Solr  wrote:

> Thank you, Mikhail. But this didn't work. The first {!parent which='...'
> v='...'} alone works. But the second {!parent ...} clause is completely
> ignored.
> In fact, if I turn on debugQuery, rawquerystring and querystring have the
> second
> query but parsedquery and parsedquery_toString only have the first query.
> BTW,
> does is the v parameter works in place of the query following {!parsername
> } for
> any parser?
>
>
> On 7/3/18 12:42 PM, Mikhail Khludnev wrote:
> > q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}
> AND
> >
> > {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}
>
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Creating single CloudSolrClient object which can be used throughout the application

2018-07-04 Thread Shawn Heisey


On 7/4/2018 2:41 AM, Ritesh Kumar wrote:

I did exactly as you told, created a public static synchronized method. The
problem still exists.


I wasn't addressing the connection problem.  I was addressing the 
question in the subject -- one client object that you can use 
everywhere.  But I think the problem you're having now is different than 
the initial problem, based on the error message:



Maybe returning the client object if it is not null is causing
" java.lang.IllegalStateException: Connection pool shut down" error. It
does run fine for just one time.


This means that you are still calling client.close() after you use the 
client.  That must be removed.  The entire point of the close() call is 
to release the internal objects to garbage and make the client unusable.


Thanks,
Shawn

Re: MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

2018-07-04 Thread Shawn Heisey


On 7/4/2018 1:36 AM, Benoit Delbosc wrote:

I have a complex integration test that is failing systematically since
we upgraded the Elasticsearch cluster to 6.3.0 (Lucene 7.3.1).


This is a Solr mailing list.  Solr is a subproject of Lucene, but it is 
not Lucene.


Solr and elasticsearch are competing projects that both use Lucene for 
the majority of their functionality.


You're going to have to go to elastic's support forum for help with ES.

Thanks,
Shawn

[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

2018-07-04 Thread Uwe Schindler

CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload
(exchange rate provider config / enum field config / TIKA parsecontext)

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.4
Solr 7.0.0 to 7.3.1

Description:
The details of this vulnerability were reported by mail to the Apache
security mailing list.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (currency.xml, enumsConfig.xml referred from schema.xml,
TIKA parsecontext config file). In addition, Xinclude functionality provided
in these config files is also affected in a similar way. The vulnerability can
be used as XXE using file/ftp/http protocols in order to read arbitrary
local files from the Solr server or the internal network. The manipulated
files can be uploaded as configsets using Solr's API, allowing to exploit
that vulnerability. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Yuyang Xiao, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12450
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/

AddReplica to shard with lowest node count

2018-07-04 Thread Duncan, Adam

Hi all,

Our team use Solrcloud for Solr 5.1 and are investigating an upgrade to 7.3
Currently we have a working scale-up approach for adding a new server to the 
cluster beyond the initial collection creation.
We’ve automated the install of Solr on new servers and, following that, we 
register the new instance with zookeeper so that the server will be included in 
the list of live nodes.
Finally we use the CoreAdmin API ‘Create’ command to associate the new node 
with our collection. Solr 5.1's CoreAdmin Create command would conveniently 
auto-assign the new node to the shard with the least nodes.

In Solr 7.3, the CoreAdmin API documentation warns us not to use the Create 
command with SolrCloud.
We tried 7.3’s CoreAdmin API Create command regardless and, unsurprisingly, it 
did not work.
The 7.3 documentation suggests we use the Collections API AddReplica 
command.The problem with AddReplica is that it expects us to specify the shard 
name.
This is unfortunate as it makes it hard for us to keep shards balanced. It puts 
the onus on us to work out the least populated shard via a call to the cluster 
status endpoint.
With that we now face the problem managing this correctly when scaling up 
multiple servers at once.

Are we missing something here? Is there really no way for a node to be 
auto-assigned to a shard in 7.3?
And if so, are there any recommendations for an approach to reliably doing this 
ourselves?

Thanks!
Adam

Re: Block Join Child Query returns incorrect result

2018-07-04 Thread kristaclaire14

Mikhail Khludnev-2 wrote
> Hello.
> 
> {!parent} always searching for parents, some improvement is in progress,
> but you need to use [child] or [subquery] to see children.
> If you don't have an idea about search result add =true param to
> get through matching details.
> 
> On Mon, Jul 2, 2018 at 10:41 PM kristaclaire14 

> kcaromualdo514@

> 
> wrote:
> 
>> Hi,
>>
>> I'm having a problem in my solr when querying third level child
>> documents.
>> I
>> want to retrieve parent documents that have specific third level child
>> documents. The example data is:
>>
>> [{
>> "id":"1001"
>> "path":"1.Project",
>> "Project_Title":"Sample Project",
>> "_childDocuments_":[
>> {
>> "id":"2001",
>> "path":"2.Project.Submission",
>> "Submission_No":"1234-QWE",
>> "_childDocuments_":[
>> {
>> "id":"3001",
>> "path":"3.Project.Submission.Agency",
>> "Agency_Cd":"QWE"
>> }
>> ]
>> }]
>> }, {
>> "id":"1002"
>> "path":"1.Project",
>> "Project_Title":"Test Project QWE",
>> "_childDocuments_":[
>> {
>> "id":"2002",
>> "path":"2.Project.Submission",
>> "Submission_No":"4567-AGY",
>> "_childDocuments_":[
>> {
>> "id":"3002",
>> "path":"3.Project.Submission.Agency",
>> "Agency_Cd":"AGY"
>> }]
>> }]
>> }]
>>
>> I want to retrieve the parent with *Agency_Cd:ZXC* in third level child
>> document.
>> So far, this is my query:
>> q={!parent which="path:1.Project" v="path:3.Project.Submission.Agency AND
>> Agency_Cd:ZXC"}
>>
>> My expected result is 0 but solr return parents with no matching child
>> documents based on the query. Am I doing something wrong on the query?
>> Thanks in advance.
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev


I see, thank you Mikhail.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: push to the limit without going over

2018-07-04 Thread Erick Erickson

First, I usually prefer to construct your CloudSolrClient by
using the Zookeeper ensemble string rather than URLs,
although that's probably not a cure for your problem.

Here's what I _think_ is happening. If you're slamming Solr
with a lot of updates, you're doing a lot of merging. At some point
when there are a lot of merges going on incoming
updates block until one or more merge threads is done.

At that point, I suspect your client is timing out. And (perhaps)
if you used the Zookeeper ensemble instead of HTTP, the
cluster state fetch would go away. I suspect that another
issue would come up, but

It's also possible this would all go away if you increase your
timeouts significantly. That's still a "set it and hope" approach
rather than a totally robust solution though.

Let's assume that the above works and you start getting timeouts.
You can back off the indexing rate at that point, or just go to
sleep for a while. This isn't what you'd like for a permanent solution,
but may let you get by.

There's work afoot to separate out update thread pools from query
thread pools so _querying_ doesn't suffer when indexing is heavy,
but that hasn't been implemented yet. This could also address
your cluster state fetch error.

You will get significantly better throughput if you batch your
docs and use the client.add(list_of_documents) BTW.

Another possibility is to use the new metrics (since Solr 6.4). They
provide over 200 metrics you can query, and it's quite
possible that they'd help your clients know when to self-throttle
but AFAIK, there's nothing built in to help you there.

Best,
Erick

On Wed, Jul 4, 2018 at 2:32 AM, Arturas Mazeika  wrote:
> Hi Solr Folk,
>
> I am trying to push solr to the limit and sometimes I succeed. The
> questions is how to not go over it, e.g., avoid:
>
> java.lang.RuntimeException: Tried fetching cluster state using the node
> names we knew of, i.e. [192.168.56.1:9998_solr, 192.168.56.1:9997_solr,
> 192.168.56.1:_solr, 192.168.56.1:9996_solr]. However, succeeded in
> obtaining the cluster state from none of them.If you think your Solr
> cluster is up and is accessible, you could try re-creating a new
> CloudSolrClient using working solrUrl(s) or zkHost(s).
> at org.apache.solr.client.solrj.impl.HttpClusterStateProvider.
> getState(HttpClusterStateProvider.java:109)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(
> CloudSolrClient.java:1113)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:845)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:818)
> at org.apache.solr.client.solrj.SolrRequest.process(
> SolrRequest.java:194)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
> at com.asc.InsertDEWikiSimple$SimpleThread.run(
> InsertDEWikiSimple.java:132)
>
>
> Details:
>
> I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu
> cores", an SSD as well as a HDD) using the German Wikipedia collection. I
> created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and
> managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s,
> ~100% CPU). Now I wanted to see what happens if I push HDD to the limits.
> Indexing the files from the SSD (I am able to scan the collection at the
> actual rate 400-500MB/s) with 16 threads, I tried to send those to the solr
> cluster with all indexes on the HDD.
>
> Clearly solr needs to deal with a very slow hard drive (10-20MB/s actual
> rate). If the cluster is not touched, solrj may start loosing connections
> after a few hours. If one checks the status of the cluster, it may happen
> sooner. After the connection is lost, the cluster calms down with writing
> after a half a dozen of minutes.
>
> What would be a reasonable way to push to the limit without going over?
>
> The exact parameters are:
>
> - 4 cores running 2gb ram
> - Schema:
>
>positionIncrementGap="100">
>  
>
>
>
>
>  
>   
>
>   
>  
>
>
>  
>   
>
>   
>   
>docValues="false" />
>
>   
>   
>   
>   
>
>   
>
> I SolrJ-connect once:
>
> ArrayList urls = new ArrayList<>();
> urls.add("http://localhost:/solr;);
> urls.add("http://localhost:9998/solr;);
> urls.add("http://localhost:9997/solr;);
> urls.add("http://localhost:9996/solr;);
>
> solrClient = new CloudSolrClient.Builder(urls)
> .withConnectionTimeout(1)
> .withSocketTimeout(6)
> .build();
> solrClient.setDefaultCollection("de_wiki_man");
>
> and then execute in 16 threads till there's anything to execute:
>
> Path p = getJobPath();
>

Re: Filtering solr suggest results

2018-07-04 Thread Arunan Sugunakumar

Hi Peter,

Thanks for the help. Didn't see it before.

Thanks,
Arunan

*Sugunakumar Arunan*
Undergraduate - CSE | UOM


On 3 July 2018 at 18:50, Peter Lancaster 
wrote:

> Hi Arunan,
>
> You can use a context filter query as described https://lucene.apache.org/
> solr/guide/6_6/suggester.html
>
> Cheers,
> Peter.
>
> -Original Message-
> From: Arunan Sugunakumar [mailto:arunans...@cse.mrt.ac.lk]
> Sent: 03 July 2018 12:17
> To: solr-user@lucene.apache.org
> Subject: Filtering solr suggest results
>
> Hi,
>
> I would like to know whether it is possible to filter the suggestions
> returned by the suggest component according to a field. For example I have
> a list of books published by different publications. I want to show
> suggestions for a book title under a specific publication.
>
> Thanks in Advance,
>
> Arunan
>
> *Sugunakumar Arunan*
> Undergraduate - CSE | UOM
>
> Email : aruna ns...@cse.mrt.ac.lk
> 
>
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __
>

RE: 7.3 appears to leak

2018-07-04 Thread Markus Jelsma

Hello Andrey,

I didn't think of that! I will try it when i have the courage again, probably 
next week or so.

Many thanks,
Markus
 
 
-Original message-
> From:Kydryavtsev Andrey 
> Sent: Wednesday 4th July 2018 14:48
> To: solr-user@lucene.apache.org
> Subject: Re: 7.3 appears to leak
> 
> If it is not possible to find a resource leak by code analysis and there is 
> no better ideas, I can suggest a brute force approach:
> - Clone Solr's sources from appropriate branch 
> https://github.com/apache/lucene-solr/tree/branch_7_3
> - Log every searcher's holder increment/decrement operation in a way to catch 
> every caller name (use Thread.currentThread().getStackTrace() or something) 
> https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
> - Build custom artefacts and upload them on prod
> - After memory leak happened - analyse logs to see what part of functionality 
> doesn't decrement searcher after counter was incremented. If searchers are 
> leaked - there should be such code I guess.
> 
> This is not something someone would like to do, but it is what it is.
> 
> 
> 
> Thank you,
> 
> Andrey Kudryavtsev
> 
> 
> 03.07.2018, 14:26, "Markus Jelsma" :
> > Hello Erick,
> >
> > Even the silliest ideas may help us, but unfortunately this is not the 
> > case. All our Solr nodes run binaries from the same source from our central 
> > build server, with the same libraries thanks to provisioning. Only schema 
> > and config are different, but the  directive is the same all over.
> >
> > Are there any other ideas, speculations, whatever, on why only our main 
> > text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 
> > and every version up?
> >
> > Many thanks?
> > Markus
> >
> > -Original message-
> >>  From:Erick Erickson 
> >>  Sent: Friday 29th June 2018 19:34
> >>  To: solr-user 
> >>  Subject: Re: 7.3 appears to leak
> >>
> >>  This is truly puzzling then, I'm clueless. It's hard to imagine this
> >>  is lurking out there and nobody else notices, but you've eliminated
> >>  the custom code. And this is also very peculiar:
> >>
> >>  * it occurs only in our main text search collection, all other
> >>  collections are unaffected;
> >>  * despite what i said earlier, it is so far unreproducible outside
> >>  production, even when mimicking production as good as we can;
> >>
> >>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
> >>  shows you each and every jar file Solr loads. Is it "somehow" possible
> >>  that your main collection is loading some jar from somewhere that's
> >>  different than you expect? 'cause silly ideas like this are all I can
> >>  come up with.
> >>
> >>  Erick
> >>
> >>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> >>   wrote:
> >>  > Hello Erick,
> >>  >
> >>  > The custom search handler doesn't interact with SolrIndexSearcher, this 
> >> is really all it does:
> >>  >
> >>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
> >> rsp) throws Exception {
> >>  > super.handleRequestBody(req, rsp);
> >>  >
> >>  > if (rsp.getToLog().get("hits") instanceof Integer) {
> >>  >   rsp.addHttpHeader("X-Solr-Hits", 
> >> String.valueOf((Integer)rsp.getToLog().get("hits")));
> >>  > }
> >>  > if (rsp.getToLog().get("hits") instanceof Long) {
> >>  >   rsp.addHttpHeader("X-Solr-Hits", 
> >> String.valueOf((Long)rsp.getToLog().get("hits")));
> >>  > }
> >>  >   }
> >>  >
> >>  > I am not sure this qualifies as one more to go.
> >>  >
> >>  > Re: compiler warnings on resources, yes! This and tests failing due to 
> >> resources leaks have always warned me when i forgot to release something 
> >> or decrement a reference. But except for the above method (and the token 
> >> filters which i really can't disable) are all that is left.
> >>  >
> >>  > I am quite desperate about this problem so although i am unwilling to 
> >> disable stuff, i can do it if i must. But i so reason, yet, to remove the 
> >> search handler or the token filter stuff, i mean, how could those leak a 
> >> SolrIndexSearcher?
> >>  >
> >>  > Let me know :)
> >>  >
> >>  > Many thanks!
> >>  > Markus
> >>  >
> >>  > -Original message-
> >>  >> From:Erick Erickson 
> >>  >> Sent: Friday 29th June 2018 18:46
> >>  >> To: solr-user 
> >>  >> Subject: Re: 7.3 appears to leak
> >>  >>
> >>  >> bq. The only custom stuff left is an extension of SearchHandler that
> >>  >> only writes numFound to the response headers.
> >>  >>
> >>  >> Well, one more to go ;). It's incredibly easy to overlook
> >>  >> innocent-seeming calls that increment the underlying reference count
> >>  >> of some objects but don't decrement them, usually through a close
> >>  >> call. Which isn't necessarily a close if the underlying reference
> >>  >> count is still > 0.
> >>  >>
> >>  >> You may infer that I've been there and done that ;). Sometime the
> >>  >> compiler warnings

Re: 7.3 appears to leak

2018-07-04 Thread Kydryavtsev Andrey

If it is not possible to find a resource leak by code analysis and there is no 
better ideas, I can suggest a brute force approach:
- Clone Solr's sources from appropriate branch 
https://github.com/apache/lucene-solr/tree/branch_7_3
- Log every searcher's holder increment/decrement operation in a way to catch 
every caller name (use Thread.currentThread().getStackTrace() or something) 
https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
- Build custom artefacts and upload them on prod
- After memory leak happened - analyse logs to see what part of functionality 
doesn't decrement searcher after counter was incremented. If searchers are 
leaked - there should be such code I guess.

This is not something someone would like to do, but it is what it is.



Thank you,

Andrey Kudryavtsev


03.07.2018, 14:26, "Markus Jelsma" :
> Hello Erick,
>
> Even the silliest ideas may help us, but unfortunately this is not the case. 
> All our Solr nodes run binaries from the same source from our central build 
> server, with the same libraries thanks to provisioning. Only schema and 
> config are different, but the  directive is the same all over.
>
> Are there any other ideas, speculations, whatever, on why only our main text 
> collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every 
> version up?
>
> Many thanks?
> Markus
>
> -Original message-
>>  From:Erick Erickson 
>>  Sent: Friday 29th June 2018 19:34
>>  To: solr-user 
>>  Subject: Re: 7.3 appears to leak
>>
>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
>>  is lurking out there and nobody else notices, but you've eliminated
>>  the custom code. And this is also very peculiar:
>>
>>  * it occurs only in our main text search collection, all other
>>  collections are unaffected;
>>  * despite what i said earlier, it is so far unreproducible outside
>>  production, even when mimicking production as good as we can;
>>
>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
>>  shows you each and every jar file Solr loads. Is it "somehow" possible
>>  that your main collection is loading some jar from somewhere that's
>>  different than you expect? 'cause silly ideas like this are all I can
>>  come up with.
>>
>>  Erick
>>
>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
>>   wrote:
>>  > Hello Erick,
>>  >
>>  > The custom search handler doesn't interact with SolrIndexSearcher, this 
>> is really all it does:
>>  >
>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
>> rsp) throws Exception {
>>  > super.handleRequestBody(req, rsp);
>>  >
>>  > if (rsp.getToLog().get("hits") instanceof Integer) {
>>  >   rsp.addHttpHeader("X-Solr-Hits", 
>> String.valueOf((Integer)rsp.getToLog().get("hits")));
>>  > }
>>  > if (rsp.getToLog().get("hits") instanceof Long) {
>>  >   rsp.addHttpHeader("X-Solr-Hits", 
>> String.valueOf((Long)rsp.getToLog().get("hits")));
>>  > }
>>  >   }
>>  >
>>  > I am not sure this qualifies as one more to go.
>>  >
>>  > Re: compiler warnings on resources, yes! This and tests failing due to 
>> resources leaks have always warned me when i forgot to release something or 
>> decrement a reference. But except for the above method (and the token 
>> filters which i really can't disable) are all that is left.
>>  >
>>  > I am quite desperate about this problem so although i am unwilling to 
>> disable stuff, i can do it if i must. But i so reason, yet, to remove the 
>> search handler or the token filter stuff, i mean, how could those leak a 
>> SolrIndexSearcher?
>>  >
>>  > Let me know :)
>>  >
>>  > Many thanks!
>>  > Markus
>>  >
>>  > -Original message-
>>  >> From:Erick Erickson 
>>  >> Sent: Friday 29th June 2018 18:46
>>  >> To: solr-user 
>>  >> Subject: Re: 7.3 appears to leak
>>  >>
>>  >> bq. The only custom stuff left is an extension of SearchHandler that
>>  >> only writes numFound to the response headers.
>>  >>
>>  >> Well, one more to go ;). It's incredibly easy to overlook
>>  >> innocent-seeming calls that increment the underlying reference count
>>  >> of some objects but don't decrement them, usually through a close
>>  >> call. Which isn't necessarily a close if the underlying reference
>>  >> count is still > 0.
>>  >>
>>  >> You may infer that I've been there and done that ;). Sometime the
>>  >> compiler warnings about "resource leak" can help pinpoint those too.
>>  >>
>>  >> Best,
>>  >> Erick
>>  >>
>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>>  >>  wrote:
>>  >> > Hello Yonik,
>>  >> >
>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it 
>> would only receive shard queries, this way i could kind of 'safely' disable 
>> our custom components one by one, while keeping functionality in place by 
>> letting the other 7.2.1 nodes continue on with the full configuration.
>>  >> >
>>  >> > I am now at

How to only highlight terms that caused the document to match

2018-07-04 Thread Bjarke Buur Mortensen

Hi list,

I'm having difficulties getting the solr highlighter to highlight only the
terms that actually caused the match. Let med explain:

Given a query "john OR (peter AND mary)"
and two documents:
"john is awesome and so is peter"
"peter is awesome and so is mary",

solr will highlight "peter" and "mary" in the second document, which is
expected.
However it will also highlight both 'john' and 'peter' in the first
document, even though peter requires that mary is present also.

Is there any way to improve this?

If I add debugQuery, the explain-block can easily tell me that the first
document matched because of john, giving it a score of 1, whereas the
second matched because of the presence of both peter and mary, giving it a
score of 2.

So somehow, the information is available, but not used by the highlighter.

Below, I have included a real world solr output to explain what I mean.

Thanks,
Bjarke


---

{
  "responseHeader":{
"status":0,
"QTime":12,
"params":{
  "hl.snippets":"2",
  "q":"plejehjem*  OR (plejecentre* AND boliger*)",
  "defType":"lucene",
  "hl":"on",
  "fl":"doc_id,score",
  "fq":"doc_id:(0273-000545 OR 259531-2018)",
  "hl.method":"unified",
  "debugQuery":"on"}},
  "response":{"numFound":2,"start":0,"maxScore":3.0,"docs":[
  {
"doc_id":"0273-000545",
"score":3.0},
  {
"doc_id":"259531-2018",
"score":1.0}]
  },
  "highlighting":{
"udbuddk-0273-000545":{
  
"content_and_cpv_descriptions_da":["Beskrivelse\n---\n\nKonkurrenceudsættelsen
omfatter drift af følgende 2 plejecentre: \n·
Sandgårdsparken, Kjellerup, 40 boliger  \n·
Solgården, Sjørslev, 22 boliger  \nBeslutningen om at udsætte
driften af plejecentre for konkurrence er aftalt i den
politiske budgetaftale for 2015, der blev indgået i august 2014 mellem
alle byrådets partier undtagen Dansk Folkeparti og Enhedslisten.
\n”Ældre- og Handicapudvalget igangsætter en proces for
konkurrenceudsættelse af drift af ca. 72 plejehjemspladser.
",
"85144100 Sygepleje på plejehjem"]},
"TED-259531-2018":{
  "content_and_cpv_descriptions_da":["Morsø Kommune 41333014
Jernbanevej 7 Nykøbing M 7900 Birgitte Lund +45 99707017
birgitte.l...@morsoe.dk https://permalink.mercell.com/8747.aspx
http://www.morsoe.dk/ https://permalink.mercell.com/8747.aspx
Mercell Danmark A/S Østre Stationsvej 33, Vestfløjen Odense C 5000
support...@mercell.com https://permalink.mercell.com/8747.aspx
https://permalink.mercell.com/8747.aspx Vikarydelser på
ældreområdet 773-2018-5278 Udbuddet omfatter hjemmeplejen og
plejecentre i Morsø Kommune. ",
"85144100 Sygepleje på plejehjem"]}},
  "debug":{
"rawquerystring":"plejehjem*  OR (plejecentre* AND boliger*)",
"querystring":"plejehjem*  OR (plejecentre* AND boliger*)",
"parsedquery":"content_and_cpv_descriptions_da:plejehjem*
(+content_and_cpv_descriptions_da:plejecentre*
+content_and_cpv_descriptions_da:boliger*)",
"parsedquery_toString":"content_and_cpv_descriptions_da:plejehjem*
(+content_and_cpv_descriptions_da:plejecentre*
+content_and_cpv_descriptions_da:boliger*)",
"explain":{
  "udbuddk-0273-000545":"\n3.0 = sum of:\n  1.0 =
content_and_cpv_descriptions_da:plejehjem*\n  2.0 = sum of:\n1.0 =
content_and_cpv_descriptions_da:plejecentre*\n1.0 =
content_and_cpv_descriptions_da:boliger*\n",
  "TED-259531-2018":"\n1.0 = sum of:\n  1.0 =
content_and_cpv_descriptions_da:plejehjem*\n"},
"QParser":"LuceneQParser",
"filter_queries":["doc_id:(0273-000545 OR 259531-2018)"],
"parsed_filter_queries":["doc_id:0273-000545 doc_id:259531-2018"],
"timing":{
  "time":12.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":11.0,
"query":{
  "time":1.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":9.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "loadFieldValues":{
"time":0.0

Re: Scores with Solr Suggester

2018-07-04 Thread Alessandro Benedetti

Hi Christine,
it depends on the suggester implementation, the one that got closer in
having a score implementation is the BlendedInfix[1] but it is still in the
TO DO phase.
Feel free to contribute it if you like !

[1]
https://sease.io/2018/06/apache-lucene-blendedinfixsuggester-how-it-works-bugs-and-improvements.html



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

push to the limit without going over

2018-07-04 Thread Arturas Mazeika

Hi Solr Folk,

I am trying to push solr to the limit and sometimes I succeed. The
questions is how to not go over it, e.g., avoid:

java.lang.RuntimeException: Tried fetching cluster state using the node
names we knew of, i.e. [192.168.56.1:9998_solr, 192.168.56.1:9997_solr,
192.168.56.1:_solr, 192.168.56.1:9996_solr]. However, succeeded in
obtaining the cluster state from none of them.If you think your Solr
cluster is up and is accessible, you could try re-creating a new
CloudSolrClient using working solrUrl(s) or zkHost(s).
at org.apache.solr.client.solrj.impl.HttpClusterStateProvider.
getState(HttpClusterStateProvider.java:109)
at org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(
CloudSolrClient.java:1113)
at org.apache.solr.client.solrj.impl.CloudSolrClient.
requestWithRetryOnStaleState(CloudSolrClient.java:845)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
CloudSolrClient.java:818)
at org.apache.solr.client.solrj.SolrRequest.process(
SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
at com.asc.InsertDEWikiSimple$SimpleThread.run(
InsertDEWikiSimple.java:132)


Details:

I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu
cores", an SSD as well as a HDD) using the German Wikipedia collection. I
created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and
managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s,
~100% CPU). Now I wanted to see what happens if I push HDD to the limits.
Indexing the files from the SSD (I am able to scan the collection at the
actual rate 400-500MB/s) with 16 threads, I tried to send those to the solr
cluster with all indexes on the HDD.

Clearly solr needs to deal with a very slow hard drive (10-20MB/s actual
rate). If the cluster is not touched, solrj may start loosing connections
after a few hours. If one checks the status of the cluster, it may happen
sooner. After the connection is lost, the cluster calms down with writing
after a half a dozen of minutes.

What would be a reasonable way to push to the limit without going over?

The exact parameters are:

- 4 cores running 2gb ram
- Schema:

  
 
   
   
   
   
 
  

  
 
   
   
 
  

  
  
  

  
  
  
  

  

I SolrJ-connect once:

ArrayList urls = new ArrayList<>();
urls.add("http://localhost:/solr;);
urls.add("http://localhost:9998/solr;);
urls.add("http://localhost:9997/solr;);
urls.add("http://localhost:9996/solr;);

solrClient = new CloudSolrClient.Builder(urls)
.withConnectionTimeout(1)
.withSocketTimeout(6)
.build();
solrClient.setDefaultCollection("de_wiki_man");

and then execute in 16 threads till there's anything to execute:

Path p = getJobPath();
   String content = new String
(Files.readAllBytes(p));
UUID id = UUID.randomUUID();
SolrInputDocument doc = new SolrInputDocument();

BasicFileAttributes attr = Files.readAttributes(p,
BasicFileAttributes.class);

doc.addField("id",  id.toString());
doc.addField("content", content);
doc.addField("time",attr.creationTime().toString());
doc.addField("size",content.length());
doc.addField("url", p.getFileName().
toAbsolutePath().toString());
solrClient.add(doc);


to go through all the wiki html files.

Cheers,
Arturas

Re: Creating single CloudSolrClient object which can be used throughout the application

2018-07-04 Thread Ritesh Kumar

Hello Shawn,

I did exactly as you told, created a public static synchronized method. The
problem still exists.

Maybe returning the client object if it is not null is causing
" java.lang.IllegalStateException: Connection pool shut down" error. It
does run fine for just one time.

pseudo code:
private static SolrClient client = null;
public static synchronized SolrClient getSolrClient(collectionName) {
if (client != null) {
return client;
}
if (Boolean.parseBoolean(isSolrCloudEnabled)) {
client = new
CloudSolrClient.Builder().withZkHost(ZKHOSTS).build();
((CloudSolrClient)client).setDefaultCollection(collectionName);
} else {
client = new HttpSolrClient.Builder(getSolrHost(delegator,
collectionName)).build();
}
return client;
}

If I remove this code and simply create a new client object still keeping
the method synchronized, everything seems to be running fine.

Am I missing something?

On Tue, Jul 3, 2018 at 6:04 AM Shawn Heisey  wrote:

> On 7/2/2018 7:35 AM, Ritesh Kumar wrote:
> > I have got a static method which returns CloudSolrClient object if Solr
> is
> > running in Cloud mode and HttpSolrClient object otherwise.
>
> Declare that method as synchronized, so that multiple usages do not step
> on each other's toes.  This will also eliminate object visibility issues
> in multi-threaded code.  The modifiers for the method will probably end
> up being "public static synchronized".
>
> In the class where that method lives, create a "private static
> SolrClient" field and set it to null.  In the method, if the class-level
> field is not null, return it.  If it is null, create the HttpSolrClient
> or CloudSolrClient object just as you do now, set the default collection
> if that's required, then assign that client object to the class-level
> field and return it.
>
> Remove any client.close() calls that you have currently.  You can close
> the client at application shutdown, but this is not actually necessary
> if application shutdown also halts the JVM.
>
> You could also use the singleton paradigm that Erick mentioned, but
> since you already have code to obtain a client object, it's probably
> more straightforward to just modify that code as I have described, and
> don't close the client after you use it.
>
> Thanks,
> Shawn
>
>

MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

2018-07-04 Thread Benoit Delbosc

Greetings,

I have a complex integration test that is failing systematically since
we upgraded the Elasticsearch cluster to 6.3.0 (Lucene 7.3.1).

The exact same test using an Elasticsearch cluster in version 6.2.4
(Lucene 7.2.1) is successful.

Basically the test is submitting concurrent indexing and bulk indexing
requests, at some point a merge exception is raised,
this ends up by an Elasticsearch shard failure (cluster health status is
RED).

The Elasticsearch and OS logs look normal until the merge exception below.

So far I was not able to reproduce the problem outside of this
dockerized test environment which is running on a specific slave.

I would like some guidance to help categorizing this problem.

Regards

ben


[2018-06-26T10:12:59,907][WARN ][o.e.i.e.Engine   ] [dZQ-7Yb] 
[nuxeo][0] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: 
java.lang.IllegalStateException: found existing value for 
PerFieldPostingsFormat.format, field=note:note.fulltext, old=Lucene50, 
new=Lucene50
at 
org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2113)
 [elasticsearch-6.3.0.jar:6.3.0]
at 
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)
 [elasticsearch-6.3.0.jar:6.3.0]
at 
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
 [elasticsearch-6.3.0.jar:6.3.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
[?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: java.lang.IllegalStateException: found existing value for 
PerFieldPostingsFormat.format, field=note:note.fulltext, old=Lucene50, 
new=Lucene50
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.buildFieldsGroupMapping(PerFieldPostingsFormat.java:226)
 ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:152)
 ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:230) 
~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) 
~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4443) 
~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4083) 
~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
 ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
at 
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
 ~[elasticsearch-6.3.0.jar:6.3.0]
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
 ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - 
caomanhdat - 2018-05-09 09:27:24]
[2018-06-26T10:12:59,913][WARN ][o.e.i.c.IndicesClusterStateService] [dZQ-7Yb] 
[[nuxeo][0]] marking and sending shard failed due to [shard failure, reason 
[merge failed]]

Re: Errors when using Blob API

2018-07-04 Thread Zahra Aminolroaya

Thanks shawn. I removed the space from header because I got another error. I
finally used "Content-Type: application/octet-stream" instead of
'Content-Type: application/octet-stream' and all of errors even the space
limit error solved.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Creating single CloudSolrClient object which can be used throughout the application

Re: AddReplica to shard with lowest node count

Re: AddReplica to shard with lowest node count

Re: Solr - zoo with more than 1000 collections

Re: Indexing part of Binary Documents and not the entire contents

Re: MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

Re: [SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

Re: Parent-child query; subqueries on child docs of the same set of fields

Re: Creating single CloudSolrClient object which can be used throughout the application

Re: MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

AddReplica to shard with lowest node count

Re: Block Join Child Query returns incorrect result

Re: push to the limit without going over

Re: Filtering solr suggest results

RE: 7.3 appears to leak

Re: 7.3 appears to leak

How to only highlight terms that caused the document to match

Re: Scores with Solr Suggester

push to the limit without going over

Re: Creating single CloudSolrClient object which can be used throughout the application

MergeException due to illegal state in PerFieldPostingsFormat in 7.3.1

Re: Errors when using Blob API

23 matches

Site Navigation

Mail list logo

Footer information