Configuration recommendation for SolrCloud

2019-06-25 Thread Rahul Goswami
Hello,
We are running Solr 7.2.1 and planning for a deployment which will grow to
4 billion documents over time. We have 16 nodes at disposal.I am thinking
between 3 configurations:

1 cluster - 16 nodes
vs
2 clusters - 8 nodes each
vs
4 clusters -4 nodes each

Irrespective of the configuration, each node would host 8 shards (eg: a
cluster with 16 nodes would have 16*8=128 shards; similarly, 32 shards in a
4 node cluster). These 16 nodes will be hosted across 4 beefy servers each
with 128 GB RAM. So we can allocate 32 GB RAM (not heap space) to each
node. what configuration would be most efficient for our use case
considering moderate-heavy indexing and search load? Would also like to
know the tradeoffs involved if any. Thanks in advance!

Regards,
Rahul


Re: Solr filter query on text fields

2019-06-25 Thread Erick Erickson
That implies that you’re using two different queryParsers, one for the “q” 
portion and one for the “fq” portion. My guess is that you have solrconfig 
/select or /query configured to use (e)dismax but your fq clause is being 
parsed by, the LuceneQueryParser.

You can specify the parser via local params, i.e. fq={!edismax blah blah}query 
terms.

Add =query to your input and you’ll see exactly which parsers are used 
for what parts of the query.

Best,
Erick

> On Jun 25, 2019, at 11:30 AM, Wei  wrote:
> 
> Thanks Erick for the clarification.  How does the ps work for fq?  I
> configured ps=4 for q, it doesn't apply to fq though. For phrase queries in
> fq seems ps=0 is used. Is there a way to config it for fq also?
> 
> Best,
> Wei
> 
> On Tue, Jun 25, 2019 at 9:51 AM Erick Erickson 
> wrote:
> 
>> q and fq do _exactly_ the same thing in terms of query parsing, subject to
>> all the same conditions.
>> 
>> There are two things that apply to fq clauses that have nothing to do with
>> the query _parsing_.
>> 1> there is no scoring, so it’s cheaper from that perspective
>> 2> the results are cached in a bitmap and can be re-used later
>> 
>> Best,
>> Erick
>> 
>>> On Jun 24, 2019, at 7:06 PM, Wei  wrote:
>>> 
>>> Thanks Shawn! I didn't notice the asterisks are created during
>> copy/paste,
>>> one lesson learned :)
>>> Does that mean when fq is applied to text fields,  it is doing text match
>>> in the field just like q in a query field?  While for string fields, it
>> is
>>> exact match.
>>> If it is a phrase query,  what are the values for relate parameters such
>> as
>>> ps?
>>> 
>>> Thanks,
>>> Wei
>>> 
>>> On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey 
>> wrote:
>>> 
 On 6/24/2019 5:37 PM, Wei wrote:
> >>> stored="true"/>
 
 I'm assuming that the asterisks here are for emphasis, that they are not
 actually present.  This can be very confusing.  It is far better to
 relay the precise information and not try to emphasize anything.
 
> For query q=*:*=description:”ice cream”,  the filter query returns
> matches for “ice cream bar”  and “vanilla ice cream” , but does not
>> match
> for “ice cold cream”.
> 
> The results seem neither exact match nor phrase match. What's the
 expected
> behavior for fq on text fields?  I have tried to look into the solr
>> docs
> but there is no clear explanation.
 
 If the quotes are present in what you actually sent to Solr, then that
 IS a phrase query.  And that is why it did not match your third example.
 
 Try one of these instead:
 
 q=*:*=description:(ice cream)
 
 q=*:*=description:ice description:cream)
 
 Thanks,
 Shawn
 
>> 
>> 



Re: Encrypting Solr Index

2019-06-25 Thread Jörn Franke
Maybe in this scenario a Secure Enclave could make sense (eg Intel sgx)?

The scenario that you describes looks like MIT CryptDB, eg 
https://css.csail.mit.edu/cryptdb/



> Am 25.06.2019 um 21:05 schrieb Tim Casey :
> 
> My two cents worth of comment,
> 
> For our local lucene indexes we use AES encryption.  We encrypt the blocks
> on the way out, decrypt on the way in.
> We are using a C version of lucene, not the java version.  But, I suspect
> the same methodology could be applied.  This assumes the data at rest is
> the attack vector for discovering what is in the invertible index.  But
> allows for the indexing/querying to be done in the clear.  This would allow
> for stemming and the like.
> 
> If you have an attack vector in which the indexing/querying are not
> trusted, then you have a whole different set of problems.
> 
> To do stemming, you need a homomorphic encryption scheme which would allow
> per character/byte queries.  This is different type of attack vector than
> the on-disk encryption.  To me, this implies the query system itself is
> untrusted and you are indexing/querying encrypted content.  The first
> "thing" people are going to try  is to hash a token into a 256bit value
> which becomes the indexable token value.  This leads to the lack of
> stemming from above comments.  Depending on how keys are handled and hashes
> are generated you can run out of token space in the various underlying
> lucene indexes because you have more than 2 million tokens.
> 
> 
> 
>> On Tue, Jun 25, 2019 at 10:21 AM Ahuja, Sakshi  wrote:
>> 
>> I am actually looking for the best option so currently doing research on
>> it.
>> For Window's FS encryption I didn't find a way to use different
>> Username/Password. It by default takes window's username/password to
>> encrypt and decrypt.
>> 
>> I tried bitlocker too for creating encrypted virtual directory (Which
>> allows me to use different credentials) and to keep Solr Index in that but
>> somehow Solr Admin was unable to access Index from that encrypted
>> directory. Not sure how that is working.
>> 
>> If you have any idea on that- will wok for me. Thanks!
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: Tuesday, June 25, 2019 12:47 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Encrypting Solr Index
>> 
>> Why does FS encryption does not serve your use case?
>> 
>> Can’t you apply it also for backups etc?
>> 
>>> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
>>> 
>>> Hi,
>>> 
>>> I am using solr 6.6 and want to encrypt index for security reasons. I
>> have tried Windows FS encryption option that works but want to know if solr
>> has some inbuilt feature to encrypt index or any good way to encrypt solr
>> index?
>>> 
>>> Thanks,
>>> Sakshi
>> 


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
I removed the replicate after startup from our solrconfig.xml file. However
that didn't solve the issue. When I rebuilt the primary, the associated
replicas all went to 0 documents. 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


bug in SolrInputDocument.toString()?

2019-06-25 Thread Mark Sholund
Hello,
First time poster here so sorry for any formatting problems.  I am sorry if 
this has been asked before but I've tried several version of SolrJ 
(6.5.1-8.1.1) with the same result.

I am runing the following example code and am seeing odd output.

String id = "foo123";
int popularity=1;
SolrInputDocument inputDocument = new SolrInputDocument();
System.out.println("creating document for " + id);
inputDocument.addField(ID_FIELD, id);
inputDocument.addField(POPULARITY_FIELD, Collections.singletonMap("set", 
popularity));
// System.out.println("document: " + inputDocument);
System.out.println("json: " + inputDocument.jsonStr());

This produces the output

creating document for foo123
document: {id=id=foo123, popularity_i=popularity_i={set=1}}
json: {"id":"id=foo123","popularity_i":"popularity_i={set=1}"}

I cannot see anything that I am doing wrong and the update succeeds.  Is this 
just a bug in the toString() and jsonStr() methods?

publickey - mark.d.sholund@protonmail.com - 0x9EF69757.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: Encrypting Solr Index

2019-06-25 Thread Tim Casey
My two cents worth of comment,

For our local lucene indexes we use AES encryption.  We encrypt the blocks
on the way out, decrypt on the way in.
We are using a C version of lucene, not the java version.  But, I suspect
the same methodology could be applied.  This assumes the data at rest is
the attack vector for discovering what is in the invertible index.  But
allows for the indexing/querying to be done in the clear.  This would allow
for stemming and the like.

If you have an attack vector in which the indexing/querying are not
trusted, then you have a whole different set of problems.

To do stemming, you need a homomorphic encryption scheme which would allow
per character/byte queries.  This is different type of attack vector than
the on-disk encryption.  To me, this implies the query system itself is
untrusted and you are indexing/querying encrypted content.  The first
"thing" people are going to try  is to hash a token into a 256bit value
which becomes the indexable token value.  This leads to the lack of
stemming from above comments.  Depending on how keys are handled and hashes
are generated you can run out of token space in the various underlying
lucene indexes because you have more than 2 million tokens.



On Tue, Jun 25, 2019 at 10:21 AM Ahuja, Sakshi  wrote:

> I am actually looking for the best option so currently doing research on
> it.
> For Window's FS encryption I didn't find a way to use different
> Username/Password. It by default takes window's username/password to
> encrypt and decrypt.
>
> I tried bitlocker too for creating encrypted virtual directory (Which
> allows me to use different credentials) and to keep Solr Index in that but
> somehow Solr Admin was unable to access Index from that encrypted
> directory. Not sure how that is working.
>
> If you have any idea on that- will wok for me. Thanks!
>
> -Original Message-
> From: Jörn Franke 
> Sent: Tuesday, June 25, 2019 12:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Encrypting Solr Index
>
> Why does FS encryption does not serve your use case?
>
> Can’t you apply it also for backups etc?
>
> > Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> >
> > Hi,
> >
> > I am using solr 6.6 and want to encrypt index for security reasons. I
> have tried Windows FS encryption option that works but want to know if solr
> has some inbuilt feature to encrypt index or any good way to encrypt solr
> index?
> >
> > Thanks,
> > Sakshi
>


Re: Solr filter query on text fields

2019-06-25 Thread Wei
Thanks Erick for the clarification.  How does the ps work for fq?  I
configured ps=4 for q, it doesn't apply to fq though. For phrase queries in
fq seems ps=0 is used. Is there a way to config it for fq also?

Best,
Wei

On Tue, Jun 25, 2019 at 9:51 AM Erick Erickson 
wrote:

> q and fq do _exactly_ the same thing in terms of query parsing, subject to
> all the same conditions.
>
> There are two things that apply to fq clauses that have nothing to do with
> the query _parsing_.
> 1> there is no scoring, so it’s cheaper from that perspective
> 2> the results are cached in a bitmap and can be re-used later
>
> Best,
> Erick
>
> > On Jun 24, 2019, at 7:06 PM, Wei  wrote:
> >
> > Thanks Shawn! I didn't notice the asterisks are created during
> copy/paste,
> > one lesson learned :)
> > Does that mean when fq is applied to text fields,  it is doing text match
> > in the field just like q in a query field?  While for string fields, it
> is
> > exact match.
> > If it is a phrase query,  what are the values for relate parameters such
> as
> > ps?
> >
> > Thanks,
> > Wei
> >
> > On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey 
> wrote:
> >
> >> On 6/24/2019 5:37 PM, Wei wrote:
> >>>  >> stored="true"/>
> >>
> >> I'm assuming that the asterisks here are for emphasis, that they are not
> >> actually present.  This can be very confusing.  It is far better to
> >> relay the precise information and not try to emphasize anything.
> >>
> >>> For query q=*:*=description:”ice cream”,  the filter query returns
> >>> matches for “ice cream bar”  and “vanilla ice cream” , but does not
> match
> >>> for “ice cold cream”.
> >>>
> >>> The results seem neither exact match nor phrase match. What's the
> >> expected
> >>> behavior for fq on text fields?  I have tried to look into the solr
> docs
> >>> but there is no clear explanation.
> >>
> >> If the quotes are present in what you actually sent to Solr, then that
> >> IS a phrase query.  And that is why it did not match your third example.
> >>
> >> Try one of these instead:
> >>
> >> q=*:*=description:(ice cream)
> >>
> >> q=*:*=description:ice description:cream)
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Solr 7.7.2 - Autoscaling in new cluster ignoring sysprop rules, possibly all rules

2019-06-25 Thread Andrew Kettmann
Using docker 7.7.2 image


Solr 7.7.2 on new Znode on ZK. Created the chroot using solr zk mkroot.


Created a policy:

{'set-policy': {'banana': [{'replica': '#ALL',
'sysprop.HELM_CHART': 'notbanana'}]}}


No errors on creation of the policy.


I have no nodes that have that value for the system property "HELM_CHART", I 
have nodes that contain "banana" and "rulesos" for that value only.


I create the collection with a call to the /admin/collections:

{'action': 'CREATE',
 'collection.configName': 'project-solr-7',
 'name': 'banana',
 'numShards': '2',
 'policy': 'banana',
 'replicationFactor': '2'}


and it creates the collection without an error. Which what I expected was the 
collection creation to fail. This is the behavior I had seen in the past, but 
after tearing down and recreating the cluster in a higher environment, it does 
not appear to function.


Is there some prerequisite before policies will be respected? The .system 
collection is in place as expected, and I am not seeing anything in the logs on 
the overseer to suggest any problems.

[https://storage.googleapis.com/e24-email-images/e24logonotag.png]
 Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn] [Twitter] 
  [Instagram] 


evolve24 Confidential & Proprietary Statement: This email and any attachments 
are confidential and may contain information that is privileged, confidential 
or exempt from disclosure under applicable law. It is intended for the use of 
the recipients. If you are not the intended recipient, or believe that you have 
received this communication in error, please do not read, print, copy, 
retransmit, disseminate, or otherwise use the information. Please delete this 
email and attachments, without reading, printing, copying, forwarding or saving 
them, and notify the Sender immediately by reply email. No confidentiality or 
privilege is waived or lost by any transmission in error.


RE: Encrypting Solr Index

2019-06-25 Thread Ahuja, Sakshi
I am actually looking for the best option so currently doing research on it.  
For Window's FS encryption I didn't find a way to use different 
Username/Password. It by default takes window's username/password to encrypt 
and decrypt.

I tried bitlocker too for creating encrypted virtual directory (Which allows me 
to use different credentials) and to keep Solr Index in that but somehow Solr 
Admin was unable to access Index from that encrypted directory. Not sure how 
that is working. 

If you have any idea on that- will wok for me. Thanks!

-Original Message-
From: Jörn Franke  
Sent: Tuesday, June 25, 2019 12:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Encrypting Solr Index

Why does FS encryption does not serve your use case?

Can’t you apply it also for backups etc?

> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> 
> Hi,
> 
> I am using solr 6.6 and want to encrypt index for security reasons. I have 
> tried Windows FS encryption option that works but want to know if solr has 
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
> 
> Thanks,
> Sakshi


Re: Solr filter query on text fields

2019-06-25 Thread Erick Erickson
q and fq do _exactly_ the same thing in terms of query parsing, subject to all 
the same conditions.

There are two things that apply to fq clauses that have nothing to do with the 
query _parsing_.
1> there is no scoring, so it’s cheaper from that perspective
2> the results are cached in a bitmap and can be re-used later

Best,
Erick

> On Jun 24, 2019, at 7:06 PM, Wei  wrote:
> 
> Thanks Shawn! I didn't notice the asterisks are created during copy/paste,
> one lesson learned :)
> Does that mean when fq is applied to text fields,  it is doing text match
> in the field just like q in a query field?  While for string fields, it is
> exact match.
> If it is a phrase query,  what are the values for relate parameters such as
> ps?
> 
> Thanks,
> Wei
> 
> On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey  wrote:
> 
>> On 6/24/2019 5:37 PM, Wei wrote:
>>> > stored="true"/>
>> 
>> I'm assuming that the asterisks here are for emphasis, that they are not
>> actually present.  This can be very confusing.  It is far better to
>> relay the precise information and not try to emphasize anything.
>> 
>>> For query q=*:*=description:”ice cream”,  the filter query returns
>>> matches for “ice cream bar”  and “vanilla ice cream” , but does not match
>>> for “ice cold cream”.
>>> 
>>> The results seem neither exact match nor phrase match. What's the
>> expected
>>> behavior for fq on text fields?  I have tried to look into the solr docs
>>> but there is no clear explanation.
>> 
>> If the quotes are present in what you actually sent to Solr, then that
>> IS a phrase query.  And that is why it did not match your third example.
>> 
>> Try one of these instead:
>> 
>> q=*:*=description:(ice cream)
>> 
>> q=*:*=description:ice description:cream)
>> 
>> Thanks,
>> Shawn
>> 



Re: Encrypting Solr Index

2019-06-25 Thread Jörn Franke
Why does FS encryption does not serve your use case?

Can’t you apply it also for backups etc?

> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> 
> Hi,
> 
> I am using solr 6.6 and want to encrypt index for security reasons. I have 
> tried Windows FS encryption option that works but want to know if solr has 
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
> 
> Thanks,
> Sakshi


Re: Encrypting Solr Index

2019-06-25 Thread Erick Erickson
This is a recurring issue. The Hitachi solution will encrypt individual 
_tokens_ in the index, even with different keys for different users. However, 
the price is functionality.

Take wildcards. The Hitachi solution doesn’t solve this, the problem is 
basically intractable. Consider the words run, running, runner, and runs. A 
search for run* has to match all those words, and an encryption algorithm that 
encodes the first three letters identically is trivially breakable.

People do as you are, put the index on an encrypting filesystim if 
encryption-at-rest is sufficient. My personal take is that if a hacker has 
unrestricted access to the memory on your Solr servers and could read the 
unencrypted index, Solr is only one of many problems you have.

Best,
Erick

> On Jun 25, 2019, at 8:40 AM, Alexandre Rafalovitch  wrote:
> 
> No index encryption in the box. I am aware of a commercial solution but no
> details on how good or what the price is:
> https://www.hitachi-solutions.com/securesearch/
> 
> Regards,
>Alex
> 
> On Tue, Jun 25, 2019, 11:32 AM Ahuja, Sakshi,  wrote:
> 
>> Hi,
>> 
>> I am using solr 6.6 and want to encrypt index for security reasons. I have
>> tried Windows FS encryption option that works but want to know if solr has
>> some inbuilt feature to encrypt index or any good way to encrypt solr index?
>> 
>> Thanks,
>> Sakshi
>> 



Re: Encrypting Solr Index

2019-06-25 Thread Alexandre Rafalovitch
No index encryption in the box. I am aware of a commercial solution but no
details on how good or what the price is:
https://www.hitachi-solutions.com/securesearch/

Regards,
Alex

On Tue, Jun 25, 2019, 11:32 AM Ahuja, Sakshi,  wrote:

> Hi,
>
> I am using solr 6.6 and want to encrypt index for security reasons. I have
> tried Windows FS encryption option that works but want to know if solr has
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
>
> Thanks,
> Sakshi
>


Encrypting Solr Index

2019-06-25 Thread Ahuja, Sakshi
Hi,

I am using solr 6.6 and want to encrypt index for security reasons. I have 
tried Windows FS encryption option that works but want to know if solr has some 
inbuilt feature to encrypt index or any good way to encrypt solr index?

Thanks,
Sakshi


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Ok. probable dropping  startup will help. Another idea
set replication.enable.master=false and enable it when master index is
build after restart.

On Tue, Jun 25, 2019 at 6:18 PM Patrick Bordelon <
patrick.borde...@coxautoinc.com> wrote:

> We are currently using the replicate after commit and startup
>
> 
> ${replication.enable.master:false}
> commit
> startup
> schema.xml,stopwords.txt
> 
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
We are currently using the replicate after commit and startup


${replication.enable.master:false}
commit
startup
schema.xml,stopwords.txt




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Alexandre Rafalovitch
You have couple of options to delete:
1) Explicit delete request
2) Expiration management:
https://lucene.apache.org/solr/8_1_0//solr-core/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.html
3) If you are indexing in clear batches (e.g. monthly, and keep last 3
month), you could look at
https://lucene.apache.org/solr/guide/8_0/time-routed-aliases.html . Or just
manual alias management to the same effect.

Regards,
   Alex.

On Tue, Jun 25, 2019, 7:04 AM Anuj Bhargava,  wrote:

> Then I'll need to delete records more than 30 days old. I was wonder if I
> could add something in the cron script itself.
>
> Regards,
>
> Anuj
>
> On Tue, 25 Jun 2019 at 16:11, Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
>
> >
> > ... and =false  if you want to index just new records and keep old
> > ones.
> > --
> > Vadim
> >
> >
> > > -Original Message-
> > > From: Jan Høydahl [mailto:jan@cominvent.com]
> > > Sent: Tuesday, June 25, 2019 10:48 AM
> > > To: solr-user
> > > Subject: Re: Solr 8.0.0 Customized Indexing
> > >
> > > Adjust your SQL (located in data-config.xml) to extract just what you
> > need
> > > (add a WHERE clause)
> > >
> > > --
> > > Jan Høydahl, search solution architect
> > > Cominvent AS - www.cominvent.com
> > >
> > > > 25. jun. 2019 kl. 07:23 skrev Anuj Bhargava :
> > > >
> > > > Customized Indexing date specific
> > > >
> > > > We have a huge database of more than 10 years. How can I index just
> > some
> > > of
> > > > the records - say for last 30 days. One of the fields in the database
> > is
> > > > *date_upload* which contains the date when the record was uploaded.
> > > >
> > > > Currently using Cron to index -
> > > > curl -q
> > > > http://loaclhost:8983/solr/newsdata/dataimport?command=full-
> > > import=true=true
> > > >> /dev/null 2>&1
> >
> >
> >
>


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Note, it seems like the current Solr's logic relies on persistent master
disks.
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java#L615


On Tue, Jun 25, 2019 at 3:16 PM Mikhail Khludnev  wrote:

> Hello, Patrick.
> Can commit help you?
>
> On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon <
> patrick.borde...@coxautoinc.com> wrote:
>
>> Hi,
>>
>> We recently upgraded to SOLR 7.5 in AWS, we had previously been running
>> SOLR
>> 6.5. In our current configuration we have our applications broken into a
>> single instance primary environment and a multi-instance replica
>> environment
>> separated behind a load balancer for each environment.
>>
>> Until recently we've been able to reload the primary without the replicas
>> updating until there was a full index. However when we upgraded to 7.5 we
>> started noticing that after terminating and rebuilding a primary instance
>> that the associated replicas would all start showing 0 documents in all
>> indexes. After some research we believe we've tracked down the issue.
>> SOLR-11293.
>>
>> SOLR-11293 changes
>> <
>> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
>>
>>
>> This fix changed the way the replication handler checks before updating a
>> replica when the primary has an empty index. Whether it's from deleting
>> the
>> old index or from terminating the instance.
>>
>> This is the code as it was in 6.5 replication handler
>>
>>   if (latestVersion == 0L) {
>> if (forceReplication && commit.getGeneration() != 0) {
>>   // since we won't get the files for an empty index,
>>   // we just clear ours and commit
>>   RefCounted iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>   try {
>> iw.get().deleteAll();
>>   } finally {
>> iw.decref();
>>   }
>>   SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>>   solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
>> false));
>> }
>>
>>
>> Without forced replication the index on the replica won't perform the
>> deletaAll operation and will keep the old index until a new index version
>> is
>> created.
>>
>> However in 7.5 the code was changed to this.
>>
>>   if (latestVersion == 0L) {
>> if (commit.getGeneration() != 0) {
>>   // since we won't get the files for an empty index,
>>   // we just clear ours and commit
>>   log.info("New index in Master. Deleting mine...");
>>   RefCounted iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>   try {
>> iw.get().deleteAll();
>>   } finally {
>> iw.decref();
>>   }
>>   assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
>>   if (skipCommitOnMasterVersionZero) {
>> openNewSearcherAndUpdateCommitPoint();
>>   } else {
>> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>> solrCore.getUpdateHandler().commit(new
>> CommitUpdateCommand(req,
>> false));
>>   }
>> }
>>
>> With the removal of the forceReplication check we believe the replica
>> always
>> deletes it's index when it detects that a new version 0 index is created.
>>
>> This is a problem as we can't afford to have active replicas to have 0
>> documents on them in the event of a failure of the primary. Since we can't
>> control the termination on AWS instances this opens up a problem as any
>> primary outage has a chance of jeopardizing the replicas viability.
>>
>> Is there a way to restore this functionality in the current or future
>> releases? We are willing to upgrade to a later version including the
>> latest
>> if it will help resolve this problem.
>>
>> If you suggest we use a load balancer health check to prevent this we
>> already are. However the load balancer type we are using (application)
>> has a
>> feature that allows access through it when all instances under it are
>> failing. This bypasses our health check and still allows the replicas to
>> poll from the primary even when it's not fully loaded. We can't change
>> load
>> balancer types as there are other features that we are taking advantage of
>> and can't change currently.
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Hello, Patrick.
Can commit help you?

On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon <
patrick.borde...@coxautoinc.com> wrote:

> Hi,
>
> We recently upgraded to SOLR 7.5 in AWS, we had previously been running
> SOLR
> 6.5. In our current configuration we have our applications broken into a
> single instance primary environment and a multi-instance replica
> environment
> separated behind a load balancer for each environment.
>
> Until recently we've been able to reload the primary without the replicas
> updating until there was a full index. However when we upgraded to 7.5 we
> started noticing that after terminating and rebuilding a primary instance
> that the associated replicas would all start showing 0 documents in all
> indexes. After some research we believe we've tracked down the issue.
> SOLR-11293.
>
> SOLR-11293 changes
> <
> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
>
>
> This fix changed the way the replication handler checks before updating a
> replica when the primary has an empty index. Whether it's from deleting the
> old index or from terminating the instance.
>
> This is the code as it was in 6.5 replication handler
>
>   if (latestVersion == 0L) {
> if (forceReplication && commit.getGeneration() != 0) {
>   // since we won't get the files for an empty index,
>   // we just clear ours and commit
>   RefCounted iw =
> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>   try {
> iw.get().deleteAll();
>   } finally {
> iw.decref();
>   }
>   SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
> ModifiableSolrParams());
>   solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
> false));
> }
>
>
> Without forced replication the index on the replica won't perform the
> deletaAll operation and will keep the old index until a new index version
> is
> created.
>
> However in 7.5 the code was changed to this.
>
>   if (latestVersion == 0L) {
> if (commit.getGeneration() != 0) {
>   // since we won't get the files for an empty index,
>   // we just clear ours and commit
>   log.info("New index in Master. Deleting mine...");
>   RefCounted iw =
> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>   try {
> iw.get().deleteAll();
>   } finally {
> iw.decref();
>   }
>   assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
>   if (skipCommitOnMasterVersionZero) {
> openNewSearcherAndUpdateCommitPoint();
>   } else {
> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
> ModifiableSolrParams());
> solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
> false));
>   }
> }
>
> With the removal of the forceReplication check we believe the replica
> always
> deletes it's index when it detects that a new version 0 index is created.
>
> This is a problem as we can't afford to have active replicas to have 0
> documents on them in the event of a failure of the primary. Since we can't
> control the termination on AWS instances this opens up a problem as any
> primary outage has a chance of jeopardizing the replicas viability.
>
> Is there a way to restore this functionality in the current or future
> releases? We are willing to upgrade to a later version including the latest
> if it will help resolve this problem.
>
> If you suggest we use a load balancer health check to prevent this we
> already are. However the load balancer type we are using (application) has
> a
> feature that allows access through it when all instances under it are
> failing. This bypasses our health check and still allows the replicas to
> poll from the primary even when it's not fully loaded. We can't change load
> balancer types as there are other features that we are taking advantage of
> and can't change currently.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Anuj Bhargava
Then I'll need to delete records more than 30 days old. I was wonder if I
could add something in the cron script itself.

Regards,

Anuj

On Tue, 25 Jun 2019 at 16:11, Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

>
> ... and =false  if you want to index just new records and keep old
> ones.
> --
> Vadim
>
>
> > -Original Message-
> > From: Jan Høydahl [mailto:jan@cominvent.com]
> > Sent: Tuesday, June 25, 2019 10:48 AM
> > To: solr-user
> > Subject: Re: Solr 8.0.0 Customized Indexing
> >
> > Adjust your SQL (located in data-config.xml) to extract just what you
> need
> > (add a WHERE clause)
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 25. jun. 2019 kl. 07:23 skrev Anuj Bhargava :
> > >
> > > Customized Indexing date specific
> > >
> > > We have a huge database of more than 10 years. How can I index just
> some
> > of
> > > the records - say for last 30 days. One of the fields in the database
> is
> > > *date_upload* which contains the date when the record was uploaded.
> > >
> > > Currently using Cron to index -
> > > curl -q
> > > http://loaclhost:8983/solr/newsdata/dataimport?command=full-
> > import=true=true
> > >> /dev/null 2>&1
>
>
>


RE: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Vadim Ivanov


... and =false  if you want to index just new records and keep old ones.
-- 
Vadim


> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: Tuesday, June 25, 2019 10:48 AM
> To: solr-user
> Subject: Re: Solr 8.0.0 Customized Indexing
> 
> Adjust your SQL (located in data-config.xml) to extract just what you need
> (add a WHERE clause)
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 25. jun. 2019 kl. 07:23 skrev Anuj Bhargava :
> >
> > Customized Indexing date specific
> >
> > We have a huge database of more than 10 years. How can I index just some
> of
> > the records - say for last 30 days. One of the fields in the database is
> > *date_upload* which contains the date when the record was uploaded.
> >
> > Currently using Cron to index -
> > curl -q
> > http://loaclhost:8983/solr/newsdata/dataimport?command=full-
> import=true=true
> >> /dev/null 2>&1




Re: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Jan Høydahl
Adjust your SQL (located in data-config.xml) to extract just what you need (add 
a WHERE clause)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jun. 2019 kl. 07:23 skrev Anuj Bhargava :
> 
> Customized Indexing date specific
> 
> We have a huge database of more than 10 years. How can I index just some of
> the records - say for last 30 days. One of the fields in the database is
> *date_upload* which contains the date when the record was uploaded.
> 
> Currently using Cron to index -
> curl -q
> http://loaclhost:8983/solr/newsdata/dataimport?command=full-import=true=true
>> /dev/null 2>&1



Jdbc driver issue on cloud

2019-06-25 Thread Midas A
Hi ,
i am  using streaming expression getting  following error .

Failed to open JDBC connection

{
  "result-set":{
"docs":[{
"EXCEPTION":"Failed to open JDBC connection to
'jdbc:mysql://localhost/users?user=root=solr'",
"EOF":true,
"RESPONSE_TIME":99}]}}


RE: highlighting not working as expected

2019-06-25 Thread Martin Frank Hansen (MHQ)
Hi again,

I have tested a bit and I was wondering if the highlighter requires a field to 
be of type "text"? Whenever I try highlighting on fields which are of type 
"string" nothing gets returned.

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Jörn Franke 
Sent: 11. juni 2019 08:45
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Could it be a stop word ? What is the exact type definition of those fields? 
Could this word be omitted or with wrong encoding during loading of the 
documents?

> Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am having some difficulties making highlighting work. For some reason the 
> highlighting feature only works on some fields but not on other fields even 
> though these fields are stored.
>
> An example of a request looks like this: 
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
>
> It simply returns an empty set, for all documents even though I can see 
> several documents which have “Sagstitel” containing the word “rotte” 
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
>
>  
>  
>default="true"
>  class="solr.highlight.GapFragmenter">
>
>  100
>
>  
>
>  
>class="solr.highlight.RegexFragmenter">
>
>  
>  70
>  
>  0.5
>  
>  [-\w ,/\n\]{20,200}
>
>  
>
>  
>   default="true"
> class="solr.highlight.HtmlFormatter">
>
>  b
>  /b
>
>  
>
>  
> class="solr.highlight.HtmlEncoder" />
>
>  
> class="solr.highlight.SimpleFragListBuilder"/>
>
>  
> class="solr.highlight.SingleFragListBuilder"/>
>
>  
>default="true"
>   class="solr.highlight.WeightedFragListBuilder"/>
>
>  
>  default="true"
>class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>
>  
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>  
>
>  
>
> default="true"
>   class="solr.highlight.SimpleBoundaryScanner">
>
>  10
>  .,!? 
>
>  
>
> class="solr.highlight.BreakIteratorBoundaryScanner">
>
>  
>  WORD
>  
>  
>  da
>
>  
>
>  
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.