Re: Query regarding Solr Cloud Setup

2019-09-06 Thread Erick Erickson
Ok, you can set it as a sysvar when starting solr. Or you can change your
solrconfig.xml to either use classic schema (schema.xml) or take out the
add-unknown-fields... from the update processor chain. You can also set a
cluster property IIRC. Better to use one of the supported options...

On Fri, Sep 6, 2019, 05:22 Porritt, Ian  wrote:

> Hi Jörn/Erick/Shawn thanks for your responses.
>
> @Jörn - much apprecaited for the heads up on Kerberos authentication its
> something we havent really considered at the moment, more production this
> may well be the case. With regards to the Solr Nodes 3 is something we are
> looking as a minimum, when adding a new Solr Node to the cluster will
> settings/configuration be applied by Zookeeper on the new node or is there
> manual intervention?
> @Erick - With regards to the core.properties, on standard Solr the
> update.autoCreateFields=false is within the core.properites file however
> for Cloud I have it added within Solrconfig.xml which gets uploaded to
> Zookeeper, apprecaite standard and cloud may work entirely different just
> wanted to ensure it’s the correct way of doing it.
> @Shawn - Will try the creation of the lib directory in Solr Home to see if
> it gets picked up and having 5 Zookeepers would more than satisy high
> availability.
>
>
> Regards
> Ian
>
> -Original Message-
> From: Jörn Franke 
>
> If you have a properly secured cluster eg with Kerberos then you should
> not update files in ZK directly. Use the corresponding Solr REST interfaces
> then you also less likely to mess something up.
>
> If you want to have HA you should have at least 3 Solr nodes and replicate
> the collection to all three of them (more is not needed from a HA point of
> view). This would also allow you upgrades to the cluster without downtime.
>
> -Original Message-
> From: erickerick...@gmail.com>
> Having custom core.properties files is “fraught”. First of all, that file
> can be re-written. Second, the collections ADDREPLICA command will create a
> new core.properties file. Third, any mistakes you make when hand-editing
> the file can have grave consequences.
>
> What change exactly do you want to make to core.properties and why?
>
> Trying to reproduce “what a colleague has done on standalone” is not
> something I’d recommend, SolrCloud is a different beast. Reproducing the
> _behavior_ is another thing, so what is the behavior you want in SolrCloud
> that causes you to want to customize core.properties?
>
> Best,
> Erick
>
> -Original Message-
> From: Shawn Heisey 
>
> I cannot tell what you are asking here.  The core.properties file lives
> on the disk, not in ZK.
>
> I was under the impression that .jar files could not be loaded into ZK
> and used in a core config.  Documentation saying otherwise was recently
> pointed out to me on the list, but I remain skeptical that this actually
> works, and I have not tried to implement it myself.
>
> The best way to handle custom jar loading is to create a "lib" directory
> under the solr home, and place all jars there.  Solr will automatically
> load them all before any cores are started, and no config commands of
> any kind will be needed to make it happen.
>
> > Also from a high availability aspect, if I effectivly lost 2 of the Solr
> > Servers due to an outage will the system still work as expected? Would I
> > expect any data loss?
>
> If all three Solr servers have a complete copy of all your indexes, then
> you should remain fully operational if two of those Solr servers go down.
>
> Note that if you have three ZK servers and you lose two, that means that
> you have lost zookeeper quorum, and in that situation, SolrCloud will
> transition to read only -- you will not be able to change any index in
> the cloud.  This is how ZK is designed and it cannot be changed.  If you
> want a ZK deployment to survive the loss of two servers, you must have
> at least five total ZK servers, so more than 50 percent of the total
> survives.
>
> Thanks,
> Shawn
>


Re: Solr 7.7.2 Autoscaling policy - Poor performance

2019-09-06 Thread Noble Paul
It can't be considered a bug , it's just that there are too many
calculations involved as there are a very large no:of nodes. Any further
spped up would require a change in the way it's calculated

On Thu, Sep 5, 2019, 1:30 AM Andrew Kettmann 
wrote:

>
> > there are known perf issues in computing very large clusters
>
> Is there any documentation/open tickets on this that you have handy? If
> that is the case, then we might be back to looking at separate Znodes.
> Right now if we provide a nodeset on collection creation, it is creating
> them quickly. I don't want to make many changes as this is part of our
> production at this time.
>
>
>
>
>
> From: Noble Paul 
>
> Sent: Wednesday, September 4, 2019 12:14 AM
>
> To: solr-user@lucene.apache.org 
>
> Subject: Re: Solr 7.7.2 Autoscaling policy - Poor performance
>
>
>
>
> there are known perf issues in computing very large clusters
>
>
>
> give it a try with the following rules
>
>
>
> "FOO_CUSTOMER":[
>
>   {
>
> "replica":"0",
>
> "sysprop.HELM_CHART":"!FOO_CUSTOMER",
>
> "strict":"true"},
>
>   {
>
> "replica":"<2",
>
> "node":"#ANY",
>
> "strict":"false"}]
>
>
>
> On Wed, Sep 4, 2019 at 1:49 AM Andrew Kettmann
>
>  wrote:
>
> >
>
> > Currently our 7.7.2 cluster has ~600 hosts and each collection is using
> an autoscaling policy based on system property. Our goal is a single core
> per host (container, running on K8S). However as we have rolled more
> containers/collections into the cluster
>  any creation/move actions are taking a huge amount of time. In fact we
> generally hit the 180 second timeout if we don't schedule it as async.
> Though the action gets completed anyway. Looking at the code, it looks like
> for each core it is considering the entire
>  cluster.
>
> >
>
> > Right now our autoscaling policies look like this, note we are feeding a
> sysprop on startup for each collection to map to specific containers:
>
> >
>
> > "FOO_CUSTOMER":[
>
> >   {
>
> > "replica":"#ALL",
>
> > "sysprop.HELM_CHART":"FOO_CUSTOMER",
>
> > "strict":"true"},
>
> >   {
>
> > "replica":"<2",
>
> > "node":"#ANY",
>
> > "strict":"false"}]
>
> >
>
> > Does name based filtering allow wildcards ? Also would that likely fix
> the issue of the time it takes for Solr to decide where cores can go? Or
> any other suggestions for making this more efficient on the Solr overseer?
> We do have dedicated overseer nodes,
>  but the leader maxes out CPU for awhile while it is thinking about this.
>
> >
>
> > We are considering putting each collection into its own zookeeper
> znode/chroot if we can't support this many nodes per overseer. I would like
> to avoid that if possible, but also creating a collection in sub 10 minutes
> would be neat too.
>
> >
>
> > I appreciate any input/suggestions anyone has!
>
> >
>
> > [https://storage.googleapis.com/e24-email-images/e24logonotag.png]<
> https://www.evolve24.com> Andrew Kettmann
>
> > DevOps Engineer
>
> > P: 1.314.596.2836
>
> > [LinkedIn] [Twitter] <
> https://twitter.com/evolve24>  [Instagram] <
> https://www.instagram.com/evolve_24>
>
> >
>
> > evolve24 Confidential & Proprietary Statement: This email and any
> attachments are confidential and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. It
> is intended for the use of the recipients. If you
>  are not the intended recipient, or believe that you have received this
> communication in error, please do not read, print, copy, retransmit,
> disseminate, or otherwise use the information. Please delete this email and
> attachments, without reading, printing,
>  copying, forwarding or saving them, and notify the Sender immediately by
> reply email. No confidentiality or privilege is waived or lost by any
> transmission in error.
>
>
>
>
>
>
>
> --
>
> -
>
> Noble Paul
>
>


Re: SolrClient from inside processAdd function

2019-09-06 Thread Arnold Bronley
Hi Markus,

"Depending on cloudMode we create new SolrClient instances based on these
classes.   "

But I still do not see SolrClient creation anywhere in your code snippet.
Am I missing something? I tried the solution with system properties and it
works but I would like to avoid that.

On Thu, Sep 5, 2019 at 6:20 PM Markus Jelsma 
wrote:

> Hello Arnold,
>
> In the Factory's inform() method you receive a SolrCore reference. Using
> this you can get the CloudDescriptor and the ZkController references. These
> provide access to what you need to open a connection for SolrClient.
>
> Our plugins usually work in cloud and non-cloud environments, so we
> initialize different things for each situation. Like this abstracted in
> some CloudUtils thing:
>
> cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
> zk = core.getCoreContainer().getZkController(); // this is the
> ZkController ref
> coreName = core.getCoreDescriptor().getName();
>
> // Are we in cloud mode?
> if (zk != null) {
>   collectionName = core.getCoreDescriptor().getCollectionName();
>   shardId = cloudDescriptor.getShardId();
> } else {
>   collectionName = null;
>   shardId = null;
> }
>
> Depending on cloudMode we create new SolrClient instances based on these
> classes.
>
> Check the apidocs and you'll quickly see what you need.
>
> We use these api's to get what we need. But you can also find these things
> if you check the Java system properties, which is easier. We use the api's
> to read the data because if api's change, we get a compile error. If the
> system properties change, we don't. So the system properties is easier, but
> the api's are safer. Although a unit tests should guard against that as
> well.
>
> Regards,
> Markus
>
> ps, on this list there is normally no need to create a new thread for an
> existing one, even if you are eagerly waiting for a reply. It might take
> some patience though.
>
> -Original message-
> > From:Arnold Bronley 
> > Sent: Thursday 5th September 2019 18:44
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrClient from inside processAdd function
> >
> > Hi Markus,
> >
> > Is there any way to get the information about the current Solr endpoint
> > from within the custom URP?
> >
> > On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma  >
> > wrote:
> >
> > > Hello Arnold,
> > >
> > > Yes, we do this too for several cases.
> > >
> > > You can create the SolrClient in the Factory's inform() method, and
> pass
> > > is to the URP when it is created. You must implement SolrCoreAware and
> > > close the client when the core closes as well. Use a CloseHook for
> this.
> > >
> > > If you do not close the client, it will cause trouble if you run unit
> > > tests, and most certainly when you regularly reload cores.
> > >
> > > Regards,
> > > Markus
> > >
> > >
> > >
> > > -Original message-
> > > > From:Arnold Bronley 
> > > > Sent: Wednesday 4th September 2019 20:10
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: SolrClient from inside processAdd function
> > > >
> > > > I need to search some other collection inside processAdd function and
> > > > append that information to the indexing request.
> > > >
> > > > On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson <
> erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > This really sounds like an XY problem. What do you need the
> SolrClient
> > > > > _for_? I suspect there’s an easier way to do this…..
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley <
> arnoldbron...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is there a way to create SolrClient from inside processAdd
> function
> > > for
> > > > > > custom update processor for the same Solr on which it is
> executing?
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: string field max size

2019-09-06 Thread Vincenzo D'Amore
Thanks Erick for this last confirmation. I've at the end I've used the
standard "text_ws":


  

  


And the field



On Fri, Sep 6, 2019 at 2:54 AM Erick Erickson 
wrote:

> bq. What I do not understand is what happens to the Analyzers, Tokenizers,
> and
> Filters in the indexing chain
>
> They are irrelevant. The analysis chain is only executed when
> indexed=true.
>
> Best,
> Erick
>
> > On Sep 5, 2019, at 9:03 AM, Vincenzo D'Amore  wrote:
> >
> > What I do not understand is what happens to the Analyzers, Tokenizers,
> and
> > Filters in the indexing chain
>
>

-- 
Vincenzo D'Amore


RE: Query regarding Solr Cloud Setup

2019-09-06 Thread Porritt, Ian
Hi Jörn/Erick/Shawn thanks for your responses.

@Jörn - much apprecaited for the heads up on Kerberos authentication its 
something we havent really considered at the moment, more production this may 
well be the case. With regards to the Solr Nodes 3 is something we are looking 
as a minimum, when adding a new Solr Node to the cluster will 
settings/configuration be applied by Zookeeper on the new node or is there 
manual intervention?
@Erick - With regards to the core.properties, on standard Solr the 
update.autoCreateFields=false is within the core.properites file however for 
Cloud I have it added within Solrconfig.xml which gets uploaded to Zookeeper, 
apprecaite standard and cloud may work entirely different just wanted to ensure 
it’s the correct way of doing it.
@Shawn - Will try the creation of the lib directory in Solr Home to see if it 
gets picked up and having 5 Zookeepers would more than satisy high availability.


Regards
Ian 

-Original Message-
From: Jörn Franke  

If you have a properly secured cluster eg with Kerberos then you should not 
update files in ZK directly. Use the corresponding Solr REST interfaces then 
you also less likely to mess something up. 

If you want to have HA you should have at least 3 Solr nodes and replicate the 
collection to all three of them (more is not needed from a HA point of view). 
This would also allow you upgrades to the cluster without downtime.

-Original Message-
From: erickerick...@gmail.com>
Having custom core.properties files is “fraught”. First of all, that file can 
be re-written. Second, the collections ADDREPLICA command will create a new 
core.properties file. Third, any mistakes you make when hand-editing the file 
can have grave consequences.

What change exactly do you want to make to core.properties and why?

Trying to reproduce “what a colleague has done on standalone” is not something 
I’d recommend, SolrCloud is a different beast. Reproducing the _behavior_ is 
another thing, so what is the behavior you want in SolrCloud that causes you to 
want to customize core.properties?

Best,
Erick  

-Original Message-
From: Shawn Heisey 

I cannot tell what you are asking here.  The core.properties file lives 
on the disk, not in ZK.

I was under the impression that .jar files could not be loaded into ZK 
and used in a core config.  Documentation saying otherwise was recently 
pointed out to me on the list, but I remain skeptical that this actually 
works, and I have not tried to implement it myself.

The best way to handle custom jar loading is to create a "lib" directory 
under the solr home, and place all jars there.  Solr will automatically 
load them all before any cores are started, and no config commands of 
any kind will be needed to make it happen.

> Also from a high availability aspect, if I effectivly lost 2 of the Solr 
> Servers due to an outage will the system still work as expected? Would I 
> expect any data loss?

If all three Solr servers have a complete copy of all your indexes, then 
you should remain fully operational if two of those Solr servers go down.

Note that if you have three ZK servers and you lose two, that means that 
you have lost zookeeper quorum, and in that situation, SolrCloud will 
transition to read only -- you will not be able to change any index in 
the cloud.  This is how ZK is designed and it cannot be changed.  If you 
want a ZK deployment to survive the loss of two servers, you must have 
at least five total ZK servers, so more than 50 percent of the total 
survives.

Thanks,
Shawn


smime.p7s
Description: S/MIME cryptographic signature


Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

2019-09-06 Thread Doss
Jorn we have add additional zookeeper nodes, now it is a 3 node quorum.

Does all nodes in a quorum sends heart beat request to all cores and shards
?

If zookeeper node 1 unable to communicate with a shard and it declares that
shard as dead, now this state can be changed by zookeeper node 2 if it got
a successful response from that particular shard?

On Thu, Sep 5, 2019 at 4:53 PM Jörn Franke  wrote:

> 1 Node zookeeper ensemble does not sound very healthy
>
> > Am 05.09.2019 um 13:07 schrieb Doss :
> >
> > Hi,
> >
> > We are using 3 node SOLR (7.0.1) cloud setup 1 node zookeeper ensemble.
> > Each system has 16CPUs, 90GB RAM (14GB HEAP), 130 cores (3 replicas NRT)
> > with index size ranging from 700MB to 20GB.
> >
> > autoCommit - 10 minutes once
> > softCommit - 30 Sec Once
> >
> > At peak time if a shard goes to recovery mode many other shards also
> going
> > to recovery mode in few minutes, which creates huge load (200+ load
> > average) and SOLR becomes non responsive. To fix this we are restarting
> the
> > node, again leader tries to correct the index by initiating replication,
> > which causes load again, and the node goes to non responsive state.
> >
> > As soon as a node starts the replication process initiated for all 130
> > cores, is there any we control it, like one after the other?
> >
> > Thanks,
> > Doss.
>


Re: Suggestion Needed: Exclude documents that are already served / viewed by a customer

2019-09-06 Thread Doss
Jorn Thanks for the input, I learned something new today!
https://cwiki.apache.org/confluence/display/solr/BloomIndexComponent this
works per segment level, but our requirement is per document level.

Thanks,
Mohandoss.

On Fri, Sep 6, 2019 at 11:41 AM Jörn Franke  wrote:

> I am not 100% sure if Solr has something out of the box, but you could
> implement a bloom filter https://en.wikipedia.org/wiki/Bloom_filter and
> store it in Solr. It is a probabilistic data structure, which is not
> growing, but can achieve your use case.
> However it has a caveat: it can, for example in your case, only say for
> sure if a person A has NOT visited person B. If you want to know if Person
> A has visited person B then there might be (with a known probability) false
> positives.
>
> Nevertheless, it still seems to address your use case as you want to show
> only not visited profiles.
>
> > Am 06.09.2019 um 07:43 schrieb Doss :
> >
> > Dear Experts,
> >
> > For a matchmaking portal, we have one requirement where in, if a customer
> > viewed complete details of a bride or groom then we have to exclude that
> > profile id from further search results. Currently, along with other
> details
> > we are storing the viewed profile ids in a field (multivalued field)
> > against that bride or groom's details.
> >
> > Eg., if A viewed B, then in B's document under the field saw_me we will
> add
> > A's id
> >
> > while searching, lets say, the currently searching members id is 123456
> > then we will fire a query like
> >
> > fq=-saw_me:(123456)
> >
> > Problem #1: The saw_me field value is growing like anything.
> > Problem #2: Removal of ids which are deleted from the base. Right now we
> > are doing this job as follows
> >   Query #1: fq=saw_me:(123456)=DocId //Get all document ids
> > which has the deleted id as part of saw_me field.
> >   Query #2: {"DociId":"234567","saw_me":{"remove":"123456"}
> //loop
> > through the results got through the 1st query and fire the update query
> one
> > by one
> >
> > We feel that this method of handling is not that optimum, so we need
> expert
> > advice. Please guide.
>


solr-user-unsubscribe

2019-09-06 Thread Charton, Andre
 



Query regarding Public listing on Solr Homepage and Public Servers.

2019-09-06 Thread Paras Lehana
Hi Team, this is a general FAQ about website listing on Solr:

As seen on the *Apache Solr Homepage *, it
mentions:

Want to see more? Want to see your app or website here? Visit Solr's Public
> Servers  listing page to
> learn more.


But there's yet no form or listing procedure mentioned on the *Public
Servers page
*.
Although, there are Solr Mailing Lists

and Contact
Options  which I
have used now.

At IndiaMART Intermesh LTD., we have been using the power of Solr to query
products and show autosuggestions from over *170 million *search terms and
product data! Always using industry practices and new features of
Solr/Lucene, at Auto-Suggest, we have been committed to make our product
the fastest and most contextual one on the internet. We have harnessed
(some on pilot basis) the features like boosting, EdgeNGrams and LTR to
understand user intent far better than popular sites than Amazon and
Flipkart.

Therefore, we would be glad to get listed (ready for quality assurance too)
on Solr homepage or at least public servers because we believe we have done
an extensive work on Solr querying and would want to serve as an example of
how Solr can be so powerful to new solr users.

*Please guide us on the steps for the listing. *



-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Suggestion Needed: Exclude documents that are already served / viewed by a customer

2019-09-06 Thread Jörn Franke
I am not 100% sure if Solr has something out of the box, but you could 
implement a bloom filter https://en.wikipedia.org/wiki/Bloom_filter and store 
it in Solr. It is a probabilistic data structure, which is not growing, but can 
achieve your use case. 
However it has a caveat: it can, for example in your case, only say for sure if 
a person A has NOT visited person B. If you want to know if Person A has 
visited person B then there might be (with a known probability) false 
positives. 

Nevertheless, it still seems to address your use case as you want to show only 
not visited profiles.

> Am 06.09.2019 um 07:43 schrieb Doss :
> 
> Dear Experts,
> 
> For a matchmaking portal, we have one requirement where in, if a customer
> viewed complete details of a bride or groom then we have to exclude that
> profile id from further search results. Currently, along with other details
> we are storing the viewed profile ids in a field (multivalued field)
> against that bride or groom's details.
> 
> Eg., if A viewed B, then in B's document under the field saw_me we will add
> A's id
> 
> while searching, lets say, the currently searching members id is 123456
> then we will fire a query like
> 
> fq=-saw_me:(123456)
> 
> Problem #1: The saw_me field value is growing like anything.
> Problem #2: Removal of ids which are deleted from the base. Right now we
> are doing this job as follows
>   Query #1: fq=saw_me:(123456)=DocId //Get all document ids
> which has the deleted id as part of saw_me field.
>   Query #2: {"DociId":"234567","saw_me":{"remove":"123456"} //loop
> through the results got through the 1st query and fire the update query one
> by one
> 
> We feel that this method of handling is not that optimum, so we need expert
> advice. Please guide.