<- general commentary not specific to the solr case -> This concept of being able to have nodes share information about 'which partition' they should be responsible for is a generically useful and very powerful thing. We need to support it. It isn't immediately obvious to me how best to do this as a generic and useful thing but a controller service on the NCM could potentially assign 'partitions' to the nodes. Zookeeper could be an important part. I think we need to tackle the HA NCM construct we talked about months ago before we can do this one nicely.
On Wed, Sep 2, 2015 at 7:47 PM, Srikanth <srikanth...@gmail.com> wrote: > Bryan, > > <Bryan> --> "I'm still a little bit unclear about the use case for querying > the shards individually... is the reason to do this because of a > performance/failover concern?" > <Srikanth> --> Reason to do this is to achieve better performance with the > convenience of automatic failover. > In the current mode, we do get very good failover offered by Solr. Failover > is seamless. > At the same time, we are not getting best performance. I guess its clear to > us why having each NiFi process query each shard with distrib=false will > give better performance. > > Now, question is how do we achieve this. Making user configure one NiFi > processor for each Solr node is one way to go. > I'm afraid this will make failover a tricky process. May even need human > intervention. > > Another approach is to have cluster master in NiFi talk to ZK and decide > which shards to query. Divide these shards among slave nodes. > My understanding is NiFi cluster master is not indented for such purpose. > I'm not sure if this even possible. > > Hope I'm a bit more clear now. > > Srikanth > > On Wed, Sep 2, 2015 at 5:58 PM, Bryan Bende <bbe...@gmail.com> wrote: >> >> Srikanth, >> >> Sorry you hadn't seen the reply, but hopefully you are subscribed to both >> the dev and users list now :) >> >> I'm still a little bit unclear about the use case for querying the shards >> individually... is the reason to do this because of a performance/failover >> concern? or is it something specific about how the data is shared? >> >> Lets say you have your Solr cluster with 10 shards, each on their own node >> for simplicity, and then your ZooKeeper cluster. >> Then you also have a NiFi cluster with 3 nodes each with their own nifi >> instance, the first node designated as the primary, and a fourth node as the >> cluster manager. >> >> Now if you want to extract data from your Solr cluster, you would do the >> following... >> - Drag GetSolr on to the graph >> - Set type to "cloud" >> - Set the Solr Location to the ZK hosts string >> - Set the scheduling to "Primary Node" >> >> When you start the processor it is now only running on the first NiFi >> node, and it it is extracting data from all your shards at the same time. >> If a Solr shard/node fails this would be handled for us by the SolrJ >> SolrCloudClient which is using ZooKeeper to know about the state of things, >> and would choose a healthy replica of the shard if it existed. >> If the primary NiFi node failed, you would manually elect a new primary >> node and the extraction would resume on that node (this will get better in >> the future). >> >> I think if we expose the distrib=false it would allow you to query shards >> individually, either by having a nifi instance with a GetSolr processor per >> shard, or several mini-NiFis each with a single GetSolr, but >> I'm not sure if we could achieve the dynamic assignment you are thinking >> of. >> >> Let me know if I'm not making sense, happy to keep discussing and trying >> to figure out what else can be done. >> >> -Bryan >> >> On Wed, Sep 2, 2015 at 4:38 PM, Srikanth <srikanth...@gmail.com> wrote: >>> >>> >>> Bryan, >>> >>> That is correct, having the ability to query nodes with "distrib=false" >>> is what I was talking about. >>> >>> Instead of user having to configure each Solr node in a separate NiFi >>> processor, can we provide a single configuration?? >>> It would be great if we can take just Zookeeper(ZK) host as input from >>> user and >>> i) Determine all nodes for a container from ZK >>> ii) Let each NiFi processor takes ownership of querying a node with >>> "distrib=false" >>> >>> From what I understand, NiFi slaves in cluster can't talk to each other. >>> Will it be possible to do the ZK query part in cluster master and have >>> individual Solr nodes propagated to each slave? >>> I don't know how we can achieve this in NiFi, if at all. >>> >>> This will make Solr interface to NiFi much simpler. User needs to provide >>> just ZK. >>> We'll be able to take care rest. Including failing over to an alternate >>> Solr node with current one fails. >>> >>> Let me know your thoughts. >>> >>> Rgds, >>> Srikanth >>> >>> P.S : I had subscribed only to digest and didn't receive your original >>> reply. Had to pull this up from mail archive. >>> Only Dev list is in Nabble!! >>> >>> >>> *************************************************************************************************** >>> >>> Hi Srikanth, >>> >>> You are correct that in a NiFi cluster the intent would be to schedule >>> GetSolr on the primary node only (on the scheduling tab) so that only one >>> node in your cluster was extracting data. >>> >>> GetSolr determines which SolrJ client to use based on the "Solr Type" >>> property, so if you select "Cloud" it will use SolrCloudClient. It would >>> send the query to one node based on the cluster state from ZooKeeper, and >>> then that Solr node performs the distributed query. >>> >>> Did you have a specific use case where you wanted to query each shard >>> individually? >>> >>> I think it would be straight forward to expose something on GetSolr that >>> would set "distrib=false" on the query so that Solr would not execute a >>> distributed query. You would then most likely create separate instances >>> of >>> GetSolr and configure them as Standard type pointing at the respective >>> shards. Let us know if that is something you are interested in. >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> On Sun, Aug 30, 2015 at 7:32 PM, Srikanth <srikanth...@gmail.com> wrote: >>> >>> > Hello, >>> > >>> > I started to explore NiFi project a few days back. I'm still trying it >>> > out. >>> > >>> > I have a few basic question on GetSolr. >>> > >>> > Should GetSolr be run as an Isolated Processor? >>> > >>> > If I have SolrCloud with 4 shards/nodes and NiFi cluster with 4 nodes, >>> > will GetSolr be able to query each shard from one specific NiFi node? >>> > I'm >>> > guessing it doesn't work that way. >>> > >>> > >>> > Thanks, >>> > Srikanth >>> > >>> > >> >> >