Re: Solr Cloud support

Bryan Bende Wed, 02 Sep 2015 15:00:03 -0700

Srikanth,

Sorry you hadn't seen the reply, but hopefully you are subscribed to both
the dev and users list now :)

I'm still a little bit unclear about the use case for querying the shards
individually... is the reason to do this because of a performance/failover
concern? or is it something specific about how the data is shared?

Lets say you have your Solr cluster with 10 shards, each on their own node
for simplicity, and then your ZooKeeper cluster.
Then you also have a NiFi cluster with 3 nodes each with their own nifi
instance, the first node designated as the primary, and a fourth node as
the cluster manager.

Now if you want to extract data from your Solr cluster, you would do the
following...
- Drag GetSolr on to the graph
- Set type to "cloud"
- Set the Solr Location to the ZK hosts string
- Set the scheduling to "Primary Node"

When you start the processor it is now only running on the first NiFi node,
and it it is extracting data from all your shards at the same time.
If a Solr shard/node fails this would be handled for us by the SolrJ
SolrCloudClient which is using ZooKeeper to know about the state of things,
and would choose a healthy replica of the shard if it existed.
If the primary NiFi node failed, you would manually elect a new primary
node and the extraction would resume on that node (this will get better in
the future).

I think if we expose the distrib=false it would allow you to query shards
individually, either by having a nifi instance with a GetSolr processor per
shard, or several mini-NiFis each with a single GetSolr, but
I'm not sure if we could achieve the dynamic assignment you are thinking
of.

Let me know if I'm not making sense, happy to keep discussing and trying to
figure out what else can be done.

-Bryan

On Wed, Sep 2, 2015 at 4:38 PM, Srikanth <[email protected]> wrote:

>
> Bryan,
>
> That is correct, having the ability to query nodes with "distrib=false" is
> what I was talking about.
>
> Instead of user having to configure each Solr node in a separate NiFi
> processor, can we provide a single configuration??
> It would be great if we can take just Zookeeper(ZK) host as input from
> user and
>   i) Determine all nodes for a container from ZK
>   ii) Let each NiFi processor takes ownership of querying a node with
> "distrib=false"
>
> From what I understand, NiFi slaves in cluster can't talk to each other.
> Will it be possible to do the ZK query part in cluster master and have
> individual Solr nodes propagated to each slave?
> I don't know how we can achieve this in NiFi, if at all.
>
> This will make Solr interface to NiFi much simpler. User needs to provide
> just ZK.
> We'll be able to take care rest. Including failing over to an alternate
> Solr node with current one fails.
>
> Let me know your thoughts.
>
> Rgds,
> Srikanth
>
> P.S : I had subscribed only to digest and didn't receive your original
> reply. Had to pull this up from mail archive.
> Only Dev list is in Nabble!!
>
>
> ***************************************************************************************************
>
> Hi Srikanth,
>
> You are correct that in a NiFi cluster the intent would be to schedule
> GetSolr on the primary node only (on the scheduling tab) so that only one
> node in your cluster was extracting data.
>
> GetSolr determines which SolrJ client to use based on the "Solr Type"
> property, so if you select "Cloud" it will use SolrCloudClient. It would
> send the query to one node based on the cluster state from ZooKeeper, and
> then that Solr node performs the distributed query.
>
> Did you have a specific use case where you wanted to query each shard
> individually?
>
> I think it would be straight forward to expose something on GetSolr that
> would set "distrib=false" on the query so that Solr would not execute a
> distributed query. You would then most likely create separate instances of
> GetSolr and configure them as Standard type pointing at the respective
> shards. Let us know if that is something you are interested in.
>
> Thanks,
>
> Bryan
>
>
> On Sun, Aug 30, 2015 at 7:32 PM, Srikanth <[email protected]> wrote:
>
> > Hello,
> >
> > I started to explore NiFi project a few days back. I'm still trying it out.
> >
> > I have a few basic question on GetSolr.
> >
> > Should GetSolr be run as an Isolated Processor?
> >
> > If I have SolrCloud with 4 shards/nodes and NiFi cluster with 4 nodes,
> > will GetSolr be able to query each shard from one specific NiFi node? I'm
> > guessing it doesn't work that way.
> >
> >
> > Thanks,
> > Srikanth
> >
> >
>
>

Re: Solr Cloud support

Reply via email to