Re: Segregating Foreman from leaf-worker fleet

2018-11-13 Thread Paul Rogers
Hi Lokendra,

Your usecase is a typical old school sharded DB app. The design itself is fine. 
However, as Tim noted, Drill is not designed for this case. Still, perhaps 
Drill could be extended.


As Tim suggested, Drill assumes any Drillbit can operate in any role. So, in 
your setup, you would run Drillbits on all your shard storage-nodes. Drill 
would schedule reads (more on this shortly) on those nodes. Then, Drill would 
do shuffles to other nodes to perform query operations.

In this model, one of your nodes would act as Foreman for a user. ZooKeeper 
(ZK) tracks all nodes, each user randomly chooses a Drillbit to act as Foreman, 
which means Forman load is shared across all your Drillbits.

Suppose you wanted to change this. You'd have to extend the way that Drillbits 
register themselves in ZK. A Drillbit, when it starts, would be assigned one or 
more roles which it would advertise in ZK. The distribution mechanisms in the 
Planner would have to be aware of scan-only nodes, compute-only nodes, and 
Foreman-only nodes.

Unless you plan to put heavy load on your scan nodes, it is not clear what 
benefit you'd gain from forcing Drill into a particular distribution model.

Perhaps you can start by running Drill on just your storage nodes, then noting 
performance.

One final point. Drill today knows to use HDFS to work out data locality for 
scans. You'd need to modify this to plug in your own data distribution 
mechanism so that Drill knows which shards to scan on which nodes. I don't 
believe Drill has a plugin-API for this, but I could be wrong. If not, this 
would be a great opportunity to define such an API.

Such an API might be helpful for other storage plugins such as Kafka so that 
scans are done on nodes with data.

Thanks,
- Paul

 

On Tuesday, November 13, 2018, 5:32:32 PM PST, Lokendra Singh Panwar 
 wrote:  
 
 Hi Tim,

Thanks for the reply.

My usecase is following:

  -  My main DB table is huge so it is sharded amongs multiple
  storage-nodes.
  -  Each stroage-node is storing the assigned shard in a local relational
  db engine.

I was planning to use Drill as a distributed query engine that can
scatter-gather data from these storage-nodes.

So, my overall plan for such architecture, as per my limited understanding
of Drill so far, is:

  - Have a DrilBit instance run on each storage-node, and this fleet will
  act as a leaf-worker fleet.
  - (I will write a Storage Plugin to transform data from my local
      relational DB engine to Drill record fromat)
  - Maintain another fleet that will serve as Foreman and Intermeidate
  query workers, still part of the same Drill cluster.
  -  The reason I intended to have the leaf-query fleet (storage-nodes)
  segregated from Foreman/Intermediate workers (working on major fragments
  is):
      -    storage-nodes (acting as leaf-workers) are premium commodity in
      my cluster, involved in data ingestion as well as query traffic
servers as
      leaf-worker.
      -    So, I do not intend to overload them further with intermediate
      query fragment processing and aggregation that Foreman and Intermeidate
      pool of workers are involved in.

Does the above make sense?

Thanks,
Lokendra



On Tue, Nov 13, 2018 at 4:17 PM Timothy Farkas  wrote:

> Hi Lokendra,
>
> All Drillbits can function as a foreman if a query is sent to them, and all
> drillbits are considered worker nodes. This ingrained deeply into the
> design of Drill and it was done with the intention of making Drill
> symmetric. Symmetric here means that each Drillbit is identical to all the
> others. Making this change would be a significant design change.
>
> Why are you interested in running Drill in this way? Do you have a specific
> use case in mind?
>
> Thanks,
> Tim
>
> On Tue, Nov 13, 2018 at 3:37 PM Lokendra Singh Panwar <
> lokendra...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is it possible to configure Drill such that the Foreman and leaf-worker
> > fleets are separate fleets of nodes?
> > Or if this needs changing the source of Drill, any pointers are
> appreciated
> > too.
> >
> > Thanks,
> > Lokendra
> >
>
  

Re: Segregating Foreman from leaf-worker fleet

2018-11-13 Thread Lokendra Singh Panwar
Hi Tim,

Thanks for the reply.

My usecase is following:

   -  My main DB table is huge so it is sharded amongs multiple
   storage-nodes.
   -  Each stroage-node is storing the assigned shard in a local relational
   db engine.

I was planning to use Drill as a distributed query engine that can
scatter-gather data from these storage-nodes.

So, my overall plan for such architecture, as per my limited understanding
of Drill so far, is:

   - Have a DrilBit instance run on each storage-node, and this fleet will
   act as a leaf-worker fleet.
   - (I will write a Storage Plugin to transform data from my local
  relational DB engine to Drill record fromat)
   - Maintain another fleet that will serve as Foreman and Intermeidate
   query workers, still part of the same Drill cluster.
   -  The reason I intended to have the leaf-query fleet (storage-nodes)
   segregated from Foreman/Intermediate workers (working on major fragments
   is):
  - storage-nodes (acting as leaf-workers) are premium commodity in
  my cluster, involved in data ingestion as well as query traffic
servers as
  leaf-worker.
  - So, I do not intend to overload them further with intermediate
  query fragment processing and aggregation that Foreman and Intermeidate
  pool of workers are involved in.

Does the above make sense?

Thanks,
Lokendra



On Tue, Nov 13, 2018 at 4:17 PM Timothy Farkas  wrote:

> Hi Lokendra,
>
> All Drillbits can function as a foreman if a query is sent to them, and all
> drillbits are considered worker nodes. This ingrained deeply into the
> design of Drill and it was done with the intention of making Drill
> symmetric. Symmetric here means that each Drillbit is identical to all the
> others. Making this change would be a significant design change.
>
> Why are you interested in running Drill in this way? Do you have a specific
> use case in mind?
>
> Thanks,
> Tim
>
> On Tue, Nov 13, 2018 at 3:37 PM Lokendra Singh Panwar <
> lokendra...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is it possible to configure Drill such that the Foreman and leaf-worker
> > fleets are separate fleets of nodes?
> > Or if this needs changing the source of Drill, any pointers are
> appreciated
> > too.
> >
> > Thanks,
> > Lokendra
> >
>


Re: Segregating Foreman from leaf-worker fleet

2018-11-13 Thread Timothy Farkas
Hi Lokendra,

All Drillbits can function as a foreman if a query is sent to them, and all
drillbits are considered worker nodes. This ingrained deeply into the
design of Drill and it was done with the intention of making Drill
symmetric. Symmetric here means that each Drillbit is identical to all the
others. Making this change would be a significant design change.

Why are you interested in running Drill in this way? Do you have a specific
use case in mind?

Thanks,
Tim

On Tue, Nov 13, 2018 at 3:37 PM Lokendra Singh Panwar 
wrote:

> Hi,
>
> Is it possible to configure Drill such that the Foreman and leaf-worker
> fleets are separate fleets of nodes?
> Or if this needs changing the source of Drill, any pointers are appreciated
> too.
>
> Thanks,
> Lokendra
>