[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066097#comment-17066097
 ] 

Abhishek Girish commented on DRILL-7563:


[~cgivre], once we get my changes in to Drill GitHub repo, I think we can work 
on the docs / tutorials. I also have some material which can help for it. 

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Dobes Vandermeer (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066095#comment-17066095
 ] 

Dobes Vandermeer commented on DRILL-7563:
-

[~cgivre]

I don't have time to write a tutorial, sorry.

[~rjaimes]

Those yaml files were enough to get drill running, but I haven't tested them 
much. A few times I had problems with zookeeper not coming back online, I think 
it was some kind of DNS resolution issue, I tried to fix it but I can't be sure 
I was successful yet. I never reached the point that I was confident that I 
could run a zookeeper & drill setup on kubernetes without worrying it would 
fail on me at an inopportune moment.

 

 

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066086#comment-17066086
 ] 

Abhishek Girish commented on DRILL-7563:


I've added support for  auto-scaling and I've tested that it works well. Please 
see: https://github.com/Agirish/drill-helm-charts#autoscaling-drill-clusters

I  have a script to test this: 
https://github.com/Agirish/drill-helm-charts/blob/master/scripts/runCPULoadTest.sh

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Rafael Jaimes (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066022#comment-17066022
 ] 

Rafael Jaimes commented on DRILL-7563:
--

Dobes - thanks. I am looking through your work now. Have you tested this with 
scaling up drillbit pods?

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065990#comment-17065990
 ] 

Charles Givre commented on DRILL-7563:
--

@Dobes, 
Would you be willing to write up a tutorial or something that we could post on 
the documentation on the main Drill website?
Thanks,
-- C




> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-24 Thread Dobes Vandermeer (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065973#comment-17065973
 ] 

Dobes Vandermeer commented on DRILL-7563:
-

I have been able to get drill to run in kubernetes using the existing drill 
docker images currently published.

 

See here for example configs:

[https://gist.github.com/dobesv/98d85b18ee8566891c5122e2b990f0c5]

[https://gist.github.com/dobesv/be5aa3e6e5830e54c0e77b73884333cc]

 

 

 

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-20 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063721#comment-17063721
 ] 

Abhishek Girish commented on DRILL-7563:


Update: I've added support for overriding Drill configurations (drill-env.sh 
and drill-override.conf) via conf files uploaded to configMaps. Details in the 
repo.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-03-20 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063608#comment-17063608
 ] 

Abhishek Girish commented on DRILL-7563:


Hey [~Paul.Rogers], [~arina], and all, sorry I missed this JIRA + comments in 
it.

I would like to share this GitHub repository of mine: 
https://github.com/Agirish/drill-helm-charts . Kindly take a look when you have 
some time.

It has support for deploying Apache Drill clusters on Kubernetes using the Helm 
Charts approach. It's functional out-of-box (I've deployed on a standard GKE 
Kubernetes Cluster) and I'm frequently adding fixes, improvements and new 
features. There are a few things missing - such as support for passing Drill 
configuration files as ConfigMaps, which I'm working on (but it has the most 
common configs available for anyone to change differently). 
The documentation has all the basic details on repo structure & usage, and I'm 
working on adding more information to it. This requires custom Dockerfiles and 
I have included those in the repo as well. 

I also have an alternate implementation by building a new native Kubernetes 
Operator for Apache Drill which can provide more flexibility and power. But for 
now, I'd like to focus on getting the above Helm Charts approach to be fully 
feature complete, reviewed and committed, so that we could ship them in an 
upcoming release.

Please feel free to submit corrections / enhancements as PRs (or file GitHub 
issues with comments) to the repo above until this is part of the official 
Drill repo: https://github.com/Agirish/drill-helm-charts/issues

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-04 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030008#comment-17030008
 ] 

Paul Rogers commented on DRILL-7563:


There appears to be a separate mechanism to announce "official" images. See how 
[Zookeeper|https://hub.docker.com/_/zookeeper/] does it. Look at the URL: the 
"_" part of the path seems to lead to official images. The ZK page has the ZK 
(but not Apache) logo.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-04 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029810#comment-17029810
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

[~cgivre] as per reply in INFRA most likely Apache projects won't be able to 
have a logo :(

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can be 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-04 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029809#comment-17029809
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

[~agirish] could you please review Paul's implementation plan? What parts are 
you planning to deliver?
Since this work is aimed for the next release, it would be good to know the ETA.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029403#comment-17029403
 ] 

Paul Rogers commented on DRILL-7563:


Here are the first attempts at a docker container for a Drillbit:

* DockerHub: [https://hub.docker.com/repository/docker/gaucho84/drill]
 * Source: [https://github.com/paul-rogers/drill-docker] See the docker folder 
for details.

As [~arina] noted, and Abhishek confirmed, MapR/HPE has a proprietary solution 
that may be open sourced at some point. To avoid duplicate efforts, let's put 
this on hold for the time being. We can revisit later if Abhishek's version 
does not end up being contributed.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028897#comment-17028897
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

Thanks!

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can be configured and integrated 
> in K8s: "stock K8s", OpenShift, AWS EKS, Google GCP and so 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028896#comment-17028896
 ] 

Vova Vysotskyi commented on DRILL-7563:
---

Thanks, I have created https://issues.apache.org/jira/browse/INFRA-19810 for 
this.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can be configured and 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028887#comment-17028887
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

Can you please raise a ticket on INFRA and ask how logo can be added?

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can be configured and integrated 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028874#comment-17028874
 ] 

Vova Vysotskyi commented on DRILL-7563:
---

I don't think that we can add a logo to https://hub.docker.com/r/apache/drill. 
I didn't find an option for that in repository settings, and all other 
repositories under Apache are also without logo: 
https://hub.docker.com/u/apache.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-03 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028864#comment-17028864
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

[~volodymyr] can we add logo? (https://hub.docker.com/r/apache/drill)

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. Production Deployments
> With Docker and K8s the old maxim "the devil is in the details" is true in 
> spades. There are dozens of ways that Drill can be configured and integrated 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-02 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028619#comment-17028619
 ] 

Charles Givre commented on DRILL-7563:
--

I have a minor comment.  When this work is complete, we should make sure that 
these images are listed as Docker "official" images with the Drill logo, 
similar to what Solr  and other Apache projects do. 

 

 

[1]: https://hub.docker.com/_/solr/

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
> h4. Security
> The above steps provide an "MVP": minimum viable project - it will run Drill 
> with standard options in the various environments. Users who chose to run 
> Drill in production will likely require additional security settings. Enable 
> SSL? Control ingress? We need to understand what is needed, what Drill 
> offers, and how to enable Drill's security features in a containerized 
> environment.
> h4. 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-02 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028499#comment-17028499
 ] 

Paul Rogers commented on DRILL-7563:


h4. Launch Script Extensions

When we created Drill-on-YARN, we found we had to adjust several of Drill's 
launch scripts to align them with the way YARN works. The same will likely be 
true for Docker and K8s. The goal for Drill 1.17 is to work around these 
limitations so we can use the official 1.17 release as the basis. For 1.18 and 
later, we have an opportunity to improve the scripts.

* It is generally a good idea to make Docker images as small as possible. Add a 
script to strip out unneeded files. For example, a Drillbit container does not 
need Sqlline or DoY. Alternatively, a Docker-specific build which excludes 
unneded files.
* Launch option, similar to {{run}}, which will run Drill as pid 0 (so it can 
receive shutdown signals), and which writes logs to stdout (typical of K8s 
pods). This means, at least, changing the "Starting drillbit, logging to 
/var/log/drill/drillbit.out" message.
* Environment variable for the ZK connect string, rather than burying it in 
{{drill-override.conf}}. This can be as simple as, if {{ZK_CONNECT}} is set, 
add the following to {{DRILL_JAVA_OPTS}}: 
{{-Ddrill.exec.zk.connect=$ZK_CONNECT}}.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> Both should be published to the official Drill DockerHub account: 
> https://hub.docker.com/r/apache/drill
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-02 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028496#comment-17028496
 ] 

Paul Rogers commented on DRILL-7563:


[~arina], thank your for the link. Yes, from the docs, it appears that the MapR 
solution is similar to what is proposed; though it is, of course, part of a 
broader MapR ecosystem. Since this ticket describes an open source solution, 
the goal would be to provide a solution free of most dependencies and which 
leverages other open source solutions such as Helm. That said, would be great 
for the MapR folks to provide their learnings and non-proprietary Docker files 
to avoid redundant efforts.

Thanks much also for the link to the Drill DockerHub account. Yes, we should 
extend that account with the work here. I'll do preliminary work using my own 
DockerHub account. Once that work passes code reviews, we can shift to the 
official account. BTW: the doc links on the Apache DockerHub page lead to 404 
errors.

Of course, if [~volodymyr] or anyone else wants to contribute, all help is 
welcome. Let me know which part you want to tackle.

> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose 

[jira] [Commented] (DRILL-7563) Docker & Kubernetes Drill server container

2020-02-02 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028451#comment-17028451
 ] 

Arina Ielchiieva commented on DRILL-7563:
-

[~Paul.Rogers] this definitely good idea for Drill. One thing that it would be 
good to publish such work under official Apache account 
(https://hub.docker.com/r/apache/drill). We have automatic builds here plus 
[~volodymyr] has write access to the repo (you can also request write access 
through INFRA, ask Vova for details if needed). Also here is the link to the 
similar work from the Mapr side if needed: 
https://github.com/mapr/mapr-operators.



> Docker & Kubernetes Drill server container
> --
>
> Key: DRILL-7563
> URL: https://issues.apache.org/jira/browse/DRILL-7563
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Drill provides two Docker containers:
> * [Build Drill from 
> sources|https://github.com/apache/drill/blob/master/Dockerfile]
> * [Run Drill in interactive embedded 
> mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile]
> User feedback suggests that these are not quite the right solutions to run 
> Drill in a K8s (or OpenShift) cluster. In addition, we need a container to 
> run a Drill server. This ticket summarizes the tasks involved.
> h4. Container Image
> The container image should:
> * Start with the OpenJDK base image with minimal extra packages.
> * Download and install an official Drill release.
> We may then want to provide two derived images:
> The Drillbit image which:
> * Configures Drill for production and as needed in the following steps.
> * Provides entry points for the Drillbit and for Sqlline
> * Exposes Drill's four ports
> * Accept as parameters things like the ZK host IP(s).
> The Sqlline image, meant to be run in interactive mode (like the current 
> embedded image) and which:
> * Accept as parameters the ZK host IP(s).
> h4. Runtime Environment
> Drill has very few dependencies, but it must have a running ZK.
> * Start a [ZK container|https://hub.docker.com/_/zookeeper/].
> * A place to store logs, which can be in the container by default, stored on 
> the host file system via a volume mount.
> * Access to a data source, which can be configured via a storage plugin 
> stored in ZK.
> * Ensure graceful shutdown integration with the Docker shutdown mechanism.
> h4. Running Drill in Docker
> Users must run at least one Drillbit, and may run more. Users may want to run 
> Sqlline.
> * The Drillbit container requires, at a minimum, the IP address of the ZK 
> instance(s).
> * The Sqlline container requires only the ZK instances, from which it can 
> find the Drillbit.
> Uses will want to customize some parts of Drill: at least memory, perhaps any 
> of the other options. Provide a way to pass this information into the 
> container to avoid the need to rebuild the container to change configuration.
> h4. Running Drill in K8s
> The containers should be usable in "plain" Docker. Today, however, many 
> people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not 
> the Sqlline) container should be designed to work with K8s. An example set of 
> K8s YAML files should illustrate:
> * Create a host-mount file system to capture Drill logs and query profiles.
> * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or 
> similar tools.
> * Pass Drill configuration (both HOCON and envrironment) as config maps.
> * Pass ZK as an environment variable (the value of which would, one presumes, 
> come from some kind of service discovery system.)
> The result is that the user should be able to manually tinker with the YAML 
> files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets 
> cluster size manually by launching the desired number of Drill pods.
> h4. Helm Chart for Drill
> The next step is to wrap the YAML files in a Helm chart, with parameters 
> exposed for the config options noted above.
> h4. Drill Operator for K8s
>  
> Full K8s integration will require an operator to manage the Drill cluster. 
> K8s operators are often written in Go, though doing so is not necessary. 
> Drill already includes Drill-on-YARN which is, essential a "YARN operator." 
> Repurpose this code to work with K8s as the target cluster manager rather 
> than YARN. Reuse the same operations from DoY: configure, start, resize and 
> stop a cluster.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)