Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
Hi Gour,

Thanks a lot for the detailed answer, and the pointer to tomcat packaging,
which does half the work for httpfs.
I'll try to wrap properly unpacking of the RPM  extraction of the relevant
parts for slider packaging. That was my gripe; other than that, i can
launch httpfs services and flex them: slider is just awesome.

Kind regards,
JB


Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
Hi Thomas,

Thanks a lot for the updates you brought to the main Koya repository.

I saw and can see you're still declaring a resource for each broker. This
is painful as it means modifying your metainfo  possibly resource.json in
case you want to grow your cluster, say beyond 10 machines :)

Wouldn't it more logically fit into slider to declare one server.xml
configuration, one resource type, and actually flex the application / play
with the instance # to grow it ?
I saw from Gour's comment that you were concerned about unique id
generation. Maybe using the app_container_tag would be a good starting
point ?
For what it's worth, it seemed to work out properly for me.

Kind regards,
JB


Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
Hi Steve,

Thanks a lot for your reply from your very busy schedule.

Actually we'll get away with a python daemon watching zookeeper and doing
dynamic DNS updates.
This seems easy enough and probably more palatable than duplicating a full
DNS server (i'm on the operations side ;)).
I'll keep you posted as we'll probably share this work.

Kind regards,
JB


Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
Hi Thomas,

This is because the app_container_tag is unique under each resource.
Given your two brokers are on separate resources BROKER0 and BROKER1, they
get identical (1) container_tag.

You should set them in the same resource (BROKER), and the numbering will
be sequential. No idea how it behaves on container restart, however this is
good enough to start and flex a kafka cluster here.

I've sent your a pull request on github showing how I did. There's no
pretention for actual merge, but if you want it, I can amend for inclusion
to your leasure.

Kind regards,
JB


Re: Packaging new apps

2015-05-11 Thread Thomas Weise
Excellent, will look the pull request shortly. Any thoughts on merging the
server properties defined into the slider config into the server.properties
that came with the Kafka archive?

Thomas

On Mon, May 11, 2015 at 8:10 AM, Jean-Baptiste Note jbn...@gmail.com
wrote:

 Hi Thomas,

 This is because the app_container_tag is unique under each resource.
 Given your two brokers are on separate resources BROKER0 and BROKER1, they
 get identical (1) container_tag.

 You should set them in the same resource (BROKER), and the numbering will
 be sequential. No idea how it behaves on container restart, however this is
 good enough to start and flex a kafka cluster here.

 I've sent your a pull request on github showing how I did. There's no
 pretention for actual merge, but if you want it, I can amend for inclusion
 to your leasure.

 Kind regards,
 JB



Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
There's a remark on the pull request about this, with more details than in
this mail, but basically:

* Other apps seem to regenerate the config files directly through a
template rather than try to do a merge (you seem to be doing a SED on
defined properties, however it does not work here, maybe a python version
issue ?), so that's what I did for server.properties.

Where I come from we use Chef, and redefine all configuration files
anyways, so I was thinking of duplicating a standard configuration file in
the appConfig-default.json (kind of duplicated from the tarball -- again
all other packaged apps are doing it like this), and use Chef to regenerate
all the appConfig.json in order to deploy infrastructure Kafka (and let
users do whatever they wish based on the defaults).
​
Kind regards,
JB


Re: Packaging new apps

2015-05-11 Thread Thomas Weise
Hi Jean,

Indeed we would like to use component instances as you outline. So far, I
have not found a way to derive the Kafka server id from the Slider
configuration. I checked on my cluster and I find 2 containers using the
same app_container_tag in the logs:

u'componentName': u'BROKER1',
 u'configurations': {u'BROKER-COMMON': {u'broker.id': u'1',
u'zookeeper.connect':
u'node26:2181,node27:2181,node28:2181'},
 u'BROKER0': {u'broker.id': u'0'},
 u'BROKER1': {u'broker.id': u'1'},
 u'global': {u'app_container_id': u'container
_1430350563654_0416_01_03',
 u'app_container_tag': u'1',



u'componentName': u'BROKER0',
 u'configurations': {u'BROKER-COMMON': {u'broker.id': u'0',
u'zookeeper.connect':
u'node26:2181,node27:2181,node28:2181'},
 u'BROKER0': {u'broker.id': u'0'},
 u'BROKER1': {u'broker.id': u'1'},
 u'global': {u'app_container_id': u'container
_1430350563654_0416_01_09',
 u'app_container_tag': u'1',

Any other ideas how to obtain the component instance index that works
across container failures?

Thanks,
Thomas


On Mon, May 11, 2015 at 1:44 AM, Jean-Baptiste Note jbn...@gmail.com
wrote:

 Hi Thomas,

 Thanks a lot for the updates you brought to the main Koya repository.

 I saw and can see you're still declaring a resource for each broker. This
 is painful as it means modifying your metainfo  possibly resource.json in
 case you want to grow your cluster, say beyond 10 machines :)

 Wouldn't it more logically fit into slider to declare one server.xml
 configuration, one resource type, and actually flex the application / play
 with the instance # to grow it ?
 I saw from Gour's comment that you were concerned about unique id
 generation. Maybe using the app_container_tag would be a good starting
 point ?
 For what it's worth, it seemed to work out properly for me.

 Kind regards,
 JB



Re: Packaging new apps

2015-05-11 Thread Thomas Weise
In order to work for different Kafka versions, it would be nice to pick
whatever server.properties the archive comes with and apply all the
properties that are defined in server.xml on top of it. Does that work for
you? We can look into making that merge work then.

Everything else looks great, thanks for the pull request!

Thomas


On Mon, May 11, 2015 at 8:21 AM, Jean-Baptiste Note jbn...@gmail.com
wrote:

 There's a remark on the pull request about this, with more details than in
 this mail, but basically:

 * Other apps seem to regenerate the config files directly through a
 template rather than try to do a merge (you seem to be doing a SED on
 defined properties, however it does not work here, maybe a python version
 issue ?), so that's what I did for server.properties.

 Where I come from we use Chef, and redefine all configuration files
 anyways, so I was thinking of duplicating a standard configuration file in
 the appConfig-default.json (kind of duplicated from the tarball -- again
 all other packaged apps are doing it like this), and use Chef to regenerate
 all the appConfig.json in order to deploy infrastructure Kafka (and let
 users do whatever they wish based on the defaults).
 ​
 Kind regards,
 JB



Re: Packaging new apps

2015-05-11 Thread hsy...@gmail.com
Hi Jean,

Thanks for the change, using instance tag(is it a new feature in the latest
version? I didn't see it in the older slider versions) is a really good
idea.  it might be good for other's to have a template but not for kafka.
Kafka is evolving in quite fast pace. I've seen many property key/val
change in last several releases. Our method is keep most properties default
and only override the one declared in appConfig.json which is actually
supported in current python script(maybe need some change for the latest
slider).

And  Kafka broker is bundled with local disk once it's launched so in the
real world there would be at most one instance for each NM.

Best,
Siyuan



On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note jbn...@gmail.com
wrote:

 Hi Thomas,

 According to kafka's documentation:
 http://kafka.apache.org/07/configuration.html there should be a default
 value for any added property; I would expect the provided server.properties
 file to actually reflect those default values.
 Therefore, I'd look twice before overconstraining the problem, and would
 just generate the file for those and only those dictionary values that have
 been set in the appConfig (which currently, my code does not, it configures
 too many properties statically, but it can be arranged), relying on the
 default properties for the rest.

 If there's really a case to have all properties at hand, I could:
 * parse the properties file provided in the tarball
 * re-generate the whole conf file with the parsed + overrides

 This, in order to allow for *added* properties (which the current schemes,
 either mine or yours, does not look to allow) AND ultimately, allow for the
 whole tarball installation to be switched to read-only (which could allow
 them to be shared among instances running on the same NM; I don't know if
 slider currently does this kind of optimization).

 Maybe guidance from people more familiar with slider than us would be
 needed here :)

 Kind regards,
 JB



Re: Packaging new apps

2015-05-11 Thread Thomas Weise
Jean,

We pulled in your changes and added modifications on top of it. It appears
we agree that we should not force the user to redefine the default values
that ship with server.properties. Please see whether the properties merge
as implemented works on your environment or not. If not, what is the Python
version?

We can find an alternative solution to in-place edit of server properties
if and when needed. The file is an argument to the start script, hence we
can do a copy before merge if necessary.

Thomas


On Mon, May 11, 2015 at 3:26 PM, hsy...@gmail.com hsy...@gmail.com wrote:

 Hi Jean,

 Thanks for the change, using instance tag(is it a new feature in the latest
 version? I didn't see it in the older slider versions) is a really good
 idea.  it might be good for other's to have a template but not for kafka.
 Kafka is evolving in quite fast pace. I've seen many property key/val
 change in last several releases. Our method is keep most properties default
 and only override the one declared in appConfig.json which is actually
 supported in current python script(maybe need some change for the latest
 slider).

 And  Kafka broker is bundled with local disk once it's launched so in the
 real world there would be at most one instance for each NM.

 Best,
 Siyuan



 On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note jbn...@gmail.com
 wrote:

  Hi Thomas,
 
  According to kafka's documentation:
  http://kafka.apache.org/07/configuration.html there should be a default
  value for any added property; I would expect the provided
 server.properties
  file to actually reflect those default values.
  Therefore, I'd look twice before overconstraining the problem, and would
  just generate the file for those and only those dictionary values that
 have
  been set in the appConfig (which currently, my code does not, it
 configures
  too many properties statically, but it can be arranged), relying on the
  default properties for the rest.
 
  If there's really a case to have all properties at hand, I could:
  * parse the properties file provided in the tarball
  * re-generate the whole conf file with the parsed + overrides
 
  This, in order to allow for *added* properties (which the current
 schemes,
  either mine or yours, does not look to allow) AND ultimately, allow for
 the
  whole tarball installation to be switched to read-only (which could allow
  them to be shared among instances running on the same NM; I don't know if
  slider currently does this kind of optimization).
 
  Maybe guidance from people more familiar with slider than us would be
  needed here :)
 
  Kind regards,
  JB
 



Re: Packaging new apps

2015-05-11 Thread Jean-Baptiste Note
Hi Thomas,

According to kafka's documentation:
http://kafka.apache.org/07/configuration.html there should be a default
value for any added property; I would expect the provided server.properties
file to actually reflect those default values.
Therefore, I'd look twice before overconstraining the problem, and would
just generate the file for those and only those dictionary values that have
been set in the appConfig (which currently, my code does not, it configures
too many properties statically, but it can be arranged), relying on the
default properties for the rest.

If there's really a case to have all properties at hand, I could:
* parse the properties file provided in the tarball
* re-generate the whole conf file with the parsed + overrides

This, in order to allow for *added* properties (which the current schemes,
either mine or yours, does not look to allow) AND ultimately, allow for the
whole tarball installation to be switched to read-only (which could allow
them to be shared among instances running on the same NM; I don't know if
slider currently does this kind of optimization).

Maybe guidance from people more familiar with slider than us would be
needed here :)

Kind regards,
JB


Re: Packaging new apps

2015-05-08 Thread Steve Loughran

 On 8 May 2015, at 01:52, Gour Saha gs...@hortonworks.com wrote:
 
 Last but not least, I'm wondering if there would already be a plan to
 expose somehow (through an internal or an external service) the registry
 through DNS (that's what we really use for service location for HTTPFS 
 OpenTSDB). A bash polling script would certainly be sufficient for our
 needs for now, but longer-term, we'd need to have a more robust solution.
 
 Registry and REST APIs on registry comes directly from YARN -
 https://issues.apache.org/jira/browse/YARN-913
 https://issues.apache.org/jira/browse/YARN-2948
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html

DNS support is always something that's been considered; it's why the paths in 
the registry spec are required to be valid DNS names (though the check is 
actually disabled primarily because usernames aren't and punycoding doesn't 
address things like spaces in names, just high-unicode characters.

We held back on this originally due to (a) need to scope things for hadoop 2.6 
and (b) worries about how operations teams will like more DNS servers popping 
up in the organisation. I think we can try to do the DNS -it just needs someone 
to sit down and to it. I'm afraid my todo list is already full

I'd like to wrap up the registry stuff with an HTTP service that can be 
deployable at a fixed location; we have this in slider but it's there to show 
its possible more than anything else (because it moves around).





Re: Packaging new apps

2015-05-07 Thread Gour Saha
Hi Jean,

Please see answers inline.

-Gour

On 5/6/15, 6:16 AM, Jean-Baptiste Note 
jbn...@gmail.commailto:jbn...@gmail.com wrote:

Hi folks,

Currently we're using Chef in our organization to deploy a lot of
infrastructure services around Hadoop. Of course it makes a lot of sense to
offer these as self-services on YARN using slider, but i'm looking at a
number of challenges. So please forgive the broad range of questions :)

I'm specifically intersted in deploying the following applications:
* HTTPFS service (see https://github.com/jbnote/httpfs-slider)  helpers
(nginx)
* Opentsdb  helpers (varnish)
* kafka (I had a look at koya)
* druid
* storm (fine, thanks !)
* hbase (fine, thanks !)

I'm facing a lot of issues with those services which are not yet packaged
correctly:

* httpfs/opentsdb are not released as standalone tarballs, contrary to all
services currently packaged. So i've butchered a tarball from Cloudera
RPMs, which is not satisfactory. How would you go about handling this ?

Not sure exactly what you mean, by saying handling this. If you are referring 
to a way to create a Slider package of an app in rpm format, then there are 
challenges, such as rpm install requires root access and YARN does not allow 
that. If you are referring to an issue you are facing with deploying the Slider 
app (now that you have created a tarball), can you share what issues you are 
facing?

You might also want to take a look at this tomcat Slider package. Caution: It 
is not ready for prime-time and has few issues which needs to be resolved. But 
the scripts and metadata files might be a helpful reference.
https://issues.apache.org/jira/browse/SLIDER-809
https://github.com/apache/incubator-slider/tree/feature/SLIDER-809-tomcat-app-package/app-packages/tomcat



* KOYA has been talked a lot of, however the source i'm looking at (
https://github.com/DataTorrent/koya) is kind of disappointing, and activity
is a bit low -- would anyone know if dataTorrent is still committed to the
project ?

What issues are you facing with KOYA? DataTorrent gave a presentation of KOYA 
and Slider seems to have fit their need so far. They wanted few features around 
data locality (strict placement) which will be there in 0.80.0 release AND 
unique ids which still needs some work to be done.


Last but not least, I'm wondering if there would already be a plan to
expose somehow (through an internal or an external service) the registry
through DNS (that's what we really use for service location for HTTPFS 
OpenTSDB). A bash polling script would certainly be sufficient for our
needs for now, but longer-term, we'd need to have a more robust solution.

Registry and REST APIs on registry comes directly from YARN -
https://issues.apache.org/jira/browse/YARN-913
https://issues.apache.org/jira/browse/YARN-2948
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html



Thanks a lot, kind regards,
JB



Re: Packaging new apps

2015-05-07 Thread Thomas Weise
Jean,

You will see updates in the KOYA repository soon. As part of that we will
move up to the latest release of Slider and also document the configuration
process.

Thanks,
Thomas




On Thu, May 7, 2015 at 5:52 PM, Gour Saha gs...@hortonworks.com wrote:

 Hi Jean,

 Please see answers inline.

 -Gour

 On 5/6/15, 6:16 AM, Jean-Baptiste Note jbn...@gmail.commailto:
 jbn...@gmail.com wrote:

 Hi folks,

 Currently we're using Chef in our organization to deploy a lot of
 infrastructure services around Hadoop. Of course it makes a lot of sense to
 offer these as self-services on YARN using slider, but i'm looking at a
 number of challenges. So please forgive the broad range of questions :)

 I'm specifically intersted in deploying the following applications:
 * HTTPFS service (see https://github.com/jbnote/httpfs-slider)  helpers
 (nginx)
 * Opentsdb  helpers (varnish)
 * kafka (I had a look at koya)
 * druid
 * storm (fine, thanks !)
 * hbase (fine, thanks !)

 I'm facing a lot of issues with those services which are not yet packaged
 correctly:

 * httpfs/opentsdb are not released as standalone tarballs, contrary to all
 services currently packaged. So i've butchered a tarball from Cloudera
 RPMs, which is not satisfactory. How would you go about handling this ?

 Not sure exactly what you mean, by saying handling this. If you are
 referring to a way to create a Slider package of an app in rpm format, then
 there are challenges, such as rpm install requires root access and YARN
 does not allow that. If you are referring to an issue you are facing with
 deploying the Slider app (now that you have created a tarball), can you
 share what issues you are facing?

 You might also want to take a look at this tomcat Slider package. Caution:
 It is not ready for prime-time and has few issues which needs to be
 resolved. But the scripts and metadata files might be a helpful reference.
 https://issues.apache.org/jira/browse/SLIDER-809

 https://github.com/apache/incubator-slider/tree/feature/SLIDER-809-tomcat-app-package/app-packages/tomcat



 * KOYA has been talked a lot of, however the source i'm looking at (
 https://github.com/DataTorrent/koya) is kind of disappointing, and
 activity
 is a bit low -- would anyone know if dataTorrent is still committed to the
 project ?

 What issues are you facing with KOYA? DataTorrent gave a presentation of
 KOYA and Slider seems to have fit their need so far. They wanted few
 features around data locality (strict placement) which will be there in
 0.80.0 release AND unique ids which still needs some work to be done.


 Last but not least, I'm wondering if there would already be a plan to
 expose somehow (through an internal or an external service) the registry
 through DNS (that's what we really use for service location for HTTPFS 
 OpenTSDB). A bash polling script would certainly be sufficient for our
 needs for now, but longer-term, we'd need to have a more robust solution.

 Registry and REST APIs on registry comes directly from YARN -
 https://issues.apache.org/jira/browse/YARN-913
 https://issues.apache.org/jira/browse/YARN-2948

 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html



 Thanks a lot, kind regards,
 JB




Packaging new apps

2015-05-06 Thread Jean-Baptiste Note
Hi folks,

Currently we're using Chef in our organization to deploy a lot of
infrastructure services around Hadoop. Of course it makes a lot of sense to
offer these as self-services on YARN using slider, but i'm looking at a
number of challenges. So please forgive the broad range of questions :)

I'm specifically intersted in deploying the following applications:
* HTTPFS service (see https://github.com/jbnote/httpfs-slider)  helpers
(nginx)
* Opentsdb  helpers (varnish)
* kafka (I had a look at koya)
* druid
* storm (fine, thanks !)
* hbase (fine, thanks !)

I'm facing a lot of issues with those services which are not yet packaged
correctly:

* httpfs/opentsdb are not released as standalone tarballs, contrary to all
services currently packaged. So i've butchered a tarball from Cloudera
RPMs, which is not satisfactory. How would you go about handling this ?

* KOYA has been talked a lot of, however the source i'm looking at (
https://github.com/DataTorrent/koya) is kind of disappointing, and activity
is a bit low -- would anyone know if dataTorrent is still committed to the
project ?

Last but not least, I'm wondering if there would already be a plan to
expose somehow (through an internal or an external service) the registry
through DNS (that's what we really use for service location for HTTPFS 
OpenTSDB). A bash polling script would certainly be sufficient for our
needs for now, but longer-term, we'd need to have a more robust solution.

Thanks a lot, kind regards,
JB