[openstack-dev] [sahara][CDH] Is it possible to add CDH5.4 into Kilo release now?

2015-04-28 Thread Chen, Ken
Hi all,
Currently Cloudera has already release CDH5.4.0 version. I have already 
registered a bp and submitted two patches for it 
(https://blueprints.launchpad.net/sahara/+spec/cdh-5-4-support) . However, they 
are for master stream, and Cloudera hope it can be added to the latest release 
version of Sahara (Kilo release) so that they can give better support to their 
customers. I am not sure whether it is possible to do this at this stage?

-Ken
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] About Sahara EDP New Ideas for Liberty

2015-04-22 Thread Chen, Ken
Hi Trevor,
I saw below items in Proposed Sprint Topics of sahara liberty. 
https://etherpad.openstack.org/p/sahara-liberty-proposed-sessions. I guess 
these are the EDP ideas we want to discuss on Vancouver design summit. We have 
some comments as below:

•   EDP Priorities in Liberty - At the last 2 summits, we've looked at 
possible work for EDP in the cycle and prioritized it. This is helpful, since 
there is probably more that could be done than can be done in a single cycle :)
o   job scheduler (proposed by weiting)
 we already have a spec on this, please help review it and give your 
comments and ideas. https://review.openstack.org/#/c/175719/ 

o   more complex workflows (job dependencies, DAGs, etc. Do we rely on 
Oozie, or something else?
 Huichun is now figuring this. I am not whether you guys already have 
some detail ideas about this? If needed we can contribute some effort. If no 
details are ready, we can help draw a draft version first.

o   job interface mapping 
https://blueprints.launchpad.net/sahara/+spec/unified-job-interface-map 
proposed in Kilo but moved to Liberty
   ++ high priority in my opinion.  Should be done early, awesome feature
 seems interesting. We agree EDP UI should be improved. In fact we have 
some unclear thinking about EDP inside our team. Some guys do not like current 
EDP design, and think it is more like a re-design of oozie or spark UI, 
instead of a universal interface to users. However, we have not a clear 
strategy on this part.

o   early error detection to help transient clusters -- how many things can 
we detect early that can go wrong with an EDP job so that we return an error 
before spinning up the cluster (only to find that the job fails once the 
cluster is launched?) Ex, bad swift paths
 seems easier, but may include some trivial work.

•   Spark plugins -- we have an independent Spark plugin, but we also have 
Spark supported by mapr, and in the future it will be supported by Ambari.  
Should we continue to carry a simple Spark standalone plugin?  Or should we 
work toward shifting our Spark support to one or more vendor plugins?
Not sure what this will impact.

-Ken


-Original Message-
From: Trevor McKay [mailto:tmc...@redhat.com] 
Sent: Tuesday, March 24, 2015 10:49 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] About Sahara EDP New Ideas for Liberty

Weiting, Andrew,

Agreed, great ideas!  As Andrew noted, we have discussed some of these things 
before and it would be great to discuss them in Vancouver.

I think that a Sahara-side workflow manager is the right approach. Oozie has a 
lot of capability for job coordination, but it won't work for all of our 
cluster and job types.

Notes on Spark in particular -- when we implemented Spark EDP, we looked at 
various implementations for a Spark job server.  One was to extend Oozie, one 
was to use the Ooyala Spark job server, and one was to use ssh around 
spark-submit.  We chose the last, notes are here:

https://etherpad.openstack.org/p/sahara_spark_edp

We could potentially revisit the Ooyala job server.  My impression at the time 
was that for the functions we wanted, it was pretty heavy. But if we are going 
to add job coordination as a general feature, it may be appropriate. I believe 
in the Spark community it is the dominant solution for job management, open 
source is here:

https://github.com/spark-jobserver/spark-jobserver

As part of the Spark investigation, I posted on this JIRA, too. This is a JIRA 
for developing a REST api to the spark job server, which may be enough for us 
to build our own coordination system:

https://issues.apache.org/jira/browse/SPARK-3644

Best,

Trevor

On Tue, 2015-03-24 at 01:55 +, Chen, Weiting wrote:
 Hi Andrew.
 
  
 
 Thanks for response. My reply in line.
 
  
 
 From: Andrew Lazarev [mailto:alaza...@mirantis.com]
 Sent: Saturday, March 21, 2015 12:10 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] About Sahara EDP New Ideas for Liberty
 
  
 
 Hi Weiting,
 
  
 
 
 1. Add a schedule feature to run the jobs on time:
 
 
 This request comes from the customer, they usually run the job in a
 specific time every day. So it should be great if there
 
 
  is a scheduler to help arrange the regular job to run.
 
 
 Looks like a great feature. And should be quite easy to implement.
 Feel free to create spec for that.
 
 
 [Weiting] We are working on the spec and the bp has already been 
 registered in 
 https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs.
 
  
 
 
 2. A more complex workflow design in Sahara EDP:
 
 
 Current EDP only provide one job that is running on one cluster.
 
 
 Yes. And ability to run several jobs in one oozie workflow is 
 discussed on every summit (e.g. 'coordinated jobs' at 
 https://etherpad.openstack.org/p/kilo-summit-sahara-edp). But for now 
 it was not 

[openstack-dev] [Sahara] Question about Sahara db code

2015-03-31 Thread Chen, Ken
Hi all,
I have some confusions on Sahara conductor codes. Maybe the questions are 
silly, but please let me know if you have the answer. Thanks.

1.   In Sahara conf we have an option db_driver, whose default value is 
sahara.db. Is it possible we do not use sahara.db? I think it should be the 
only choice for Sahara, so why do we have this option? For different db engine 
backend we already have another option db_backend whose default value is 
sqlalchemy.

2.   In sahara/db/ directory we have a base.py, which defines Base class 
where use the db_driver to initialize self.db in Base. Thus we have below 
calling sequence (use cluster_create method as an example):
sahara.conductor.manager.ConductorManager().cluster_create  == 
sahara.db.Base().db.cluster_create == sahara.db.cluster_create == 
sahara.db.api.cluster_create == IMPL.cluster_create
So why we do not just discard base.py, and assign 
sahara.conductor.manager.ConductorManager().db_api = sahara.db.api, and let the 
flow be like
sahara.conductor.manager.ConductorManager().cluster_create  == 
sahara.db_api.cluster_create == IMPL.cluster_create ?
This is what Heat codes is like. Current Sahara implementation seems copied 
from nova. There we also have a db_driver whose default value is nova.db. 
In fact I also have the similar questions on nova (db_driver and base.py seem 
redundant).

Thanks.
-Ken
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Sahara] Question about Sahara API v2

2015-03-31 Thread Chen, Ken
Sergey and Michael, thanks for explaining these.
-Ken

From: Sergey Lukjanov [mailto:slukja...@mirantis.com]
Sent: Tuesday, March 31, 2015 12:00 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Sahara] Question about Sahara API v2

Agree with Mike, thx for the link.

On Mon, Mar 30, 2015 at 4:55 PM, michael mccune 
m...@redhat.commailto:m...@redhat.com wrote:
On 03/30/2015 07:02 AM, Sergey Lukjanov wrote:
My personal opinion for API 2.0 - we should discuss design of all object
and endpoint, review how they are used from Horizon or
python-saharaclient and improve them as much as possible. For example,
it includes:

* get rid of tons of extra optional fields
* rename Job - Job Template, Job Execution - Job
* better support for Horizon needs
* hrefs

If you have any ideas ideas about 2.0 - please write them up, there is a
99% chance that we'll discuss an API 2.0 a lot on Vancouver summit.

+1

i've started a pad that we can use to collect ideas for the discussion: 
https://etherpad.openstack.org/p/sahara-liberty-api-v2

things that i'd like to see from the v2 discussion

* a full endpoint review, some of the endpoints might need to be deprecated or 
adjusted slightly (for example, job-binary-internals)

* a technology review, should we consider Pecan or stay with Flask?

* proposals for more radical changes to the api; use of micro-versions akin to 
nova's plan, migrating the project id into the headers, possible use of swagger 
to aid in auto-generation of api definitions.

i think we will have a good amount to discuss and i will be migrating some of 
my local notes into the pad over this week and the next. i invite everyone to 
add their thoughts to the pad for ideas.

mike


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribehttp://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Sincerely yours,
Sergey Lukjanov
Sahara Technical Lead
(OpenStack Data Processing)
Principal Software Engineer
Mirantis Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Sahara] Question about Sahara API v2

2015-03-29 Thread Chen, Ken
Hi all,
Recently I have read some contents about Sahara API v2 propose, but I am still 
a bit confused why we are doing so at this stage. I read the bp 
https://blueprints.launchpad.net/sahara/+spec/v2-api-impl and the involved 
gerrit reviews (although already abandoned). However, I did not find anything 
new than current v1+v1.1 APIs. So why do we want v2 API? Just to combine v1 and 
v1.1 APIs? Is there any deeper requirement or background needs us to do so? 
Please let me know that if yes.
Btw, I also see some comments that we may want to introduce PECAN to implement 
Sahara APIs. Will that be soon in Liberty, or not decided yet?

Thanks a lot.
-Ken
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev