Re: Bluesky calls for a new mentor!
Hi, Ralph: I am not avoiding the truth that we suck during the last three years, though we were once at the verge of release. It's* just* we *Bluesky Team @ XJTU ,Xi'an China* fail to make it good, please remember it well. I believe projects from school could also rock in Apache as well. You can look down up on us but you can't deny others. I had a propose that community give us the last chance for 1-2month and certain member could become our mentor to lead us finish releasing the newest version. During this time slot. What you would see includes: 1. gradually increasing discussion in bluesky-dev mailing list. Meaningless discussion would not count. 2. committing of source code after they were cleaned up. Inactive committers would be revoked and new committers would apply to join in. 3. preparing for what release needs and make the release successful. Thus the new developers and committers could completely experienced the release process and know about How things are done in Apache community better. If community accept my suggestion, individually, i want the BlueSky project under strict surveillance by community members. If we can't fulfill what we just promised, then just kick us out of here and i would have noting to say. Well, suppose we live through that, besides working in Apache way, we would continually working on to evolve BlueSky to make it much more easier to use in e-learning area and used in a larger scope(now bluesky has been deployed in China and is about to be applied in India ), so that more students in undeveloped district could share the same high quality education as the developed area. Sincerely, i would invite you Ralph to be our mentor in this 1-2 month if you were not busy enough and willing to guide us. Don't feel sorry if you want to refuse me.TOT regards, Kevin 2011/6/30 Ralph Goers ralph.go...@dslextreme.com Sorry, but the explanation below makes things sound even worse. Apache projects are not here to give students a place to do school work. What you have described is not a community. If the project cannot build a community of people who are interested in the project for more than a school term then it doesn't belong here. Ralph On Jun 29, 2011, at 8:12 PM, SamuelKevin wrote: Hi, Noel: 2011/6/30 Noel J. Bergman n...@devtech.com Joe Schaefer wrote: Chen Liu wrote: We propose to move future development of BlueSky to the Apache Software Foundation in order to build a broader user and developer community. You are supposed to be doing your development work in the ASF subversion repository, using ASF mailing lists, as peers. Chen, as Joe points out, these are what BlueSky should have been doing for the past three (3) years, and yet we still here a proposal for the future. Looking at the (limited) commit history, there is a total imbalance between the number of people associated with the development work (20+) and the number of people with Apache accounts here (2). I guess i can explain that. Most of the developers of BlueSky project are students. As you all know, students come when they join in school and go after they graduate. So the active developers are around 10. Like we used to have 5 committers, but now we only have 2 committers in active. Again, as Joe points out, ALL of BlueSky development should been done via the ASF infrastructure, not periodically synchronized. We are a development community, not a remote archive. What we really need you to discuss are *plans*, how you will implement them, who will implement them, and how you will collaborate in the codebase as peers. Joe, again, has this on the money. The BlueSky project must immediately make significant strides to rectify these issues. Now, not later. We should see: 1) All current code in the ASF repository. 2) All development via ASF accounts (get the rest of the people signed up). 3) Ddevelopment discussion on the mailing list. 4) All licensing issues cleaned up. According to what you've listed, i would forward your suggestion to bluesky dev list and wish we could make a quick response after discussion. Appreciate your help. regards, Kevin --- Noel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Bowen Ma a.k.a Samuel Kevin @ Bluesky Dev TeamXJTU Shaanxi Province Key Lab. of Satellite and Terrestrial Network Tech http://incubator.apache.org/bluesky/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Bowen Ma a.k.a Samuel Kevin @ Bluesky Dev TeamXJTU Shaanxi Province Key Lab.
Re: [VOTE] Oozie to join the Incubator
+1 (non-binding) On Wed, Jun 29, 2011 at 12:10 PM, Mohammad Islam misla...@yahoo.com wrote: Hi All, The discussion about Oozie proposal is settling down. Therefore I would like to initiate a vote to accept Oozie as an Apache Incubator project. The latest proposal is pasted at the end and it could be found in the wiki as well: http://wiki.apache.org/incubator/OozieProposal The related discussion thread is at: http://www.mail-archive.com/general@incubator.apache.org/msg29633.html Please cast your votes: [ ] +1 Accept Oozie for incubation [ ] +0 Indifferent to Oozie incubation [ ] -1 Reject Oozie for incubation This vote will close 72 hours from now. Regards, Mohammad Abstract Oozie is a server-based workflow scheduling and coordination system to manage data processing jobs for Apache HadoopTM. Proposal Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes: * XML-based declarative framework to specify a job or a complex workflow of dependent jobs. * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications. * Workflow scheduling based on frequency and/or data availability. * Monitoring capability, automatic retry and failure handing of jobs. * Extensible and pluggable architecture to allow arbitrary grid programming paradigms. * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service. Background Most data processing applications require multiple jobs to achieve their goals, with inherent dependencies among the jobs. A dependency could be sequential, where one job can only start after another job has finished. Or it could be conditional, where the execution of a job depends on the return value or status of another job. In other cases, parallel execution of multiple jobs may be permitted – or desired – to exploit the massive pool of compute nodes provided by Hadoop. These job dependencies are often expressed as a Directed Acyclic Graph, also called a workflow. A node in the workflow is typically a job (a computation on the grid) or another type of action such as an eMail notification. Computations can be expressed in map/reduce, Pig, Hive or any other programming paradigm available on the grid. Edges of the graph represent transitions from one node to the next, as the execution of a workflow proceeds. Describing a workflow in a declarative way has the advantage of decoupling job dependencies and execution control from application logic. Furthermore, the workflow is modularized into jobs that can be reused within the same workflow or across different workflows. Execution of the workflow is then driven by a runtime system without understanding the application logic of the jobs. This runtime system specializes in reliable and predictable execution: It can retry actions that have failed or invoke a cleanup action after termination of the workflow; it can monitor progress, success, or failure of a workflow, and send appropriate alerts to an administrator. The application developer is relieved from implementing these generic procedures. Furthermore, some applications or workflows need to run in periodic intervals or when dependent data is available. For example, a workflow could be executed every day as soon as output data from the previous 24 instances of another, hourly workflow is available. The workflow coordinator provides such scheduling features, along with prioritization, load balancing and throttling to optimize utilization of resources in the cluster. This makes it easier to maintain, control, and coordinate complex data applications. Nearly three years ago, a team of Yahoo! developers addressed these critical requirements for Hadoop-based data processing systems by developing a new workflow management and scheduling system called Oozie. While it was initially developed as a Yahoo!-internal project, it was designed and implemented with the intention of open-sourcing. Oozie was released as a GitHub project in early 2010. Oozie is used in production within Yahoo and since it has been open-sourced it has been gaining adoption with external developers Rationale Commonly, applications that run on Hadoop require multiple Hadoop jobs in order to obtain the desired results. Furthermore, these Hadoop jobs are commonly a combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and shell scripts. Because of this, developers find themselves writing ad-hoc glue programs to combine these Hadoop jobs. These ad-hoc programs are difficult to schedule, manage, monitor and recover.
Re: [PROPOSAL] Deft for incubation
On Wed, Jun 29, 2011 at 4:05 PM, Mohammad Nour El-Din nour.moham...@gmail.com wrote: You can sign me in. You've been added to the wiki page. One or two more mentors would be outstanding. /niklas - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: KEYS and releases
On Tue, Jun 28, 2011 at 10:20 AM, Christian Grobmeier grobme...@gmail.com wrote: we copy a KEYS file into that directory upon succesful VOTE of the release artifacts (which also include the KEYS file). Perhaps, but the point we're getting at was explicitly stated by Benson, The goal here is to allow and encourage consumers to independently verify signatures. That calls for KEYS somewhere else than inside the package. I am sorry to ask it again, but why can't the incubator have a policy to make people use: https://id.apache.org/ to store their signing key. Then we have them listed for each projects there: https://people.apache.org/keys/ Was it not meant that way? AIUI this infrastructure is relative new and intended to add defense-in-depth IMHO the IPMC should only document (any volunteers?) a strong recommendation but leave policy in this area to the experts over in infrastructure Robert - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Oozie to join the Incubator
+1 (binding) -C On Wednesday, June 29, 2011, Mohammad Islam misla...@yahoo.com wrote: Hi All, The discussion about Oozie proposal is settling down. Therefore I would like to initiate a vote to accept Oozie as an Apache Incubator project. The latest proposal is pasted at the end and it could be found in the wiki as well: http://wiki.apache.org/incubator/OozieProposal The related discussion thread is at: http://www.mail-archive.com/general@incubator.apache.org/msg29633.html Please cast your votes: [ ] +1 Accept Oozie for incubation [ ] +0 Indifferent to Oozie incubation [ ] -1 Reject Oozie for incubation This vote will close 72 hours from now. Regards, Mohammad Abstract Oozie is a server-based workflow scheduling and coordination system to manage data processing jobs for Apache HadoopTM. Proposal Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes: * XML-based declarative framework to specify a job or a complex workflow of dependent jobs. * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications. * Workflow scheduling based on frequency and/or data availability. * Monitoring capability, automatic retry and failure handing of jobs. * Extensible and pluggable architecture to allow arbitrary grid programming paradigms. * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service. Background Most data processing applications require multiple jobs to achieve their goals, with inherent dependencies among the jobs. A dependency could be sequential, where one job can only start after another job has finished. Or it could be conditional, where the execution of a job depends on the return value or status of another job. In other cases, parallel execution of multiple jobs may be permitted – or desired – to exploit the massive pool of compute nodes provided by Hadoop. These job dependencies are often expressed as a Directed Acyclic Graph, also called a workflow. A node in the workflow is typically a job (a computation on the grid) or another type of action such as an eMail notification. Computations can be expressed in map/reduce, Pig, Hive or any other programming paradigm available on the grid. Edges of the graph represent transitions from one node to the next, as the execution of a workflow proceeds. Describing a workflow in a declarative way has the advantage of decoupling job dependencies and execution control from application logic. Furthermore, the workflow is modularized into jobs that can be reused within the same workflow or across different workflows. Execution of the workflow is then driven by a runtime system without understanding the application logic of the jobs. This runtime system specializes in reliable and predictable execution: It can retry actions that have failed or invoke a cleanup action after termination of the workflow; it can monitor progress, success, or failure of a workflow, and send appropriate alerts to an administrator. The application developer is relieved from implementing these generic procedures. Furthermore, some applications or workflows need to run in periodic intervals or when dependent data is available. For example, a workflow could be executed every day as soon as output data from the previous 24 instances of another, hourly workflow is available. The workflow coordinator provides such scheduling features, along with prioritization, load balancing and throttling to optimize utilization of resources in the cluster. This makes it easier to maintain, control, and coordinate complex data applications. Nearly three years ago, a team of Yahoo! developers addressed these critical requirements for Hadoop-based data processing systems by developing a new workflow management and scheduling system called Oozie. While it was initially developed as a Yahoo!-internal project, it was designed and implemented with the intention of open-sourcing. Oozie was released as a GitHub project in early 2010. Oozie is used in production within Yahoo and since it has been open-sourced it has been gaining adoption with external developers Rationale Commonly, applications that run on Hadoop require multiple Hadoop jobs in order to obtain the desired results. Furthermore, these Hadoop jobs are commonly a combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and shell scripts. Because of this, developers find themselves writing ad-hoc glue programs to combine these Hadoop jobs. These ad-hoc programs are difficult to schedule, manage, monitor and recover. Workflow
Re: [PROPOSAL] Deft for incubation
Thanks Niklas On Thu, Jun 30, 2011 at 9:24 AM, Niklas Gustavsson nik...@protocol7.com wrote: On Wed, Jun 29, 2011 at 4:05 PM, Mohammad Nour El-Din nour.moham...@gmail.com wrote: You can sign me in. You've been added to the wiki page. One or two more mentors would be outstanding. /niklas - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Thanks - Mohammad Nour Author of (WebSphere Application Server Community Edition 2.0 User Guide) http://www.redbooks.ibm.com/abstracts/sg247585.html - LinkedIn: http://www.linkedin.com/in/mnour - Blog: http://tadabborat.blogspot.com Life is like riding a bicycle. To keep your balance you must keep moving - Albert Einstein Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best. - Clean Code: A Handbook of Agile Software Craftsmanship Stay hungry, stay foolish. - Steve Jobs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Oozie to join the Incubator
+1 (Binding) On Thu, Jun 30, 2011 at 10:04 AM, Chris Douglas cdoug...@apache.org wrote: +1 (binding) -C On Wednesday, June 29, 2011, Mohammad Islam misla...@yahoo.com wrote: Hi All, The discussion about Oozie proposal is settling down. Therefore I would like to initiate a vote to accept Oozie as an Apache Incubator project. The latest proposal is pasted at the end and it could be found in the wiki as well: http://wiki.apache.org/incubator/OozieProposal The related discussion thread is at: http://www.mail-archive.com/general@incubator.apache.org/msg29633.html Please cast your votes: [ ] +1 Accept Oozie for incubation [ ] +0 Indifferent to Oozie incubation [ ] -1 Reject Oozie for incubation This vote will close 72 hours from now. Regards, Mohammad Abstract Oozie is a server-based workflow scheduling and coordination system to manage data processing jobs for Apache HadoopTM. Proposal Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes: * XML-based declarative framework to specify a job or a complex workflow of dependent jobs. * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications. * Workflow scheduling based on frequency and/or data availability. * Monitoring capability, automatic retry and failure handing of jobs. * Extensible and pluggable architecture to allow arbitrary grid programming paradigms. * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service. Background Most data processing applications require multiple jobs to achieve their goals, with inherent dependencies among the jobs. A dependency could be sequential, where one job can only start after another job has finished. Or it could be conditional, where the execution of a job depends on the return value or status of another job. In other cases, parallel execution of multiple jobs may be permitted – or desired – to exploit the massive pool of compute nodes provided by Hadoop. These job dependencies are often expressed as a Directed Acyclic Graph, also called a workflow. A node in the workflow is typically a job (a computation on the grid) or another type of action such as an eMail notification. Computations can be expressed in map/reduce, Pig, Hive or any other programming paradigm available on the grid. Edges of the graph represent transitions from one node to the next, as the execution of a workflow proceeds. Describing a workflow in a declarative way has the advantage of decoupling job dependencies and execution control from application logic. Furthermore, the workflow is modularized into jobs that can be reused within the same workflow or across different workflows. Execution of the workflow is then driven by a runtime system without understanding the application logic of the jobs. This runtime system specializes in reliable and predictable execution: It can retry actions that have failed or invoke a cleanup action after termination of the workflow; it can monitor progress, success, or failure of a workflow, and send appropriate alerts to an administrator. The application developer is relieved from implementing these generic procedures. Furthermore, some applications or workflows need to run in periodic intervals or when dependent data is available. For example, a workflow could be executed every day as soon as output data from the previous 24 instances of another, hourly workflow is available. The workflow coordinator provides such scheduling features, along with prioritization, load balancing and throttling to optimize utilization of resources in the cluster. This makes it easier to maintain, control, and coordinate complex data applications. Nearly three years ago, a team of Yahoo! developers addressed these critical requirements for Hadoop-based data processing systems by developing a new workflow management and scheduling system called Oozie. While it was initially developed as a Yahoo!-internal project, it was designed and implemented with the intention of open-sourcing. Oozie was released as a GitHub project in early 2010. Oozie is used in production within Yahoo and since it has been open-sourced it has been gaining adoption with external developers Rationale Commonly, applications that run on Hadoop require multiple Hadoop jobs in order to obtain the desired results. Furthermore, these Hadoop jobs are commonly a combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and shell scripts. Because of this, developers find themselves writing ad-hoc glue programs to combine these
Re: Bluesky calls for a new mentor!
I believe projects from school could also rock in Apache as well. You can look down up on us but you can't deny others. This is not the point. The point is, if you have contributors who have no apache id, they a) need to sign an ICLA b) need to create an Jira issue and attach an svn diff there, ticking the allowed to use for the ASF box c) need to ask development questions on the mailinglist, not by ICQ, MSN or whatever You can actually work with students, no problem. But it should happen visible. Even when you are in the same room, it should be visible to all other parties around the world. Otherwise you will never get an development community. I had a propose that community give us the last chance for 1-2month and certain member could become our mentor to lead us finish releasing the newest version. During this time slot. What you would see includes: I am not sure if the term mentor is used well here. A mentor is not here to help you in development questions. A mentors role is to oversee how the project progresses, guide people to work after the apache way. After 3 years you should already know about the apache way and mentor should be obsolet (from a teaching role only). A mentor is for sure NO project lead. He can point you to the according docs of how to release code, for example. 1. gradually increasing discussion in bluesky-dev mailing list. Meaningless discussion would not count. ALL discussion must happen on list, from now on. If it didn't happen on list, it didn't happen, as a wise man once said. 2. committing of source code after they were cleaned up. Inactive committers would be revoked and new committers would apply to join in. Now or never. It is commit then review Potential new committers must show their interest on the mailing list - otherwise your mentors cannot decide if they should support a invitation or not. As you know, new committers must be voted in. The discussion should also happen before, on list. 3. preparing for what release needs and make the release successful. Thus the new developers and committers could completely experienced the release process and know about How things are done in Apache community better. You should start working on the apache way even before the release. If it didn't work well before, it will not work well while releasing. If community accept my suggestion, individually, i want the BlueSky project under strict surveillance by community members. If we can't fulfill what we just promised, then just kick us out of here and i would have noting to say. I (personally) have no problems with waiting just another 2 or 3 months. I cannot imagine anyone would like to step up as a mentor at the moment. My suggestion: try to work out the apache way now. Use jira and the mailinglist. Students contribute patches through jira. Committers apply them. And so on. If that all happens, your Jira is full of contributions and your mailinglist full of discussions. If that is the case, come back to this list and ask for a mentor again - probably somebody is willling to step up again. If you have more questions on how apache works, I am pretty sure you'll get an answer on this list. Cheers, Christian Well, suppose we live through that, besides working in Apache way, we would continually working on to evolve BlueSky to make it much more easier to use in e-learning area and used in a larger scope(now bluesky has been deployed in China and is about to be applied in India ), so that more students in undeveloped district could share the same high quality education as the developed area. Sincerely, i would invite you Ralph to be our mentor in this 1-2 month if you were not busy enough and willing to guide us. Don't feel sorry if you want to refuse me.TOT regards, Kevin 2011/6/30 Ralph Goers ralph.go...@dslextreme.com Sorry, but the explanation below makes things sound even worse. Apache projects are not here to give students a place to do school work. What you have described is not a community. If the project cannot build a community of people who are interested in the project for more than a school term then it doesn't belong here. Ralph On Jun 29, 2011, at 8:12 PM, SamuelKevin wrote: Hi, Noel: 2011/6/30 Noel J. Bergman n...@devtech.com Joe Schaefer wrote: Chen Liu wrote: We propose to move future development of BlueSky to the Apache Software Foundation in order to build a broader user and developer community. You are supposed to be doing your development work in the ASF subversion repository, using ASF mailing lists, as peers. Chen, as Joe points out, these are what BlueSky should have been doing for the past three (3) years, and yet we still here a proposal for the future. Looking at the (limited) commit history, there is a total imbalance between the number of people associated with the development work (20+) and the number of people with
Re: Bluesky calls for a new mentor!
Personally, I see a *HEAP* of stuff Bluesky would need to handle before doing an ASF release. I would get that right out of your head from the start. Firstly, you would have to have demonstrated that all the code is covered by software grants or ICLAs that are held by the Apache Software Foundation. Secondly, you would have to go through the entire codebase, and remove all code that cannot be included in a work covered by the Apache License. This would mean excluding any LGPL/GPL code, and possibly more. Secondly, you should be committing code *before* you clean it up. The clean-up should happen in public, on ASF lists. Otherwise it smacks of 'over the wall' development, meaning other developers not in your immediate team would have no capacity to engage in the development, as all they can see at Apache is a sequence of code drops, of code that was actually developed elsewhere. Here are the steps I would see the project needing to complete, and probably within a month, to survive: (a) Get all code onto Apache SVN, immediately (it is okay to include LGPL code in SVN, it just can't be released) (b) Every change to the code needs to be a real change, not a code drop (c) Patches made by students who are not committers should be uploaded to JIRA, with correct provenance (ICLA signed) before they are committed (d) All development happens on the ASF list (e) Any idea of doing a release at Apache within six months must be dropped Upayavira On Thu, 30 Jun 2011 14:49 +0800, SamuelKevin lovesumm...@gmail.com wrote: Hi, Ralph: I am not avoiding the truth that we suck during the last three years, though we were once at the verge of release. It's* just* we *Bluesky Team @ XJTU ,Xi'an China* fail to make it good, please remember it well. I believe projects from school could also rock in Apache as well. You can look down up on us but you can't deny others. I had a propose that community give us the last chance for 1-2month and certain member could become our mentor to lead us finish releasing the newest version. During this time slot. What you would see includes: 1. gradually increasing discussion in bluesky-dev mailing list. Meaningless discussion would not count. 2. committing of source code after they were cleaned up. Inactive committers would be revoked and new committers would apply to join in. 3. preparing for what release needs and make the release successful. Thus the new developers and committers could completely experienced the release process and know about How things are done in Apache community better. If community accept my suggestion, individually, i want the BlueSky project under strict surveillance by community members. If we can't fulfill what we just promised, then just kick us out of here and i would have noting to say. Well, suppose we live through that, besides working in Apache way, we would continually working on to evolve BlueSky to make it much more easier to use in e-learning area and used in a larger scope(now bluesky has been deployed in China and is about to be applied in India ), so that more students in undeveloped district could share the same high quality education as the developed area. Sincerely, i would invite you Ralph to be our mentor in this 1-2 month if you were not busy enough and willing to guide us. Don't feel sorry if you want to refuse me.TOT regards, Kevin 2011/6/30 Ralph Goers ralph.go...@dslextreme.com Sorry, but the explanation below makes things sound even worse. Apache projects are not here to give students a place to do school work. What you have described is not a community. If the project cannot build a community of people who are interested in the project for more than a school term then it doesn't belong here. Ralph On Jun 29, 2011, at 8:12 PM, SamuelKevin wrote: Hi, Noel: 2011/6/30 Noel J. Bergman n...@devtech.com Joe Schaefer wrote: Chen Liu wrote: We propose to move future development of BlueSky to the Apache Software Foundation in order to build a broader user and developer community. You are supposed to be doing your development work in the ASF subversion repository, using ASF mailing lists, as peers. Chen, as Joe points out, these are what BlueSky should have been doing for the past three (3) years, and yet we still here a proposal for the future. Looking at the (limited) commit history, there is a total imbalance between the number of people associated with the development work (20+) and the number of people with Apache accounts here (2). I guess i can explain that. Most of the developers of BlueSky project are students. As you all know, students come when they join in school and go after they graduate. So the active developers are around 10. Like we used to have 5 committers, but
Re: Bluesky calls for a new mentor!
On Thu, Jun 30, 2011 at 10:37, Christian Grobmeier grobme...@gmail.com wrote: I believe projects from school could also rock in Apache as well. You can look down up on us but you can't deny others. This is not the point. The point is, if you have contributors who have no apache id, they a) need to sign an ICLA b) need to create an Jira issue and attach an svn diff there, ticking the allowed to use for the ASF box c) need to ask development questions on the mailinglist, not by ICQ, MSN or whatever You can actually work with students, no problem. But it should happen visible. Even when you are in the same room, it should be visible to all other parties around the world. Otherwise you will never get an development community. I had a propose that community give us the last chance for 1-2month and certain member could become our mentor to lead us finish releasing the newest version. During this time slot. What you would see includes: I am not sure if the term mentor is used well here. A mentor is not here to help you in development questions. A mentors role is to oversee how the project progresses, guide people to work after the apache way. After 3 years you should already know about the apache way and mentor should be obsolet (from a teaching role only). A mentor is for sure NO project lead. He can point you to the according docs of how to release code, for example. 1. gradually increasing discussion in bluesky-dev mailing list. Meaningless discussion would not count. ALL discussion must happen on list, from now on. If it didn't happen on list, it didn't happen, as a wise man once said. 2. committing of source code after they were cleaned up. Inactive committers would be revoked and new committers would apply to join in. Now or never. It is commit then review Potential new committers must show their interest on the mailing list - otherwise your mentors cannot decide if they should support a invitation or not. As you know, new committers must be voted in. The discussion should also happen before, on list. 3. preparing for what release needs and make the release successful. Thus the new developers and committers could completely experienced the release process and know about How things are done in Apache community better. You should start working on the apache way even before the release. If it didn't work well before, it will not work well while releasing. If community accept my suggestion, individually, i want the BlueSky project under strict surveillance by community members. If we can't fulfill what we just promised, then just kick us out of here and i would have noting to say. I (personally) have no problems with waiting just another 2 or 3 months. I cannot imagine anyone would like to step up as a mentor at the moment. My suggestion: try to work out the apache way now. Use jira and the mailinglist. Students contribute patches through jira. Committers apply them. And so on. If that all happens, your Jira is full of contributions and your mailinglist full of discussions. If that is the case, come back to this list and ask for a mentor again - probably somebody is willling to step up again. We had this discussion multiple times over the last year. I firmly think, without immediate new mentors this project should not continue. Bernd - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Oozie to join the Incubator
On Wed, Jun 29, 2011 at 9:10 PM, Mohammad Islam misla...@yahoo.com wrote: ... [X ] +1 Accept Oozie for incubation ... -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: KEYS and releases
Robert Burrell Donkin wrote on Thu, Jun 30, 2011 at 08:31:38 +0100: On Tue, Jun 28, 2011 at 10:20 AM, Christian Grobmeier grobme...@gmail.com wrote: we copy a KEYS file into that directory upon succesful VOTE of the release artifacts (which also include the KEYS file). Perhaps, but the point we're getting at was explicitly stated by Benson, The goal here is to allow and encourage consumers to independently verify signatures. That calls for KEYS somewhere else than inside the package. I am sorry to ask it again, but why can't the incubator have a policy to make people use: https://id.apache.org/ to store their signing key. Then we have them listed for each projects there: https://people.apache.org/keys/ Was it not meant that way? AIUI this infrastructure is relative new and intended to add defense-in-depth Yes, it's new, and yes, it isn't meant to replaced PGP trust paths. What it does behind the scenes is 'gpg --recv-key keyid committer.asc' and publish the result over https, where the key id (or fingerprint) is provided by the committer (authenticating with their svn password). IMHO the IPMC should only document (any volunteers?) a strong recommendation but leave policy in this area to the experts over in infrastructure Robert - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: Bluesky calls for a new mentor!
Samuel Kevin wrote: Most of the developers of BlueSky project are students. As you all know, students come when they join in school and go after they graduate. So the active developers are around 10. Like we used to have 5 committers, but now we only have 2 committers in active. As others have pointed out, and I believe you acknowledge (q.v., I am not avoiding the truth that we suck during the last three years), there are better and necessary ways to address this issue. And we've worked with Google every year during the Summer of Code, so we're not exactly inexperienced working with students. According to what you've listed, i would forward your suggestion to bluesky dev list and wish we could make a quick response after discussion. Incorporate all of the feedback you're getting from folks. It is urgent that you take the advice, get all of the current code into source control ASAP, get students onto the mailing list now, start doing discussion and coding in public, and submit changes on a regular basis via SVN and/or JIRA. These are the same things you've also read from Christian and Upayavira. You don't need a new Mentor to do those things. Demonstrate change and we'll try to help you succeed. --- Noel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Oozie to join the Incubator
+1 binding Regards, Alan On Jun 29, 2011, at 12:10 PM, Mohammad Islam wrote: Hi All, The discussion about Oozie proposal is settling down. Therefore I would like to initiate a vote to accept Oozie as an Apache Incubator project. The latest proposal is pasted at the end and it could be found in the wiki as well: http://wiki.apache.org/incubator/OozieProposal The related discussion thread is at: http://www.mail-archive.com/general@incubator.apache.org/msg29633.html Please cast your votes: [ ] +1 Accept Oozie for incubation [ ] +0 Indifferent to Oozie incubation [ ] -1 Reject Oozie for incubation This vote will close 72 hours from now. Regards, Mohammad Abstract Oozie is a server-based workflow scheduling and coordination system to manage data processing jobs for Apache HadoopTM. Proposal Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes: * XML-based declarative framework to specify a job or a complex workflow of dependent jobs. * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications. * Workflow scheduling based on frequency and/or data availability. * Monitoring capability, automatic retry and failure handing of jobs. * Extensible and pluggable architecture to allow arbitrary grid programming paradigms. * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service. Background Most data processing applications require multiple jobs to achieve their goals, with inherent dependencies among the jobs. A dependency could be sequential, where one job can only start after another job has finished. Or it could be conditional, where the execution of a job depends on the return value or status of another job. In other cases, parallel execution of multiple jobs may be permitted – or desired – to exploit the massive pool of compute nodes provided by Hadoop. These job dependencies are often expressed as a Directed Acyclic Graph, also called a workflow. A node in the workflow is typically a job (a computation on the grid) or another type of action such as an eMail notification. Computations can be expressed in map/reduce, Pig, Hive or any other programming paradigm available on the grid. Edges of the graph represent transitions from one node to the next, as the execution of a workflow proceeds. Describing a workflow in a declarative way has the advantage of decoupling job dependencies and execution control from application logic. Furthermore, the workflow is modularized into jobs that can be reused within the same workflow or across different workflows. Execution of the workflow is then driven by a runtime system without understanding the application logic of the jobs. This runtime system specializes in reliable and predictable execution: It can retry actions that have failed or invoke a cleanup action after termination of the workflow; it can monitor progress, success, or failure of a workflow, and send appropriate alerts to an administrator. The application developer is relieved from implementing these generic procedures. Furthermore, some applications or workflows need to run in periodic intervals or when dependent data is available. For example, a workflow could be executed every day as soon as output data from the previous 24 instances of another, hourly workflow is available. The workflow coordinator provides such scheduling features, along with prioritization, load balancing and throttling to optimize utilization of resources in the cluster. This makes it easier to maintain, control, and coordinate complex data applications. Nearly three years ago, a team of Yahoo! developers addressed these critical requirements for Hadoop-based data processing systems by developing a new workflow management and scheduling system called Oozie. While it was initially developed as a Yahoo!-internal project, it was designed and implemented with the intention of open-sourcing. Oozie was released as a GitHub project in early 2010. Oozie is used in production within Yahoo and since it has been open-sourced it has been gaining adoption with external developers Rationale Commonly, applications that run on Hadoop require multiple Hadoop jobs in order to obtain the desired results. Furthermore, these Hadoop jobs are commonly a combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and shell scripts. Because of this, developers find themselves writing ad-hoc glue programs to combine these Hadoop jobs.
Re: [VOTE] Retire ALOIS podling
I would like to close this vote with +1 from: Betrand Delacretaz Alan Cabrera Henri Yandell Mohamma Nour El-Din Noel Bergman Christian Grobmeier I will try to do the necessary retirement steps as soon as I can. Thanks for your time! Christian On Tue, Jun 21, 2011 at 5:52 PM, Christian Grobmeier grobme...@gmail.com wrote: Hello, as already mentioned last week, the ALOIS project is dead and it seems there is no way to recover in near future (or even later). The developers told me in a private message in March that they cannot continue due to personal reasons. It seem this has become truth. I have set up a vote on the dev mailinglist: * http://s.apache.org/eBx (Note: one of the voters responded on the private list - I counted the vote) So far, no releases have been made. This vote passed before a few hour after being open for 5 days. Please vote for retirement of the alois podling. If this vote passes, I will step to the discussions on retirement and finally retire it. Thanks, Christian [] +1 - please retire [] +/-0 [] -1 - please don't retire, because... -- http://www.grobmeier.de - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Oozie to join the Incubator
+1 (non-binding) Good luck On Wed, Jun 29, 2011 at 12:10 PM, Mohammad Islam misla...@yahoo.com wrote: Hi All, The discussion about Oozie proposal is settling down. Therefore I would like to initiate a vote to accept Oozie as an Apache Incubator project. The latest proposal is pasted at the end and it could be found in the wiki as well: http://wiki.apache.org/incubator/OozieProposal The related discussion thread is at: http://www.mail-archive.com/general@incubator.apache.org/msg29633.html Please cast your votes: [ ] +1 Accept Oozie for incubation [ ] +0 Indifferent to Oozie incubation [ ] -1 Reject Oozie for incubation This vote will close 72 hours from now. Regards, Mohammad Abstract Oozie is a server-based workflow scheduling and coordination system to manage data processing jobs for Apache HadoopTM. Proposal Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes: * XML-based declarative framework to specify a job or a complex workflow of dependent jobs. * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications. * Workflow scheduling based on frequency and/or data availability. * Monitoring capability, automatic retry and failure handing of jobs. * Extensible and pluggable architecture to allow arbitrary grid programming paradigms. * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service. Background Most data processing applications require multiple jobs to achieve their goals, with inherent dependencies among the jobs. A dependency could be sequential, where one job can only start after another job has finished. Or it could be conditional, where the execution of a job depends on the return value or status of another job. In other cases, parallel execution of multiple jobs may be permitted – or desired – to exploit the massive pool of compute nodes provided by Hadoop. These job dependencies are often expressed as a Directed Acyclic Graph, also called a workflow. A node in the workflow is typically a job (a computation on the grid) or another type of action such as an eMail notification. Computations can be expressed in map/reduce, Pig, Hive or any other programming paradigm available on the grid. Edges of the graph represent transitions from one node to the next, as the execution of a workflow proceeds. Describing a workflow in a declarative way has the advantage of decoupling job dependencies and execution control from application logic. Furthermore, the workflow is modularized into jobs that can be reused within the same workflow or across different workflows. Execution of the workflow is then driven by a runtime system without understanding the application logic of the jobs. This runtime system specializes in reliable and predictable execution: It can retry actions that have failed or invoke a cleanup action after termination of the workflow; it can monitor progress, success, or failure of a workflow, and send appropriate alerts to an administrator. The application developer is relieved from implementing these generic procedures. Furthermore, some applications or workflows need to run in periodic intervals or when dependent data is available. For example, a workflow could be executed every day as soon as output data from the previous 24 instances of another, hourly workflow is available. The workflow coordinator provides such scheduling features, along with prioritization, load balancing and throttling to optimize utilization of resources in the cluster. This makes it easier to maintain, control, and coordinate complex data applications. Nearly three years ago, a team of Yahoo! developers addressed these critical requirements for Hadoop-based data processing systems by developing a new workflow management and scheduling system called Oozie. While it was initially developed as a Yahoo!-internal project, it was designed and implemented with the intention of open-sourcing. Oozie was released as a GitHub project in early 2010. Oozie is used in production within Yahoo and since it has been open-sourced it has been gaining adoption with external developers Rationale Commonly, applications that run on Hadoop require multiple Hadoop jobs in order to obtain the desired results. Furthermore, these Hadoop jobs are commonly a combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and shell scripts. Because of this, developers find themselves writing ad-hoc glue programs to combine these Hadoop jobs. These ad-hoc programs are difficult to schedule, manage, monitor and recover.
Re: [VOTE] Retire ALOIS podling
Hi Christan, To confirm, will you be doing the following steps: * Investigate whether the source was covered by CLAs. * If so then update the web page before moving to retired. * Otherwise delete the source from svn. ? Hen On Thu, Jun 30, 2011 at 10:17 AM, Christian Grobmeier grobme...@gmail.com wrote: I would like to close this vote with +1 from: Betrand Delacretaz Alan Cabrera Henri Yandell Mohamma Nour El-Din Noel Bergman Christian Grobmeier I will try to do the necessary retirement steps as soon as I can. Thanks for your time! Christian On Tue, Jun 21, 2011 at 5:52 PM, Christian Grobmeier grobme...@gmail.com wrote: Hello, as already mentioned last week, the ALOIS project is dead and it seems there is no way to recover in near future (or even later). The developers told me in a private message in March that they cannot continue due to personal reasons. It seem this has become truth. I have set up a vote on the dev mailinglist: * http://s.apache.org/eBx (Note: one of the voters responded on the private list - I counted the vote) So far, no releases have been made. This vote passed before a few hour after being open for 5 days. Please vote for retirement of the alois podling. If this vote passes, I will step to the discussions on retirement and finally retire it. Thanks, Christian [] +1 - please retire [] +/-0 [] -1 - please don't retire, because... -- http://www.grobmeier.de - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Oozie for the Apache Incubator
Strong +1 (non-binding). Thanks, Chao -- View this message in context: http://old.nabble.com/-PROPOSAL--Oozie-for-the-Apache-Incubator-tp31922563p31970721.html Sent from the Apache Incubator - General mailing list archive at Nabble.com. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org