Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Thu, Aug 20, 2015 at 8:52 AM, Jim Jagielski j...@jagunet.com wrote:


 A snapshot is not a release. Licenses kick in at distribution/
 release.


Lets just imagine if Jim, VP Legal is actually correct in his
interpretation, and that there are no AL 2.0 licenses applicable to our
source code repositories, svn or git.

Quoting http://apache.org/licenses/LICENSE-2.0 ...

2. Grant of Copyright License. Subject to the terms and conditions of this
License, each Contributor hereby grants to You a perpetual, worldwide,
non-exclusive, no-charge, royalty-free, irrevocable copyright license to
reproduce, prepare Derivative Works of, publicly display, publicly perform,
sublicense, and distribute the Work and such Derivative Works in Source or
Object form.

No, you may not modify the sources or derive those that reside within
version control of the ASF, until and upon the time when the project has
blessed that project as a release.  Patches to others' contributions to
source code control are not within the scope of this imaginary non-license
application.

3. Grant of Patent License. Subject to the terms and conditions of this
License, each Contributor hereby grants to You a perpetual, worldwide,
non-exclusive, no-charge, royalty-free, irrevocable (except as stated in
this section) patent license to make, have made, use, offer to sell, sell,
import, and otherwise transfer the Work, where such license applies only to
those patent claims licensable by such Contributor that are necessarily
infringed by their Contribution(s) alone or by combination of their
Contribution(s) with the Work to which such Contribution(s) was submitted.
If You institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work or a
Contribution incorporated within the Work constitutes direct or
contributory patent infringement, then any patent licenses granted to You
under this License for that Work shall terminate as of the date such
litigation is filed.

No, you may absolutely not test the code that has been committed to source
control without a patent license, which you do not have, until that time
when the ASF blesses the work and calls it a release.

4. Redistribution. You may reproduce and distribute copies of the Work or
Derivative Works thereof in any medium, with or without modifications, and
in Source or Object form

None of that, it's all straight out, none of it applies to your work at the
ASF until the release is blessed.  That includes passing off a patched fork
of a security fix to a reporter who claimed there was a defect in the
earlier release.

5. Submission of Contributions. Unless You explicitly state otherwise, any
Contribution intentionally submitted for inclusion in the Work by You to
the Licensor shall be under the terms and conditions of this License

Except when it isn't in Ross's and our VP Legal's own minds...

6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor, except
as required for reasonable and customary use in describing the origin of
the Work and reproducing the content of the NOTICE file.

Which wasn't a right in the first place, so no change here under any
interpretation...

7. Disclaimer of Warranty. Unless required by applicable law or agreed to
in writing, Licensor provides the Work (and each Contributor provides its
Contributions) on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied, including, without limitation, any
warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or
FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for
determining the appropriateness of using or redistributing the Work and
assume any risks associated with Your exercise of permissions under this
License.

Except that perhaps the ASF is liable, under our VP Legal's interpretation,
for works which do reside in source control and were not, in fact, released
to the general public?  [Ad nauseam 8. and 9.]

Let's just not go this direction, because it is plainly false. Jim, it
would truly be helpful if you spoke up for or in contradiction to your
earlier statements, here...

Cheers,

Bill


[DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Roman Shaposhnik
Hi!

I would like to start a discussion on accepting HAWQ
into ASF Incubator. The proposal is available at:
https://wiki.apache.org/incubator/ApexProposal
and is also attached to the end of this email.

Please note, that this proposal is very complementary
to the desire of HAWQ's sister project (MADlib) to
join ASF Incubator:
http://madlib.net/pipermail/user/2015-August/
http://madlib.net/pipermail/devel/2015-August/
I've volunteered to help MADlib community and we're
currently working on a separate proposal to be submitted
later next week. If you're interested in monitoring progress
of that please see updates to:
 https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
and later:
 https://wiki.apache.org/incubator/MADlibProposal

Thanks in advance for your time and help.

Thanks,
Roman.

== Abstract ==

HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
around a robust and high-performance massively-parallel processing
(MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.

HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
extensions) and supports open database connectivity (ODBC) and Java
database connectivity (JDBC), as well. Most business intelligence,
data analysis and data visualization tools work with HAWQ out of the
box without the need for specialized drivers.

A unique aspect of HAWQ is its integration of statistical and machine
learning capabilities that can be natively invoked from SQL or (in the
context of PL/Python, PL/Java or PL/R) in massively parallel modes and
applied to large data sets across a Hadoop cluster. These capabilities
are provided through MADlib – an existing open source, parallel
machine-learning library. Given the close ties between the two
development communities, the MADlib community has expressed interest
in joining HAWQ on its journey into the ASF Incubator and will be
submitting a separate, concurrent proposal.

HAWQ will provide more robust and higher performing options for Hadoop
environments that demand best-in-class data analytics for business
critical purposes. HAWQ is implemented in C and C++.

== Proposal ==
The goal of this proposal is to bring the core of Pivotal Software,
Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
Foundation (ASF) in order to build a vibrant, diverse and
self-governed open source community around the technology. Pivotal has
agreed to transfer the brand name HAWQ to Apache Software Foundation
and will stop using HAWQ to refer to this software if the project gets
accepted into the ASF Incubator under the name of Apache HAWQ
(incubating). Pivotal will continue to market and sell an analytic
engine product that includes Apache HAWQ (incubating). While HAWQ is
our primary choice for a name of the project, in anticipation of any
potential issues with PODLINGNAMESEARCH we have come up with two
alternative names: (1) Hornet; or (2) Grove.

Pivotal is submitting this proposal to donate the HAWQ source code and
associated artifacts (documentation, web site content, wiki, etc.) to
the Apache Software Foundation Incubator under the Apache License,
Version 2.0 and is asking Incubator PMC to establish an open source
community.

== Background ==
While the ecosystem of open source SQL-on-Hadoop solutions is fairly
developed by now, HAWQ has several unique features that will set it
apart from existing ASF and non-ASF projects. HAWQ made its debut in
2013 as a closed source product leveraging a decade's worth of product
development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
has rapidly gained a solid customer base and became available on
non-Pivotal distributions of Hadoop.
In 2015 HAWQ still leverages the rock solid foundation of Greenplum
Database, while at the same time embracing elasticity and resource
management native to Hadoop applications. This allows HAWQ to provide
superior SQL on Hadoop performance, scalability and coverage while
also providing massively-parallel machine learning capabilities and
support for native Hadoop file formats. In addition, HAWQ's advanced
features include support for complex joins, rich and compliant SQL
dialect and industry-differentiating data federation capabilities.
Dynamic pipelining and pluggable query optimizer architecture enable
HAWQ to perform queries on Hadoop with the speed and scalability
required for enterprise data warehouse (EDW) workloads. HAWQ provides
strong support for low-latency analytic SQL queries, coupled with
massively parallel machine learning capabilities. This enables
discovery-based analysis of large data sets and rapid, iterative
development of data analytics applications that apply deep machine
learning – 

Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Youngwoo Kim
Hi Roman,

Great news!

BTW, it might be a invalid URL for the proposal. Should be
https://wiki.apache.org/incubator/HAWQProposal ?

Thanks,
Youngwoo

On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote:

 Hi!

 I would like to start a discussion on accepting HAWQ
 into ASF Incubator. The proposal is available at:
 https://wiki.apache.org/incubator/ApexProposal
 and is also attached to the end of this email.

 Please note, that this proposal is very complementary
 to the desire of HAWQ's sister project (MADlib) to
 join ASF Incubator:
 http://madlib.net/pipermail/user/2015-August/
 http://madlib.net/pipermail/devel/2015-August/
 I've volunteered to help MADlib community and we're
 currently working on a separate proposal to be submitted
 later next week. If you're interested in monitoring progress
 of that please see updates to:
  https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
 and later:
  https://wiki.apache.org/incubator/MADlibProposal

 Thanks in advance for your time and help.

 Thanks,
 Roman.

 == Abstract ==

 HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
 around a robust and high-performance massively-parallel processing
 (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.

 HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
 with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
 Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
 managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
 compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
 extensions) and supports open database connectivity (ODBC) and Java
 database connectivity (JDBC), as well. Most business intelligence,
 data analysis and data visualization tools work with HAWQ out of the
 box without the need for specialized drivers.

 A unique aspect of HAWQ is its integration of statistical and machine
 learning capabilities that can be natively invoked from SQL or (in the
 context of PL/Python, PL/Java or PL/R) in massively parallel modes and
 applied to large data sets across a Hadoop cluster. These capabilities
 are provided through MADlib – an existing open source, parallel
 machine-learning library. Given the close ties between the two
 development communities, the MADlib community has expressed interest
 in joining HAWQ on its journey into the ASF Incubator and will be
 submitting a separate, concurrent proposal.

 HAWQ will provide more robust and higher performing options for Hadoop
 environments that demand best-in-class data analytics for business
 critical purposes. HAWQ is implemented in C and C++.

 == Proposal ==
 The goal of this proposal is to bring the core of Pivotal Software,
 Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
 Foundation (ASF) in order to build a vibrant, diverse and
 self-governed open source community around the technology. Pivotal has
 agreed to transfer the brand name HAWQ to Apache Software Foundation
 and will stop using HAWQ to refer to this software if the project gets
 accepted into the ASF Incubator under the name of Apache HAWQ
 (incubating). Pivotal will continue to market and sell an analytic
 engine product that includes Apache HAWQ (incubating). While HAWQ is
 our primary choice for a name of the project, in anticipation of any
 potential issues with PODLINGNAMESEARCH we have come up with two
 alternative names: (1) Hornet; or (2) Grove.

 Pivotal is submitting this proposal to donate the HAWQ source code and
 associated artifacts (documentation, web site content, wiki, etc.) to
 the Apache Software Foundation Incubator under the Apache License,
 Version 2.0 and is asking Incubator PMC to establish an open source
 community.

 == Background ==
 While the ecosystem of open source SQL-on-Hadoop solutions is fairly
 developed by now, HAWQ has several unique features that will set it
 apart from existing ASF and non-ASF projects. HAWQ made its debut in
 2013 as a closed source product leveraging a decade's worth of product
 development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
 has rapidly gained a solid customer base and became available on
 non-Pivotal distributions of Hadoop.
 In 2015 HAWQ still leverages the rock solid foundation of Greenplum
 Database, while at the same time embracing elasticity and resource
 management native to Hadoop applications. This allows HAWQ to provide
 superior SQL on Hadoop performance, scalability and coverage while
 also providing massively-parallel machine learning capabilities and
 support for native Hadoop file formats. In addition, HAWQ's advanced
 features include support for complex joins, rich and compliant SQL
 dialect and industry-differentiating data federation capabilities.
 Dynamic pipelining and pluggable query optimizer architecture enable
 HAWQ to perform queries on Hadoop with the speed and scalability
 required for enterprise data warehouse (EDW) 

Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Roman Shaposhnik
On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com wrote:
 Hi Roman,

 Great news!

 BTW, it might be a invalid URL for the proposal. Should be
 https://wiki.apache.org/incubator/HAWQProposal ?

Two may copy-paste buffers strike again :-( Thanks for spotting it
so quickly. Yes it is:
https://wiki.apache.org/incubator/HAWQProposal

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[DISCUSS] Horn Incubation Proposal

2015-08-20 Thread Edward J. Yoon
Hi all,

We'd like to propose Horn (혼), a fully distributed system for
large-scale deep learning as an Apache Incubator project and start the
discussion. The complete proposal can be found at:
https://wiki.apache.org/incubator/HornProposal

Any advices and helps are welcome! Thanks, Edward.

= Horn Proposal =

== Abstract ==

(tentatively named Horn [hɔ:n], korean meaning of Horn is a
Spirit) is a neuron-centric programming APIs and execution framework
for large-scale deep learning, built on top of Apache Hama.

== Proposal ==

It is a goal of the Horn to provide a neuron-centric programming APIs
which allows user to easily define the characteristic of artificial
neural network model and its structure, and its execution framework
that leverages the heterogeneous resources on Hama and Hadoop YARN
cluster.

== Background ==

The initial ANN code was developed at Apache Hama project by a
committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
work is to build a framework that provides more intuitive programming
APIs like Google's MapReduce or Pregel and supports applications
needing large model with huge memory consumptions in distributed way.

== Rationale ==

While many of deep learning open source softwares such as Caffe,
DeepDist, and NeuralGiraph are still data or model parallel only, we
aim to support both data and model parallelism and also fault-tolerant
system design. The basic idea of data and model parallelism is use of
the remote parameter server to parallelize model creation and
distribute training across machines, and the BSP framework of Apache
Hama for performing asynchronous mini-batches. Within single BSP job,
each task group works asynchronously using region barrier
synchronization instead of global barrier synchronization, and trains
large-scale neural network model using assigned data sets in BSP
paradigm. Thus, we achieve data and model parallelism. This
architecture is inspired by Google's !DistBelief (Jeff Dean et al,
2012).

== Initial Goals ==

Some current goals include:
 * builds new community
 * provides more intuitive programming APIs
 * needs both data and model parallelism support
 * must run natively on both Hama and Hadoop2
 * needs also GPUs and InfiniBand support (FPGAs if possible)

== Current Status ==

=== Meritocracy ===

The core developers understand what it means to have a process based
on meritocracy. We will provide continuous efforts to build an
environment that supports this, encouraging community members to
contribute.

=== Community ===

A small community has formed within the Apache Hama project and some
companies such as instant messenger service company and mobile
manufacturing company. And many people are interested in the
large-scale deep learning platform itself. By bringing Horn into
Apache, we believe that the community will grow even bigger.

=== Core Developers ===

Edward J. Yoon, Thomas Jungblut, and Dongjin Lee

== Known Risks ==

=== Orphaned Products ===

Apache Hama is already a core open source component at Samsung
Electronics, and Horn also will be used by Samsung Electronics, and so
there is no direct risk for this project to be orphaned.

=== Inexperience with Open Source ===

Some are very new and the others have experience using and/or working
on Apache open source projects.

=== Homogeneous Developers ===

The initial committers are from different organizations such as,
Microsoft, Samsung Electronics, and Line Plus.

=== Reliance on Salaried Developers ===

Few will be worked as a full-time open source developer. Other
developers will also start working on the project in their spare time.

=== Relationships with Other Apache Products ===

 * Horn is based on Apache Hama
 * Apache Zookeeper is used for distributed locking service
 * Natively run on Apache Hadoop and Mesos
 * Horn can be somewhat overlapped with Singa podling (If possible,
we'd also like to use Singa or Caffe to do the heavy lifting part).

=== An Excessive Fascination with the Apache Brand ===

Horn itself will hopefully have benefits from Apache, in terms of
attracting a community and establishing a solid group of developers,
but also the relation with Apache Hama, a general-purpose BSP
computing engine. These are the main reasons for us to send this
proposal.

== Documentation ==

Initial plan about Horn can be found at
http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html

== Initial Source ==

The initial source code has been release as part of Apache Hama
project developed under Apache Software Foundation. The source code is
currently hosted at
https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/

== Cryptography ==

Not applicable.

== Required Resources ==

=== Mailing Lists ===

 * horn-private
 * horn-dev

=== Subversion Directory ===

 * Git is the preferred source control system: git://git.apache.org/horn

=== Issue Tracking ===

 * a JIRA issue tracker, HORN

== Initial Committers and Affiliations 

Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Aug 20, 2015 08:52, Jim Jagielski j...@jagunet.com wrote:

 Coming in late.

 A snapshot is not a release. Licenses kick in at distribution/
 release.

I want to fix FUD before it infests the rafters and subfloor.  I really
have never read something so stupid or ill phrased...

Every contributor committing code to any ASF project, or even contributing
it to us in public forums (including our mailing lists, our bug trackers,
etc) is committing that code under the AL or has designated explicitly what
licence it came in under (commit message: forked from BSD-licensed code
base at {URL}.)

It is generally AL code all the time.  I don't know where you invented a
'kick-in' concept, but unless the committers are violating their ICLA/CCLA,
nothing could be further from the truth.

 There is also a trademark issue as well... only the ASF
 can declare something as a release.

There we agree :)


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Alex Harui


On 8/20/15, 5:27 PM, William A Rowe Jr wr...@rowe-clan.net wrote:

It is generally AL code all the time.  I don't know where you invented a
'kick-in' concept, but unless the committers are violating their
ICLA/CCLA,
nothing could be further from the truth.

Committers sometimes make mistakes.  IIRC, Justin recently caught a
mistake where some files accidentally got their non-AL headers replaced
with AL headers.

Large codebase contributions, especially initial podling code grants might
be messy as well until scrubbed and approved for an official ASF release.
I know from experience.

-Alex



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Aug 20, 2015 8:19 PM, William A Rowe Jr wr...@rowe-clan.net wrote:

 On Aug 20, 2015 7:39 PM, Alex Harui aha...@adobe.com wrote:
 
 
 
  On 8/20/15, 5:27 PM, William A Rowe Jr wr...@rowe-clan.net wrote:
 
  It is generally AL code all the time.  I don't know where you invented
a
  'kick-in' concept, but unless the committers are violating their
  ICLA/CCLA,
  nothing could be further from the truth.
 
  Committers sometimes make mistakes.  IIRC, Justin recently caught a
  mistake where some files accidentally got their non-AL headers replaced
  with AL headers.
 
  Large codebase contributions, especially initial podling code grants
might
  be messy as well until scrubbed and approved for an official ASF
release.
  I know from experience.

 We don't disagree on this point.  Sometimes, they are caught through the
release process, or by peer review.  Other times, we must retract the claim
we offered.

 Nothing changes the fact that code is either offered under the AL 2.0 or
another license, unless the author/licensor changes their license
retroactively.

Your comment also hones in on the logical fallacy our VP fell into... While
it may be true that the ASF granted its own AL 2.0 license to the release
package, the ASF is unable to change component licenses in incompatible
ways.  And the warranty the ASF offers on an inaccurate license claims is -
nil - c.f. AL 2.0

However, if our repositories are under another license, that VP needs to
make public this information, because I never got the memo, and I must
notify friends and the many companies I advise and consult to that they all
need to cease looking at the ASF's repositories, and let their respective
legal departments each sort this all out, if those repositories are
licensed with terms and conditions differing from the AL 2.0.


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Benson Margulies
This thread started as a discussion of Linux distros and trademarks.
Perhaps I could try to return it there?

If a distro takes a release of Apache X, compiles it with minimal changes
that adapt it to the environment, and distributes it, I believe that it's a
fine thing for them to call it simple Apache X, and acknowledge our marks.

If a distro takes a release of Apache X, and make significant changes to
it, and then distributes it, I believe that it's not OK with us for them to
simply call it Apache X. I've seen some evidence that Gentoo Linux makes a
regular habit of this, because their policies drive them to make some
pretty scary changes in some cases. Others may not share my view.

Further, if someone takes a snapshot (small 's') from source control and
starts from that, with minimal changes, I think that this would also be
trademark-acceptable, so long as they accurately describe what they did.

The operative concept here, as Shane has taught it, is 'confusion in the
marketplace.' If some third party behaves so as to cause confusion as to
the identity of Apache X, there's a trademark issue. If not, not.


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Thu, Aug 20, 2015 at 9:37 PM, Ross Gardler ross.gard...@microsoft.com
wrote:

 I do not agree with this interpretation when viewed from a legal angle
 (though I do agree from a trademark angle). I have a feeling that the root
 of my disagreement is the same as the root of Jim's earlier statement
 (though I may be mistaken).


You've lost me already, but let's unwind this...


 There are two points of IP due diligence in an Apache project: At the
 point of contribution where the IP is validated by the committer and zero
 or more people who review the patch. The second phase of IP validation is
 at the point of release, where 3 our more PMC members validate that the
 foundation can legally release the code.


No, 3 or more PMC members make a best-effort that the code meets our
qualifications for release.  Not being copyright and patent atty's, we
presume they did not cast their votes based on a legal definition of due
diligence.


 This means that taking a snapshot and building a release is *not*
 trademark-acceptable since the foundation, through the project PMC has not
 approved the release, therefore it is not an Apache release.


That much we agree on...

Only the ASF gets to say what is an ASF release and to do so requires a
 vote of the PMC. It has nothing to do with the number of changes made to
 what is in our repositories. It has everything to do with whether it's a
 release of the foundation.


Accurate...


 So, in the strictest sense, distributions that make minor changes for
 their distribution should call it Bar powered by Apache Foo in order to
 differentiate it from an official release of the foundation. In the real
 world the question is, from a legal point of view, do we care?


Here is where there is some room for interpretation, the httpd project can
probably be built more than 10^9 different ways (I extrapolate this from a
Chipotle drink cup that claimed the number of permutations of their
quick-service faux-tex-mex menu)


 (lets ignore the fact that some people vote on releases without doing
 proper validation, that's why we require 3 +1 votes, the assumption is that
 at least one of them did the job properly)


Define Proper, I haven't read that
http://www.apache.org/dev/release/proper.html page yet.

You still didn't comment on the license under which the repository is
licensed, so this wasn't a terribly helpful post.

From: William A Rowe Jrmailto:wr...@rowe-clan.net
 Sent: ‎8/‎20/‎2015 7:17 PM
 To: general@incubator.apache.orgmailto:general@incubator.apache.org
 Subject: Re: What is the legal basis for enforcing release policies at ASF?

 On Thu, Aug 20, 2015 at 9:03 PM, Benson Margulies bimargul...@gmail.com
 wrote:

  This thread started as a discussion of Linux distros and trademarks.
  Perhaps I could try to return it there?
 
  If a distro takes a release of Apache X, compiles it with minimal changes
  that adapt it to the environment, and distributes it, I believe that
 it's a
  fine thing for them to call it simple Apache X, and acknowledge our
 marks.
 
  If a distro takes a release of Apache X, and make significant changes to
  it, and then distributes it, I believe that it's not OK with us for them
 to
  simply call it Apache X. I've seen some evidence that Gentoo Linux makes
 a
  regular habit of this, because their policies drive them to make some
  pretty scary changes in some cases. Others may not share my view.
 
  Further, if someone takes a snapshot (small 's') from source control and
  starts from that, with minimal changes, I think that this would also be
  trademark-acceptable, so long as they accurately describe what they did.
 
  The operative concept here, as Shane has taught it, is 'confusion in the
  marketplace.' If some third party behaves so as to cause confusion as to
  the identity of Apache X, there's a trademark issue. If not, not.
 

 You summed this up to the best of my understanding ... +1.  If our legal VP
 agrees (and retracts earlier FUD) it appears we are entirely in agreement.



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Niclas Hedhman
I think it is somewhat amusing, that this is actually discussed ~20years
after Apache group is formed. A newcomer must be flabbergasted that this
isn't clear cut by now... ;-)

// Niclas

On Fri, Aug 21, 2015 at 10:37 AM, Ross Gardler ross.gard...@microsoft.com
wrote:

 I do not agree with this interpretation when viewed from a legal angle
 (though I do agree from a trademark angle). I have a feeling that the root
 of my disagreement is the same as the root of Jim's earlier statement
 (though I may be mistaken).

 There are two points of IP due diligence in an Apache project: At the
 point of contribution where the IP is validated by the committer and zero
 or more people who review the patch. The second phase of IP validation is
 at the point of release, where 3 our more PMC members validate that the
 foundation can legally release the code.

 This means that taking a snapshot and building a release is *not*
 trademark-acceptable since the foundation, through the project PMC has not
 approved the release, therefore it is not an Apache release.

 Only the ASF gets to say what is an ASF release and to do so requires a
 vote of the PMC. It has nothing to do with the number of changes made to
 what is in our repositories. It has everything to do with whether it's a
 release of the foundation.

 So, in the strictest sense, distributions that make minor changes for
 their distribution should call it Bar powered by Apache Foo in order to
 differentiate it from an official release of the foundation. In the real
 world the question is, from a legal point of view, do we care?

 (lets ignore the fact that some people vote on releases without doing
 proper validation, that's why we require 3 +1 votes, the assumption is that
 at least one of them did the job properly)

 Sent from my Windows Phone
 
 From: William A Rowe Jrmailto:wr...@rowe-clan.net
 Sent: ‎8/‎20/‎2015 7:17 PM
 To: general@incubator.apache.orgmailto:general@incubator.apache.org
 Subject: Re: What is the legal basis for enforcing release policies at ASF?

 On Thu, Aug 20, 2015 at 9:03 PM, Benson Margulies bimargul...@gmail.com
 wrote:

  This thread started as a discussion of Linux distros and trademarks.
  Perhaps I could try to return it there?
 
  If a distro takes a release of Apache X, compiles it with minimal changes
  that adapt it to the environment, and distributes it, I believe that
 it's a
  fine thing for them to call it simple Apache X, and acknowledge our
 marks.
 
  If a distro takes a release of Apache X, and make significant changes to
  it, and then distributes it, I believe that it's not OK with us for them
 to
  simply call it Apache X. I've seen some evidence that Gentoo Linux makes
 a
  regular habit of this, because their policies drive them to make some
  pretty scary changes in some cases. Others may not share my view.
 
  Further, if someone takes a snapshot (small 's') from source control and
  starts from that, with minimal changes, I think that this would also be
  trademark-acceptable, so long as they accurately describe what they did.
 
  The operative concept here, as Shane has taught it, is 'confusion in the
  marketplace.' If some third party behaves so as to cause confusion as to
  the identity of Apache X, there's a trademark issue. If not, not.
 

 You summed this up to the best of my understanding ... +1.  If our legal VP
 agrees (and retracts earlier FUD) it appears we are entirely in agreement.




-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Thu, Aug 20, 2015 at 9:03 PM, Benson Margulies bimargul...@gmail.com
wrote:

 This thread started as a discussion of Linux distros and trademarks.
 Perhaps I could try to return it there?

 If a distro takes a release of Apache X, compiles it with minimal changes
 that adapt it to the environment, and distributes it, I believe that it's a
 fine thing for them to call it simple Apache X, and acknowledge our marks.

 If a distro takes a release of Apache X, and make significant changes to
 it, and then distributes it, I believe that it's not OK with us for them to
 simply call it Apache X. I've seen some evidence that Gentoo Linux makes a
 regular habit of this, because their policies drive them to make some
 pretty scary changes in some cases. Others may not share my view.

 Further, if someone takes a snapshot (small 's') from source control and
 starts from that, with minimal changes, I think that this would also be
 trademark-acceptable, so long as they accurately describe what they did.

 The operative concept here, as Shane has taught it, is 'confusion in the
 marketplace.' If some third party behaves so as to cause confusion as to
 the identity of Apache X, there's a trademark issue. If not, not.


You summed this up to the best of my understanding ... +1.  If our legal VP
agrees (and retracts earlier FUD) it appears we are entirely in agreement.


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Thu, Aug 20, 2015 at 9:11 PM, Christopher ctubb...@apache.org wrote:

 It sounds to me like you're saying that the license under which code is
 offered (to anybody who encounters it) is independent of the license
 declaration attached to the project.


No, the license is that which was granted by the author, and I think you
missed my followup by a few minutes, so I will quote myself here...

Your comment also hones in on the logical fallacy our VP fell into...
While it may be true that the ASF granted its own AL 2.0 license to the
release package, the ASF is unable to change component licenses in
incompatible ways.  And the warranty the ASF offers on an inaccurate
license claims is - nil - c.f. AL 2.0

However, if our repositories are under another license, that VP needs to
make public this information, because I never got the memo, and I must
notify friends and the many companies I advise and consult to that they all
need to cease looking at the ASF's repositories, and let their respective
legal departments each sort this all out, if those repositories are
licensed with terms and conditions differing from the AL 2.0.
Obviously, I think our VP Legal isn't taking his job seriously of advising
the community on the specific legal particularities of the software we
create, which is why I'm going to stand pat until someone offers up a
compelling argument over why anyone is not able to take any of the AL 2.0
code out of ASF repositories, released or not, and re-purpose it for
whatever they desire.

But don't name it by Apache {foo} unless {foo} PMC sanctioned the release
of the code.  It's entirely in trademark law, and our license and copyright
law gives them everything they need to utilize the code, released or not.


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Christopher
It sounds to me like you're saying that the license under which code is
offered (to anybody who encounters it) is independent of the license
declaration attached to the project.

This makes sense to me, presuming that we still agree that the license
declaration (header or license file) is the best way to communicate the
license under which the code is offered.

It seems to follow, then, that were saying that there are sometimes errors
in the declaration, where it doesn't reflect what license the code is
actually offered under (if any). Further, we're saying that this is
hopefully less likely in a release, which has been vetted with greater
scrutiny.

Is that right?

If so, then it seems to me that the question really becomes: is it
sufficiently communicated by the very fact of being a snapshot (any state
of the code other than in a release), that errors are possible in the
license? I would think the answer is yes, personally. However, I'm not sure
it really means much, because it's still reasonable for people to assume
the license declaration is correct, until shown otherwise.

It seems to me that the very fact that any license declaration is attached
to the code at all, regardless of its state as a release or snapshot,
shifts the burden of responsibility to actually demonstrate that the
license does not apply. This is the reverse of the case when no obvious
license declaration is made. The burden in that case is to show that the
license does apply. Isn't that why we explicitly put headers on each file,
in addition to the LICENSE file? To explicitly shift this burden to us in
order to encourage free use of our software by others?

On Thu, Aug 20, 2015, 21:19 William A Rowe Jr wr...@rowe-clan.net wrote:

 On Aug 20, 2015 7:39 PM, Alex Harui aha...@adobe.com wrote:
 
 
 
  On 8/20/15, 5:27 PM, William A Rowe Jr wr...@rowe-clan.net wrote:
 
  It is generally AL code all the time.  I don't know where you invented a
  'kick-in' concept, but unless the committers are violating their
  ICLA/CCLA,
  nothing could be further from the truth.
 
  Committers sometimes make mistakes.  IIRC, Justin recently caught a
  mistake where some files accidentally got their non-AL headers replaced
  with AL headers.
 
  Large codebase contributions, especially initial podling code grants
 might
  be messy as well until scrubbed and approved for an official ASF release.
  I know from experience.

 We don't disagree on this point.  Sometimes, they are caught through the
 release process, or by peer review.  Other times, we must retract the claim
 we offered.

 Nothing changes the fact that code is either offered under the AL 2.0 or
 another license, unless the author/licensor changes their license
 retroactively.



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread sebb
AFAIK a SNAPSHOT has not been voted on and is therefore not a formal
ASF release.

So for example this would cover CI builds that deploy jars to the ASF
Maven SNAPSHOT repo.


On 20 August 2015 at 23:33, Mike Kienenberger mkien...@gmail.com wrote:
 On Thu, Aug 20, 2015 at 6:23 PM, Gavin McDonald ga...@16degrees.com.au 
 wrote:
 So what do we do about all the rc1|rc2|rcx ,alphas, betas and Milestone
 ‘releases’ that are on our official mirrors right now?

 (Because they would have been voted on as a ‘’release’’ for the projects to
 put them there in the first place)

 Release means different things in different contexts.  An ASF
 release is a product that a PMC has vetted to meet ASF release
 standards (builds from source, APL2 licensed) and has made available
 to end-users in our download services.  This use of release deals
 with legal and can-be-modified promises made to the end-user.

 Various ASF projects also use release to mean something different --
 a community-approved product that has a certain API and typically has
 no known issues.  This use of release generally deals with technical
 aspects of the project, such as stability and reliability.

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread William A Rowe Jr
On Aug 20, 2015 7:39 PM, Alex Harui aha...@adobe.com wrote:



 On 8/20/15, 5:27 PM, William A Rowe Jr wr...@rowe-clan.net wrote:

 It is generally AL code all the time.  I don't know where you invented a
 'kick-in' concept, but unless the committers are violating their
 ICLA/CCLA,
 nothing could be further from the truth.

 Committers sometimes make mistakes.  IIRC, Justin recently caught a
 mistake where some files accidentally got their non-AL headers replaced
 with AL headers.

 Large codebase contributions, especially initial podling code grants might
 be messy as well until scrubbed and approved for an official ASF release.
 I know from experience.

We don't disagree on this point.  Sometimes, they are caught through the
release process, or by peer review.  Other times, we must retract the claim
we offered.

Nothing changes the fact that code is either offered under the AL 2.0 or
another license, unless the author/licensor changes their license
retroactively.


RE: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Ross Gardler
I do not agree with this interpretation when viewed from a legal angle (though 
I do agree from a trademark angle). I have a feeling that the root of my 
disagreement is the same as the root of Jim's earlier statement (though I may 
be mistaken).

There are two points of IP due diligence in an Apache project: At the point of 
contribution where the IP is validated by the committer and zero or more people 
who review the patch. The second phase of IP validation is at the point of 
release, where 3 our more PMC members validate that the foundation can legally 
release the code.

This means that taking a snapshot and building a release is *not* 
trademark-acceptable since the foundation, through the project PMC has not 
approved the release, therefore it is not an Apache release.

Only the ASF gets to say what is an ASF release and to do so requires a vote of 
the PMC. It has nothing to do with the number of changes made to what is in our 
repositories. It has everything to do with whether it's a release of the 
foundation.

So, in the strictest sense, distributions that make minor changes for their 
distribution should call it Bar powered by Apache Foo in order to differentiate 
it from an official release of the foundation. In the real world the question 
is, from a legal point of view, do we care?

(lets ignore the fact that some people vote on releases without doing proper 
validation, that's why we require 3 +1 votes, the assumption is that at least 
one of them did the job properly)

Sent from my Windows Phone

From: William A Rowe Jrmailto:wr...@rowe-clan.net
Sent: ‎8/‎20/‎2015 7:17 PM
To: general@incubator.apache.orgmailto:general@incubator.apache.org
Subject: Re: What is the legal basis for enforcing release policies at ASF?

On Thu, Aug 20, 2015 at 9:03 PM, Benson Margulies bimargul...@gmail.com
wrote:

 This thread started as a discussion of Linux distros and trademarks.
 Perhaps I could try to return it there?

 If a distro takes a release of Apache X, compiles it with minimal changes
 that adapt it to the environment, and distributes it, I believe that it's a
 fine thing for them to call it simple Apache X, and acknowledge our marks.

 If a distro takes a release of Apache X, and make significant changes to
 it, and then distributes it, I believe that it's not OK with us for them to
 simply call it Apache X. I've seen some evidence that Gentoo Linux makes a
 regular habit of this, because their policies drive them to make some
 pretty scary changes in some cases. Others may not share my view.

 Further, if someone takes a snapshot (small 's') from source control and
 starts from that, with minimal changes, I think that this would also be
 trademark-acceptable, so long as they accurately describe what they did.

 The operative concept here, as Shane has taught it, is 'confusion in the
 marketplace.' If some third party behaves so as to cause confusion as to
 the identity of Apache X, there's a trademark issue. If not, not.


You summed this up to the best of my understanding ... +1.  If our legal VP
agrees (and retracts earlier FUD) it appears we are entirely in agreement.


Re: apache binary distributions

2015-08-20 Thread Niclas Hedhman
On Thu, Aug 20, 2015 at 1:06 AM, William A Rowe Jr wr...@rowe-clan.net
wrote:


 There are some special things here we do have absolute control over. If a
 project wants to provide the 'official' build, why not start signing the
 .jar?


Good idea, but to be practical to users, the certificate for the signing
needs to be part of the certificate chain of the JVM (otherwise those would
be needed to be installed on every host). I don't know how willing infra
would be to support PKI at ASF for this, otherwise many projects will be
limited due to cost (I could be wrong by now and that there are totally
free CAs)

Cheers
-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Benson Margulies
On Thu, Aug 20, 2015 at 9:52 AM, Jim Jagielski j...@jagunet.com wrote:
 Coming in late.

 A snapshot is not a release. Licenses kick in at distribution/
 release.

Are you sure? When you have a public source control repo, with a
LICENSE file at the top, I would think that this counts as a legal
'publication' under the terms of the license.

if not, just what is the legal status of source code snipped from our
repositories?



 There is also a trademark issue as well... only the ASF
 can declare something as a release.

 On Aug 6, 2015, at 8:50 PM, Roman Shaposhnik ro...@shaposhnik.org wrote:

 Hi!

 while answering a question on release policies and ALv2
 I've suddenly realized that I really don't know what is the
 legal basis for enforcing release policies we've got
 documented over here:
   http://www.apache.org/dev/release.html

 For example, what would be the legal basis for stopping
 a 3d party from releasing a snapshot of ASF's project
 source tree and claim it to be a release X.Y.Z of said
 project?

 Thanks,
 Roman.

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Jim Jagielski
Coming in late.

A snapshot is not a release. Licenses kick in at distribution/
release.

There is also a trademark issue as well... only the ASF
can declare something as a release.

 On Aug 6, 2015, at 8:50 PM, Roman Shaposhnik ro...@shaposhnik.org wrote:
 
 Hi!
 
 while answering a question on release policies and ALv2
 I've suddenly realized that I really don't know what is the
 legal basis for enforcing release policies we've got
 documented over here:
   http://www.apache.org/dev/release.html
 
 For example, what would be the legal basis for stopping
 a 3d party from releasing a snapshot of ASF's project
 source tree and claim it to be a release X.Y.Z of said
 project?
 
 Thanks,
 Roman.
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Ted Dunning
Drill is implemented entirely in Java.

This isn't core to the proposal, but it would be better corrected.



On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com
wrote:

 Hi Roman,

 Great news!

 BTW, it might be a invalid URL for the proposal. Should be
 https://wiki.apache.org/incubator/HAWQProposal ?

 Thanks,
 Youngwoo

 On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote:

  Hi!
 
  I would like to start a discussion on accepting HAWQ
  into ASF Incubator. The proposal is available at:
  https://wiki.apache.org/incubator/ApexProposal
  and is also attached to the end of this email.
 
  Please note, that this proposal is very complementary
  to the desire of HAWQ's sister project (MADlib) to
  join ASF Incubator:
  http://madlib.net/pipermail/user/2015-August/
  http://madlib.net/pipermail/devel/2015-August/
  I've volunteered to help MADlib community and we're
  currently working on a separate proposal to be submitted
  later next week. If you're interested in monitoring progress
  of that please see updates to:
   https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
  and later:
   https://wiki.apache.org/incubator/MADlibProposal
 
  Thanks in advance for your time and help.
 
  Thanks,
  Roman.
 
  == Abstract ==
 
  HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
  around a robust and high-performance massively-parallel processing
  (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
 
  HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
  with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
  Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
  managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
  compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
  extensions) and supports open database connectivity (ODBC) and Java
  database connectivity (JDBC), as well. Most business intelligence,
  data analysis and data visualization tools work with HAWQ out of the
  box without the need for specialized drivers.
 
  A unique aspect of HAWQ is its integration of statistical and machine
  learning capabilities that can be natively invoked from SQL or (in the
  context of PL/Python, PL/Java or PL/R) in massively parallel modes and
  applied to large data sets across a Hadoop cluster. These capabilities
  are provided through MADlib – an existing open source, parallel
  machine-learning library. Given the close ties between the two
  development communities, the MADlib community has expressed interest
  in joining HAWQ on its journey into the ASF Incubator and will be
  submitting a separate, concurrent proposal.
 
  HAWQ will provide more robust and higher performing options for Hadoop
  environments that demand best-in-class data analytics for business
  critical purposes. HAWQ is implemented in C and C++.
 
  == Proposal ==
  The goal of this proposal is to bring the core of Pivotal Software,
  Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
  Foundation (ASF) in order to build a vibrant, diverse and
  self-governed open source community around the technology. Pivotal has
  agreed to transfer the brand name HAWQ to Apache Software Foundation
  and will stop using HAWQ to refer to this software if the project gets
  accepted into the ASF Incubator under the name of Apache HAWQ
  (incubating). Pivotal will continue to market and sell an analytic
  engine product that includes Apache HAWQ (incubating). While HAWQ is
  our primary choice for a name of the project, in anticipation of any
  potential issues with PODLINGNAMESEARCH we have come up with two
  alternative names: (1) Hornet; or (2) Grove.
 
  Pivotal is submitting this proposal to donate the HAWQ source code and
  associated artifacts (documentation, web site content, wiki, etc.) to
  the Apache Software Foundation Incubator under the Apache License,
  Version 2.0 and is asking Incubator PMC to establish an open source
  community.
 
  == Background ==
  While the ecosystem of open source SQL-on-Hadoop solutions is fairly
  developed by now, HAWQ has several unique features that will set it
  apart from existing ASF and non-ASF projects. HAWQ made its debut in
  2013 as a closed source product leveraging a decade's worth of product
  development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
  has rapidly gained a solid customer base and became available on
  non-Pivotal distributions of Hadoop.
  In 2015 HAWQ still leverages the rock solid foundation of Greenplum
  Database, while at the same time embracing elasticity and resource
  management native to Hadoop applications. This allows HAWQ to provide
  superior SQL on Hadoop performance, scalability and coverage while
  also providing massively-parallel machine learning capabilities and
  support for native Hadoop file formats. In addition, HAWQ's advanced
  features include support for 

Re: [DISCUSS] Horn Incubation Proposal

2015-08-20 Thread ooibc


Hi,

I am an initial committer of Apache(incubating) SINGA 
(http://singa.incubator.apache.org/)


Both SINGA and the proposal follow the general parameter-server 
architecture:

workers for computing gradients; servers for parameter updating.

SINGA has implemented the model and data parallelism discussed in the 
Horn' proposal:
multiple worker groups for asynchronous training---data parallelism; and 
multiple workers in one group for synchronous training---model 
parallelism.


One feature of SINGA's architecture is that it can be extended to 
organize the
servers in a hierarchical topology, which may help to reduce the 
communication bottleneck

of servers organized in a flat topology.

For the programming model, currently Horn proposes to support 
feed-forward models,
e.g., MLP, auto-encoder, while SINGA supports all three categories of 
the known models,

feed-forward models (eg MLP, CNN), energy models (eg RBM, DBM),
and recurrent models (eg. RNN).
SINGA provides good support for users to code, e.g., implement new 
parameter updating

protocols or layers, and is being integrated with HDFS as well.

We will submit the first release and full documentation to the mentors 
this weekend, and if
ok, we will announce the first full release soon.  The GPU version is 
scheduled for

October release.

Technical papers:
  http://www.comp.nus.edu.sg/~ooibc/singa-mm15.pdf
  http://www.comp.nus.edu.sg/~ooibc/singaopen-mm15.pdf

and project website (which has more details than the Apache web site):
  http://www.comp.nus.edu.sg/~dbsystem/singa/


There are plenty of rooms for collaborations indeed...

regards
beng chin
www.comp.nus.edu.sg/~ooibc



On 2015-08-21 08:27, Edward J. Yoon wrote:

Hi all,

We'd like to propose Horn (혼), a fully distributed system for
large-scale deep learning as an Apache Incubator project and start the
discussion. The complete proposal can be found at:
https://wiki.apache.org/incubator/HornProposal

Any advices and helps are welcome! Thanks, Edward.

= Horn Proposal =

== Abstract ==

(tentatively named Horn [hɔ:n], korean meaning of Horn is a
Spirit) is a neuron-centric programming APIs and execution framework
for large-scale deep learning, built on top of Apache Hama.

== Proposal ==

It is a goal of the Horn to provide a neuron-centric programming APIs
which allows user to easily define the characteristic of artificial
neural network model and its structure, and its execution framework
that leverages the heterogeneous resources on Hama and Hadoop YARN
cluster.

== Background ==

The initial ANN code was developed at Apache Hama project by a
committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
work is to build a framework that provides more intuitive programming
APIs like Google's MapReduce or Pregel and supports applications
needing large model with huge memory consumptions in distributed way.

== Rationale ==

While many of deep learning open source softwares such as Caffe,
DeepDist, and NeuralGiraph are still data or model parallel only, we
aim to support both data and model parallelism and also fault-tolerant
system design. The basic idea of data and model parallelism is use of
the remote parameter server to parallelize model creation and
distribute training across machines, and the BSP framework of Apache
Hama for performing asynchronous mini-batches. Within single BSP job,
each task group works asynchronously using region barrier
synchronization instead of global barrier synchronization, and trains
large-scale neural network model using assigned data sets in BSP
paradigm. Thus, we achieve data and model parallelism. This
architecture is inspired by Google's !DistBelief (Jeff Dean et al,
2012).

== Initial Goals ==

Some current goals include:
 * builds new community
 * provides more intuitive programming APIs
 * needs both data and model parallelism support
 * must run natively on both Hama and Hadoop2
 * needs also GPUs and InfiniBand support (FPGAs if possible)

== Current Status ==

=== Meritocracy ===

The core developers understand what it means to have a process based
on meritocracy. We will provide continuous efforts to build an
environment that supports this, encouraging community members to
contribute.

=== Community ===

A small community has formed within the Apache Hama project and some
companies such as instant messenger service company and mobile
manufacturing company. And many people are interested in the
large-scale deep learning platform itself. By bringing Horn into
Apache, we believe that the community will grow even bigger.

=== Core Developers ===

Edward J. Yoon, Thomas Jungblut, and Dongjin Lee

== Known Risks ==

=== Orphaned Products ===

Apache Hama is already a core open source component at Samsung
Electronics, and Horn also will be used by Samsung Electronics, and so
there is no direct risk for this project to be orphaned.

=== Inexperience with Open Source ===

Some are very new and the others have experience using and/or working
on 

Re: [DISCUSS] Horn Incubation Proposal

2015-08-20 Thread Edward J. Yoon
 multiple worker groups for asynchronous training---data parallelism; and
 multiple workers in one group for synchronous training---model parallelism.

So, it's basically execution of the multiple asynchronous BSP (Bulk
Synchronous Parallel) jobs. This can be simply handled within only
single BSP job using region barriers as mentioned in proposal.
Moreover, since Apache Hama is a general-purpose BSP framework on top
of HDFS, it provides the data partition, locality optimization,
job/task scheduling, messaging and fault tolerance in scalable way by
nature.

 For the programming model, currently Horn proposes to support feed-forward
 There are plenty of rooms for collaborations indeed...

Yeah, but still it can be more improved. Maybe we can discuss the
simplified programming APIs and many others e.g., support GPUs
together in the future.

On Fri, Aug 21, 2015 at 1:13 PM, ooibc oo...@comp.nus.edu.sg wrote:

 Hi,

 I am an initial committer of Apache(incubating) SINGA
 (http://singa.incubator.apache.org/)

 Both SINGA and the proposal follow the general parameter-server
 architecture:
 workers for computing gradients; servers for parameter updating.

 SINGA has implemented the model and data parallelism discussed in the Horn'
 proposal:
 multiple worker groups for asynchronous training---data parallelism; and
 multiple workers in one group for synchronous training---model parallelism.

 One feature of SINGA's architecture is that it can be extended to organize
 the
 servers in a hierarchical topology, which may help to reduce the
 communication bottleneck
 of servers organized in a flat topology.

 For the programming model, currently Horn proposes to support feed-forward
 models,
 e.g., MLP, auto-encoder, while SINGA supports all three categories of the
 known models,
 feed-forward models (eg MLP, CNN), energy models (eg RBM, DBM),
 and recurrent models (eg. RNN).
 SINGA provides good support for users to code, e.g., implement new parameter
 updating
 protocols or layers, and is being integrated with HDFS as well.

 We will submit the first release and full documentation to the mentors this
 weekend, and if
 ok, we will announce the first full release soon.  The GPU version is
 scheduled for
 October release.

 Technical papers:
   http://www.comp.nus.edu.sg/~ooibc/singa-mm15.pdf
   http://www.comp.nus.edu.sg/~ooibc/singaopen-mm15.pdf

 and project website (which has more details than the Apache web site):
   http://www.comp.nus.edu.sg/~dbsystem/singa/


 There are plenty of rooms for collaborations indeed...

 regards
 beng chin
 www.comp.nus.edu.sg/~ooibc




 On 2015-08-21 08:27, Edward J. Yoon wrote:

 Hi all,

 We'd like to propose Horn (혼), a fully distributed system for
 large-scale deep learning as an Apache Incubator project and start the
 discussion. The complete proposal can be found at:
 https://wiki.apache.org/incubator/HornProposal

 Any advices and helps are welcome! Thanks, Edward.

 = Horn Proposal =

 == Abstract ==

 (tentatively named Horn [hɔ:n], korean meaning of Horn is a
 Spirit) is a neuron-centric programming APIs and execution framework
 for large-scale deep learning, built on top of Apache Hama.

 == Proposal ==

 It is a goal of the Horn to provide a neuron-centric programming APIs
 which allows user to easily define the characteristic of artificial
 neural network model and its structure, and its execution framework
 that leverages the heterogeneous resources on Hama and Hadoop YARN
 cluster.

 == Background ==

 The initial ANN code was developed at Apache Hama project by a
 committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
 work is to build a framework that provides more intuitive programming
 APIs like Google's MapReduce or Pregel and supports applications
 needing large model with huge memory consumptions in distributed way.

 == Rationale ==

 While many of deep learning open source softwares such as Caffe,
 DeepDist, and NeuralGiraph are still data or model parallel only, we
 aim to support both data and model parallelism and also fault-tolerant
 system design. The basic idea of data and model parallelism is use of
 the remote parameter server to parallelize model creation and
 distribute training across machines, and the BSP framework of Apache
 Hama for performing asynchronous mini-batches. Within single BSP job,
 each task group works asynchronously using region barrier
 synchronization instead of global barrier synchronization, and trains
 large-scale neural network model using assigned data sets in BSP
 paradigm. Thus, we achieve data and model parallelism. This
 architecture is inspired by Google's !DistBelief (Jeff Dean et al,
 2012).

 == Initial Goals ==

 Some current goals include:
  * builds new community
  * provides more intuitive programming APIs
  * needs both data and model parallelism support
  * must run natively on both Hama and Hadoop2
  * needs also GPUs and InfiniBand support (FPGAs if possible)

 == Current Status ==

 === 

Re: apache binary distributions

2015-08-20 Thread William A Rowe Jr
On Thu, Aug 20, 2015 at 8:09 AM, Niclas Hedhman nic...@hedhman.org wrote:

 On Thu, Aug 20, 2015 at 1:06 AM, William A Rowe Jr wr...@rowe-clan.net
 wrote:

  There are some special things here we do have absolute control over. If a
  project wants to provide the 'official' build, why not start signing
 the .jar?

 Good idea, but to be practical to users, the certificate for the signing
 needs to be part of the certificate chain of the JVM (otherwise those would
 be needed to be installed on every host). I don't know how willing infra
 would be to support PKI at ASF for this, otherwise many projects will be
 limited due to cost (I could be wrong by now and that there are totally
 free CAs)


That infrastructure now exists through code signing service by Symantec.
One PMC member (or more) gets their own unique log in, pushes the artifact
(.jar, in this example) to the service and is returned a signed artifact
reflecting the ASF providence.

The interesting thing is the actual cert is unique to the object, so if it
is discovered that it was compromised, the signature can be revoked (good
luck having sig revocations active at boot time, but otherwise this is
quite useful.) And because there is a history, we know who precisely
requested each object signing.


Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Marvin Humphrey
On Thu, Aug 20, 2015 at 7:23 AM, Benson Margulies bimargul...@gmail.com wrote:
 On Thu, Aug 20, 2015 at 9:52 AM, Jim Jagielski j...@jagunet.com wrote:
 Coming in late.

 A snapshot is not a release. Licenses kick in at distribution/
 release.

 Are you sure? When you have a public source control repo, with a
 LICENSE file at the top, I would think that this counts as a legal
 'publication' under the terms of the license.

 if not, just what is the legal status of source code snipped from our
 repositories?

I agree with Jim that a snapshot is not a release.  I also agree with him
that licenses kick in at distribution.  As to whether they kick in at
distribution/release, I think that's a weird bit of wording, and I would be
surprised if we are not all in agreement here.

There were long threads on this topic back in 2007-2009 on
legal-discuss@apache.

http://markmail.org/message/jangmpbssvvd73az
http://s.apache.org/6Wm

http://markmail.org/message/xietapwmthvvknex
http://s.apache.org/H6o

Here's are a couple germane points from Roy:

http://markmail.org/message/vbfjep4r2npkwufa
http://s.apache.org/aXK

Copyright law has no concept of software development. So, when a
lawyer looks at

http://svn.apache.org/repos/asf/httpd/httpd/trunk/

what the lawyer (or even layperson) sees is a website.

http://markmail.org/message/44ezdre3se3ov5nu
http://s.apache.org/MEC

 SVN is not a distribution point.

Of course it is a distribution point. Distribution == copy to someone
else. It isn't a release (an editorial decision by the ASF).

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Gavin McDonald

 On 20 Aug 2015, at 2:52 pm, Jim Jagielski j...@jagunet.com wrote:
 
 Coming in late.
 
 A snapshot is not a release. Licenses kick in at distribution/
 release.
 

Interesting.

So what do we do about all the rc1|rc2|rcx ,alphas, betas and Milestone 
‘releases’ that 
are on our official mirrors right now?

(Because they would have been voted on as a ‘’release’’ for the projects to put 
them there
in the first place)

(Or are all those different to Snapshots somehow?)

Gav…

 There is also a trademark issue as well... only the ASF
 can declare something as a release.
 
 On Aug 6, 2015, at 8:50 PM, Roman Shaposhnik ro...@shaposhnik.org wrote:
 
 Hi!
 
 while answering a question on release policies and ALv2
 I've suddenly realized that I really don't know what is the
 legal basis for enforcing release policies we've got
 documented over here:
  http://www.apache.org/dev/release.html
 
 For example, what would be the legal basis for stopping
 a 3d party from releasing a snapshot of ASF's project
 source tree and claim it to be a release X.Y.Z of said
 project?
 
 Thanks,
 Roman.
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 
 
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 

Gav...

  ((  ( 
 
   (  )\ ) )\ )   )\ )   (   )) 
 
   )\(()/((()/(  (()/(   )\ )  (   )  ( /( (  (( /( 
  (   (  (   
_)(   /(_))/(_))  /(_)) (   (()/(  )(   ( /(  (   )\()))())\   (   
)\()) ))\  )())\  
 )\ _ )\ (_)) (_))_| (_))   )\ ) /(_))(()\  )(_)) )\ (_))/(()\  /((_)  )\ (_))/ 
/((_)(()\  /((_) 
 (_)_\(_)/ __|| |_   |_ _| _(_/((_) _| ((_)((_)_ ((_)| |_  ((_)(_))(  ((_)| |_ 
(_))(  ((_)(_))   
  / _ \  \__ \| __|   | | | ' \))|  _|| '_|/ _` |(_-|  _|| '_|| || |/ _| |  
_|| || || '_|/ -_)  
 /_/ \_\ |___/|_||___||_||_| |_|  |_|  \__,_|/__/ \__||_|   \_,_|\__|  \__| 
\_,_||_|  \___|  

 






Re: What is the legal basis for enforcing release policies at ASF?

2015-08-20 Thread Mike Kienenberger
On Thu, Aug 20, 2015 at 6:23 PM, Gavin McDonald ga...@16degrees.com.au wrote:
 So what do we do about all the rc1|rc2|rcx ,alphas, betas and Milestone
 ‘releases’ that are on our official mirrors right now?

 (Because they would have been voted on as a ‘’release’’ for the projects to
 put them there in the first place)

Release means different things in different contexts.  An ASF
release is a product that a PMC has vetted to meet ASF release
standards (builds from source, APL2 licensed) and has made available
to end-users in our download services.  This use of release deals
with legal and can-be-modified promises made to the end-user.

Various ASF projects also use release to mean something different --
a community-approved product that has a certain API and typically has
no known issues.  This use of release generally deals with technical
aspects of the project, such as stability and reliability.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org