Re: libhdfs3 development is still going on outside of ASF

2016-09-23 Thread Roman Shaposhnik
On Wed, Sep 21, 2016 at 11:09 AM, Kyle Dunn  wrote:
>> Personally, I suggest that whoever was managing the library in a separate 
>> repo
> on GH (and it sounds like Zhanwei Wang is that person) pings the submitters
> of all the oustanding PRs (via comment on the PR) telling them to resubmit
> via HAWQ's JIRA.
>
> I 100% disagree with this. As a gesture of appreciation - we should be
> migrating things for the users, precisely for what you pointed out about
> things being poorly managed up to this point. We should be striving to
> avoid a fork + isolated development at all costs.

Hey! Don't read to much into it -- that was just my suggestion. I think what
you've done so far is super amazing and helpful to the original contributors.
So thank you!

Roman.


Re: libhdfs3 development is still going on outside of ASF

2016-09-21 Thread Roman Shaposhnik
On Wed, Sep 21, 2016 at 10:08 AM, Kyle Dunn  wrote:
> What have we decided here? Fold libhdfs3 back into HAWQ for the near term
> and revisit spinning it out in a dedicated submodule / repo down the road?

To me this sounds like a near term decision. IOW, the other repo is clearly
read/only at this point.

> Do we need to have a consensus vote for this action?

I don't think we have any contrarian opinion. I appreciated what Matthew Rocklin
had to say, but to me it sounds like his interest will be taken care
of long term.
Matt, please let us know if you disagree.

Now,

> As for the outstanding PRs and issues in the current repo, who will be
> moving those to HAWQ? Are we expecting users to do this?

Well, see -- that's *exactly* the problem with how this project was managed.
At this point we can't even reach the interested parties in a broadcast manner.

Personally, I suggest that whoever was managing the library in a separate
repo on GH (and it sounds like Zhanwei Wang is that person) pings the
submitters of all the oustanding PRs (via comment on the PR) telling them
to resubmit via HAWQ's JIRA.

> I'm taking the
> "liberty" of responding to each of these users explaining the situation but
> would like to know what to report on that front.

That's exactly the right thing to do. So I guess it is you and Zhanwei Wang
on the hook to do that.

Thanks,
Roman.


Re: libhdfs3 development is still going on outside of ASF

2016-09-21 Thread Kyle Dunn
What have we decided here? Fold libhdfs3 back into HAWQ for the near term
and revisit spinning it out in a dedicated submodule / repo down the road?
Do we need to have a consensus vote for this action?

As for the outstanding PRs and issues in the current repo, who will be
moving those to HAWQ? Are we expecting users to do this? I'm taking the
"liberty" of responding to each of these users explaining the situation but
would like to know what to report on that front.


-Kyle

On Tue, Sep 20, 2016 at 3:30 PM Matthew Rocklin 
wrote:

> Hi everyone,
>
> I plan to remove myself from the HAWQ mailing list (it's fairly high volume
> for non-developers).
>
> If further conversation on this issue occurs could I ask for someone to
> update one of the public issues?  Either something on Github or
> https://issues.apache.org/jira/browse/HAWQ-1046 ?
>
> -matt
>
> On Fri, Sep 16, 2016 at 7:14 AM, Matthew Rocklin 
> wrote:
>
> > > Positively ignore the frustration of libhdfs3 users and about to delete
> >> it’s repository.
> >>
> >> I don't think the frustration is related to whether we delete it or not,
> >> I think
> >> the frustration is related to the fact the current model of libhdfs3
> >> living in a
> >> random, separate GH repo:
> >>1. does NOT have a clear governance model: the bigger ASF community
> >> doesn't
> >>really monitor pull request, there's not clear way of filing issues
> >> against it, etc.
> >>
> >>2. does NOT have a clear release policy: last release appears to be
> >> Dec 17, 2015
> >>and even that doesn't clearly indicate what was the release criteria
> >> for it.
> >>
> >>3. does NOT have a clear path of integration with HAWQ.
> >>
> >
> > Speaking just for myself here actually I care much less about the points
> > above than you might think.  I don't need libhdfs3 to be governed by the
> > ASF to find it useful.  I don't mind packaging up and using unreleased
> > versions.  I don't care that it isn't integrated with HAWQ (in fact, I
> > somewhat prefer that it remains separate).  I *do *prefer that it lives
> > in a separate repository on GitHub.
> >
> > I don't mean to be contrarian here, just clarifying where I can clarify.
> > My situation/priorities may not be representative.
> >
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kd...@pivotal.io


Re: libhdfs3 development is still going on outside of ASF

2016-09-20 Thread Matthew Rocklin
Hi everyone,

I plan to remove myself from the HAWQ mailing list (it's fairly high volume
for non-developers).

If further conversation on this issue occurs could I ask for someone to
update one of the public issues?  Either something on Github or
https://issues.apache.org/jira/browse/HAWQ-1046 ?

-matt

On Fri, Sep 16, 2016 at 7:14 AM, Matthew Rocklin 
wrote:

> > Positively ignore the frustration of libhdfs3 users and about to delete
>> it’s repository.
>>
>> I don't think the frustration is related to whether we delete it or not,
>> I think
>> the frustration is related to the fact the current model of libhdfs3
>> living in a
>> random, separate GH repo:
>>1. does NOT have a clear governance model: the bigger ASF community
>> doesn't
>>really monitor pull request, there's not clear way of filing issues
>> against it, etc.
>>
>>2. does NOT have a clear release policy: last release appears to be
>> Dec 17, 2015
>>and even that doesn't clearly indicate what was the release criteria
>> for it.
>>
>>3. does NOT have a clear path of integration with HAWQ.
>>
>
> Speaking just for myself here actually I care much less about the points
> above than you might think.  I don't need libhdfs3 to be governed by the
> ASF to find it useful.  I don't mind packaging up and using unreleased
> versions.  I don't care that it isn't integrated with HAWQ (in fact, I
> somewhat prefer that it remains separate).  I *do *prefer that it lives
> in a separate repository on GitHub.
>
> I don't mean to be contrarian here, just clarifying where I can clarify.
> My situation/priorities may not be representative.
>


Re: libhdfs3 development is still going on outside of ASF

2016-09-16 Thread Matthew Rocklin
>
> > Positively ignore the frustration of libhdfs3 users and about to delete
> it’s repository.
>
> I don't think the frustration is related to whether we delete it or not, I
> think
> the frustration is related to the fact the current model of libhdfs3
> living in a
> random, separate GH repo:
>1. does NOT have a clear governance model: the bigger ASF community
> doesn't
>really monitor pull request, there's not clear way of filing issues
> against it, etc.
>
>2. does NOT have a clear release policy: last release appears to be
> Dec 17, 2015
>and even that doesn't clearly indicate what was the release criteria
> for it.
>
>3. does NOT have a clear path of integration with HAWQ.
>

Speaking just for myself here actually I care much less about the points
above than you might think.  I don't need libhdfs3 to be governed by the
ASF to find it useful.  I don't mind packaging up and using unreleased
versions.  I don't care that it isn't integrated with HAWQ (in fact, I
somewhat prefer that it remains separate).  I *do *prefer that it lives in
a separate repository on GitHub.

I don't mean to be contrarian here, just clarifying where I can clarify.
My situation/priorities may not be representative.


Re: libhdfs3 development is still going on outside of ASF

2016-09-16 Thread Zhanwei Wang
Hi Roman

I have create JIRA HAWQ-1058 to track the release of libhdfs3 tarball and 
HAWQ-1059 to separate it as new ASF repository and set the tag of backlog. 

I also update the readme of current libhdfs3’s Github repository in order to 
declare it is in read only mode and let its user know where it is going to. 
Hope they would not get lost.

https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/commit/59789c90f2f42726900eb2e417eea88a670f0b3c
 




Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月16日,下午3:40,Zhanwei Wang  写道:
> 
>> Now, that is NOT to say that when you DO release you shouldn't be producing
>> multiple source tarballs. That's more than appropriate and will give your 
>> users
>> maximum benefit on both HAWQ and libhdfs3 side of things.
> 
> 
> Release separated tarball for libhdfs3 would be much better for libhdfs3 
> users to access the code.  
> 
> 
>> you guys can't quite master the release process yet. Splitting the project
>> into multiple repos will only make it worse. Master the release mechanics
>> and then we can talk about multiple repos.
> 
> 
> HAWQ has got enough difficulties for its release. I agree with you that we 
> separate repository for libhdfs3 after HAWQ establish the release process. 
> And when it is done, the result is good to both ASF and libhdfs3’s users.
> 
> 
> Best Regards
> 
> Zhanwei Wang
> wan...@apache.org
> 
> 
> 
>> 在 2016年9月16日,下午1:20,Roman Shaposhnik  写道:
>> 
>> On Wed, Sep 14, 2016 at 11:38 PM, Zhanwei Wang > > wrote:
>>> Hi Roman
>>> 
>>> I think I have discussed enough about the benefit and drawback of merge two 
>>> independent project together.
>>> Let me propose a way to see if it can make both ASF and libhdfs3’s user 
>>> happy. And I need your advise.
>>> 
>>> 
>>> Is it possibile to have two git repository in ASF for HAWQ incubator 
>>> project. If it is possible, I propose to solve the libhdfs3 issue like this.
>>> 
>>> 1) create a new git repository in ASF and push all libhdfs3’s code and 
>>> branch from Github to ASF.
>>> 2) make libhdfs3’s Github repository as read only mirror of ASF repository. 
>>> Maybe need to transfer current owner of Github repository from Pivotal to 
>>> ASF on Github.
>>> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>>> 
>>> 
>>> In this way, we keep libhdfs3 independent and keep its all pull request, 
>>> wiki, issues and history. And most importantly libhdfs3 can follow ASF 
>>> rules and process. People can file pull request on Github and commit to ASF 
>>> repository and eventually mirror to Github.
>>> 
>>> 
>>> Any comments?
>> 
>> It is possible, but at this point I will strongly recommend against
>> it. As it is,
>> you guys can't quite master the release process yet. Splitting the project
>> into multiple repos will only make it worse. Master the release mechanics
>> and then we can talk about multiple repos.
>> 
>> Now, that is NOT to say that when you DO release you shouldn't be producing
>> multiple source tarballs. That's more than appropriate and will give your 
>> users
>> maximum benefit on both HAWQ and libhdfs3 side of things.
>> 
>> Thanks,
>> Roman.
> 
> 



Re: libhdfs3 development is still going on outside of ASF

2016-09-16 Thread Zhanwei Wang
> Now, that is NOT to say that when you DO release you shouldn't be producing
> multiple source tarballs. That's more than appropriate and will give your 
> users
> maximum benefit on both HAWQ and libhdfs3 side of things.


Release separated tarball for libhdfs3 would be much better for libhdfs3 users 
to access the code.  


> you guys can't quite master the release process yet. Splitting the project
> into multiple repos will only make it worse. Master the release mechanics
> and then we can talk about multiple repos.


HAWQ has got enough difficulties for its release. I agree with you that we 
separate repository for libhdfs3 after HAWQ establish the release process. And 
when it is done, the result is good to both ASF and libhdfs3’s users.


Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月16日,下午1:20,Roman Shaposhnik  写道:
> 
> On Wed, Sep 14, 2016 at 11:38 PM, Zhanwei Wang  > wrote:
>> Hi Roman
>> 
>> I think I have discussed enough about the benefit and drawback of merge two 
>> independent project together.
>> Let me propose a way to see if it can make both ASF and libhdfs3’s user 
>> happy. And I need your advise.
>> 
>> 
>> Is it possibile to have two git repository in ASF for HAWQ incubator 
>> project. If it is possible, I propose to solve the libhdfs3 issue like this.
>> 
>> 1) create a new git repository in ASF and push all libhdfs3’s code and 
>> branch from Github to ASF.
>> 2) make libhdfs3’s Github repository as read only mirror of ASF repository. 
>> Maybe need to transfer current owner of Github repository from Pivotal to 
>> ASF on Github.
>> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>> 
>> 
>> In this way, we keep libhdfs3 independent and keep its all pull request, 
>> wiki, issues and history. And most importantly libhdfs3 can follow ASF rules 
>> and process. People can file pull request on Github and commit to ASF 
>> repository and eventually mirror to Github.
>> 
>> 
>> Any comments?
> 
> It is possible, but at this point I will strongly recommend against
> it. As it is,
> you guys can't quite master the release process yet. Splitting the project
> into multiple repos will only make it worse. Master the release mechanics
> and then we can talk about multiple repos.
> 
> Now, that is NOT to say that when you DO release you shouldn't be producing
> multiple source tarballs. That's more than appropriate and will give your 
> users
> maximum benefit on both HAWQ and libhdfs3 side of things.
> 
> Thanks,
> Roman.



Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Roman Shaposhnik
On Wed, Sep 14, 2016 at 11:38 PM, Zhanwei Wang  wrote:
> Hi Roman
>
> I think I have discussed enough about the benefit and drawback of merge two 
> independent project together.
> Let me propose a way to see if it can make both ASF and libhdfs3’s user 
> happy. And I need your advise.
>
>
> Is it possibile to have two git repository in ASF for HAWQ incubator project. 
> If it is possible, I propose to solve the libhdfs3 issue like this.
>
> 1) create a new git repository in ASF and push all libhdfs3’s code and branch 
> from Github to ASF.
> 2) make libhdfs3’s Github repository as read only mirror of ASF repository. 
> Maybe need to transfer current owner of Github repository from Pivotal to ASF 
> on Github.
> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>
>
> In this way, we keep libhdfs3 independent and keep its all pull request, 
> wiki, issues and history. And most importantly libhdfs3 can follow ASF rules 
> and process. People can file pull request on Github and commit to ASF 
> repository and eventually mirror to Github.
>
>
> Any comments?

It is possible, but at this point I will strongly recommend against
it. As it is,
you guys can't quite master the release process yet. Splitting the project
into multiple repos will only make it worse. Master the release mechanics
and then we can talk about multiple repos.

Now, that is NOT to say that when you DO release you shouldn't be producing
multiple source tarballs. That's more than appropriate and will give your users
maximum benefit on both HAWQ and libhdfs3 side of things.

Thanks,
Roman.


Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Roman Shaposhnik
On Wed, Sep 14, 2016 at 11:19 PM, Zhanwei Wang  wrote:
>> Open source is about community first.
>
> Good point Kyle. I strongly agree with you!
>
> But unfortunately seems no one in this thread care about libhdfs3’s community 
> (users) except me.

Quite the contrary -- I think we all do. In fact, part of the reason
of really making
sure that it gets maintained as part of Apache HAWQ (incubating) is to make
sure that there's long term viability of the project.

> Positively ignore the frustration of libhdfs3 users and about to delete it’s 
> repository.

I don't think the frustration is related to whether we delete it or not, I think
the frustration is related to the fact the current model of libhdfs3 living in a
random, separate GH repo:
   1. does NOT have a clear governance model: the bigger ASF community doesn't
   really monitor pull request, there's not clear way of filing issues
against it, etc.

   2. does NOT have a clear release policy: last release appears to be
Dec 17, 2015
   and even that doesn't clearly indicate what was the release criteria for it.

   3. does NOT have a clear path of integration with HAWQ.

> So let’s set the tone of this thread.
>
>  If we remove libhdfs3’s repository or make it read only:
>   a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
>   b. What drawback for BOTH HAWQ and libhdfs3’s users?
>
> The following is my answer.
>
> a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For 
> libhdfs3’s users, none.

Once again -- I disagree. For libhdfs3 users the benefit is a
predictable process
of how to contribute, how to consume releases without fearing problems around
intellectual property issues (who's making sure that the code in that random GH
repo is clean?) and stability.

> b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit log.

At this point HAWQ consists of many parts. Not all of them are of
interest to all
people (e.g. PXF is quite separate) but all of them are needed to make HAWQ
awesome. IOW, HAWQ developers absolutely should care about libhdfs3 commits.
The whole purpose of libhdfs3 is to be the best, darn HDFS interface
library *for*
HAWQ, not a generic implementation.

> JIRA and pull request will be fired in HAWQ but not related to HAWQ.  
> Furthermore
> commit in libhdfs3 may break HAWQ and it’s hard to debug, I have experienced 
> it enough.
> It is important to use the stable version of libhdfs3, HAWQ code should only 
> keep the stable
> version of libhdfs3.

Well, suffice it to say that I pretty strongly disagree with all of
the above points.

> For libhdfs3’s user, they have to ask question in HAWQ’s community.

Yes, because that's the most appropriate community today for these questions
to be asked. Remember that HDFS community is busy building alternative for
libhdfs3 (or at least polishing the existing one). Its not like HAWQ’s libhdfs3
is the only choice to interface with HDFS from C and C++.

Also, remember, that HAWQ is the driving force behind libhdfs3. Suppose
for example that HAWQ community makes a choice to stick with a particular
version of Hadoop as the default choice for where HAWQ runs the best -- well,
then, libhdfs3 will stick to that choice as well. Regardless of
whether the non-HAWQ
users feel, for example, that Hadoop 3 is a priority.

> They have to clone entire HAWQ to build libhdfs3 and contribute.

Well, if HAWQ makes a proper ASF release and releases the libarary separately
they won't have to. In fact, I'd say that downstream consumers should never
clone the repo -- they should *always* use a released version.

> Let’s think about more. How we schedule a release of libhdfs3 when HAWQ is 
> under developing?

Releases in ASF are pretty cheap once you get over the initial hurdles of
IP issues. Which, see my point above, is exactly what worries me about
what we are giving to our users in that random GH repo.

> Should we branch HAWQ for libhdfs3’s release?

No. You just do a release.

> Should we merge libhdfs3’s pull request when we
> are releasing HAWQ?

Yes, absolutely -- all releases are always done on a release branch
and meanwhile pull requests could land in trunk (and then an RM
can decide to pull certain commints into a release branch).

> Do we have to sync the release process of HAWQ and libhdfs3 and how?

Simple: you just do a release.

> Maybe we should better involve libhdfs3’s users into this thread. But 
> unfortunately they are
> not in HAWQ’s mail list.

So where are they? Are they given the tools to collaborate? Do they
have a mailing
list? Do they have a website? Do they have an issue tracker?

> In general merge two independent project together introduce more trouble than 
> benefit.

They are NOT independent today.

> To be clear, I’m not against ASF rule. I’m deeply understand the importance 
> of it.
> Is there any way to make HAWQ and libhdfs3 separated and make both ASF and
> libhdfs3’s user happy?

There's no ASF users vs 

Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Gregory Chase
Hi Matthew,
Thank you for chiming in on behalf of other libhdfs3 users.

I wonder whether this component might do best being part of one of the
Apache Hadoop repos given its rather broad appeal.

This might also diversify the contributors, and also the number of users.

-Greg

On Thu, Sep 15, 2016 at 5:16 AM, Matthew Rocklin 
wrote:

> Hi All,
>
> I joined this e-mail list in order to chime in to this discussion.  I'm not
> part of Apache HAWQ but *do* use libhdfs3 and know a number of other people
> who do as well.
>
> I maintain a library for parallel programming Dask
> , which is commonly used within the
> PyData software ecosystem.  We often interact with data on HDFS and found
> libhdfs3 to be an excellent solution, particularly because it doesn't
> require JVM interaction, which is rare among our users.   To assist Python
> users we made the wrapper library hdfs3
> , which has gotten some traction
> both within Dask and outside.
>
> We intentionally released and maintain hdfs3 separately from Dask because
> it's a more general and releasable component.  This turns out to have been
> a good move.  There are lots of people who use hdfs3 who have no interest
> in using Dask at all.  They appreciate this separation because they're not
> forced to grab all of Dask in order to just get the single component they
> want, hdfs3.  These are great users.  They come from a wide range of
> university to small and large businesses.  They contribute back to hdfs3
> readily and are also, today, trying to contribute back to libhdfs3.  By not
> tying hdfs3 into Dask we increased both community engagement and social
> impact.
>
> So my initial bias is "Please, keep libhdfs3 separate.  It will make my
> life (and the lives of many others) much more convenient."  However I also
> recognize the need for Apache's strict-for-a-reason policies.  No matter
> what you all decide the PyData community will find a way to make things
> work.  I just wanted to make it clear that there are several other
> stakeholders out there using this library so that this decision wasn't made
> in a vacuum.
>
> Best,
> -matthew rocklin
>
>
>
>
> On Thu, Sep 15, 2016 at 2:38 AM, Zhanwei Wang  wrote:
>
> > Hi Roman
> >
> > I think I have discussed enough about the benefit and drawback of merge
> > two independent project together.
> > Let me propose a way to see if it can make both ASF and libhdfs3’s user
> > happy. And I need your advise.
> >
> >
> > Is it possibile to have two git repository in ASF for HAWQ incubator
> > project. If it is possible, I propose to solve the libhdfs3 issue like
> this.
> >
> > 1) create a new git repository in ASF and push all libhdfs3’s code and
> > branch from Github to ASF.
> > 2) make libhdfs3’s Github repository as read only mirror of ASF
> > repository. Maybe need to transfer current owner of Github repository
> from
> > Pivotal to ASF on Github.
> > 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
> >
> >
> > In this way, we keep libhdfs3 independent and keep its all pull request,
> > wiki, issues and history. And most importantly libhdfs3 can follow ASF
> > rules and process. People can file pull request on Github and commit to
> ASF
> > repository and eventually mirror to Github.
> >
> >
> > Any comments?
> >
> >
> > Best Regards
> >
> > Zhanwei Wang
> > wan...@apache.org
> >
> >
> >
> > > 在 2016年9月15日,下午2:19,Zhanwei Wang  写道:
> > >
> > >> Open source is about community first.
> > >
> > > Good point Kyle. I strongly agree with you!
> > >
> > > But unfortunately seems no one in this thread care about libhdfs3’s
> > community (users) except me. Positively ignore the frustration of
> libhdfs3
> > users and about to delete it’s repository.
> > >
> > >
> > > So let’s set the tone of this thread.
> > >
> > > If we remove libhdfs3’s repository or make it read only:
> > >  a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
> > >  b. What drawback for BOTH HAWQ and libhdfs3’s users?
> > >
> > >
> > >
> > > The following is my answer.
> > >
> > > a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.
> For
> > libhdfs3’s users, none.
> > >
> > > b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s
> commit
> > log. JIRA and pull request will be fired in HAWQ but not related to HAWQ.
> > Furthermore commit in libhdfs3 may break HAWQ and it’s hard to debug, I
> > have experienced it enough. It is important to use the stable version of
> > libhdfs3, HAWQ code should only keep the stable version of libhdfs3.
> > >
> > >For libhdfs3’s user, they have to ask question in HAWQ’s community.
> > They have to clone entire HAWQ to build libhdfs3 and contribute.
> > >
> > > Let’s think about more. How we schedule a release of libhdfs3 when HAWQ
> > is under developing? Should we branch HAWQ for libhdfs3’s release? Should
> > we 

Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Ruilong Huo
Hi All,

It is great to have comment and concern from different stakeholder discussed 
here.

@Zhanwei, from the maintenance perspective, is it possible that we keep a 
version of libhdfs3 that is compatible among all users of it? This would help 
to reduce the customization for it and make it more generally adopted by 
broader audience in community.

Secondly, if the single compatible version is "always" stable (in most case it 
should be), we then don't bother to that maintenance effort to pick up the 
stable version of it for HAWQ as both HAWQ and libhdfs3 grow.

Any comment from you would be appreciated:)

@Roman, regarding the governance of ASF, from your perspective, is that means 
we do need to keep them in one ASF project and repo. Or it is ok to make it a 
separate ASF repo other than Pivotal Data Fabric? Your advise are also 
appreciated. Thanks.

> 在 2016年9月15日,20:16,Matthew Rocklin  写道:
> 
> Hi All,
> 
> I joined this e-mail list in order to chime in to this discussion.  I'm not
> part of Apache HAWQ but *do* use libhdfs3 and know a number of other people
> who do as well.
> 
> I maintain a library for parallel programming Dask
> , which is commonly used within the
> PyData software ecosystem.  We often interact with data on HDFS and found
> libhdfs3 to be an excellent solution, particularly because it doesn't
> require JVM interaction, which is rare among our users.   To assist Python
> users we made the wrapper library hdfs3
> , which has gotten some traction
> both within Dask and outside.
> 
> We intentionally released and maintain hdfs3 separately from Dask because
> it's a more general and releasable component.  This turns out to have been
> a good move.  There are lots of people who use hdfs3 who have no interest
> in using Dask at all.  They appreciate this separation because they're not
> forced to grab all of Dask in order to just get the single component they
> want, hdfs3.  These are great users.  They come from a wide range of
> university to small and large businesses.  They contribute back to hdfs3
> readily and are also, today, trying to contribute back to libhdfs3.  By not
> tying hdfs3 into Dask we increased both community engagement and social
> impact.
> 
> So my initial bias is "Please, keep libhdfs3 separate.  It will make my
> life (and the lives of many others) much more convenient."  However I also
> recognize the need for Apache's strict-for-a-reason policies.  No matter
> what you all decide the PyData community will find a way to make things
> work.  I just wanted to make it clear that there are several other
> stakeholders out there using this library so that this decision wasn't made
> in a vacuum.
> 
> Best,
> -matthew rocklin
> 
> 
> 
> 
>> On Thu, Sep 15, 2016 at 2:38 AM, Zhanwei Wang  wrote:
>> 
>> Hi Roman
>> 
>> I think I have discussed enough about the benefit and drawback of merge
>> two independent project together.
>> Let me propose a way to see if it can make both ASF and libhdfs3’s user
>> happy. And I need your advise.
>> 
>> 
>> Is it possibile to have two git repository in ASF for HAWQ incubator
>> project. If it is possible, I propose to solve the libhdfs3 issue like this.
>> 
>> 1) create a new git repository in ASF and push all libhdfs3’s code and
>> branch from Github to ASF.
>> 2) make libhdfs3’s Github repository as read only mirror of ASF
>> repository. Maybe need to transfer current owner of Github repository from
>> Pivotal to ASF on Github.
>> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>> 
>> 
>> In this way, we keep libhdfs3 independent and keep its all pull request,
>> wiki, issues and history. And most importantly libhdfs3 can follow ASF
>> rules and process. People can file pull request on Github and commit to ASF
>> repository and eventually mirror to Github.
>> 
>> 
>> Any comments?
>> 
>> 
>> Best Regards
>> 
>> Zhanwei Wang
>> wan...@apache.org
>> 
>> 
>> 
 在 2016年9月15日,下午2:19,Zhanwei Wang  写道:
 
 Open source is about community first.
>>> 
>>> Good point Kyle. I strongly agree with you!
>>> 
>>> But unfortunately seems no one in this thread care about libhdfs3’s
>> community (users) except me. Positively ignore the frustration of libhdfs3
>> users and about to delete it’s repository.
>>> 
>>> 
>>> So let’s set the tone of this thread.
>>> 
>>> If we remove libhdfs3’s repository or make it read only:
>>> a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
>>> b. What drawback for BOTH HAWQ and libhdfs3’s users?
>>> 
>>> 
>>> 
>>> The following is my answer.
>>> 
>>> a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For
>> libhdfs3’s users, none.
>>> 
>>> b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit
>> log. JIRA and pull request will be fired in HAWQ but not related to HAWQ.
>> Furthermore commit in 

Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Zhanwei Wang
Hi Roman

I think I have discussed enough about the benefit and drawback of merge two 
independent project together. 
Let me propose a way to see if it can make both ASF and libhdfs3’s user happy. 
And I need your advise.


Is it possibile to have two git repository in ASF for HAWQ incubator project. 
If it is possible, I propose to solve the libhdfs3 issue like this.

1) create a new git repository in ASF and push all libhdfs3’s code and branch 
from Github to ASF.
2) make libhdfs3’s Github repository as read only mirror of ASF repository. 
Maybe need to transfer current owner of Github repository from Pivotal to ASF 
on Github.
3) HAWQ keep the stable version code of libhdfs3 or just Git reference.


In this way, we keep libhdfs3 independent and keep its all pull request, wiki, 
issues and history. And most importantly libhdfs3 can follow ASF rules and 
process. People can file pull request on Github and commit to ASF repository 
and eventually mirror to Github.

 
Any comments?


Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月15日,下午2:19,Zhanwei Wang  写道:
> 
>> Open source is about community first.
> 
> Good point Kyle. I strongly agree with you!
> 
> But unfortunately seems no one in this thread care about libhdfs3’s community 
> (users) except me. Positively ignore the frustration of libhdfs3 users and 
> about to delete it’s repository.
> 
> 
> So let’s set the tone of this thread.
> 
> If we remove libhdfs3’s repository or make it read only:
>  a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
>  b. What drawback for BOTH HAWQ and libhdfs3’s users?
> 
> 
> 
> The following is my answer.
> 
> a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For 
> libhdfs3’s users, none.
> 
> b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit log. 
> JIRA and pull request will be fired in HAWQ but not related to HAWQ.  
> Furthermore commit in libhdfs3 may break HAWQ and it’s hard to debug, I have 
> experienced it enough. It is important to use the stable version of libhdfs3, 
> HAWQ code should only keep the stable version of libhdfs3.
> 
>For libhdfs3’s user, they have to ask question in HAWQ’s community. They 
> have to clone entire HAWQ to build libhdfs3 and contribute.
> 
> Let’s think about more. How we schedule a release of libhdfs3 when HAWQ is 
> under developing? Should we branch HAWQ for libhdfs3’s release? Should we 
> merge libhdfs3’s pull request when we are releasing HAWQ? Do we have to sync 
> the release process of HAWQ and libhdfs3 and how?
> 
> Maybe we should better involve libhdfs3’s users into this thread. But 
> unfortunately they are not in HAWQ’s mail list. See, this is another big 
> issue. We discuss dropping libhdfs3’s repository in HAWQ’s mail list without 
> libhdfs3’s users involved, seems odd. Image this, one day the repository you 
> are working with is gone and you even do not know this discuss.
> 
> If anyone want to discuss if we should dropping libhdfs3’s repository, the 
> better place is libhdfs3’s repository.
> 
> In general merge two independent project together introduce more trouble than 
> benefit. 
> 
> To be clear, I’m not against ASF rule. I’m deeply understand the importance 
> of it. Is there any way to make HAWQ and libhdfs3 separated and make both ASF 
> and libhdfs3’s user happy? Just like Kyle said, “HOW” is more important. 
> 
> @Roman, your mentoring is important.
> 
> 
> Any comments?
> 
> 
> Best Regards
> 
> Zhanwei Wang
> wan...@apache.org
> 
> 
> 
>> 在 2016年9月15日,下午12:54,Kyle Dunn  写道:
>> 
>> Chiming in here only as a casual but concerned observer.
>> 
>> Open source is about community first. If the logistics around "where"
>> libhdfs3 lives rather than the much more important issue of "how" it lives
>> are the focus here, I think we've missed the real issue.
>> 
>> For what it's worth, I concur with others, let's move it to HAWQ
>> exclusively and move on to addressing the community, starting with the
>> decision being made and how/where future contributions can be made.
>> 
>> My brief scan of libhdfs3 shows numerous open pull requests (with
>> apparently useful contributions) and several loose ends "issues". We need
>> to communicate effectively to these contributors whether those PRs and
>> issues are valuable and relevant. This type of engagement is what OSS
>> projects live and die by. We need to be better, starting with libhdfs3,
>> into HAWQ, and beyond.
>> 
>> "Open source isn't someone else's job" - it's everyone's job. I'm
>> challenging everyone with commit responsibly on repos to value community
>> input (both code and issues) as highly as your own backlog. Pay it forward
>> and maybe the community will start shrinking your backlog unexpectedly.
>> 
>> 
>> -Kyle
>> 
>> On Wed, Sep 14, 2016, 21:33 Lei Chang  wrote:
>> 
>>> 
>>> There was a short discussion before when we moved libhfds3 to HAWQ repo.
>>> 
>>> 

Re: libhdfs3 development is still going on outside of ASF

2016-09-15 Thread Zhanwei Wang
> Open source is about community first.

Good point Kyle. I strongly agree with you!

But unfortunately seems no one in this thread care about libhdfs3’s community 
(users) except me. Positively ignore the frustration of libhdfs3 users and 
about to delete it’s repository.


So let’s set the tone of this thread.

 If we remove libhdfs3’s repository or make it read only:
  a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
  b. What drawback for BOTH HAWQ and libhdfs3’s users?



The following is my answer.

a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For 
libhdfs3’s users, none.

b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit log. 
JIRA and pull request will be fired in HAWQ but not related to HAWQ.  
Furthermore commit in libhdfs3 may break HAWQ and it’s hard to debug, I have 
experienced it enough. It is important to use the stable version of libhdfs3, 
HAWQ code should only keep the stable version of libhdfs3.

For libhdfs3’s user, they have to ask question in HAWQ’s community. They 
have to clone entire HAWQ to build libhdfs3 and contribute.

Let’s think about more. How we schedule a release of libhdfs3 when HAWQ is 
under developing? Should we branch HAWQ for libhdfs3’s release? Should we merge 
libhdfs3’s pull request when we are releasing HAWQ? Do we have to sync the 
release process of HAWQ and libhdfs3 and how?

Maybe we should better involve libhdfs3’s users into this thread. But 
unfortunately they are not in HAWQ’s mail list. See, this is another big issue. 
We discuss dropping libhdfs3’s repository in HAWQ’s mail list without 
libhdfs3’s users involved, seems odd. Image this, one day the repository you 
are working with is gone and you even do not know this discuss.

If anyone want to discuss if we should dropping libhdfs3’s repository, the 
better place is libhdfs3’s repository.

In general merge two independent project together introduce more trouble than 
benefit. 

To be clear, I’m not against ASF rule. I’m deeply understand the importance of 
it. Is there any way to make HAWQ and libhdfs3 separated and make both ASF and 
libhdfs3’s user happy? Just like Kyle said, “HOW” is more important. 

@Roman, your mentoring is important.


Any comments?


Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月15日,下午12:54,Kyle Dunn  写道:
> 
> Chiming in here only as a casual but concerned observer.
> 
> Open source is about community first. If the logistics around "where"
> libhdfs3 lives rather than the much more important issue of "how" it lives
> are the focus here, I think we've missed the real issue.
> 
> For what it's worth, I concur with others, let's move it to HAWQ
> exclusively and move on to addressing the community, starting with the
> decision being made and how/where future contributions can be made.
> 
> My brief scan of libhdfs3 shows numerous open pull requests (with
> apparently useful contributions) and several loose ends "issues". We need
> to communicate effectively to these contributors whether those PRs and
> issues are valuable and relevant. This type of engagement is what OSS
> projects live and die by. We need to be better, starting with libhdfs3,
> into HAWQ, and beyond.
> 
> "Open source isn't someone else's job" - it's everyone's job. I'm
> challenging everyone with commit responsibly on repos to value community
> input (both code and issues) as highly as your own backlog. Pay it forward
> and maybe the community will start shrinking your backlog unexpectedly.
> 
> 
> -Kyle
> 
> On Wed, Sep 14, 2016, 21:33 Lei Chang  wrote:
> 
>> 
>> There was a short discussion before when we moved libhfds3 to HAWQ repo.
>> 
>> http://mail-archives.apache.org/mod_mbox/incubator-hawq-dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=Lx=xubpvw18zk4iz3euc...@mail.gmail.com%3e
>> I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify
>> Apache build and releases in current phase. This is what we have done in
>> the past. But looks not everyone is on the same page.
>> CheersLei
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase" 
>> wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Its fine if libhdfs3 is a third party license, and is treated that way.
>> 
>> However, why does Apache HAWQ want to be dependent on some strange 3rd
>> party library with no transparency?
>> 
>> We are having enough difficulties just getting our first release out.
>> 
>> Is there a compelling reason why we need to keep up with the independently
>> developed libhdfs3 project?  Are they willing to make necessary changes so
>> that they are compatible with ASF's strict-for-a-good-reason policies?
>> 
>> Can we fork hdfs3 for Apache HAWQ's purposes in Apache?
>> 
>> If any libhdfs3 committers are also part of Apache HAWQ, perhaps you can
>> shed some light on the viability of this as an independent project since I
>> only see 4 contributors.
>> 
>> 

Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Kyle Dunn
Chiming in here only as a casual but concerned observer.

Open source is about community first. If the logistics around "where"
libhdfs3 lives rather than the much more important issue of "how" it lives
are the focus here, I think we've missed the real issue.

For what it's worth, I concur with others, let's move it to HAWQ
exclusively and move on to addressing the community, starting with the
decision being made and how/where future contributions can be made.

My brief scan of libhdfs3 shows numerous open pull requests (with
apparently useful contributions) and several loose ends "issues". We need
to communicate effectively to these contributors whether those PRs and
issues are valuable and relevant. This type of engagement is what OSS
projects live and die by. We need to be better, starting with libhdfs3,
into HAWQ, and beyond.

"Open source isn't someone else's job" - it's everyone's job. I'm
challenging everyone with commit responsibly on repos to value community
input (both code and issues) as highly as your own backlog. Pay it forward
and maybe the community will start shrinking your backlog unexpectedly.


-Kyle

On Wed, Sep 14, 2016, 21:33 Lei Chang  wrote:

>
> There was a short discussion before when we moved libhfds3 to HAWQ repo.
>
> http://mail-archives.apache.org/mod_mbox/incubator-hawq-dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=Lx=xubpvw18zk4iz3euc...@mail.gmail.com%3e
> I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify
> Apache build and releases in current phase. This is what we have done in
> the past. But looks not everyone is on the same page.
> CheersLei
>
>
>
>
>
>
> On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase" 
> wrote:
>
>
>
>
>
>
>
>
>
>
> Its fine if libhdfs3 is a third party license, and is treated that way.
>
> However, why does Apache HAWQ want to be dependent on some strange 3rd
> party library with no transparency?
>
> We are having enough difficulties just getting our first release out.
>
> Is there a compelling reason why we need to keep up with the independently
> developed libhdfs3 project?  Are they willing to make necessary changes so
> that they are compatible with ASF's strict-for-a-good-reason policies?
>
> Can we fork hdfs3 for Apache HAWQ's purposes in Apache?
>
> If any libhdfs3 committers are also part of Apache HAWQ, perhaps you can
> shed some light on the viability of this as an independent project since I
> only see 4 contributors.
>
> -Greg
>
> On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu  wrote:
>
> > In my opinion, I think it is reasonable to transfer the third-party repo
> of
> > libhdfs3 totally into HAWQ, not only for the convenience of HAWQ build,
> but
> > also for the consideration of ASF project. So for HAWQ project, I am with
> > Roman.
> >
> > But my concern is the current users of libhdfs3 and all the pull
> requests,
> > wiki docs and issues. Another uncertain aspect from my perspective is
> that
> > although HAWQ could not run without libhdfs3, libhdfs3 could be used in
> > other open source projects, that might be the true meaning of making
> > libhdfs3 open source at the beginning.
> >
> > In summary, if it is really against the spirit of a ASF project for
> HAWQ, a
> > suggested way might be marking original libhdfs3 repo as a legacy repo in
> > stead of remove it.
> >
> > Best
> > Hong
> >
> > 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
> >
> > > Currently libhdfs3’s official code is not the same as in HAWQ. Some new
> > > code does not copy into HAWQ.  I do not think code change of libhdfs3
> > > should follow HAWQ’s commit process because  many change are not
> related
> > to
> > > HAWQ.
> > >
> > > From HAWQ side, I suggest to keep the stable version of its third-party
> > > libraries and copy new libhdfs3’s code only when it is necessary.
> > >
> > > libhdfs3 was open source years before HAWQ incubating with a separated
> > > permission of its authority. So in my opinion it is a third party and
> it
> > > actually was a third party before HAWQ incubating. And HAWQ is not the
> > only
> > > user.
> > >
> > >
> > >
> > > Best Regards
> > >
> > > Zhanwei Wang
> > > wan...@apache.org
> > >
> > >
> > >
> > > > 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
> > > >
> > > > On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang
> > wrote:
> > > >> Hi Roman
> > > >>
> > > >> libhdfs3 works as third-party library of HAWQ, Just for the
> > convenience
> > > of HAWQ release
> > > >> process we copy its code into HAWQ.  The reason is that HAWQ used to
> > > dependent on
> > > >> specific version of libhdfs3 and libhdfs3 only distribute as source
> > > code and the build process is complicated.
> > > >
> > > > I actually don't buy this argument. libhdfs3 is not an optional
> > > > dependency for HAWQ
> > > > like ORCA is (for example). Without libhdfs3 there's pretty tough to
> > > > imagine HAWQ.
> > > > As such the code base needs to be governed as part of the ASF
> project,
> > > > not a 

Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Zhanwei Wang
> But my concern is the current users of libhdfs3 and all the pull requests,
> wiki docs and issues. Another uncertain aspect from my perspective is that
> although HAWQ could not run without libhdfs3, libhdfs3 could be used in
> other open source projects, that might be the true meaning of making
> libhdfs3 open source at the beginning.


That’s what I concern about. Think about others before we take actions. Users 
already show there frustration HAWQ-1046. 

libhdfs3 open source as independent project before Apache HAWQ was born. People 
contribute to it before Apache HAWQ was born. And I do not think they all sign 
the contribution license with ASF. 

When Apache HAWQ start incubating, libhdfs3 is not part of it, HAWQ users 
should build and install libhdfs3 as other third partiesbefore build Apache 
HAWQ. See the commit 
https://github.com/apache/incubator-hawq/commit/8b26974cd8d6e1d824f274eb4a68f950fd94156c
 



I really do not mind who govern it and follow what kind process. I care about 
what troubles to libhdfs3’s users will be introduced if we drop libhdfs3’s 
repository.  Please note HAWQ is not the only user.

Import libhdfs3 into HAWQ also introduce trouble to HAWQ.  The commit to 
libhdfs3 probably not related to HAWQ, but it will interfere with HAWQ.  HAWQ 
should not always keep the latest development version of libhdfs3. The stable 
version of libhdfs3 is best for HAWQ. If we import libhdfs3 into HAWQ, we have 
to schedule the release process both HAWQ and libhdfs3. And libhdfs3 commit may 
break HAWQ severely.

What benefit of deleting libhdfs3’s repository except ASF declare its 
governance (I also doubt if libhdfs3 is included in the HAWQ donation license, 
but it is ok to me it is governed by ASF)?  In my opinion it only introduce 
trouble to libhdfs3’s users and HAWQ. 

Keep current status (make libhdfs3 as third party dependency and copy it stable 
version to HAWQ) is best for HAWQ.




Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月15日,上午10:54,Hong Wu  写道:
> 
> In my opinion, I think it is reasonable to transfer the third-party repo of
> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ build, but
> also for the consideration of ASF project. So for HAWQ project, I am with
> Roman.
> 
> But my concern is the current users of libhdfs3 and all the pull requests,
> wiki docs and issues. Another uncertain aspect from my perspective is that
> although HAWQ could not run without libhdfs3, libhdfs3 could be used in
> other open source projects, that might be the true meaning of making
> libhdfs3 open source at the beginning.
> 
> In summary, if it is really against the spirit of a ASF project for HAWQ, a
> suggested way might be marking original libhdfs3 repo as a legacy repo in
> stead of remove it.
> 
> Best
> Hong
> 
> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
> 
>> Currently libhdfs3’s official code is not the same as in HAWQ. Some new
>> code does not copy into HAWQ.  I do not think code change of libhdfs3
>> should follow HAWQ’s commit process because  many change are not related to
>> HAWQ.
>> 
>> From HAWQ side, I suggest to keep the stable version of its third-party
>> libraries and copy new libhdfs3’s code only when it is necessary.
>> 
>> libhdfs3 was open source years before HAWQ incubating with a separated
>> permission of its authority. So in my opinion it is a third party and it
>> actually was a third party before HAWQ incubating. And HAWQ is not the only
>> user.
>> 
>> 
>> 
>> Best Regards
>> 
>> Zhanwei Wang
>> wan...@apache.org
>> 
>> 
>> 
>>> 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
>>> 
>>> On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang  wrote:
 Hi Roman
 
 libhdfs3 works as third-party library of HAWQ, Just for the convenience
>> of HAWQ release
 process we copy its code into HAWQ.  The reason is that HAWQ used to
>> dependent on
 specific version of libhdfs3 and libhdfs3 only distribute as source
>> code and the build process is complicated.
>>> 
>>> I actually don't buy this argument. libhdfs3 is not an optional
>>> dependency for HAWQ
>>> like ORCA is (for example). Without libhdfs3 there's pretty tough to
>>> imagine HAWQ.
>>> As such the code base needs to be governed as part of the ASF project,
>>> not a random
>>> GitHub dependency.
>>> 
>>> IOW, let me ask you this: were all the changes that went into libhdfs3
>>> that is part of
>>> HAWQ discussed and reviewed via the ASF development process or did you
>> just
>>> import them from time to time as this comment suggests:
>>>   https://issues.apache.org/jira/browse/HAWQ-1046?
>> focusedCommentId=15489669=com.atlassian.jira.
>> plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
>>> ?
>>> 
 I do not think we have any reason to shutdown a third party’s official
>> repository.
>>> 

Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Lei Chang

There was a short discussion before when we moved libhfds3 to HAWQ repo.
http://mail-archives.apache.org/mod_mbox/incubator-hawq-dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=Lx=xubpvw18zk4iz3euc...@mail.gmail.com%3e
I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify Apache 
build and releases in current phase. This is what we have done in the past. But 
looks not everyone is on the same page.
CheersLei






On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase"  wrote:










Its fine if libhdfs3 is a third party license, and is treated that way.

However, why does Apache HAWQ want to be dependent on some strange 3rd
party library with no transparency?

We are having enough difficulties just getting our first release out.

Is there a compelling reason why we need to keep up with the independently
developed libhdfs3 project?  Are they willing to make necessary changes so
that they are compatible with ASF's strict-for-a-good-reason policies?

Can we fork hdfs3 for Apache HAWQ's purposes in Apache?

If any libhdfs3 committers are also part of Apache HAWQ, perhaps you can
shed some light on the viability of this as an independent project since I
only see 4 contributors.

-Greg

On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu  wrote:

> In my opinion, I think it is reasonable to transfer the third-party repo of
> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ build, but
> also for the consideration of ASF project. So for HAWQ project, I am with
> Roman.
>
> But my concern is the current users of libhdfs3 and all the pull requests,
> wiki docs and issues. Another uncertain aspect from my perspective is that
> although HAWQ could not run without libhdfs3, libhdfs3 could be used in
> other open source projects, that might be the true meaning of making
> libhdfs3 open source at the beginning.
>
> In summary, if it is really against the spirit of a ASF project for HAWQ, a
> suggested way might be marking original libhdfs3 repo as a legacy repo in
> stead of remove it.
>
> Best
> Hong
>
> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
>
> > Currently libhdfs3’s official code is not the same as in HAWQ. Some new
> > code does not copy into HAWQ.  I do not think code change of libhdfs3
> > should follow HAWQ’s commit process because  many change are not related
> to
> > HAWQ.
> >
> > From HAWQ side, I suggest to keep the stable version of its third-party
> > libraries and copy new libhdfs3’s code only when it is necessary.
> >
> > libhdfs3 was open source years before HAWQ incubating with a separated
> > permission of its authority. So in my opinion it is a third party and it
> > actually was a third party before HAWQ incubating. And HAWQ is not the
> only
> > user.
> >
> >
> >
> > Best Regards
> >
> > Zhanwei Wang
> > wan...@apache.org
> >
> >
> >
> > > 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
> > >
> > > On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang 
> wrote:
> > >> Hi Roman
> > >>
> > >> libhdfs3 works as third-party library of HAWQ, Just for the
> convenience
> > of HAWQ release
> > >> process we copy its code into HAWQ.  The reason is that HAWQ used to
> > dependent on
> > >> specific version of libhdfs3 and libhdfs3 only distribute as source
> > code and the build process is complicated.
> > >
> > > I actually don't buy this argument. libhdfs3 is not an optional
> > > dependency for HAWQ
> > > like ORCA is (for example). Without libhdfs3 there's pretty tough to
> > > imagine HAWQ.
> > > As such the code base needs to be governed as part of the ASF project,
> > > not a random
> > > GitHub dependency.
> > >
> > > IOW, let me ask you this: were all the changes that went into libhdfs3
> > > that is part of
> > > HAWQ discussed and reviewed via the ASF development process or did you
> > just
> > > import them from time to time as this comment suggests:
> > >https://issues.apache.org/jira/browse/HAWQ-1046?
> > focusedCommentId=15489669=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
> > > ?
> > >
> > >> I do not think we have any reason to shutdown a third party’s official
> > repository.
> > >
> > > You say 3d party as though its not just you guys maintaining it on the
> > side.
> > >
> > >> We also copy google test source code into HAWQ, just as what we did
> for
> > libhdfs3.
> > >
> > > But this is very different. You don't do any development (certainly
> > > you don't do any
> > > non-trivial development) of that code.
> > >
> > >> libhdfs3 open source under Apache license version 2 just the same as
> > HAWQ. So I believe there is no license issue.
> > >
> > > You're correct. There's no licensing issue but there's a pretty
> > significant
> > > governance issue.
> > >
> > > Thanks,
> > > Roman.
> > >
> >
> >
>







Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Greg Chase
Its fine if libhdfs3 is a third party license, and is treated that way.

However, why does Apache HAWQ want to be dependent on some strange 3rd
party library with no transparency?

We are having enough difficulties just getting our first release out.

Is there a compelling reason why we need to keep up with the independently
developed libhdfs3 project?  Are they willing to make necessary changes so
that they are compatible with ASF's strict-for-a-good-reason policies?

Can we fork hdfs3 for Apache HAWQ's purposes in Apache?

If any libhdfs3 committers are also part of Apache HAWQ, perhaps you can
shed some light on the viability of this as an independent project since I
only see 4 contributors.

-Greg

On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu  wrote:

> In my opinion, I think it is reasonable to transfer the third-party repo of
> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ build, but
> also for the consideration of ASF project. So for HAWQ project, I am with
> Roman.
>
> But my concern is the current users of libhdfs3 and all the pull requests,
> wiki docs and issues. Another uncertain aspect from my perspective is that
> although HAWQ could not run without libhdfs3, libhdfs3 could be used in
> other open source projects, that might be the true meaning of making
> libhdfs3 open source at the beginning.
>
> In summary, if it is really against the spirit of a ASF project for HAWQ, a
> suggested way might be marking original libhdfs3 repo as a legacy repo in
> stead of remove it.
>
> Best
> Hong
>
> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
>
> > Currently libhdfs3’s official code is not the same as in HAWQ. Some new
> > code does not copy into HAWQ.  I do not think code change of libhdfs3
> > should follow HAWQ’s commit process because  many change are not related
> to
> > HAWQ.
> >
> > From HAWQ side, I suggest to keep the stable version of its third-party
> > libraries and copy new libhdfs3’s code only when it is necessary.
> >
> > libhdfs3 was open source years before HAWQ incubating with a separated
> > permission of its authority. So in my opinion it is a third party and it
> > actually was a third party before HAWQ incubating. And HAWQ is not the
> only
> > user.
> >
> >
> >
> > Best Regards
> >
> > Zhanwei Wang
> > wan...@apache.org
> >
> >
> >
> > > 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
> > >
> > > On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang 
> wrote:
> > >> Hi Roman
> > >>
> > >> libhdfs3 works as third-party library of HAWQ, Just for the
> convenience
> > of HAWQ release
> > >> process we copy its code into HAWQ.  The reason is that HAWQ used to
> > dependent on
> > >> specific version of libhdfs3 and libhdfs3 only distribute as source
> > code and the build process is complicated.
> > >
> > > I actually don't buy this argument. libhdfs3 is not an optional
> > > dependency for HAWQ
> > > like ORCA is (for example). Without libhdfs3 there's pretty tough to
> > > imagine HAWQ.
> > > As such the code base needs to be governed as part of the ASF project,
> > > not a random
> > > GitHub dependency.
> > >
> > > IOW, let me ask you this: were all the changes that went into libhdfs3
> > > that is part of
> > > HAWQ discussed and reviewed via the ASF development process or did you
> > just
> > > import them from time to time as this comment suggests:
> > >https://issues.apache.org/jira/browse/HAWQ-1046?
> > focusedCommentId=15489669=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
> > > ?
> > >
> > >> I do not think we have any reason to shutdown a third party’s official
> > repository.
> > >
> > > You say 3d party as though its not just you guys maintaining it on the
> > side.
> > >
> > >> We also copy google test source code into HAWQ, just as what we did
> for
> > libhdfs3.
> > >
> > > But this is very different. You don't do any development (certainly
> > > you don't do any
> > > non-trivial development) of that code.
> > >
> > >> libhdfs3 open source under Apache license version 2 just the same as
> > HAWQ. So I believe there is no license issue.
> > >
> > > You're correct. There's no licensing issue but there's a pretty
> > significant
> > > governance issue.
> > >
> > > Thanks,
> > > Roman.
> > >
> >
> >
>


Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Zhanwei Wang
Currently libhdfs3’s official code is not the same as in HAWQ. Some new code 
does not copy into HAWQ.  I do not think code change of libhdfs3 should follow 
HAWQ’s commit process because  many change are not related to HAWQ. 

From HAWQ side, I suggest to keep the stable version of its third-party 
libraries and copy new libhdfs3’s code only when it is necessary.

libhdfs3 was open source years before HAWQ incubating with a separated 
permission of its authority. So in my opinion it is a third party and it 
actually was a third party before HAWQ incubating. And HAWQ is not the only 
user.



Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
> 
> On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang  wrote:
>> Hi Roman
>> 
>> libhdfs3 works as third-party library of HAWQ, Just for the convenience of 
>> HAWQ release
>> process we copy its code into HAWQ.  The reason is that HAWQ used to 
>> dependent on
>> specific version of libhdfs3 and libhdfs3 only distribute as source code and 
>> the build process is complicated.
> 
> I actually don't buy this argument. libhdfs3 is not an optional
> dependency for HAWQ
> like ORCA is (for example). Without libhdfs3 there's pretty tough to
> imagine HAWQ.
> As such the code base needs to be governed as part of the ASF project,
> not a random
> GitHub dependency.
> 
> IOW, let me ask you this: were all the changes that went into libhdfs3
> that is part of
> HAWQ discussed and reviewed via the ASF development process or did you just
> import them from time to time as this comment suggests:
>
> https://issues.apache.org/jira/browse/HAWQ-1046?focusedCommentId=15489669=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
> ?
> 
>> I do not think we have any reason to shutdown a third party’s official 
>> repository.
> 
> You say 3d party as though its not just you guys maintaining it on the side.
> 
>> We also copy google test source code into HAWQ, just as what we did for 
>> libhdfs3.
> 
> But this is very different. You don't do any development (certainly
> you don't do any
> non-trivial development) of that code.
> 
>> libhdfs3 open source under Apache license version 2 just the same as HAWQ. 
>> So I believe there is no license issue.
> 
> You're correct. There's no licensing issue but there's a pretty significant
> governance issue.
> 
> Thanks,
> Roman.
> 



Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Roman Shaposhnik
On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang  wrote:
> Hi Roman
>
> libhdfs3 works as third-party library of HAWQ, Just for the convenience of 
> HAWQ release
> process we copy its code into HAWQ.  The reason is that HAWQ used to 
> dependent on
> specific version of libhdfs3 and libhdfs3 only distribute as source code and 
> the build process is complicated.

I actually don't buy this argument. libhdfs3 is not an optional
dependency for HAWQ
like ORCA is (for example). Without libhdfs3 there's pretty tough to
imagine HAWQ.
As such the code base needs to be governed as part of the ASF project,
not a random
GitHub dependency.

IOW, let me ask you this: were all the changes that went into libhdfs3
that is part of
HAWQ discussed and reviewed via the ASF development process or did you just
import them from time to time as this comment suggests:

https://issues.apache.org/jira/browse/HAWQ-1046?focusedCommentId=15489669=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
?

> I do not think we have any reason to shutdown a third party’s official 
> repository.

You say 3d party as though its not just you guys maintaining it on the side.

> We also copy google test source code into HAWQ, just as what we did for 
> libhdfs3.

But this is very different. You don't do any development (certainly
you don't do any
non-trivial development) of that code.

> libhdfs3 open source under Apache license version 2 just the same as HAWQ. So 
> I believe there is no license issue.

You're correct. There's no licensing issue but there's a pretty significant
governance issue.

Thanks,
Roman.


Re: libhdfs3 development is still going on outside of ASF

2016-09-14 Thread Zhanwei Wang
Hi Roman

libhdfs3 works as third-party library of HAWQ, Just for the convenience of HAWQ 
release process we copy its code into HAWQ.  The reason is that HAWQ used to 
dependent on specific version of libhdfs3 and libhdfs3 only distribute as 
source code and the build process is complicated.

I do not think we have any reason to shutdown a third party’s official 
repository. We also copy google test source code into HAWQ, just as what we did 
for libhdfs3.

libhdfs3 open source under Apache license version 2 just the same as HAWQ. So I 
believe there is no license issue. 


Best Regards

Zhanwei Wang
wan...@apache.org



> 在 2016年9月15日,上午8:42,Roman Shaposhnik  写道:
> 
> Hi!
> 
> a good discussion over at:
>https://issues.apache.org/jira/browse/HAWQ-1046
> highlighted the fact that there still seems to be
> non-trivial HAWQ development going on outside of
> ASF repos. This is contrary to my understanding and
> we have to come up with a plan of how to shut down
> that repo (or make it a read-only mirror).
> 
> If you really need a separate repo we can create
> an extra one on the ASF side, but at this point it is
> likely going to complicate your release mechanics
> which I really don't recommend until you get a few
> releases under your belt.
> 
> If there's any other outstanding issues -- lets discuss
> those on this thread.
> 
> Thanks,
> Roman.
>