[jira] [Created] (ARROW-5009) [C++] Cleanup using to std::* in files

2019-03-25 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5009:
--

 Summary: [C++] Cleanup using to std::* in files
 Key: ARROW-5009
 URL: https://issues.apache.org/jira/browse/ARROW-5009
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Gandiva
Reporter: Micah Kornfield
Assignee: Micah Kornfield


This can also affect parquet.

We inconsistently use std::string, std::vector and std::shared_ptr, this will 
be an attempt to consistently use std::* instead of do "use std::vector".  This 
is more of suggestion, so if people are opposed to it (or some of the changes). 
 i'm OK not checking them in.  For now I plan on doing one pull request which 
will include parquet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Wes McKinney
I just pushed an update to https://github.com/apache/arrow/pull/3988
to do as much as we can to handle Dask compatibility with partitioned
Parquet files. As soon as the build passes I think that can be merged,
further fixes will probably have to happen in Dask. From my
perspective we are faithfully storing pandas DataFrame objects in
Parquet files and so some logic will have to be implemented to
interpret the RangeIndex metadata rather than deserializing a column
of sequential integers in the file.

I added ARROW-4995 to the 0.13 milestone per Javier's comments. Maybe
let's try to resolve that tomorrow if we can

On Mon, Mar 25, 2019 at 7:50 PM Wes McKinney  wrote:
>
> I think it's fine. The ASF's relationship with CRAN is a bit fuzzy
> historically due to issues with GPL so if Romain wants to be listed as
> the maintainer of the package that seems OK to me
>
> On Mon, Mar 25, 2019 at 7:46 PM Javier Luraschi  wrote:
> >
> > For R, we should be able to make "Artifacts produced out of signed /
> > voted source releases" work.
> >
> > The way I understand this is that will check-in all the changes to
> > the arrow project and once the source releases are signed/voted,
> > we would use those to rebuild and publish in CRAN.
> >
> > Is there any restriction as to who submits the binaries to a package
> > manager? I'm hoping Romain would be able to do this himself
> > when releasing to CRAN, would that be acceptable?
> >
> >
> > On Mon, Mar 25, 2019 at 3:44 PM Wes McKinney  wrote:
> >
> > > hi Javier,
> > >
> > > On Mon, Mar 25, 2019 at 4:54 PM Javier Luraschi 
> > > wrote:
> > > >
> > > > From the [R] side, we ideally need this one merged:
> > > > https://github.com/apache/arrow/pull/4011; however
> > > > Romain is working on a follow up commit. If the
> > > > Apache Arrow project allows it, we can also release
> > > > in CRAN from this PR without merging, but Im sure
> > > > we prefer to have everything checked-in when
> > > > pushing changes to package managers, like CRAN.
> > >
> > > From the perspective of the Arrow PMC we can't have anything to do
> > > with any binary or source artifacts published by third parties outside
> > > of our release process. Official ASF releases are the
> > > signed-and-verified artifacts that the PMC votes on. In external
> > > package managers our order of preference would be:
> > >
> > > * Binary artifacts that are signed and voted on by the community (we
> > > understand this is not possible in all cases -- e.g. we cannot publish
> > > our binary artifacts directly to conda-forge)
> > > * Artifacts produced out of signed/voted source releases (e.g. using
> > > the conda-forge example, we can use the trusted source artifact for
> > > producing the binaries)
> > > * Artifacts resulting from patched releases (YMMV)
> > >
> > > For example, we create NPM packages using the JavaScript release
> > > artifact such as
> > >
> > > https://www.npmjs.com/package/apache-arrow/v/0.4.1
> > >
> > > The issue is one of trust. Given the many examples of exploits caused
> > > by untrusted package artifacts (e.g. recently [1], [2]), it is better
> > > when users are installing software which has been verified to have
> > > been produced by a member of the Arrow PMC and signed by a GPG key in
> > > our web of trust.
> > >
> > > [1]:
> > > https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes
> > > [2]:
> > > https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
> > >
> > > >
> > > > On Mon, Mar 25, 2019 at 2:11 PM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > I'll gladly help Kou. You've postponed the the packaging issues,
> > > > > but at least the wheel builds must pass, I'm still working on it.
> > > > >
> > > > > - K
> > > > >
> > > > > On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > It seems that nobody can be a release manager.
> > > > > > I'll be the release manager for 0.13.0. But I don't have
> > > > > > enough time. Could someone help me?
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > kou
> > > > > >
> > > > > > In <
> > > cahm19a61d-b1evtp1woajztbpwz5bvao_jtquqwe8hsatwc...@mail.gmail.com>
> > > > > >   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46
> > > > > +0100,
> > > > > >   Krisztián Szűcs  wrote:
> > > > > >
> > > > > > > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> > > > > > wrote:
> > > > > > >
> > > > > > >> If Gandiva is causing packaging problems for 0.13 as a result of
> > > the
> > > > > > CMake
> > > > > > >> refactor, I suggest that we drop it from packages and plan to
> > > resolve
> > > > > > for
> > > > > > >> 0.14. So I think we should set a pretty strict time box for
> > > resolution
> > > > > > of
> > > > > > >> these issues (eg end of day Tuesday)
> > > > > > >>
> > > > > > > Sounds good to me. Do We have a release manager?
> > > > > > >
> > > > > > >>
> > >

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Wes McKinney
I think it's fine. The ASF's relationship with CRAN is a bit fuzzy
historically due to issues with GPL so if Romain wants to be listed as
the maintainer of the package that seems OK to me

On Mon, Mar 25, 2019 at 7:46 PM Javier Luraschi  wrote:
>
> For R, we should be able to make "Artifacts produced out of signed /
> voted source releases" work.
>
> The way I understand this is that will check-in all the changes to
> the arrow project and once the source releases are signed/voted,
> we would use those to rebuild and publish in CRAN.
>
> Is there any restriction as to who submits the binaries to a package
> manager? I'm hoping Romain would be able to do this himself
> when releasing to CRAN, would that be acceptable?
>
>
> On Mon, Mar 25, 2019 at 3:44 PM Wes McKinney  wrote:
>
> > hi Javier,
> >
> > On Mon, Mar 25, 2019 at 4:54 PM Javier Luraschi 
> > wrote:
> > >
> > > From the [R] side, we ideally need this one merged:
> > > https://github.com/apache/arrow/pull/4011; however
> > > Romain is working on a follow up commit. If the
> > > Apache Arrow project allows it, we can also release
> > > in CRAN from this PR without merging, but Im sure
> > > we prefer to have everything checked-in when
> > > pushing changes to package managers, like CRAN.
> >
> > From the perspective of the Arrow PMC we can't have anything to do
> > with any binary or source artifacts published by third parties outside
> > of our release process. Official ASF releases are the
> > signed-and-verified artifacts that the PMC votes on. In external
> > package managers our order of preference would be:
> >
> > * Binary artifacts that are signed and voted on by the community (we
> > understand this is not possible in all cases -- e.g. we cannot publish
> > our binary artifacts directly to conda-forge)
> > * Artifacts produced out of signed/voted source releases (e.g. using
> > the conda-forge example, we can use the trusted source artifact for
> > producing the binaries)
> > * Artifacts resulting from patched releases (YMMV)
> >
> > For example, we create NPM packages using the JavaScript release
> > artifact such as
> >
> > https://www.npmjs.com/package/apache-arrow/v/0.4.1
> >
> > The issue is one of trust. Given the many examples of exploits caused
> > by untrusted package artifacts (e.g. recently [1], [2]), it is better
> > when users are installing software which has been verified to have
> > been produced by a member of the Arrow PMC and signed by a GPG key in
> > our web of trust.
> >
> > [1]:
> > https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes
> > [2]:
> > https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
> >
> > >
> > > On Mon, Mar 25, 2019 at 2:11 PM Krisztián Szűcs <
> > szucs.kriszt...@gmail.com>
> > > wrote:
> > >
> > > > Hey,
> > > >
> > > > I'll gladly help Kou. You've postponed the the packaging issues,
> > > > but at least the wheel builds must pass, I'm still working on it.
> > > >
> > > > - K
> > > >
> > > > On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > It seems that nobody can be a release manager.
> > > > > I'll be the release manager for 0.13.0. But I don't have
> > > > > enough time. Could someone help me?
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > kou
> > > > >
> > > > > In <
> > cahm19a61d-b1evtp1woajztbpwz5bvao_jtquqwe8hsatwc...@mail.gmail.com>
> > > > >   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46
> > > > +0100,
> > > > >   Krisztián Szűcs  wrote:
> > > > >
> > > > > > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> > > > > wrote:
> > > > > >
> > > > > >> If Gandiva is causing packaging problems for 0.13 as a result of
> > the
> > > > > CMake
> > > > > >> refactor, I suggest that we drop it from packages and plan to
> > resolve
> > > > > for
> > > > > >> 0.14. So I think we should set a pretty strict time box for
> > resolution
> > > > > of
> > > > > >> these issues (eg end of day Tuesday)
> > > > > >>
> > > > > > Sounds good to me. Do We have a release manager?
> > > > > >
> > > > > >>
> > > > > >> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs <
> > > > > szucs.kriszt...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hello all,
> > > > > >> >
> > > > > >> > I'm working on the conda and wheel builds:
> > > > > >> > https://github.com/apache/arrow/pull/3832
> > > > > >> > https://github.com/apache/arrow/pull/4024
> > > > > >> > These must pass before We can cut the release.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney <
> > wesmck...@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi folks,
> > > > > >> > >
> > > > > >> > > I think we should close the 0.13 backlog and try to get an
> > RC0 out
> > > > > >> ASAP.
> > > > > >> > > What work must get done before that happens?
> > > > > >> > >
> > > > > >> > > I intend to sort out ARROW-4872 today.
> > > > > >> > >
> > > > > >> > > Thanks
> > > > > >> > > 

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Javier Luraschi
For R, we should be able to make "Artifacts produced out of signed /
voted source releases" work.

The way I understand this is that will check-in all the changes to
the arrow project and once the source releases are signed/voted,
we would use those to rebuild and publish in CRAN.

Is there any restriction as to who submits the binaries to a package
manager? I'm hoping Romain would be able to do this himself
when releasing to CRAN, would that be acceptable?


On Mon, Mar 25, 2019 at 3:44 PM Wes McKinney  wrote:

> hi Javier,
>
> On Mon, Mar 25, 2019 at 4:54 PM Javier Luraschi 
> wrote:
> >
> > From the [R] side, we ideally need this one merged:
> > https://github.com/apache/arrow/pull/4011; however
> > Romain is working on a follow up commit. If the
> > Apache Arrow project allows it, we can also release
> > in CRAN from this PR without merging, but Im sure
> > we prefer to have everything checked-in when
> > pushing changes to package managers, like CRAN.
>
> From the perspective of the Arrow PMC we can't have anything to do
> with any binary or source artifacts published by third parties outside
> of our release process. Official ASF releases are the
> signed-and-verified artifacts that the PMC votes on. In external
> package managers our order of preference would be:
>
> * Binary artifacts that are signed and voted on by the community (we
> understand this is not possible in all cases -- e.g. we cannot publish
> our binary artifacts directly to conda-forge)
> * Artifacts produced out of signed/voted source releases (e.g. using
> the conda-forge example, we can use the trusted source artifact for
> producing the binaries)
> * Artifacts resulting from patched releases (YMMV)
>
> For example, we create NPM packages using the JavaScript release
> artifact such as
>
> https://www.npmjs.com/package/apache-arrow/v/0.4.1
>
> The issue is one of trust. Given the many examples of exploits caused
> by untrusted package artifacts (e.g. recently [1], [2]), it is better
> when users are installing software which has been verified to have
> been produced by a member of the Arrow PMC and signed by a GPG key in
> our web of trust.
>
> [1]:
> https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes
> [2]:
> https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
>
> >
> > On Mon, Mar 25, 2019 at 2:11 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> > > Hey,
> > >
> > > I'll gladly help Kou. You've postponed the the packaging issues,
> > > but at least the wheel builds must pass, I'm still working on it.
> > >
> > > - K
> > >
> > > On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > It seems that nobody can be a release manager.
> > > > I'll be the release manager for 0.13.0. But I don't have
> > > > enough time. Could someone help me?
> > > >
> > > > Thanks,
> > > > --
> > > > kou
> > > >
> > > > In <
> cahm19a61d-b1evtp1woajztbpwz5bvao_jtquqwe8hsatwc...@mail.gmail.com>
> > > >   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46
> > > +0100,
> > > >   Krisztián Szűcs  wrote:
> > > >
> > > > > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> > > > wrote:
> > > > >
> > > > >> If Gandiva is causing packaging problems for 0.13 as a result of
> the
> > > > CMake
> > > > >> refactor, I suggest that we drop it from packages and plan to
> resolve
> > > > for
> > > > >> 0.14. So I think we should set a pretty strict time box for
> resolution
> > > > of
> > > > >> these issues (eg end of day Tuesday)
> > > > >>
> > > > > Sounds good to me. Do We have a release manager?
> > > > >
> > > > >>
> > > > >> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs <
> > > > szucs.kriszt...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hello all,
> > > > >> >
> > > > >> > I'm working on the conda and wheel builds:
> > > > >> > https://github.com/apache/arrow/pull/3832
> > > > >> > https://github.com/apache/arrow/pull/4024
> > > > >> > These must pass before We can cut the release.
> > > > >> >
> > > > >> >
> > > > >> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney <
> wesmck...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > > I think we should close the 0.13 backlog and try to get an
> RC0 out
> > > > >> ASAP.
> > > > >> > > What work must get done before that happens?
> > > > >> > >
> > > > >> > > I intend to sort out ARROW-4872 today.
> > > > >> > >
> > > > >> > > Thanks
> > > > >> > > Wes
> > > > >> > >
> > > > >> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette <
> hulet...@gmail.com>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > I think that makes sense. I would really like to make JS
> part of
> > > > the
> > > > >> > > > mainstream releases, but we already have JS-0.4.1 ready to
> go
> > > [1]
> > > > >> with
> > > > >> > > > primarily bugfixes for JS-0.4.0. I think we should just cut
> that
> > > > and
> > > > >> > > > integrate JS in 0.14.
> > > > >> > > >
> > > > >> > > >

Re: [Python] The next manylinux specification

2019-03-25 Thread Robert Nishihara
Thanks for posting the thread. This is great!

On Mon, Mar 25, 2019 at 9:04 AM Wes McKinney  wrote:

> Thanks Antoine for alerting us to this thread. It's important that our
> interests are represented in this discussion given the problems we've
> had with interactions with the TensorFlow and PyTorch wheels. Please
> let me know if I can help.
>
> Robert and Philipp, can you keep an eye out on this also?
>
> On Fri, Mar 22, 2019 at 4:12 PM Antoine Pitrou  wrote:
> >
> >
> > For those who are interested in discussing it:
> >
> > https://discuss.python.org/t/the-next-manylinux-specification/1043
> >
> > Regards
> >
> > Antoine.
>


[jira] [Created] (ARROW-5008) ORC Reader Core Dumps in PyArrow if `/etc/localtime` does not exist

2019-03-25 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-5008:
--

 Summary: ORC Reader Core Dumps in PyArrow if `/etc/localtime` does 
not exist
 Key: ARROW-5008
 URL: https://issues.apache.org/jira/browse/ARROW-5008
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.12.1, 0.12.0
Reporter: Keith Kraus


In docker containers it's common for `/etc/localtime` to not exist, and if it 
doesn't exist it causes a file not found error which is not handled in PyArrow. 
Workaround is to install `tzdata` into the container (at least for Ubuntu), but 
wanted to report upstream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Wes McKinney
hi Javier,

On Mon, Mar 25, 2019 at 4:54 PM Javier Luraschi  wrote:
>
> From the [R] side, we ideally need this one merged:
> https://github.com/apache/arrow/pull/4011; however
> Romain is working on a follow up commit. If the
> Apache Arrow project allows it, we can also release
> in CRAN from this PR without merging, but Im sure
> we prefer to have everything checked-in when
> pushing changes to package managers, like CRAN.

>From the perspective of the Arrow PMC we can't have anything to do
with any binary or source artifacts published by third parties outside
of our release process. Official ASF releases are the
signed-and-verified artifacts that the PMC votes on. In external
package managers our order of preference would be:

* Binary artifacts that are signed and voted on by the community (we
understand this is not possible in all cases -- e.g. we cannot publish
our binary artifacts directly to conda-forge)
* Artifacts produced out of signed/voted source releases (e.g. using
the conda-forge example, we can use the trusted source artifact for
producing the binaries)
* Artifacts resulting from patched releases (YMMV)

For example, we create NPM packages using the JavaScript release
artifact such as

https://www.npmjs.com/package/apache-arrow/v/0.4.1

The issue is one of trust. Given the many examples of exploits caused
by untrusted package artifacts (e.g. recently [1], [2]), it is better
when users are installing software which has been verified to have
been produced by a member of the Arrow PMC and signed by a GPG key in
our web of trust.

[1]: https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes
[2]: 
https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident

>
> On Mon, Mar 25, 2019 at 2:11 PM Krisztián Szűcs 
> wrote:
>
> > Hey,
> >
> > I'll gladly help Kou. You've postponed the the packaging issues,
> > but at least the wheel builds must pass, I'm still working on it.
> >
> > - K
> >
> > On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou  wrote:
> >
> > > Hi,
> > >
> > > It seems that nobody can be a release manager.
> > > I'll be the release manager for 0.13.0. But I don't have
> > > enough time. Could someone help me?
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In 
> > >   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46
> > +0100,
> > >   Krisztián Szűcs  wrote:
> > >
> > > > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> > > wrote:
> > > >
> > > >> If Gandiva is causing packaging problems for 0.13 as a result of the
> > > CMake
> > > >> refactor, I suggest that we drop it from packages and plan to resolve
> > > for
> > > >> 0.14. So I think we should set a pretty strict time box for resolution
> > > of
> > > >> these issues (eg end of day Tuesday)
> > > >>
> > > > Sounds good to me. Do We have a release manager?
> > > >
> > > >>
> > > >> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hello all,
> > > >> >
> > > >> > I'm working on the conda and wheel builds:
> > > >> > https://github.com/apache/arrow/pull/3832
> > > >> > https://github.com/apache/arrow/pull/4024
> > > >> > These must pass before We can cut the release.
> > > >> >
> > > >> >
> > > >> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney 
> > > >> wrote:
> > > >> >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > > I think we should close the 0.13 backlog and try to get an RC0 out
> > > >> ASAP.
> > > >> > > What work must get done before that happens?
> > > >> > >
> > > >> > > I intend to sort out ARROW-4872 today.
> > > >> > >
> > > >> > > Thanks
> > > >> > > Wes
> > > >> > >
> > > >> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette 
> > > >> wrote:
> > > >> > >
> > > >> > > > I think that makes sense. I would really like to make JS part of
> > > the
> > > >> > > > mainstream releases, but we already have JS-0.4.1 ready to go
> > [1]
> > > >> with
> > > >> > > > primarily bugfixes for JS-0.4.0. I think we should just cut that
> > > and
> > > >> > > > integrate JS in 0.14.
> > > >> > > >
> > > >> > > > [1]
> > > https://issues.apache.org/jira/projects/ARROW/versions/12344961
> > > >> > > >
> > > >> > > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney <
> > wesmck...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > In light of the discussion on
> > > >> > > > > https://github.com/apache/arrow/pull/3630 I think we should
> > > wait
> > > >> > until
> > > >> > > > > we have a "not broken" JavaScript-only release on NPM and have
> > > >> > > > > confidence that we can respond to the community's needs
> > > >> > > > >
> > > >> > > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor <
> > > ptay...@apache.org>
> > > >> > > wrote:
> > > >> > > > > >
> > > >> > > > > > I agree, the JS has matured a lot in the last few months. I
> > > think
> > > >> > > it's
> > > >> > > > > > ready to join the regular Arrow releases. Let me know if I
> > can
> > > >> help
> > > >> > > > > > integrate the publish 

Re: [R] Improving documentation and transparency for Arrow build and packaging work for R

2019-03-25 Thread Javier Luraschi
I signed up as "Javier Luraschi" with this email, if you could please
give me access that would be great. Thanks!

I'm assuming the CRAN documentation would go under:
https://cwiki.apache.org/confluence/display/ARROW/Distribution+Packages
I'll start adding it when I get access.

Yes, I mean https://github.com/apache/arrow/pull/3932.

Regarding "The challenge I see is that the development procedure is being
commingled with packaging issues.". Yes, I agree! Let me send a PR to fix
that
as well. If a developer properly sets up the RTools development environment,
they should not need to rely or rwinlibs.

Regarding "How would you suggest testing release", this would be
addressed with the previous comment. As in, there needs to be support
from building the RTools binaries locally. I'll work on this and follow up
with the
PR/JIRA-issue once it's ready.

Regarding "Seems like this should be turned into a Crossbow task", right;
however, I'm limited in time here. I'll open a Jira issue to get some help
from
the community. I see this as a nice-to-have and less of a must-have, but
I'll
certainly add this to the confluent docs.

Regarding "If there a way to simulate this environment locally?", yes, this
is
called "R CMD check --as-cran" I'll add it to the confluent docs as well.

Regarding, " let's definitely copy this information into a page on the
wiki", for
sure.

Regarding, "Given how manual the process is right now it seems like
there's a solid chance that something will be broken after the 0.13", we
need more automation and have maintainers used to building RTool
binaries, etc. so year, probably the 0.13 will be rough but we will have
to go through this and get better over time, not sure we can automate
everything on a first release.

Yes, I'll reply to "can you reply on the "Timeline for 0.13 release".

I think pending docs and PR to decouple builds from release, this would
address most of these concerns, correct? Otherwise, let me know.

Regarding, "can you reply on the Timeline for 0.13 release". Replied and
yes, I just marked the remaining JIRA issue as required for 1.13.

Best, Javier


On Mon, Mar 25, 2019 at 1:33 PM Wes McKinney  wrote:

> hi Javier,
>
> Thank you for writing back.
>
> On Mon, Mar 25, 2019 at 12:41 PM Javier Luraschi 
> wrote:
> >
> > Hi Wes, sorry for the delay I haven't been monitoring this DL
> proactively.
>
> Yes, I highly recommend setting up some e-mail filters so anything
> with "[R]" in the subject title lands in your inbox. You can also
> separate "[jira]" messages with a separate filter; there isn't very
> much list traffic if you split off the new issue notifications.
>
> >
> > Please notice that I'm not the expert in this topic, so I'll share as
> much
> > information
> > as I can but others with more expertise should feel free comment as well.
> > Please
> > also note that some of the restrictions we have are common practices in
> > R packages that are out of our control, at least without significant
> > investment.
> >
> > I'll document what I know in this email, but please let me know if there
> is
> > a wiki
> > or a better place to move this documentation into.
> >
>
> Yes, let's definitely stash all of the build and packaging information
> on our wiki at
>
> https://cwiki.apache.org/confluence/display/ARROW
>
> If you let me know your ASF Confluence username I will give you edit
> permissions
>
> > ## Background
> >
> > CRAN, The Comprehensive R Archive Network, is the most popular (primary)
> > package repo for the R community. You can think of CRAN as Homebrew or
> > pip.org. CRAN encourages cross-platform packages to be submitted and to
> > ease compilation and testing, provide support to precompile binaries for
> OS
> > X
> > and Windows. We will focus now on Windows specifics from now on.
> >
> > CRAN and R rely on a set of tools based on Mingw to easily compile
> packages
> > in Windows, this tools set is known as RTools. Originally, Prof. Brian
> > Ripley and
> > Duncan Murdoch put this toolset together; however, Jeroen Ooms is it
> current
> > maintainer. RTools is based on Mingw but from past experience, not
> > completely
> > interchangeable with the standard Mingw distribution. I'm afraid I don't
> > have the
> > details but this is mostly related to specific packages, versions and
> > compilers
> > included in Rtools. It's possible to match a Mingw environment with
> RTools
> > but
> > this is, in general, not a straightforward task.
>
> It would be good to have some links (on a wiki page) to any additional
> information about this.
>
> >
> > A few months ago, I naively tried to accomplish this work myself. As in,
> get
> > RTools to compile Apache Arrow, how hard can it be? It's hard to explain
> > all the caveats in a single mail, but if you are interested, you can read
> > my own exploration of possible solutions to this problem in this gist
> > writeup [1].
> >
> > The outcome of this investigation, at least for me and my limited
> knowledge
> > was
> > 

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Javier Luraschi
>From the [R] side, we ideally need this one merged:
https://github.com/apache/arrow/pull/4011; however
Romain is working on a follow up commit. If the
Apache Arrow project allows it, we can also release
in CRAN from this PR without merging, but Im sure
we prefer to have everything checked-in when
pushing changes to package managers, like CRAN.

On Mon, Mar 25, 2019 at 2:11 PM Krisztián Szűcs 
wrote:

> Hey,
>
> I'll gladly help Kou. You've postponed the the packaging issues,
> but at least the wheel builds must pass, I'm still working on it.
>
> - K
>
> On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou  wrote:
>
> > Hi,
> >
> > It seems that nobody can be a release manager.
> > I'll be the release manager for 0.13.0. But I don't have
> > enough time. Could someone help me?
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46
> +0100,
> >   Krisztián Szűcs  wrote:
> >
> > > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> > wrote:
> > >
> > >> If Gandiva is causing packaging problems for 0.13 as a result of the
> > CMake
> > >> refactor, I suggest that we drop it from packages and plan to resolve
> > for
> > >> 0.14. So I think we should set a pretty strict time box for resolution
> > of
> > >> these issues (eg end of day Tuesday)
> > >>
> > > Sounds good to me. Do We have a release manager?
> > >
> > >>
> > >> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs <
> > szucs.kriszt...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hello all,
> > >> >
> > >> > I'm working on the conda and wheel builds:
> > >> > https://github.com/apache/arrow/pull/3832
> > >> > https://github.com/apache/arrow/pull/4024
> > >> > These must pass before We can cut the release.
> > >> >
> > >> >
> > >> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney 
> > >> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > > I think we should close the 0.13 backlog and try to get an RC0 out
> > >> ASAP.
> > >> > > What work must get done before that happens?
> > >> > >
> > >> > > I intend to sort out ARROW-4872 today.
> > >> > >
> > >> > > Thanks
> > >> > > Wes
> > >> > >
> > >> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette 
> > >> wrote:
> > >> > >
> > >> > > > I think that makes sense. I would really like to make JS part of
> > the
> > >> > > > mainstream releases, but we already have JS-0.4.1 ready to go
> [1]
> > >> with
> > >> > > > primarily bugfixes for JS-0.4.0. I think we should just cut that
> > and
> > >> > > > integrate JS in 0.14.
> > >> > > >
> > >> > > > [1]
> > https://issues.apache.org/jira/projects/ARROW/versions/12344961
> > >> > > >
> > >> > > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney <
> wesmck...@gmail.com
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > In light of the discussion on
> > >> > > > > https://github.com/apache/arrow/pull/3630 I think we should
> > wait
> > >> > until
> > >> > > > > we have a "not broken" JavaScript-only release on NPM and have
> > >> > > > > confidence that we can respond to the community's needs
> > >> > > > >
> > >> > > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor <
> > ptay...@apache.org>
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > > I agree, the JS has matured a lot in the last few months. I
> > think
> > >> > > it's
> > >> > > > > > ready to join the regular Arrow releases. Let me know if I
> can
> > >> help
> > >> > > > > > integrate the publish scripts :-)
> > >> > > > > >
> > >> > > > > > The two main things in progress are docs + Vector Builders,
> > >> neither
> > >> > > of
> > >> > > > > > which should block this release.
> > >> > > > > >
> > >> > > > > > We're going to try to get the docs/recipes ready for a PR
> this
> > >> > > weekend.
> > >> > > > > > If that lands shortly after 0.13.0 goes out, would it be
> > possible
> > >> > to
> > >> > > > > > update the website independently, or would that need to wait
> > >> until
> > >> > > > 0.14?
> > >> > > > > >
> > >> > > > > > Paul
> > >> > > > > >
> > >> > > > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> > >> > > > > > > I'm in favor of including JS in the 0.13.0 release.
> > >> > > > > > >
> > >> > > > > > > I'm going to try to fix a couple of the Python Parquet
> bugs
> > >> until
> > >> > > the
> > >> > > > > > > RC is ready to be cut, but none of them need block the
> > release.
> > >> > > > > > >
> > >> > > > > > > Seems like we need someone else to volunteer to be the RM
> > for
> > >> > 0.13
> > >> > > if
> > >> > > > > > > Uwe is unavailable next week. Antoine -- are you possibly
> up
> > >> for
> > >> > it
> > >> > > > > > > (the initial setup will be a bit painful)? I don't have
> > access
> > >> > to a
> > >> > > > > > > machine with my code signing key on it until next week so
> I
> > >> > cannot
> > >> > > do
> > >> > > > > > > it
> > >> > > > > > >
> > >> > > > > > > - Wes
> > >> > > > > > >
> > >> > > > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou <
> > >> k...@clear-code.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > >> Hi,
> > >>

Re: tensorflow-io Arrow Datasets and thoughts on support for tensor columns

2019-03-25 Thread Wes McKinney
hi Bryan,

I agree this would be useful to work out.

There's a few options:

* Sending multiple tensors as a sequence of encapsulated IPC messages
(as described in
https://github.com/apache/arrow/blob/master/docs/source/format/IPC.rst).
There is no conflict with the columnar streaming protocol that
prevents this
* Embedding tensors in BinaryArray columns in some way (e.g. as an
ExtensionType, which we have now in C++)
* Adding Tensor as a logical type (this is essentially ARROW-1614)

I would like to understand the use cases more precisely. Perhaps you
can write a design document that describes the use cases in detail and
proposed solution? This doesn't fall anywhere on my list of 2019
priorities but I'm happy to give feedback on discussions and review
PRs where relevant.

In conjunction with embedding sequences of tensors in a BinaryArray,
we would probably need to first develop a LargeBinaryArray with 64-bit
offsets, so that buffers can be arbitrarily large (well, within 64-bit
address space at least)

- Wes

On Fri, Mar 22, 2019 at 1:24 PM Bryan Cutler  wrote:
>
> Hi All,
>
> Recently I have been working with the TensorFlow SIG-IO community to 
> introduce Apache Arrow based Datasets for bringing Arrow data into 
> TensorFlow. SIG-IO is a community maintained repository focused on 
> input/output support for TF, see https://github.com/tensorflow/io (a lot of 
> formats from contrib/ ended up here).  Since it is community driven, if 
> anyone is interested, participation is highly encouraged!
>
> I'm bringing this up for a couple reasons. First, I want to make sure that 
> this stays in-line with any related efforts within the Arrow project and 
> welcome any feedback. Secondly, the initial response has been great and 
> people are excited about using Arrow and looking to use it in other areas of 
> TF, but I've noticed there has been some confusion about how Arrow handles 
> tensor data. Specifically, it gets assumed that tensors could be part of a 
> RecordBatch and could be readily used in an Arrow stream.
>
> I know we have talked about making tensors a logical type for columnar data 
> before in 
> https://lists.apache.org/thread.html/6cc86d50d92dbd21d6fc34e34485afb3cab4956fbc0d61ff9b99ea27@%3Cdev.arrow.apache.org%3E
>  and there is a JIRA ARROW-1614, but since there is work needed to fully 
> support the current spec for 1.0, I don't think it has moved forward much. 
> I'm wondering if maybe now is a better time to start working on this?  I 
> think having built-in support for tensor columns would really help to 
> increase adoption of Arrow in frameworks that use tensor data. What are other 
> people's thoughts?
>
> Best Regards,
> Bryan
>


Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Krisztián Szűcs
Hey,

I'll gladly help Kou. You've postponed the the packaging issues,
but at least the wheel builds must pass, I'm still working on it.

- K

On Mon, Mar 25, 2019 at 10:03 PM Kouhei Sutou  wrote:

> Hi,
>
> It seems that nobody can be a release manager.
> I'll be the release manager for 0.13.0. But I don't have
> enough time. Could someone help me?
>
> Thanks,
> --
> kou
>
> In 
>   "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46 +0100,
>   Krisztián Szűcs  wrote:
>
> > On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney 
> wrote:
> >
> >> If Gandiva is causing packaging problems for 0.13 as a result of the
> CMake
> >> refactor, I suggest that we drop it from packages and plan to resolve
> for
> >> 0.14. So I think we should set a pretty strict time box for resolution
> of
> >> these issues (eg end of day Tuesday)
> >>
> > Sounds good to me. Do We have a release manager?
> >
> >>
> >> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> >> wrote:
> >>
> >> > Hello all,
> >> >
> >> > I'm working on the conda and wheel builds:
> >> > https://github.com/apache/arrow/pull/3832
> >> > https://github.com/apache/arrow/pull/4024
> >> > These must pass before We can cut the release.
> >> >
> >> >
> >> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney 
> >> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > > I think we should close the 0.13 backlog and try to get an RC0 out
> >> ASAP.
> >> > > What work must get done before that happens?
> >> > >
> >> > > I intend to sort out ARROW-4872 today.
> >> > >
> >> > > Thanks
> >> > > Wes
> >> > >
> >> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette 
> >> wrote:
> >> > >
> >> > > > I think that makes sense. I would really like to make JS part of
> the
> >> > > > mainstream releases, but we already have JS-0.4.1 ready to go [1]
> >> with
> >> > > > primarily bugfixes for JS-0.4.0. I think we should just cut that
> and
> >> > > > integrate JS in 0.14.
> >> > > >
> >> > > > [1]
> https://issues.apache.org/jira/projects/ARROW/versions/12344961
> >> > > >
> >> > > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney  >
> >> > > wrote:
> >> > > >
> >> > > > > In light of the discussion on
> >> > > > > https://github.com/apache/arrow/pull/3630 I think we should
> wait
> >> > until
> >> > > > > we have a "not broken" JavaScript-only release on NPM and have
> >> > > > > confidence that we can respond to the community's needs
> >> > > > >
> >> > > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor <
> ptay...@apache.org>
> >> > > wrote:
> >> > > > > >
> >> > > > > > I agree, the JS has matured a lot in the last few months. I
> think
> >> > > it's
> >> > > > > > ready to join the regular Arrow releases. Let me know if I can
> >> help
> >> > > > > > integrate the publish scripts :-)
> >> > > > > >
> >> > > > > > The two main things in progress are docs + Vector Builders,
> >> neither
> >> > > of
> >> > > > > > which should block this release.
> >> > > > > >
> >> > > > > > We're going to try to get the docs/recipes ready for a PR this
> >> > > weekend.
> >> > > > > > If that lands shortly after 0.13.0 goes out, would it be
> possible
> >> > to
> >> > > > > > update the website independently, or would that need to wait
> >> until
> >> > > > 0.14?
> >> > > > > >
> >> > > > > > Paul
> >> > > > > >
> >> > > > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> >> > > > > > > I'm in favor of including JS in the 0.13.0 release.
> >> > > > > > >
> >> > > > > > > I'm going to try to fix a couple of the Python Parquet bugs
> >> until
> >> > > the
> >> > > > > > > RC is ready to be cut, but none of them need block the
> release.
> >> > > > > > >
> >> > > > > > > Seems like we need someone else to volunteer to be the RM
> for
> >> > 0.13
> >> > > if
> >> > > > > > > Uwe is unavailable next week. Antoine -- are you possibly up
> >> for
> >> > it
> >> > > > > > > (the initial setup will be a bit painful)? I don't have
> access
> >> > to a
> >> > > > > > > machine with my code signing key on it until next week so I
> >> > cannot
> >> > > do
> >> > > > > > > it
> >> > > > > > >
> >> > > > > > > - Wes
> >> > > > > > >
> >> > > > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou <
> >> k...@clear-code.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >> Hi,
> >> > > > > > >>
> >> > > > > > >> There are no blockers on GLib, Ruby and Linux packages.
> >> > > > > > >>
> >> > > > > > >> Can we include JavaScript into 0.13.0?
> >> > > > > > >> If we include JavaScript into 0.13.0, we can remove
> >> > > > > > >> codes to release JavaScript separately. For example, we can
> >> > > > > > >> remove dev/release/js-*. We can enable version update code
> >> > > > > > >> in dev/release/00-prepare.sh:
> >> > > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
> >> > > > > > >>
> >> > > > > > >> We can merge "JavaScript Releases" document into our
> release
> >> > > > > > >> document:
> >> > > > > > >>
> >> > > >

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Kouhei Sutou
Hi,

It seems that nobody can be a release manager.
I'll be the release manager for 0.13.0. But I don't have
enough time. Could someone help me?

Thanks,
--
kou

In 
  "Re: Timeline for 0.13 Arrow release" on Mon, 25 Mar 2019 15:37:46 +0100,
  Krisztián Szűcs  wrote:

> On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney  wrote:
> 
>> If Gandiva is causing packaging problems for 0.13 as a result of the CMake
>> refactor, I suggest that we drop it from packages and plan to resolve for
>> 0.14. So I think we should set a pretty strict time box for resolution of
>> these issues (eg end of day Tuesday)
>>
> Sounds good to me. Do We have a release manager?
> 
>>
>> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs 
>> wrote:
>>
>> > Hello all,
>> >
>> > I'm working on the conda and wheel builds:
>> > https://github.com/apache/arrow/pull/3832
>> > https://github.com/apache/arrow/pull/4024
>> > These must pass before We can cut the release.
>> >
>> >
>> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney 
>> wrote:
>> >
>> > > Hi folks,
>> > >
>> > > I think we should close the 0.13 backlog and try to get an RC0 out
>> ASAP.
>> > > What work must get done before that happens?
>> > >
>> > > I intend to sort out ARROW-4872 today.
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette 
>> wrote:
>> > >
>> > > > I think that makes sense. I would really like to make JS part of the
>> > > > mainstream releases, but we already have JS-0.4.1 ready to go [1]
>> with
>> > > > primarily bugfixes for JS-0.4.0. I think we should just cut that and
>> > > > integrate JS in 0.14.
>> > > >
>> > > > [1] https://issues.apache.org/jira/projects/ARROW/versions/12344961
>> > > >
>> > > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney 
>> > > wrote:
>> > > >
>> > > > > In light of the discussion on
>> > > > > https://github.com/apache/arrow/pull/3630 I think we should wait
>> > until
>> > > > > we have a "not broken" JavaScript-only release on NPM and have
>> > > > > confidence that we can respond to the community's needs
>> > > > >
>> > > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor 
>> > > wrote:
>> > > > > >
>> > > > > > I agree, the JS has matured a lot in the last few months. I think
>> > > it's
>> > > > > > ready to join the regular Arrow releases. Let me know if I can
>> help
>> > > > > > integrate the publish scripts :-)
>> > > > > >
>> > > > > > The two main things in progress are docs + Vector Builders,
>> neither
>> > > of
>> > > > > > which should block this release.
>> > > > > >
>> > > > > > We're going to try to get the docs/recipes ready for a PR this
>> > > weekend.
>> > > > > > If that lands shortly after 0.13.0 goes out, would it be possible
>> > to
>> > > > > > update the website independently, or would that need to wait
>> until
>> > > > 0.14?
>> > > > > >
>> > > > > > Paul
>> > > > > >
>> > > > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
>> > > > > > > I'm in favor of including JS in the 0.13.0 release.
>> > > > > > >
>> > > > > > > I'm going to try to fix a couple of the Python Parquet bugs
>> until
>> > > the
>> > > > > > > RC is ready to be cut, but none of them need block the release.
>> > > > > > >
>> > > > > > > Seems like we need someone else to volunteer to be the RM for
>> > 0.13
>> > > if
>> > > > > > > Uwe is unavailable next week. Antoine -- are you possibly up
>> for
>> > it
>> > > > > > > (the initial setup will be a bit painful)? I don't have access
>> > to a
>> > > > > > > machine with my code signing key on it until next week so I
>> > cannot
>> > > do
>> > > > > > > it
>> > > > > > >
>> > > > > > > - Wes
>> > > > > > >
>> > > > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou <
>> k...@clear-code.com
>> > >
>> > > > > wrote:
>> > > > > > >> Hi,
>> > > > > > >>
>> > > > > > >> There are no blockers on GLib, Ruby and Linux packages.
>> > > > > > >>
>> > > > > > >> Can we include JavaScript into 0.13.0?
>> > > > > > >> If we include JavaScript into 0.13.0, we can remove
>> > > > > > >> codes to release JavaScript separately. For example, we can
>> > > > > > >> remove dev/release/js-*. We can enable version update code
>> > > > > > >> in dev/release/00-prepare.sh:
>> > > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
>> > > > > > >>
>> > > > > > >> We can merge "JavaScript Releases" document into our release
>> > > > > > >> document:
>> > > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> Thanks,
>> > > > > > >> --
>> > > > > > >> kou
>> > > > > > >>
>> > > > > > >> In <
>> > > > > cajpuwmbgjzbwrwybwse6bd9lnn_7xozn_aq2job9_mpvmhc...@mail.gmail.com
>> >
>> > > > > > >>"Re: Timeline for 0.13 Arrow release" on Mon, 18 Mar 2019
>> > > > 20:51:12
>> > > > > -0500,
>> > > > > > >>Wes McKinney  wrote:
>> > > > > > >>
>> > > > > > >>> hi folks,
>>

Re: [R] Improving documentation and transparency for Arrow build and packaging work for R

2019-03-25 Thread Wes McKinney
hi Javier,

Thank you for writing back.

On Mon, Mar 25, 2019 at 12:41 PM Javier Luraschi  wrote:
>
> Hi Wes, sorry for the delay I haven't been monitoring this DL proactively.

Yes, I highly recommend setting up some e-mail filters so anything
with "[R]" in the subject title lands in your inbox. You can also
separate "[jira]" messages with a separate filter; there isn't very
much list traffic if you split off the new issue notifications.

>
> Please notice that I'm not the expert in this topic, so I'll share as much
> information
> as I can but others with more expertise should feel free comment as well.
> Please
> also note that some of the restrictions we have are common practices in
> R packages that are out of our control, at least without significant
> investment.
>
> I'll document what I know in this email, but please let me know if there is
> a wiki
> or a better place to move this documentation into.
>

Yes, let's definitely stash all of the build and packaging information
on our wiki at

https://cwiki.apache.org/confluence/display/ARROW

If you let me know your ASF Confluence username I will give you edit permissions

> ## Background
>
> CRAN, The Comprehensive R Archive Network, is the most popular (primary)
> package repo for the R community. You can think of CRAN as Homebrew or
> pip.org. CRAN encourages cross-platform packages to be submitted and to
> ease compilation and testing, provide support to precompile binaries for OS
> X
> and Windows. We will focus now on Windows specifics from now on.
>
> CRAN and R rely on a set of tools based on Mingw to easily compile packages
> in Windows, this tools set is known as RTools. Originally, Prof. Brian
> Ripley and
> Duncan Murdoch put this toolset together; however, Jeroen Ooms is it current
> maintainer. RTools is based on Mingw but from past experience, not
> completely
> interchangeable with the standard Mingw distribution. I'm afraid I don't
> have the
> details but this is mostly related to specific packages, versions and
> compilers
> included in Rtools. It's possible to match a Mingw environment with RTools
> but
> this is, in general, not a straightforward task.

It would be good to have some links (on a wiki page) to any additional
information about this.

>
> A few months ago, I naively tried to accomplish this work myself. As in, get
> RTools to compile Apache Arrow, how hard can it be? It's hard to explain
> all the caveats in a single mail, but if you are interested, you can read
> my own exploration of possible solutions to this problem in this gist
> writeup [1].
>
> The outcome of this investigation, at least for me and my limited knowledge
> was
> to not try to do this on my own by reinventing the wheel; otherwise, this
> would
> have taken months of my own time. The solution was then to find out how
> other
> R packages have solve this problem in the past.
>
> Given the specifics of the RTools toolchain, for complex projects with
> significant
> number of components and dependencies, the best (and maybe only!) way
> to get R packages into CRAN in Windows is to precompile the binaries outside
> of the CRAN build process. The repo of precompiled packages is called
> rwinlibs [2] and has 75 packages and growing. When compiling in CRAN, rather
> than building the library, it simply gets downloaded from the rwinlibs repo.
>
> How then are the rwinlibs libraries build then? All the packages are built
> through
> an automated build system available under theb rtools-packages [3] repo
> where
> an appveyor script detects changes and builds the appropriate libraries.
> This repo
> runs with the latest RTools toolchain. To support previous versions of
> R/RTools a
> the rtools-backports [4] repo provides backward compatibility in an
> automated way.
>
> So now we can get back at discussing how we want to make this work in the
> arrow project. One way, which this PR encourages is to say "Lets not worry
> about
> what the R/CRAN publishing process is, they have their own processes and
> tools
> to build binaries for Windows. This is similar to brew formulae, the
> formula that
> builds arrow for OS X using homebrew is in a different repo [5]".

When you say "this PR" you mean

https://github.com/apache/arrow/pull/4011

or

https://github.com/apache/arrow/pull/3932

The challenge I see is that the development procedure is being
commingled with packaging issues. I would like to see a write-up to
provide instructions for an Arrow developer to create a build of Arrow
on the master branch using mingw/Rtools for the purposes of
development. If we don't have this written down, this is putting us in
a potentially very bad situation where developers cannot debug issues.
I think it's fine if all of the other C++ dependencies are snapshotted
in rwinlibs

>
> While splitting the release processes into multiple repos has some
> advantages,
> it certainly has some caveats. For instance, when publishing a new release
> of
> arrow in Homebrew, one needs to m

[jira] [Created] (ARROW-5007) [C++] Move DCHECK out of sse-utils

2019-03-25 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5007:
-

 Summary: [C++] Move DCHECK out of sse-utils 
 Key: ARROW-5007
 URL: https://issues.apache.org/jira/browse/ARROW-5007
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Some users tried to compile arrow on ppc64, but they face the following error

{code:bash}
In file included from /root/repos/arrow/cpp/src/arrow/json/chunker.h:26:0,
 from /root/repos/arrow/cpp/src/arrow/json/chunker.cc:18:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘__m128i 
arrow::SSE4_cmpestrm(__m128i, int, __m128i, int)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: error: there are no 
arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of 
‘DCHECK’ must be available [-fpermissive]
   DCHECK(false) << "CPU doesn't support SSE 4.2";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: note: (if you use 
‘-fpermissive’, G++ will accept your code, but allowing the use of an 
undeclared name is deprecated)
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘int 
arrow::SSE4_cmpestri(__m128i, int, __m128i, int)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:131:3: error: there are no 
arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of 
‘DCHECK’ must be available [-fpermissive]
   DCHECK(false) << "CPU doesn't support SSE 4.2";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u8(uint32_t, uint8_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:136:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u16(uint32_t, uint16_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:141:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u32(uint32_t, uint32_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:146:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u64(uint32_t, uint64_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:151:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
{code}

By importing `logging.h` or removing `DCHECK`, they can compile. The fix should 
be to refactor the SSE detection macro out of this file such that the needing 
code does not need to import this file and only a header with macro detection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [R] Improving documentation and transparency for Arrow build and packaging work for R

2019-03-25 Thread Javier Luraschi
Hi Wes, sorry for the delay I haven't been monitoring this DL proactively.

Please notice that I'm not the expert in this topic, so I'll share as much
information
as I can but others with more expertise should feel free comment as well.
Please
also note that some of the restrictions we have are common practices in
R packages that are out of our control, at least without significant
investment.

I'll document what I know in this email, but please let me know if there is
a wiki
or a better place to move this documentation into.

## Background

CRAN, The Comprehensive R Archive Network, is the most popular (primary)
package repo for the R community. You can think of CRAN as Homebrew or
pip.org. CRAN encourages cross-platform packages to be submitted and to
ease compilation and testing, provide support to precompile binaries for OS
X
and Windows. We will focus now on Windows specifics from now on.

CRAN and R rely on a set of tools based on Mingw to easily compile packages
in Windows, this tools set is known as RTools. Originally, Prof. Brian
Ripley and
Duncan Murdoch put this toolset together; however, Jeroen Ooms is it current
maintainer. RTools is based on Mingw but from past experience, not
completely
interchangeable with the standard Mingw distribution. I'm afraid I don't
have the
details but this is mostly related to specific packages, versions and
compilers
included in Rtools. It's possible to match a Mingw environment with RTools
but
this is, in general, not a straightforward task.

A few months ago, I naively tried to accomplish this work myself. As in, get
RTools to compile Apache Arrow, how hard can it be? It's hard to explain
all the caveats in a single mail, but if you are interested, you can read
my own exploration of possible solutions to this problem in this gist
writeup [1].

The outcome of this investigation, at least for me and my limited knowledge
was
to not try to do this on my own by reinventing the wheel; otherwise, this
would
have taken months of my own time. The solution was then to find out how
other
R packages have solve this problem in the past.

Given the specifics of the RTools toolchain, for complex projects with
significant
number of components and dependencies, the best (and maybe only!) way
to get R packages into CRAN in Windows is to precompile the binaries outside
of the CRAN build process. The repo of precompiled packages is called
rwinlibs [2] and has 75 packages and growing. When compiling in CRAN, rather
than building the library, it simply gets downloaded from the rwinlibs repo.

How then are the rwinlibs libraries build then? All the packages are built
through
an automated build system available under theb rtools-packages [3] repo
where
an appveyor script detects changes and builds the appropriate libraries.
This repo
runs with the latest RTools toolchain. To support previous versions of
R/RTools a
the rtools-backports [4] repo provides backward compatibility in an
automated way.

So now we can get back at discussing how we want to make this work in the
arrow project. One way, which this PR encourages is to say "Lets not worry
about
what the R/CRAN publishing process is, they have their own processes and
tools
to build binaries for Windows. This is similar to brew formulae, the
formula that
builds arrow for OS X using homebrew is in a different repo [5]".

While splitting the release processes into multiple repos has some
advantages,
it certainly has some caveats. For instance, when publishing a new release
of
arrow in Homebrew, one needs to manually go an update the Hombrew formulae.

That said, I would hope that the Homebrew release process is documented in
the
Arrow project in the same way that we should document the R release process
in
the Arrow project. Hopefully this mail helps build a first iteration on
this.

## Releasing

These instructions are a bit more pragmatic as to what needs to be done to
release
the R package in CRAN:

(1) Send PR to the rtools-packages [3], increment the version, the repo
already
 downloads the binaries from the Arrow GitHub project. Ensure that the
appveyor
 build succeeds. If the build or tests fails, send the appropriate PR
to the official
 Arrow repo.
(2) Send PR to the rtools-backports [4], similar to (1) but different repo.
(3) Copy the output produced by (1) and (2) as a PR to the rwinlib/arrow
[6] repo.
(4) Before merging (3) validate that CRAN can build and test using the new
library
 using the winbuilder service [7]. This service is maintained to CRAN
and allows
 you to pre-check a package builds properly under a CRAN-like build
machine
 for Windows.
(5) Submit package to CRAN, make sure their practices and processes are
 followed [8].

While I did my best to document the steps, there is certainly more details
that can be
added over time. Regardless, feel free to reach out to me with questions,
support
requests and why not and I'll try my best to address them.

Best, Javier

[1]: https://gist.git

Arrow committers who wish to mentor GSoC 2019 projects?

2019-03-25 Thread Wes McKinney
hi folks,

The ASF has been accepted in GSoC 2019, so any of our 41 committers
can be a mentor for GSoC students. See below for instructions how to
become a mentor and how to record project ideas for GSoC students

Thanks
Wes

-- Forwarded message -
From: Ulrich Stärk 
Date: Fri, Mar 8, 2019 at 1:49 PM
Subject: Google Summer of Code 2019 Mentor Registration
To: 
Cc: d...@community.apache.org 


Dear PMCs,

I'm happy to announce that the ASF has made it onto the list of
accepted organizations for
Google Summer of Code 2019! [1,2]

It is now time for mentors to sign up, so please pass this email on to
your community and
podlings. If you aren’t already subscribed to
ment...@community.apache.org you should do so now else
you might miss important information.

Mentor signup requires two steps: mentor signup in Google's system [3]
and PMC acknowledgement.

If you want to mentor a project in this year's SoC you will have to

1. Be an Apache committer.
2. Request an acknowledgement from the PMC for which you want to
mentor projects. Use the below
template and *do not forget to copy ment...@community.apache.org*. We
will use the email adress you
indicate to send the invite to be a mentor for Apache.

PMCs, read carefully please.

We request that each mentor is acknowledged by a PMC member. This is
to ensure the mentor is in good
standing with the community. When you receive a request for
acknowledgement, please ACK it and cc
ment...@community.apache.org

Lastly, it is not yet too late to record your ideas in Jira (see
previous emails for details).
Students will now begin to explore ideas so if you haven’t already
done so, record your ideas
immediately!

Cheers,

The Apache GSoC Team

mentor request email template:

to: private@.apache.org
cc: ment...@community.apache.org
subject: GSoC 2019 mentor request for 

 PMC,

please acknowledge my request to become a mentor for Google Summer of
Code 2018 projects for Apache
.

I would like to receive the mentor invite to 





[1] https://summerofcode.withgoogle.com/organizations/
[2] https://summerofcode.withgoogle.com/organizations/6614885824200704/
[3] https://summerofcode.withgoogle.com/


Re: [Python] The next manylinux specification

2019-03-25 Thread Wes McKinney
Thanks Antoine for alerting us to this thread. It's important that our
interests are represented in this discussion given the problems we've
had with interactions with the TensorFlow and PyTorch wheels. Please
let me know if I can help.

Robert and Philipp, can you keep an eye out on this also?

On Fri, Mar 22, 2019 at 4:12 PM Antoine Pitrou  wrote:
>
>
> For those who are interested in discussing it:
>
> https://discuss.python.org/t/the-next-manylinux-specification/1043
>
> Regards
>
> Antoine.


[jira] [Created] (ARROW-5006) [R] parquet.cpp does not include enough Rcpp

2019-03-25 Thread JIRA
Romain François created ARROW-5006:
--

 Summary: [R] parquet.cpp does not include enough Rcpp
 Key: ARROW-5006
 URL: https://issues.apache.org/jira/browse/ARROW-5006
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain François


Getting this error when compiling parquet.cpp with `-Wall` :

 
{quote}/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/protection/Armor.h:38:23:
 warning: inline function 'Rcpp::Armor::operator=' is not 
defined [-Wundefined-inline]
 inline Armor& operator=( const U& x ) ;
 ^
 
/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r_cast.h:34:21:
 note: used here
 res = Rcpp_fast_eval(Rf_lang2(funSym, x), R_GlobalEnv);
 ^


{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Krisztián Szűcs
On Mon, Mar 25, 2019 at 3:21 PM Wes McKinney  wrote:

> If Gandiva is causing packaging problems for 0.13 as a result of the CMake
> refactor, I suggest that we drop it from packages and plan to resolve for
> 0.14. So I think we should set a pretty strict time box for resolution of
> these issues (eg end of day Tuesday)
>
Sounds good to me. Do We have a release manager?

>
> On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs 
> wrote:
>
> > Hello all,
> >
> > I'm working on the conda and wheel builds:
> > https://github.com/apache/arrow/pull/3832
> > https://github.com/apache/arrow/pull/4024
> > These must pass before We can cut the release.
> >
> >
> > On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney 
> wrote:
> >
> > > Hi folks,
> > >
> > > I think we should close the 0.13 backlog and try to get an RC0 out
> ASAP.
> > > What work must get done before that happens?
> > >
> > > I intend to sort out ARROW-4872 today.
> > >
> > > Thanks
> > > Wes
> > >
> > > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette 
> wrote:
> > >
> > > > I think that makes sense. I would really like to make JS part of the
> > > > mainstream releases, but we already have JS-0.4.1 ready to go [1]
> with
> > > > primarily bugfixes for JS-0.4.0. I think we should just cut that and
> > > > integrate JS in 0.14.
> > > >
> > > > [1] https://issues.apache.org/jira/projects/ARROW/versions/12344961
> > > >
> > > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney 
> > > wrote:
> > > >
> > > > > In light of the discussion on
> > > > > https://github.com/apache/arrow/pull/3630 I think we should wait
> > until
> > > > > we have a "not broken" JavaScript-only release on NPM and have
> > > > > confidence that we can respond to the community's needs
> > > > >
> > > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor 
> > > wrote:
> > > > > >
> > > > > > I agree, the JS has matured a lot in the last few months. I think
> > > it's
> > > > > > ready to join the regular Arrow releases. Let me know if I can
> help
> > > > > > integrate the publish scripts :-)
> > > > > >
> > > > > > The two main things in progress are docs + Vector Builders,
> neither
> > > of
> > > > > > which should block this release.
> > > > > >
> > > > > > We're going to try to get the docs/recipes ready for a PR this
> > > weekend.
> > > > > > If that lands shortly after 0.13.0 goes out, would it be possible
> > to
> > > > > > update the website independently, or would that need to wait
> until
> > > > 0.14?
> > > > > >
> > > > > > Paul
> > > > > >
> > > > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> > > > > > > I'm in favor of including JS in the 0.13.0 release.
> > > > > > >
> > > > > > > I'm going to try to fix a couple of the Python Parquet bugs
> until
> > > the
> > > > > > > RC is ready to be cut, but none of them need block the release.
> > > > > > >
> > > > > > > Seems like we need someone else to volunteer to be the RM for
> > 0.13
> > > if
> > > > > > > Uwe is unavailable next week. Antoine -- are you possibly up
> for
> > it
> > > > > > > (the initial setup will be a bit painful)? I don't have access
> > to a
> > > > > > > machine with my code signing key on it until next week so I
> > cannot
> > > do
> > > > > > > it
> > > > > > >
> > > > > > > - Wes
> > > > > > >
> > > > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou <
> k...@clear-code.com
> > >
> > > > > wrote:
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> There are no blockers on GLib, Ruby and Linux packages.
> > > > > > >>
> > > > > > >> Can we include JavaScript into 0.13.0?
> > > > > > >> If we include JavaScript into 0.13.0, we can remove
> > > > > > >> codes to release JavaScript separately. For example, we can
> > > > > > >> remove dev/release/js-*. We can enable version update code
> > > > > > >> in dev/release/00-prepare.sh:
> > > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
> > > > > > >>
> > > > > > >> We can merge "JavaScript Releases" document into our release
> > > > > > >> document:
> > > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases
> > > > > > >>
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> --
> > > > > > >> kou
> > > > > > >>
> > > > > > >> In <
> > > > > cajpuwmbgjzbwrwybwse6bd9lnn_7xozn_aq2job9_mpvmhc...@mail.gmail.com
> >
> > > > > > >>"Re: Timeline for 0.13 Arrow release" on Mon, 18 Mar 2019
> > > > 20:51:12
> > > > > -0500,
> > > > > > >>Wes McKinney  wrote:
> > > > > > >>
> > > > > > >>> hi folks,
> > > > > > >>>
> > > > > > >>> I think we're basically at the 0.13 end game here. There's
> some
> > > > more
> > > > > > >>> patches can get in, but do we all think we can cut an RC by
> the
> > > end
> > > > > of
> > > > > > >>> the week? What are the blocking issues?
> > > > > > >>>
> > > > > > >>> Thanks
> > > > > > >>> Wes
> > > > > > >>>
> > > > > > >>> On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou <
> > k...@c

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Wes McKinney
If Gandiva is causing packaging problems for 0.13 as a result of the CMake
refactor, I suggest that we drop it from packages and plan to resolve for
0.14. So I think we should set a pretty strict time box for resolution of
these issues (eg end of day Tuesday)

On Mon, Mar 25, 2019, 2:12 PM Krisztián Szűcs 
wrote:

> Hello all,
>
> I'm working on the conda and wheel builds:
> https://github.com/apache/arrow/pull/3832
> https://github.com/apache/arrow/pull/4024
> These must pass before We can cut the release.
>
>
> On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney  wrote:
>
> > Hi folks,
> >
> > I think we should close the 0.13 backlog and try to get an RC0 out ASAP.
> > What work must get done before that happens?
> >
> > I intend to sort out ARROW-4872 today.
> >
> > Thanks
> > Wes
> >
> > On Wed, Mar 20, 2019, 5:00 PM Brian Hulette  wrote:
> >
> > > I think that makes sense. I would really like to make JS part of the
> > > mainstream releases, but we already have JS-0.4.1 ready to go [1] with
> > > primarily bugfixes for JS-0.4.0. I think we should just cut that and
> > > integrate JS in 0.14.
> > >
> > > [1] https://issues.apache.org/jira/projects/ARROW/versions/12344961
> > >
> > > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney 
> > wrote:
> > >
> > > > In light of the discussion on
> > > > https://github.com/apache/arrow/pull/3630 I think we should wait
> until
> > > > we have a "not broken" JavaScript-only release on NPM and have
> > > > confidence that we can respond to the community's needs
> > > >
> > > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor 
> > wrote:
> > > > >
> > > > > I agree, the JS has matured a lot in the last few months. I think
> > it's
> > > > > ready to join the regular Arrow releases. Let me know if I can help
> > > > > integrate the publish scripts :-)
> > > > >
> > > > > The two main things in progress are docs + Vector Builders, neither
> > of
> > > > > which should block this release.
> > > > >
> > > > > We're going to try to get the docs/recipes ready for a PR this
> > weekend.
> > > > > If that lands shortly after 0.13.0 goes out, would it be possible
> to
> > > > > update the website independently, or would that need to wait until
> > > 0.14?
> > > > >
> > > > > Paul
> > > > >
> > > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> > > > > > I'm in favor of including JS in the 0.13.0 release.
> > > > > >
> > > > > > I'm going to try to fix a couple of the Python Parquet bugs until
> > the
> > > > > > RC is ready to be cut, but none of them need block the release.
> > > > > >
> > > > > > Seems like we need someone else to volunteer to be the RM for
> 0.13
> > if
> > > > > > Uwe is unavailable next week. Antoine -- are you possibly up for
> it
> > > > > > (the initial setup will be a bit painful)? I don't have access
> to a
> > > > > > machine with my code signing key on it until next week so I
> cannot
> > do
> > > > > > it
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou  >
> > > > wrote:
> > > > > >> Hi,
> > > > > >>
> > > > > >> There are no blockers on GLib, Ruby and Linux packages.
> > > > > >>
> > > > > >> Can we include JavaScript into 0.13.0?
> > > > > >> If we include JavaScript into 0.13.0, we can remove
> > > > > >> codes to release JavaScript separately. For example, we can
> > > > > >> remove dev/release/js-*. We can enable version update code
> > > > > >> in dev/release/00-prepare.sh:
> > > > > >>
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
> > > > > >>
> > > > > >> We can merge "JavaScript Releases" document into our release
> > > > > >> document:
> > > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases
> > > > > >>
> > > > > >>
> > > > > >> Thanks,
> > > > > >> --
> > > > > >> kou
> > > > > >>
> > > > > >> In <
> > > > cajpuwmbgjzbwrwybwse6bd9lnn_7xozn_aq2job9_mpvmhc...@mail.gmail.com>
> > > > > >>"Re: Timeline for 0.13 Arrow release" on Mon, 18 Mar 2019
> > > 20:51:12
> > > > -0500,
> > > > > >>Wes McKinney  wrote:
> > > > > >>
> > > > > >>> hi folks,
> > > > > >>>
> > > > > >>> I think we're basically at the 0.13 end game here. There's some
> > > more
> > > > > >>> patches can get in, but do we all think we can cut an RC by the
> > end
> > > > of
> > > > > >>> the week? What are the blocking issues?
> > > > > >>>
> > > > > >>> Thanks
> > > > > >>> Wes
> > > > > >>>
> > > > > >>> On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou <
> k...@clear-code.com
> > >
> > > > wrote:
> > > > >  Hi,
> > > > > 
> > > > > > Submitted the packaging builds:
> > > > > >
> > > >
> > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-452
> > > > >  I've fixed .deb/.rpm packages:
> > > > https://github.com/apache/arrow/pull/3934
> > > > >  It has been merged.
> > > > >  So .deb/.rpm packages are ready for release.
> > > > > 

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Krisztián Szűcs
Hello all,

I'm working on the conda and wheel builds:
https://github.com/apache/arrow/pull/3832
https://github.com/apache/arrow/pull/4024
These must pass before We can cut the release.


On Mon, Mar 25, 2019 at 2:08 PM Wes McKinney  wrote:

> Hi folks,
>
> I think we should close the 0.13 backlog and try to get an RC0 out ASAP.
> What work must get done before that happens?
>
> I intend to sort out ARROW-4872 today.
>
> Thanks
> Wes
>
> On Wed, Mar 20, 2019, 5:00 PM Brian Hulette  wrote:
>
> > I think that makes sense. I would really like to make JS part of the
> > mainstream releases, but we already have JS-0.4.1 ready to go [1] with
> > primarily bugfixes for JS-0.4.0. I think we should just cut that and
> > integrate JS in 0.14.
> >
> > [1] https://issues.apache.org/jira/projects/ARROW/versions/12344961
> >
> > On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney 
> wrote:
> >
> > > In light of the discussion on
> > > https://github.com/apache/arrow/pull/3630 I think we should wait until
> > > we have a "not broken" JavaScript-only release on NPM and have
> > > confidence that we can respond to the community's needs
> > >
> > > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor 
> wrote:
> > > >
> > > > I agree, the JS has matured a lot in the last few months. I think
> it's
> > > > ready to join the regular Arrow releases. Let me know if I can help
> > > > integrate the publish scripts :-)
> > > >
> > > > The two main things in progress are docs + Vector Builders, neither
> of
> > > > which should block this release.
> > > >
> > > > We're going to try to get the docs/recipes ready for a PR this
> weekend.
> > > > If that lands shortly after 0.13.0 goes out, would it be possible to
> > > > update the website independently, or would that need to wait until
> > 0.14?
> > > >
> > > > Paul
> > > >
> > > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> > > > > I'm in favor of including JS in the 0.13.0 release.
> > > > >
> > > > > I'm going to try to fix a couple of the Python Parquet bugs until
> the
> > > > > RC is ready to be cut, but none of them need block the release.
> > > > >
> > > > > Seems like we need someone else to volunteer to be the RM for 0.13
> if
> > > > > Uwe is unavailable next week. Antoine -- are you possibly up for it
> > > > > (the initial setup will be a bit painful)? I don't have access to a
> > > > > machine with my code signing key on it until next week so I cannot
> do
> > > > > it
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou 
> > > wrote:
> > > > >> Hi,
> > > > >>
> > > > >> There are no blockers on GLib, Ruby and Linux packages.
> > > > >>
> > > > >> Can we include JavaScript into 0.13.0?
> > > > >> If we include JavaScript into 0.13.0, we can remove
> > > > >> codes to release JavaScript separately. For example, we can
> > > > >> remove dev/release/js-*. We can enable version update code
> > > > >> in dev/release/00-prepare.sh:
> > > > >>
> > >
> >
> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
> > > > >>
> > > > >> We can merge "JavaScript Releases" document into our release
> > > > >> document:
> > > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> --
> > > > >> kou
> > > > >>
> > > > >> In <
> > > cajpuwmbgjzbwrwybwse6bd9lnn_7xozn_aq2job9_mpvmhc...@mail.gmail.com>
> > > > >>"Re: Timeline for 0.13 Arrow release" on Mon, 18 Mar 2019
> > 20:51:12
> > > -0500,
> > > > >>Wes McKinney  wrote:
> > > > >>
> > > > >>> hi folks,
> > > > >>>
> > > > >>> I think we're basically at the 0.13 end game here. There's some
> > more
> > > > >>> patches can get in, but do we all think we can cut an RC by the
> end
> > > of
> > > > >>> the week? What are the blocking issues?
> > > > >>>
> > > > >>> Thanks
> > > > >>> Wes
> > > > >>>
> > > > >>> On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou  >
> > > wrote:
> > > >  Hi,
> > > > 
> > > > > Submitted the packaging builds:
> > > > >
> > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-452
> > > >  I've fixed .deb/.rpm packages:
> > > https://github.com/apache/arrow/pull/3934
> > > >  It has been merged.
> > > >  So .deb/.rpm packages are ready for release.
> > > > 
> > > >  Thanks,
> > > >  --
> > > >  kou
> > > > 
> > > >  In <
> > > cahm19a5somzxgcphc6ee-mr2usvvhwb252udgjrvocq-cb2...@mail.gmail.com>
> > > > "Re: Timeline for 0.13 Arrow release" on Thu, 14 Mar 2019
> > > 16:24:43 +0100,
> > > > Krisztián Szűcs  wrote:
> > > > 
> > > > > Submitted the packaging builds:
> > > > >
> > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-452
> > > > >
> > > > > On Thu, Mar 14, 2019 at 4:19 PM Wes McKinney <
> > wesmck...@gmail.com>
> > > wrote:
> > > > >
> > > > >> The CMake refactor is merged

Re: Timeline for 0.13 Arrow release

2019-03-25 Thread Wes McKinney
Hi folks,

I think we should close the 0.13 backlog and try to get an RC0 out ASAP.
What work must get done before that happens?

I intend to sort out ARROW-4872 today.

Thanks
Wes

On Wed, Mar 20, 2019, 5:00 PM Brian Hulette  wrote:

> I think that makes sense. I would really like to make JS part of the
> mainstream releases, but we already have JS-0.4.1 ready to go [1] with
> primarily bugfixes for JS-0.4.0. I think we should just cut that and
> integrate JS in 0.14.
>
> [1] https://issues.apache.org/jira/projects/ARROW/versions/12344961
>
> On Wed, Mar 20, 2019 at 8:20 AM Wes McKinney  wrote:
>
> > In light of the discussion on
> > https://github.com/apache/arrow/pull/3630 I think we should wait until
> > we have a "not broken" JavaScript-only release on NPM and have
> > confidence that we can respond to the community's needs
> >
> > On Tue, Mar 19, 2019 at 11:24 PM Paul Taylor  wrote:
> > >
> > > I agree, the JS has matured a lot in the last few months. I think it's
> > > ready to join the regular Arrow releases. Let me know if I can help
> > > integrate the publish scripts :-)
> > >
> > > The two main things in progress are docs + Vector Builders, neither of
> > > which should block this release.
> > >
> > > We're going to try to get the docs/recipes ready for a PR this weekend.
> > > If that lands shortly after 0.13.0 goes out, would it be possible to
> > > update the website independently, or would that need to wait until
> 0.14?
> > >
> > > Paul
> > >
> > > On 3/19/19 10:08 AM, Wes McKinney wrote:
> > > > I'm in favor of including JS in the 0.13.0 release.
> > > >
> > > > I'm going to try to fix a couple of the Python Parquet bugs until the
> > > > RC is ready to be cut, but none of them need block the release.
> > > >
> > > > Seems like we need someone else to volunteer to be the RM for 0.13 if
> > > > Uwe is unavailable next week. Antoine -- are you possibly up for it
> > > > (the initial setup will be a bit painful)? I don't have access to a
> > > > machine with my code signing key on it until next week so I cannot do
> > > > it
> > > >
> > > > - Wes
> > > >
> > > > On Tue, Mar 19, 2019 at 9:46 AM Kouhei Sutou 
> > wrote:
> > > >> Hi,
> > > >>
> > > >> There are no blockers on GLib, Ruby and Linux packages.
> > > >>
> > > >> Can we include JavaScript into 0.13.0?
> > > >> If we include JavaScript into 0.13.0, we can remove
> > > >> codes to release JavaScript separately. For example, we can
> > > >> remove dev/release/js-*. We can enable version update code
> > > >> in dev/release/00-prepare.sh:
> > > >>
> >
> https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L67-L74
> > > >>
> > > >> We can merge "JavaScript Releases" document into our release
> > > >> document:
> > > >>
> >
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases
> > > >>
> > > >>
> > > >> Thanks,
> > > >> --
> > > >> kou
> > > >>
> > > >> In <
> > cajpuwmbgjzbwrwybwse6bd9lnn_7xozn_aq2job9_mpvmhc...@mail.gmail.com>
> > > >>"Re: Timeline for 0.13 Arrow release" on Mon, 18 Mar 2019
> 20:51:12
> > -0500,
> > > >>Wes McKinney  wrote:
> > > >>
> > > >>> hi folks,
> > > >>>
> > > >>> I think we're basically at the 0.13 end game here. There's some
> more
> > > >>> patches can get in, but do we all think we can cut an RC by the end
> > of
> > > >>> the week? What are the blocking issues?
> > > >>>
> > > >>> Thanks
> > > >>> Wes
> > > >>>
> > > >>> On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou 
> > wrote:
> > >  Hi,
> > > 
> > > > Submitted the packaging builds:
> > > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-452
> > >  I've fixed .deb/.rpm packages:
> > https://github.com/apache/arrow/pull/3934
> > >  It has been merged.
> > >  So .deb/.rpm packages are ready for release.
> > > 
> > >  Thanks,
> > >  --
> > >  kou
> > > 
> > >  In <
> > cahm19a5somzxgcphc6ee-mr2usvvhwb252udgjrvocq-cb2...@mail.gmail.com>
> > > "Re: Timeline for 0.13 Arrow release" on Thu, 14 Mar 2019
> > 16:24:43 +0100,
> > > Krisztián Szűcs  wrote:
> > > 
> > > > Submitted the packaging builds:
> > > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-452
> > > >
> > > > On Thu, Mar 14, 2019 at 4:19 PM Wes McKinney <
> wesmck...@gmail.com>
> > wrote:
> > > >
> > > >> The CMake refactor is merged! Kudos to Uwe for 3+ weeks of hard
> > labor on
> > > >> this.
> > > >>
> > > >> We should run all the packaging tasks and get a full accounting
> of
> > > >> what is broken so we aren't surprised during the release process
> > > >>
> > > >> On Wed, Mar 13, 2019 at 9:39 AM Krisztián Szűcs
> > > >>  wrote:
> > > >>> The proof of the pudding is in the eating. You convinced me.
> > > >>>
> > > >>> On Wed, Mar 13, 2019 at 3:31 PM Wes McKinney <
> > wesmck...@gmail.com>
> > > >> wrote:

[jira] [Created] (ARROW-5005) [C++] Add support for filter mask in AggregateFunction

2019-03-25 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5005:
-

 Summary: [C++] Add support for filter mask in AggregateFunction
 Key: ARROW-5005
 URL: https://issues.apache.org/jira/browse/ARROW-5005
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.14.0


The aggregate kernels don't support mask (the result of a filter). Add the the 
following method to `AggregateFunction`.

{code:c++}
virtual Status ConsumeWithFilter(const Array& input, const Array& mask, void* 
state) const = 0;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5004) Confusing behaviour with boolean partition keys

2019-03-25 Thread Scott Taylor (JIRA)
Scott Taylor created ARROW-5004:
---

 Summary: Confusing behaviour with boolean partition keys
 Key: ARROW-5004
 URL: https://issues.apache.org/jira/browse/ARROW-5004
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.12.1
Reporter: Scott Taylor


[https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L686]

Here the type of a partition key is converted to match the type of a filter 
variable.

using the *write_to_dataset* function allows *boolean* partition keys (*True* 
or *False)* but these silently break at the linked line as *bool('False')* 
evaluates as *True*.

I understand a docstring 
([https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L653)]
 refers to only string or int partition variables being supported although this 
is somewhat buried away from the user facing API.

It may be beneficial to detect the boolean case and raise a warning or to 
ensure the function returns a more intuitive output when partition key is 
*'False'* and the filter variable is *False.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5003) [R] remove dependency on withr

2019-03-25 Thread JIRA
Romain François created ARROW-5003:
--

 Summary: [R] remove dependency on withr
 Key: ARROW-5003
 URL: https://issues.apache.org/jira/browse/ARROW-5003
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain François


It is not critical to depend on `withr`, we can remove it.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)