[jira] [Created] (ARROW-2944) Arrow format documentation mentions VectorLayout that does not exist anymore

2018-07-30 Thread Pearu Peterson (JIRA)
Pearu Peterson created ARROW-2944: - Summary: Arrow format documentation mentions VectorLayout that does not exist anymore Key: ARROW-2944 URL: https://issues.apache.org/jira/browse/ARROW-2944

[jira] [Created] (ARROW-2943) [C++] Implement BufferedOutputStream::Flush

2018-07-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2943: --- Summary: [C++] Implement BufferedOutputStream::Flush Key: ARROW-2943 URL: https://issues.apache.org/jira/browse/ARROW-2943 Project: Apache Arrow Issue Type:

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Wes McKinney
> I would like to point out that arrow's use of orc is a great example of how > it would be possible to manage parquet-cpp as a separate codebase. That gives > me hope that the projects could be managed separately some day. Well, I don't know that ORC is the best example. The ORC C++ codebase

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Joshua Storck
You're point about the constraints of the ASF release process are well taken and as a developer who's trying to work in the current environment I would be much happier if the codebases were merged. The main issues I worry about when you put codebases like these together are: 1. The delineation of

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Wes McKinney
hi Josh, > I can imagine use cases for parquet that don't involve arrow and tying them > together seems like the wrong choice. Apache is "Community over Code"; right now it's the same people building these projects -- my argument (which I think you agree with?) is that we should work more

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Joshua Storck
I recently worked on an issue that had to be implemented in parquet-cpp (ARROW-1644, ARROW-1599) but required changes in arrow (ARROW-2585, ARROW-2586). I found the circular dependencies confusing and hard to work with. For example, I still have a PR open in parquet-cpp (created on May 10) because

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Wes McKinney
On Mon, Jul 30, 2018 at 8:50 PM, Ted Dunning wrote: > On Mon, Jul 30, 2018 at 5:39 PM Wes McKinney wrote: > >> >> > The community will be less willing to accept large >> > changes that require multiple rounds of patches for stability and API >> > convergence. Our contributions to Libhdfs++ in

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Ted Dunning
On Mon, Jul 30, 2018 at 5:39 PM Wes McKinney wrote: > > > The community will be less willing to accept large > > changes that require multiple rounds of patches for stability and API > > convergence. Our contributions to Libhdfs++ in the HDFS community took a > > significantly long time for the

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Wes McKinney
hi, On Mon, Jul 30, 2018 at 6:52 PM, Deepak Majeti wrote: > Wes, > > I definitely appreciate and do see the impact of contributions made by > everyone. I made this statement not to rate any contributions but solely to > support my concern. > The contribution barrier is higher simply because of

[jira] [Created] (ARROW-2942) [Packaging] Allow a user to inspect the status of another user's builds

2018-07-30 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2942: Summary: [Packaging] Allow a user to inspect the status of another user's builds Key: ARROW-2942 URL: https://issues.apache.org/jira/browse/ARROW-2942 Project:

[jira] [Created] (ARROW-2941) [Packaging] Allow a user to kill existing builds

2018-07-30 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2941: Summary: [Packaging] Allow a user to kill existing builds Key: ARROW-2941 URL: https://issues.apache.org/jira/browse/ARROW-2941 Project: Apache Arrow Issue

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Julian Hyde
I'm not going to comment on the design of the parquet-cpp module and whether it is “closer” to parquet or arrow. But I do think Wes’s proposal is consistent with Apache policy. PMCs make releases and govern communities; they don’t exist to manage code bases, except as a means to the end of

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Deepak Majeti
Wes, I definitely appreciate and do see the impact of contributions made by everyone. I made this statement not to rate any contributions but solely to support my concern. The contribution barrier is higher simply because of the increased code, build, and test dependencies. If the community has

[jira] [Created] (ARROW-2940) [Python] Import error with pytorch 0.3

2018-07-30 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2940: - Summary: [Python] Import error with pytorch 0.3 Key: ARROW-2940 URL: https://issues.apache.org/jira/browse/ARROW-2940 Project: Apache Arrow Issue Type:

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Wes McKinney
hi Deepak On Mon, Jul 30, 2018 at 5:18 PM, Deepak Majeti wrote: > @Wes > My observation is that most of the parquet-cpp contributors you listed that > overlap with the Arrow community mainly contribute to the Arrow > bindings(parquet::arrow layer)/platform API changes in the parquet-cpp > repo.

Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-30 Thread Bryan Cutler
Hi Richard, Take a look at this JIRA https://issues.apache.org/jira/browse/SPARK-24579, it is geared towards exporting Spark data to DL frameworks, but it's likely to add a general method to map Spark data partitions to a function using Arrow data. In that function you should be able apply

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Deepak Majeti
@Wes My observation is that most of the parquet-cpp contributors you listed that overlap with the Arrow community mainly contribute to the Arrow bindings(parquet::arrow layer)/platform API changes in the parquet-cpp repo. Very few of them review/contribute patches to the parquet-cpp core. I

[jira] [Created] (ARROW-2939) [Python] API documentation version doesn't match latest on PyPI

2018-07-30 Thread Ian Robertson (JIRA)
Ian Robertson created ARROW-2939: Summary: [Python] API documentation version doesn't match latest on PyPI Key: ARROW-2939 URL: https://issues.apache.org/jira/browse/ARROW-2939 Project: Apache Arrow

[jira] [Created] (ARROW-2938) [Packaging] Make the source release via crossbow

2018-07-30 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2938: -- Summary: [Packaging] Make the source release via crossbow Key: ARROW-2938 URL: https://issues.apache.org/jira/browse/ARROW-2938 Project: Apache Arrow

Reading PageHeader separately from reading entire page

2018-07-30 Thread Renato Marroquín Mogrovejo
Hi Arrow devs, I am trying to separate reading only pageHeaders from reading (reading+uncompresing+serializing) its entire content. The current SerializedPageReader::NextPage() does both things at the same time. I tried importing format::PageHeader into a separate project linking against a build

Re: Working towards 0.10.0 release candidate

2018-07-30 Thread Phillip Cloud
Sounds good. I will start cranking on this later today and provide an update tomorrow morning about any progress or issues that arise. On Mon, Jul 30, 2018 at 11:05 AM Wes McKinney wrote: > hey Phillip, > > I think this is getting too complicated and it's going to hold up the > release more

Re: Working towards 0.10.0 release candidate

2018-07-30 Thread Wes McKinney
hey Phillip, I think this is getting too complicated and it's going to hold up the release more than it already has. How about we cut 0.10.0 binaries based on the git tag and we try for using a signed tarball for 0.11? I'm concerned we're going to miss our window this week to get an RC cut and

[jira] [Created] (ARROW-2937) [Java] Follow-up changes to ARROW-2704

2018-07-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2937: --- Summary: [Java] Follow-up changes to ARROW-2704 Key: ARROW-2937 URL: https://issues.apache.org/jira/browse/ARROW-2937 Project: Apache Arrow Issue Type:

Re: Working towards 0.10.0 release candidate

2018-07-30 Thread Phillip Cloud
Wanted to update everyone here regarding the ability to cut a release candidate for 0.10.0. The last remaining set of tasks is to be able to use the new packaging tool (crossbow.py) to build binary artifacts from a source archive. What this means is that we'll have to move the release scripts

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Antoine Pitrou
Le 30/07/2018 à 10:50, Antoine Pitrou a écrit : > > Hi Wes, > > Le 29/07/2018 à 01:44, Wes McKinney a écrit : >> I believe the best way to remedy the situation is to adopt a >> "Community over Code" approach and find a way for the Parquet and >> Arrow C++ development communities to operate out

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-07-30 Thread Antoine Pitrou
Hi Wes, Le 29/07/2018 à 01:44, Wes McKinney a écrit : > I believe the best way to remedy the situation is to adopt a > "Community over Code" approach and find a way for the Parquet and > Arrow C++ development communities to operate out of the same code > repository, i.e. the apache/arrow git