Troubleshooting large number of nested items

2018-04-04 Thread Bryant Menn
I am attempting to troubleshoot and provide a patch if I am capable for
ARROW-2367 (https://issues.apache.org/jira/browse/ARROW-2367). From what I
can tell from gdb on a debug build of master, I believe the issue to be
lists in individual rows in an Pandas dataframe/series being stored as a
single BinaryArray instead of a ChunkedArray when the size of the total
column data exceeds the max int32 size.

How would I confirm this hunch? Apologies if this something
straightforward; new to the project and this is my first time debugging a
Python C/C++ extension.

Thanks,

Bryant


[jira] [Created] (ARROW-2397) Document changes in Tensor encoding in IPC.md.

2018-04-04 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-2397:
---

 Summary: Document changes in Tensor encoding in IPC.md.
 Key: ARROW-2397
 URL: https://issues.apache.org/jira/browse/ARROW-2397
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Robert Nishihara


Update IPC.md to reflect the changes in 
https://github.com/apache/arrow/pull/1802.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Aapche process questions

2018-04-04 Thread Wes McKinney
Yes, that is correct. We don't want to create confusion between releases
made by the Apache project and unofficial development releases

On Wed, Apr 4, 2018, 10:17 PM Andy Grove  wrote:

> Wes,
>
> We talked about nightlies and downstream dependencies and I just want to be
> clear I'm understanding what you are suggesting. Apologies if I'm being a
> bit slow.
>
> I think we are in agreement that nightlies make sense for now because
> things are moving fast and this Rust library is very new.
>
> I can build nightly releases myself, but if I publish those to crates.io,
> they should be under a separate name and not using the "arrow" crate that I
> established for this project on crates.io
>
> Is that correct?
>
> Thanks,
>
> Andy.
>
> On Wed, Apr 4, 2018 at 6:29 PM, Wes McKinney  wrote:
>
> > Hi Andy,
> >
> > You're free to do whatever you like outside of Apache infrastructure -- I
> > don't think we can have the apache/arrow Travis publishing artifacts to
> > crates.io, though.
> >
> > If you want to have some scripts in the repo to help with creating
> > nightlies, that's totally fine, but this isn't something that the PMC is
> > able to do anything about formally. We documented our nightly build tool
> > for Python here
> >
> > https://github.com/apache/arrow/blob/master/python/doc/
> > source/development.rst#nightly-builds-of-arrow-cpp-
> > parquet-cpp-and-pyarrow-for-linux
> >
> > - Wes
> >
> > On Wed, Apr 4, 2018, 7:13 PM Andy Grove  wrote:
> >
> > > Wes,
> > >
> > > Thanks for the feedback.
> > >
> > > A nightly release sounds appealing in these early days. Technically, I
> > > assume this is just a case of configuring travis to update the minor
> > > version number and run a "cargo publish" on master once per day (if
> there
> > > are changes)?
> > >
> > > From a process point of view, what is involved in deciding whether to
> do
> > > this or not?
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > >
> > >
> > > On Wed, Apr 4, 2018 at 8:18 AM, Wes McKinney 
> > wrote:
> > >
> > > > hi Andy,
> > > >
> > > > > My argument for doing this is that although other Rust developers
> can
> > > > set up a github dependency in their Cargo,toml, this just isn't a
> > natural
> > > > way to work in Rust so we are making it hard for people to experiment
> > > with
> > > > Arrow.
> > > >
> > > > Being penalized by the language for not using a centralized package
> > > > repository seems odd to me, but if that's the way Rust is, then so be
> > > > it.
> > > >
> > > > > 1. Is it ok for me to push updated releases to crates.io?
> > > >
> > > > If crates.io definitely needs to get updated as frequently as you
> are
> > > > saying, we should make separate Rust releases and hold votes like we
> > > > have with JavaScript. It sounds like nightlies or downstream
> > > > "releases" would be a better solution while the project is new / in
> > > > flux, see below:
> > > >
> > > > > 2. Is releasing from a fork of the code (under a different name)
> > > > acceptable?
> > > >
> > > > Sure, you can make your own "downstream" releases of the project, as
> > > > long as they aren't advertised as being releases made by the Apache
> > > > project. Some of us have been building nightly Python packages for
> > > > development purposes
> > > >
> > > > - Wes
> > > >
> > > > On Wed, Apr 4, 2018 at 10:01 AM, Andy Grove 
> > > wrote:
> > > > > Hi,
> > > > >
> > > > > I've been creating some frustrations for myself this week because
> I'm
> > > not
> > > > > sure how to work efficiently now that the Rust version of Apache
> > Arrow
> > > is
> > > > > in the official Apache repo.
> > > > >
> > > > > It seems I have two conflicting requirements:
> > > > >
> > > > > 1. I want Apache Arrow [Rust] to be a community-driven high quality
> > > piece
> > > > > of software built by many contributors, with a thoughtful code
> review
> > > > > process.
> > > > >
> > > > > 2. I want to move fast and innovate on my other open source project
> > > which
> > > > > now depends on Arrow, and I want to help other projects (such as
> > > > > https://github.com/sunchao/parquet-rs) integrate with Arrow.
> > > > >
> > > > > I have two questions that I need guidance on:
> > > > >
> > > > > 1. Is it ok for me to push updated releases to crates.io?
> > > > >
> > > > > We currently have a 0.1.0 release of the Rust library in crates.io
> (
> > > > > https://crates.io/crates/arrow) which was made to reserve the
> Arrow
> > > name
> > > > > there (with approval from Wes). The github repo has changed quite a
> > bit
> > > > > since this release but this is the only version that Rust users can
> > > > easily
> > > > > pull into their projects as a versioned dependency. Understanding
> > that
> > > > this
> > > > > isn't an official Apache release since it hasn't been through a
> > release
> > > > > process, is it OK to push new versions of this unofficial 

Re: Aapche process questions

2018-04-04 Thread Andy Grove
Wes,

We talked about nightlies and downstream dependencies and I just want to be
clear I'm understanding what you are suggesting. Apologies if I'm being a
bit slow.

I think we are in agreement that nightlies make sense for now because
things are moving fast and this Rust library is very new.

I can build nightly releases myself, but if I publish those to crates.io,
they should be under a separate name and not using the "arrow" crate that I
established for this project on crates.io

Is that correct?

Thanks,

Andy.

On Wed, Apr 4, 2018 at 6:29 PM, Wes McKinney  wrote:

> Hi Andy,
>
> You're free to do whatever you like outside of Apache infrastructure -- I
> don't think we can have the apache/arrow Travis publishing artifacts to
> crates.io, though.
>
> If you want to have some scripts in the repo to help with creating
> nightlies, that's totally fine, but this isn't something that the PMC is
> able to do anything about formally. We documented our nightly build tool
> for Python here
>
> https://github.com/apache/arrow/blob/master/python/doc/
> source/development.rst#nightly-builds-of-arrow-cpp-
> parquet-cpp-and-pyarrow-for-linux
>
> - Wes
>
> On Wed, Apr 4, 2018, 7:13 PM Andy Grove  wrote:
>
> > Wes,
> >
> > Thanks for the feedback.
> >
> > A nightly release sounds appealing in these early days. Technically, I
> > assume this is just a case of configuring travis to update the minor
> > version number and run a "cargo publish" on master once per day (if there
> > are changes)?
> >
> > From a process point of view, what is involved in deciding whether to do
> > this or not?
> >
> > Thanks,
> >
> > Andy.
> >
> >
> >
> >
> > On Wed, Apr 4, 2018 at 8:18 AM, Wes McKinney 
> wrote:
> >
> > > hi Andy,
> > >
> > > > My argument for doing this is that although other Rust developers can
> > > set up a github dependency in their Cargo,toml, this just isn't a
> natural
> > > way to work in Rust so we are making it hard for people to experiment
> > with
> > > Arrow.
> > >
> > > Being penalized by the language for not using a centralized package
> > > repository seems odd to me, but if that's the way Rust is, then so be
> > > it.
> > >
> > > > 1. Is it ok for me to push updated releases to crates.io?
> > >
> > > If crates.io definitely needs to get updated as frequently as you are
> > > saying, we should make separate Rust releases and hold votes like we
> > > have with JavaScript. It sounds like nightlies or downstream
> > > "releases" would be a better solution while the project is new / in
> > > flux, see below:
> > >
> > > > 2. Is releasing from a fork of the code (under a different name)
> > > acceptable?
> > >
> > > Sure, you can make your own "downstream" releases of the project, as
> > > long as they aren't advertised as being releases made by the Apache
> > > project. Some of us have been building nightly Python packages for
> > > development purposes
> > >
> > > - Wes
> > >
> > > On Wed, Apr 4, 2018 at 10:01 AM, Andy Grove 
> > wrote:
> > > > Hi,
> > > >
> > > > I've been creating some frustrations for myself this week because I'm
> > not
> > > > sure how to work efficiently now that the Rust version of Apache
> Arrow
> > is
> > > > in the official Apache repo.
> > > >
> > > > It seems I have two conflicting requirements:
> > > >
> > > > 1. I want Apache Arrow [Rust] to be a community-driven high quality
> > piece
> > > > of software built by many contributors, with a thoughtful code review
> > > > process.
> > > >
> > > > 2. I want to move fast and innovate on my other open source project
> > which
> > > > now depends on Arrow, and I want to help other projects (such as
> > > > https://github.com/sunchao/parquet-rs) integrate with Arrow.
> > > >
> > > > I have two questions that I need guidance on:
> > > >
> > > > 1. Is it ok for me to push updated releases to crates.io?
> > > >
> > > > We currently have a 0.1.0 release of the Rust library in crates.io (
> > > > https://crates.io/crates/arrow) which was made to reserve the Arrow
> > name
> > > > there (with approval from Wes). The github repo has changed quite a
> bit
> > > > since this release but this is the only version that Rust users can
> > > easily
> > > > pull into their projects as a versioned dependency. Understanding
> that
> > > this
> > > > isn't an official Apache release since it hasn't been through a
> release
> > > > process, is it OK to push new versions of this unofficial release as
> > PRs
> > > > are accepted into the repo? I have the *ability* to do this but I
> don't
> > > > know if I have *approval* to do this.
> > > >
> > > > My argument for doing this is that although other Rust developers can
> > set
> > > > up a github dependency in their Cargo,toml, this just isn't a natural
> > way
> > > > to work in Rust so we are making it hard for people to experiment
> with
> > > > Arrow. The code is available in github and anyone could fork the code
> 

[jira] [Created] (ARROW-2396) Unify Rust Errors

2018-04-04 Thread Maximilian Roos (JIRA)
Maximilian Roos created ARROW-2396:
--

 Summary: Unify Rust Errors
 Key: ARROW-2396
 URL: https://issues.apache.org/jira/browse/ARROW-2396
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Maximilian Roos


Currently there are two Error items - one Enum and one Struct. These should be 
unified under a single ArrowError Enum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2394) [Python] Correct flake8 errors in benchmarks

2018-04-04 Thread Alex Hagerman (JIRA)
Alex Hagerman created ARROW-2394:


 Summary: [Python] Correct flake8 errors in benchmarks
 Key: ARROW-2394
 URL: https://issues.apache.org/jira/browse/ARROW-2394
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Alex Hagerman
Assignee: Alex Hagerman
 Fix For: 0.10.0


Fix linting issues that that flake8 can be ran for all files in the Python 
directory.

 

!https://user-images.githubusercontent.com/2118138/38217076-f08a67da-369a-11e8-8166-b3a9ed7d9a60.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Aapche process questions

2018-04-04 Thread Andy Grove
Wes,

Thanks for the feedback.

A nightly release sounds appealing in these early days. Technically, I
assume this is just a case of configuring travis to update the minor
version number and run a "cargo publish" on master once per day (if there
are changes)?

>From a process point of view, what is involved in deciding whether to do
this or not?

Thanks,

Andy.




On Wed, Apr 4, 2018 at 8:18 AM, Wes McKinney  wrote:

> hi Andy,
>
> > My argument for doing this is that although other Rust developers can
> set up a github dependency in their Cargo,toml, this just isn't a natural
> way to work in Rust so we are making it hard for people to experiment with
> Arrow.
>
> Being penalized by the language for not using a centralized package
> repository seems odd to me, but if that's the way Rust is, then so be
> it.
>
> > 1. Is it ok for me to push updated releases to crates.io?
>
> If crates.io definitely needs to get updated as frequently as you are
> saying, we should make separate Rust releases and hold votes like we
> have with JavaScript. It sounds like nightlies or downstream
> "releases" would be a better solution while the project is new / in
> flux, see below:
>
> > 2. Is releasing from a fork of the code (under a different name)
> acceptable?
>
> Sure, you can make your own "downstream" releases of the project, as
> long as they aren't advertised as being releases made by the Apache
> project. Some of us have been building nightly Python packages for
> development purposes
>
> - Wes
>
> On Wed, Apr 4, 2018 at 10:01 AM, Andy Grove  wrote:
> > Hi,
> >
> > I've been creating some frustrations for myself this week because I'm not
> > sure how to work efficiently now that the Rust version of Apache Arrow is
> > in the official Apache repo.
> >
> > It seems I have two conflicting requirements:
> >
> > 1. I want Apache Arrow [Rust] to be a community-driven high quality piece
> > of software built by many contributors, with a thoughtful code review
> > process.
> >
> > 2. I want to move fast and innovate on my other open source project which
> > now depends on Arrow, and I want to help other projects (such as
> > https://github.com/sunchao/parquet-rs) integrate with Arrow.
> >
> > I have two questions that I need guidance on:
> >
> > 1. Is it ok for me to push updated releases to crates.io?
> >
> > We currently have a 0.1.0 release of the Rust library in crates.io (
> > https://crates.io/crates/arrow) which was made to reserve the Arrow name
> > there (with approval from Wes). The github repo has changed quite a bit
> > since this release but this is the only version that Rust users can
> easily
> > pull into their projects as a versioned dependency. Understanding that
> this
> > isn't an official Apache release since it hasn't been through a release
> > process, is it OK to push new versions of this unofficial release as PRs
> > are accepted into the repo? I have the *ability* to do this but I don't
> > know if I have *approval* to do this.
> >
> > My argument for doing this is that although other Rust developers can set
> > up a github dependency in their Cargo,toml, this just isn't a natural way
> > to work in Rust so we are making it hard for people to experiment with
> > Arrow. The code is available in github and anyone could fork the code and
> > make their own release to crates.io but I'd prefer them to use the one
> we
> > make.
> >
> > 2. Is releasing from a fork of the code (under a different name)
> acceptable?
> >
> > We cannot stop people forking the Arrow repo and making their own
> releases
> > to crates.io under a different name.
> >
> > I'm wondering if I should just do that for now e.g. release a
> > "datafusion-arrow" project from the branch in my fork where I have merged
> > all of my PRs. This way I can keep moving fast and if other projects such
> > as parquet-rs want to depend on my releases as a temporary measure, they
> > can, until the official Arrow crate catches up. This would fix my short
> > term pains but I don't want to detract from the official project.
> >
> > I'd appreciate some guidance on the best way forward.
> >
> > Thanks,
> >
> > Andy.
>


[jira] [Created] (ARROW-2393) [C++] arrow/status.h does not define ARROW_CHECK needed for ARROW_CHECK_OK

2018-04-04 Thread dennis lucero (JIRA)
dennis lucero created ARROW-2393:


 Summary: [C++] arrow/status.h does not define ARROW_CHECK needed 
for ARROW_CHECK_OK
 Key: ARROW-2393
 URL: https://issues.apache.org/jira/browse/ARROW-2393
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.9.0
Reporter: dennis lucero


test.cpp
{code:c++}
#include 

int main(void) {
arrow::Int64Builder i64builder;
std::shared_ptr i64array;
ARROW_CHECK_OK(i64builder.Finish());
return EXIT_SUCCESS;
}
{code}
Attempt to build:
{code:bash}
$CXX test.cpp -std=c++11 -larrow
{code}
Error:
{code}
test.cpp:6:2: error: use of undeclared identifier 'ARROW_CHECK' 
ARROW_CHECK_OK(i64builder.Finish()); ^ 
xxx/include/arrow/status.h:49:27: note: expanded from macro 'ARROW_CHECK_OK' 
#define ARROW_CHECK_OK(s) ARROW_CHECK_OK_PREPEND(s, "Bad status") ^ 
xxx/include/arrow/status.h:44:5: note: expanded from macro 
'ARROW_CHECK_OK_PREPEND' ARROW_CHECK(_s.ok()) << (msg) << ": " << 
_s.ToString(); \ ^ 1 error generated.
{code}
I expect that ARROW_* macro are public API, and should work out of the box.
A naive attempt to fix it
{code}
diff --git a/cpp/src/arrow/status.h b/cpp/src/arrow/status.h
index 84f55e41..6da4a773 100644
--- a/cpp/src/arrow/status.h
+++ b/cpp/src/arrow/status.h
@@ -25,6 +25,7 @@

 #include "arrow/util/macros.h"
 #include "arrow/util/visibility.h"
+#include "arrow/util/logging.h"

 // Return the given status if it is not OK.
 #define ARROW_RETURN_NOT_OK(s)   \
{code}
fails with
{code}
public-api-test.cc:21:2: error: "DCHECK should not be visible from Arrow public 
headers."
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: What do people think about a one day get together?

2018-04-04 Thread Wes McKinney
I'm +1 on the idea of a Arrow conference. June is a pretty busy month
for me, so I would prefer another time, but I can arrange to make it
if we do it then.

- Wes

On Wed, Apr 4, 2018 at 1:06 PM, Siddharth Teotia  wrote:
> +1. I would love to attend.
>
> On Tue, Apr 3, 2018 at 4:18 PM, Kevin Moore  wrote:
>
>> Sounds great. Quilt Data may be able to sponsor some of the refreshment
>> costs.
>>
>> 
>> Kevin Moore
>> CEO, Quilt Data, Inc.
>> ke...@quiltdata.io | LinkedIn 
>> (415) 497-7895
>>
>>
>> Manage Data like Code
>> quiltdata.com
>>
>> On Tue, Apr 3, 2018 at 1:41 PM, Li Jin  wrote:
>>
>> > I'd love to attend. I will be around for Spark Summit.
>> >
>> > Li
>> >
>> >
>> > On Tue, Apr 3, 2018 at 11:48 AM, Jacques Nadeau 
>> > wrote:
>> >
>> > > Hey All,
>> > >
>> > > In light of growing interest in Apache Arrow over the past year and the
>> > > great response to the meetup talk invitation I sent last week, I was
>> > > thinking it may be time to hold a single day conference focused on the
>> > > project. Wes and I have previously thrown this idea around and it seems
>> > > like it might be a good time to get something started. Some of my
>> > > colleagues did an investigation on how and when we could do this. I'm
>> > > raising this to you all now to get people's thoughts.
>> > >
>> > >
>> > > A rough sketch of what Wes and I have bounced around:
>> > >
>> > > *One day developer-focused event on Apache Arrow in San Francisco, June
>> > 7,
>> > > just after Spark Summit (open to other dates, but it would be nice for
>> > > folks attending the conference to stay one extra day for Arrow).
>> > >
>> > > * Focus on interesting use cases and applications of Arrow. We could
>> also
>> > > use this event to discuss/plan/present about movement to Arrow 1.0 this
>> > > year and beyond.
>> > >
>> > > *Goal of 100-200 attendees.
>> > >
>> > > *Dremio can offer to organize the event (venue, logistics,
>> registrations,
>> > > etc). The goal would be to keep ticket costs very modest to encourage
>> > > attendance (eg, $50). Opportunity for sponsorship by vendors to help
>> > drive
>> > > down costs (eg, refreshments).
>> > >
>> > > *Still need to determine a venue but probably something downtown SF
>> > nearish
>> > > Moscone.
>> > >
>> > > *PMC or appointed sub-committee could review talk submissions. We could
>> > use
>> > > something like EasyChair to make this as simple as possible.
>> > >
>> > > What do people think? I think this could be good to continue to drive
>> and
>> > > grow the community in a positive way.
>> > >
>> > > thanks,
>> > > Jacques
>> > >
>> >
>>


Re: What do people think about a one day get together?

2018-04-04 Thread Siddharth Teotia
+1. I would love to attend.

On Tue, Apr 3, 2018 at 4:18 PM, Kevin Moore  wrote:

> Sounds great. Quilt Data may be able to sponsor some of the refreshment
> costs.
>
> 
> Kevin Moore
> CEO, Quilt Data, Inc.
> ke...@quiltdata.io | LinkedIn 
> (415) 497-7895
>
>
> Manage Data like Code
> quiltdata.com
>
> On Tue, Apr 3, 2018 at 1:41 PM, Li Jin  wrote:
>
> > I'd love to attend. I will be around for Spark Summit.
> >
> > Li
> >
> >
> > On Tue, Apr 3, 2018 at 11:48 AM, Jacques Nadeau 
> > wrote:
> >
> > > Hey All,
> > >
> > > In light of growing interest in Apache Arrow over the past year and the
> > > great response to the meetup talk invitation I sent last week, I was
> > > thinking it may be time to hold a single day conference focused on the
> > > project. Wes and I have previously thrown this idea around and it seems
> > > like it might be a good time to get something started. Some of my
> > > colleagues did an investigation on how and when we could do this. I'm
> > > raising this to you all now to get people's thoughts.
> > >
> > >
> > > A rough sketch of what Wes and I have bounced around:
> > >
> > > *One day developer-focused event on Apache Arrow in San Francisco, June
> > 7,
> > > just after Spark Summit (open to other dates, but it would be nice for
> > > folks attending the conference to stay one extra day for Arrow).
> > >
> > > * Focus on interesting use cases and applications of Arrow. We could
> also
> > > use this event to discuss/plan/present about movement to Arrow 1.0 this
> > > year and beyond.
> > >
> > > *Goal of 100-200 attendees.
> > >
> > > *Dremio can offer to organize the event (venue, logistics,
> registrations,
> > > etc). The goal would be to keep ticket costs very modest to encourage
> > > attendance (eg, $50). Opportunity for sponsorship by vendors to help
> > drive
> > > down costs (eg, refreshments).
> > >
> > > *Still need to determine a venue but probably something downtown SF
> > nearish
> > > Moscone.
> > >
> > > *PMC or appointed sub-committee could review talk submissions. We could
> > use
> > > something like EasyChair to make this as simple as possible.
> > >
> > > What do people think? I think this could be good to continue to drive
> and
> > > grow the community in a positive way.
> > >
> > > thanks,
> > > Jacques
> > >
> >
>


[jira] [Created] (ARROW-2392) pyarrow RecordBatchStreamWriter allows writing batches with different schemas

2018-04-04 Thread Ernesto Ocampo (JIRA)
Ernesto Ocampo created ARROW-2392:
-

 Summary: pyarrow RecordBatchStreamWriter allows writing batches 
with different schemas
 Key: ARROW-2392
 URL: https://issues.apache.org/jira/browse/ARROW-2392
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Ernesto Ocampo


A RecordBatchStreamWriter initialised with a given schema will still allow 
writing RecordBatches that have different schemas. Example:

 
{code:java}
schema = pa.schema([pa.field('some_field', pa.int64())])
stream = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(stream, schema)

data = [pa.array([1.234])]
batch = pa.RecordBatch.from_arrays(data, ['some_field'])  
# batch does not conform to schema

assert batch.schema != schema

writer.write_batch(batch)  # no exception raised
writer.close()
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Arrow sync tomorrow: 12:00 US/Eastern, please review packaging thread

2018-04-04 Thread Phillip Cloud
I didn't realize one needs chrome to use google meet, so it'll be a minute
or so before I'm there.

On Wed, Apr 4, 2018 at 11:54 AM Wes McKinney  wrote:

> hi Sidd -- your e-mail is on the Google calendar invite, not sure how
> to send a link to it
>
> Here is the link for the meeting: https://meet.google.com/vtm-teks-phx
>
> On Wed, Apr 4, 2018 at 11:48 AM, Siddharth Teotia 
> wrote:
> > Can someone please send me the link to gcal? For some reason it has
> > vanished from my calendar.
> >
> > On Wed, Apr 4, 2018 at 7:49 AM, Li Jin  wrote:
> >
> >> Sorry I have a conflict today so won't be able to join.
> >>
> >> Li
> >>
> >> On Wed, Apr 4, 2018 at 1:53 AM, Bhaskar Mookerji 
> >> wrote:
> >>
> >> > Can someone attending this send out notes afterwards? It would be very
> >> much
> >> > appreciated.
> >> >
> >> > Thanks,
> >> > Buro
> >> >
> >> > On Tue, Apr 3, 2018 at 2:44 PM, Wes McKinney 
> >> wrote:
> >> >
> >> > > hi folks,
> >> > >
> >> > > We have a sync call tomorrow. Could everyone please review the
> >> > > packaging mailing list thread and if possible review and comment on
> >> > > Phillip's document about this? We need to begin taking action to fix
> >> > > these problems:
> >> > >
> >> > > https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-
> >> > > g9EGPOtcFdtMBzEyDJv48BKc/edit?usp=sharing
> >> > >
> >> > > Thanks
> >> > > Wes
> >> > >
> >> >
> >>
>


Re: Arrow sync tomorrow: 12:00 US/Eastern, please review packaging thread

2018-04-04 Thread Wes McKinney
hi Sidd -- your e-mail is on the Google calendar invite, not sure how
to send a link to it

Here is the link for the meeting: https://meet.google.com/vtm-teks-phx

On Wed, Apr 4, 2018 at 11:48 AM, Siddharth Teotia  wrote:
> Can someone please send me the link to gcal? For some reason it has
> vanished from my calendar.
>
> On Wed, Apr 4, 2018 at 7:49 AM, Li Jin  wrote:
>
>> Sorry I have a conflict today so won't be able to join.
>>
>> Li
>>
>> On Wed, Apr 4, 2018 at 1:53 AM, Bhaskar Mookerji 
>> wrote:
>>
>> > Can someone attending this send out notes afterwards? It would be very
>> much
>> > appreciated.
>> >
>> > Thanks,
>> > Buro
>> >
>> > On Tue, Apr 3, 2018 at 2:44 PM, Wes McKinney 
>> wrote:
>> >
>> > > hi folks,
>> > >
>> > > We have a sync call tomorrow. Could everyone please review the
>> > > packaging mailing list thread and if possible review and comment on
>> > > Phillip's document about this? We need to begin taking action to fix
>> > > these problems:
>> > >
>> > > https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-
>> > > g9EGPOtcFdtMBzEyDJv48BKc/edit?usp=sharing
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> >
>>


Re: Arrow sync tomorrow: 12:00 US/Eastern, please review packaging thread

2018-04-04 Thread Siddharth Teotia
Got it: https://meet.google.com/vtm-teks-phx

On Wed, Apr 4, 2018 at 8:48 AM, Siddharth Teotia 
wrote:

> Can someone please send me the link to gcal? For some reason it has
> vanished from my calendar.
>
> On Wed, Apr 4, 2018 at 7:49 AM, Li Jin  wrote:
>
>> Sorry I have a conflict today so won't be able to join.
>>
>> Li
>>
>> On Wed, Apr 4, 2018 at 1:53 AM, Bhaskar Mookerji 
>> wrote:
>>
>> > Can someone attending this send out notes afterwards? It would be very
>> much
>> > appreciated.
>> >
>> > Thanks,
>> > Buro
>> >
>> > On Tue, Apr 3, 2018 at 2:44 PM, Wes McKinney 
>> wrote:
>> >
>> > > hi folks,
>> > >
>> > > We have a sync call tomorrow. Could everyone please review the
>> > > packaging mailing list thread and if possible review and comment on
>> > > Phillip's document about this? We need to begin taking action to fix
>> > > these problems:
>> > >
>> > > https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-
>> > > g9EGPOtcFdtMBzEyDJv48BKc/edit?usp=sharing
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> >
>>
>
>


Re: Arrow sync tomorrow: 12:00 US/Eastern, please review packaging thread

2018-04-04 Thread Siddharth Teotia
Can someone please send me the link to gcal? For some reason it has
vanished from my calendar.

On Wed, Apr 4, 2018 at 7:49 AM, Li Jin  wrote:

> Sorry I have a conflict today so won't be able to join.
>
> Li
>
> On Wed, Apr 4, 2018 at 1:53 AM, Bhaskar Mookerji 
> wrote:
>
> > Can someone attending this send out notes afterwards? It would be very
> much
> > appreciated.
> >
> > Thanks,
> > Buro
> >
> > On Tue, Apr 3, 2018 at 2:44 PM, Wes McKinney 
> wrote:
> >
> > > hi folks,
> > >
> > > We have a sync call tomorrow. Could everyone please review the
> > > packaging mailing list thread and if possible review and comment on
> > > Phillip's document about this? We need to begin taking action to fix
> > > these problems:
> > >
> > > https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-
> > > g9EGPOtcFdtMBzEyDJv48BKc/edit?usp=sharing
> > >
> > > Thanks
> > > Wes
> > >
> >
>


Re: Arrow sync tomorrow: 12:00 US/Eastern, please review packaging thread

2018-04-04 Thread Li Jin
Sorry I have a conflict today so won't be able to join.

Li

On Wed, Apr 4, 2018 at 1:53 AM, Bhaskar Mookerji  wrote:

> Can someone attending this send out notes afterwards? It would be very much
> appreciated.
>
> Thanks,
> Buro
>
> On Tue, Apr 3, 2018 at 2:44 PM, Wes McKinney  wrote:
>
> > hi folks,
> >
> > We have a sync call tomorrow. Could everyone please review the
> > packaging mailing list thread and if possible review and comment on
> > Phillip's document about this? We need to begin taking action to fix
> > these problems:
> >
> > https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-
> > g9EGPOtcFdtMBzEyDJv48BKc/edit?usp=sharing
> >
> > Thanks
> > Wes
> >
>


Re: Aapche process questions

2018-04-04 Thread Wes McKinney
hi Andy,

> My argument for doing this is that although other Rust developers can set up 
> a github dependency in their Cargo,toml, this just isn't a natural way to 
> work in Rust so we are making it hard for people to experiment with Arrow.

Being penalized by the language for not using a centralized package
repository seems odd to me, but if that's the way Rust is, then so be
it.

> 1. Is it ok for me to push updated releases to crates.io?

If crates.io definitely needs to get updated as frequently as you are
saying, we should make separate Rust releases and hold votes like we
have with JavaScript. It sounds like nightlies or downstream
"releases" would be a better solution while the project is new / in
flux, see below:

> 2. Is releasing from a fork of the code (under a different name) acceptable?

Sure, you can make your own "downstream" releases of the project, as
long as they aren't advertised as being releases made by the Apache
project. Some of us have been building nightly Python packages for
development purposes

- Wes

On Wed, Apr 4, 2018 at 10:01 AM, Andy Grove  wrote:
> Hi,
>
> I've been creating some frustrations for myself this week because I'm not
> sure how to work efficiently now that the Rust version of Apache Arrow is
> in the official Apache repo.
>
> It seems I have two conflicting requirements:
>
> 1. I want Apache Arrow [Rust] to be a community-driven high quality piece
> of software built by many contributors, with a thoughtful code review
> process.
>
> 2. I want to move fast and innovate on my other open source project which
> now depends on Arrow, and I want to help other projects (such as
> https://github.com/sunchao/parquet-rs) integrate with Arrow.
>
> I have two questions that I need guidance on:
>
> 1. Is it ok for me to push updated releases to crates.io?
>
> We currently have a 0.1.0 release of the Rust library in crates.io (
> https://crates.io/crates/arrow) which was made to reserve the Arrow name
> there (with approval from Wes). The github repo has changed quite a bit
> since this release but this is the only version that Rust users can easily
> pull into their projects as a versioned dependency. Understanding that this
> isn't an official Apache release since it hasn't been through a release
> process, is it OK to push new versions of this unofficial release as PRs
> are accepted into the repo? I have the *ability* to do this but I don't
> know if I have *approval* to do this.
>
> My argument for doing this is that although other Rust developers can set
> up a github dependency in their Cargo,toml, this just isn't a natural way
> to work in Rust so we are making it hard for people to experiment with
> Arrow. The code is available in github and anyone could fork the code and
> make their own release to crates.io but I'd prefer them to use the one we
> make.
>
> 2. Is releasing from a fork of the code (under a different name) acceptable?
>
> We cannot stop people forking the Arrow repo and making their own releases
> to crates.io under a different name.
>
> I'm wondering if I should just do that for now e.g. release a
> "datafusion-arrow" project from the branch in my fork where I have merged
> all of my PRs. This way I can keep moving fast and if other projects such
> as parquet-rs want to depend on my releases as a temporary measure, they
> can, until the official Arrow crate catches up. This would fix my short
> term pains but I don't want to detract from the official project.
>
> I'd appreciate some guidance on the best way forward.
>
> Thanks,
>
> Andy.


[jira] [Created] (ARROW-2391) Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-04 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2391:
---

 Summary: Segmentation fault from PyArrow when mapping Pandas 
datetime column to pyarrow.date64
 Key: ARROW-2391
 URL: https://issues.apache.org/jira/browse/ARROW-2391
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Mac OS High Sierra
Python 3.6
Reporter: Dave Challis


When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and a 
`pyarrow.Schema` provided, the function call results in a segmentation fault if 
Pandas `datetime64[ns]` column tries to be converted to a `pyarrow.date64` type.

 

A minimal example which shows this is:

{{import pandas as pd}}
{{import pyarrow as pa}}

{{df = pd.DataFrame(\{'created': ['2018-05-10T10:24:01']})}}
{{df['created'] = pd.to_datetime(df['created'])}}
{{schema = pa.schema([pa.field('created', pa.date64())])}}
{{pa.Table.from_pandas(df, schema=schema)}}

 

Executing the above causes the python interpreter to exit with "Segmentation 
fault: 11".

 

Attempting to convert into various other datatypes (by specifying different 
schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Aapche process questions

2018-04-04 Thread Andy Grove
Hi,

I've been creating some frustrations for myself this week because I'm not
sure how to work efficiently now that the Rust version of Apache Arrow is
in the official Apache repo.

It seems I have two conflicting requirements:

1. I want Apache Arrow [Rust] to be a community-driven high quality piece
of software built by many contributors, with a thoughtful code review
process.

2. I want to move fast and innovate on my other open source project which
now depends on Arrow, and I want to help other projects (such as
https://github.com/sunchao/parquet-rs) integrate with Arrow.

I have two questions that I need guidance on:

1. Is it ok for me to push updated releases to crates.io?

We currently have a 0.1.0 release of the Rust library in crates.io (
https://crates.io/crates/arrow) which was made to reserve the Arrow name
there (with approval from Wes). The github repo has changed quite a bit
since this release but this is the only version that Rust users can easily
pull into their projects as a versioned dependency. Understanding that this
isn't an official Apache release since it hasn't been through a release
process, is it OK to push new versions of this unofficial release as PRs
are accepted into the repo? I have the *ability* to do this but I don't
know if I have *approval* to do this.

My argument for doing this is that although other Rust developers can set
up a github dependency in their Cargo,toml, this just isn't a natural way
to work in Rust so we are making it hard for people to experiment with
Arrow. The code is available in github and anyone could fork the code and
make their own release to crates.io but I'd prefer them to use the one we
make.

2. Is releasing from a fork of the code (under a different name) acceptable?

We cannot stop people forking the Arrow repo and making their own releases
to crates.io under a different name.

I'm wondering if I should just do that for now e.g. release a
"datafusion-arrow" project from the branch in my fork where I have merged
all of my PRs. This way I can keep moving fast and if other projects such
as parquet-rs want to depend on my releases as a temporary measure, they
can, until the official Arrow crate catches up. This would fix my short
term pains but I don't want to detract from the official project.

I'd appreciate some guidance on the best way forward.

Thanks,

Andy.


Re: Next Arrow sync call

2018-04-04 Thread Phillip Cloud
Please add me as well.

On Thu, Mar 29, 2018 at 3:36 PM Li Jin  wrote:

> Please add me in the gcal invite too. Thx
> On Thu, Mar 29, 2018 at 3:12 PM Paul Taylor  wrote:
>
> > I'd like to join the gcal invite as well. Thanks!
> >
> > > On Mar 29, 2018, at 11:10 AM, Wes McKinney 
> wrote:
> > >
> > > Looks good.
> > >
> > > The next Arrow sync will be Wednesday April 4 at 12:00 US Eastern time
> > >
> > > On Thu, Mar 29, 2018 at 7:53 AM, Uwe L. Korn  wrote:
> > >> Hi,
> > >>
> > >> I've added all who have requested an invite. Hope this worked
> > eventhough I'm not the orangniser.
> > >>
> > >> Uwe
> > >>
> > >> On Thu, Mar 29, 2018, at 1:11 PM, Deepak Majeti wrote:
> > >>> Wes,
> > >>>
> > >>> Can you add me too? Thanks!
> > >>>
> > >>> On Wed, Mar 28, 2018 at 9:52 PM, Alex Hagerman <
> a...@unexpectedeof.net
> > >
> > >>> wrote:
> > >>>
> >  Hi,
> > 
> >  Can I get an invite as well?
> > 
> >  Thank you.
> > 
> >  Alex
> > 
> > 
> > 
> >  On 03/28/2018 09:28 PM, Aneesh Karve wrote:
> > 
> > > Hi Wes, please add me to the Gcal invite. Thank you.
> > > ᐧ
> > >
> > >
> > 
> > >>>
> > >>>
> > >>> --
> > >>> regards,
> > >>> Deepak Majeti
> >
> >
>


[jira] [Created] (ARROW-2390) [C++/Python] CheckPyError() could inspect exception type

2018-04-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2390:
-

 Summary: [C++/Python] CheckPyError() could inspect exception type
 Key: ARROW-2390
 URL: https://issues.apache.org/jira/browse/ARROW-2390
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Current {{CheckPyError}} always chooses an "unknown error" status. But it could 
inspect the Python exception and choose, e.g. "type error" for a {{TypeError}} 
exception, etc.

See also ARROW-2389



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2389) [C++] Add StatusCode::OverflowError

2018-04-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2389:
-

 Summary: [C++] Add StatusCode::OverflowError
 Key: ARROW-2389
 URL: https://issues.apache.org/jira/browse/ARROW-2389
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


It may be useful to have a {{StatusCode::OverflowError}} return code, to signal 
that something overflowed allowed limits (e.g. the 2GB limit for string or 
binary values).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)