Re: [Java] PR Reviewers

2020-01-27 Thread Micah Kornfield
Thanks for the offers I'll try to do a triage pass in the next few days and
tag some of the people who have volunteered.

Cheers,
Micah

On Mon, Jan 27, 2020 at 10:40 AM Ryan Murray  wrote:

> Hey all, I would love to help out. Is there any specific ones that are
> relatively easy for me to get started on?
>
> On Mon, 27 Jan 2020, 18:31 Bryan Cutler,  wrote:
>
> > Hi Micah, I don't have a ton of bandwidth at the moment, but I'll try to
> > review some more PRs. Anyone, please feel free to ping me too if you
> have a
> > stale PR that needs some help getting through. Outreach to other Java
> > communities sounds like a good idea - more Java users would definitely
> be a
> > good thing!
> >
> > Bryan
> >
> > On Mon, Jan 27, 2020 at 8:12 AM Andy Grove 
> wrote:
> >
> > > I've now started working with the Java implementation of Arrow,
> > > specifically Flight, and would be happy to help although I do have
> > limited
> > > time each week. I can at least review from a Java correctness point of
> > > view.
> > >
> > > Andy.
> > >
> > > On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield  >
> > > wrote:
> > >
> > > > I mentioned this elsewhere but my intent is to stop doing java
> reviews
> > > for
> > > > the immediate future once I wrap up the few that I have requested
> > change
> > > > on.
> > > >
> > > > I'm happy to try to triage incoming Java PRs, but in order to do
> this,
> > I
> > > > need to know which committers have some bandwidth to do reviews (some
> > of
> > > > the existing PRs I've tagged people who never responded).
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > >
> >
>


Re: [Java] PR Reviewers

2020-01-27 Thread Micah Kornfield
>
> Somewhat related, but are there any thoughts about growing the Java
> developer community generally? Perhaps we could do some outreach to
> other Java-focused Apache communities (Iceberg comes to mind, but
> there may be others)?

I'm all for this.  I think one of the things that we are lacking a little
bit on the Java side of things is a clear idea of what we want to build
into Apache Arrow proper.  For instance, in the past, I've been -0.5
on trying to replicate the work that is on-going on the C++ side of things,
but maybe we should reconsider that? Or at least more JNI bindings?
Getting more input on this would be useful especially from those outside
the community.  I still think a strong set of adapter libraries, especially
if we can make them "best of class" in performance would be beneficial for
adoption.

Not directly related, but it would be nice if Java contributors could
> fill the holes in the 0.16.0 release blog post.  Currently the Java
> section is empty:
> https://github.com/apache/arrow-site/pull/41


I put a few bullet points in.

On Mon, Jan 27, 2020 at 11:08 AM Antoine Pitrou  wrote:

>
> Not directly related, but it would be nice if Java contributors could
> fill the holes in the 0.16.0 release blog post.  Currently the Java
> section is empty:
> https://github.com/apache/arrow-site/pull/41
>
> Regards
>
> Antoine.
>
>
> Le 27/01/2020 à 19:40, Ryan Murray a écrit :
> > Hey all, I would love to help out. Is there any specific ones that are
> > relatively easy for me to get started on?
> >
>


[jira] [Created] (ARROW-7698) [Format][C++] Add tensor and sparse tensor supports in File metadata

2020-01-27 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-7698:
---

 Summary: [Format][C++] Add tensor and sparse tensor supports in 
File metadata
 Key: ARROW-7698
 URL: https://issues.apache.org/jira/browse/ARROW-7698
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Format
Reporter: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Format] Array/RowBatch filters

2020-01-27 Thread Micah Kornfield
Thanks for all the input:

> I think having support for this in some way in the IPC
> protocol makes sense (it seems slightly less important for the C API
> but worth thinking about

The way I read Jacques e-mail is it seems like the opposite might be true
(at least for Dremio).  For IPC I think there is probably a sweet spot
where it doesn't pay to compact the batches but it would like take some
tuning.


> The question is how mechanically, would it be some extra buffers at
> the start or end of the record batch body (probably have to be at the
> end of the body for forward compatibility reasons)?

I think for RecordBatch it would be an extra buffer either at the beginning
for the end.  Its possible putting at the end would allow better forwards
compatibility.  I haven't really given much thought on design here.  My
main concern is to define appropriate metadata before 1.0.0 to maintain
forwards compatibility.  My thinking is the metadata would be an enum or
null table that indicates "no filters".  Implementations could then
determine if they know how to understand the corresponding buffers
correctly based on the metadata.

I can try to put up a straw-man PR for metadata if we think this is worth
pursuing further.

Thanks,
Micah

P.S. This also raises a slightly related concern about letting applications
negotiate "capabilities" at a finer grained level (e.g. letting the
transmitter know that the receive only supports unfiltered values).

On Mon, Jan 27, 2020 at 8:34 PM Wes McKinney  wrote:

> hi Micah -- I think having support for this in some way in the IPC
> protocol makes sense (it seems slightly less important for the C API
> but worth thinking about). It's helpful to know that Dremio (a big
> Arrow user) already employs various filters / selection vectors.
>
> The question is how mechanically, would it be some extra buffers at
> the start or end of the record batch body (probably have to be at the
> end of the body for forward compatibility reasons)?
>
> On Sun, Jan 26, 2020 at 1:16 PM Jacques Nadeau  wrote:
> >
> > At Dremio, we use four main types of selection vector/bitmaps:
> >
> > Dense Format (record valid or not, no ordering)
> > - single bit (bitmap)
> >
> > Sparse formats (identifies valid records as well as their order)
> > - 2 byte (for record batches up to 2^16 records).
> > - 4 byte (for 2^16 batches of 2^16 records);
> > - 6 byte (for 2^32 batches of 2^16 records);
> >
> > We've considered introducing a couple more. I imagine for other use
> cases,
> > where people use much larger batches of records, different requirements
> > would be necessary. My reason for sharing is it seems like this may be
> > use-case specific. I'd also note that at the IPC level, you'd generally
> > want to contract batches before dropping them on the wire (or at least
> that
> > is what we typically do).
> >
> > On Fri, Jan 24, 2020 at 11:23 PM Micah Kornfield 
> > wrote:
> >
> > > I was thinking selection vector/bitmap (possibly with different
> encodings),
> > > but really nothing for now.  Ordinarily, I'd lean towards YAGNI but
> there
> > > isn't a good way to add this in easily in a forward compatible way
> unless
> > > we add a placeholder enum/table for 1.0 (the default option would be no
> > > filter and wouldn't change the packaged data at all).
> > >
> > > On Fri, Jan 24, 2020 at 4:55 AM Francois Saint-Jacques <
> > > fsaintjacq...@gmail.com> wrote:
> > >
> > > > By filter, you mean a filter expression, or a selection
> vector/bitmap?
> > > >
> > > > On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > > >
> > > > > One of the things that I think got overlooked in the conversation
> on
> > > > having
> > > > > a slice offset in the C API was a suggestion from Jacques of
> perhaps
> > > > > generalizing the concept to an arbitrary "filter" for arrays/record
> > > > batches.
> > > > >
> > > > > I believe this point was also discussed in the past as well.  I'm
> not
> > > > > advocating for adding it now but I'm curious if people feel we
> should
> > > add
> > > > > something to Schema.fbs for forward compatibility,  in case we
> wish to
> > > > > support this use-case in the future.
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > >
> > >
>


Re: [Format] Array/RowBatch filters

2020-01-27 Thread Wes McKinney
hi Micah -- I think having support for this in some way in the IPC
protocol makes sense (it seems slightly less important for the C API
but worth thinking about). It's helpful to know that Dremio (a big
Arrow user) already employs various filters / selection vectors.

The question is how mechanically, would it be some extra buffers at
the start or end of the record batch body (probably have to be at the
end of the body for forward compatibility reasons)?

On Sun, Jan 26, 2020 at 1:16 PM Jacques Nadeau  wrote:
>
> At Dremio, we use four main types of selection vector/bitmaps:
>
> Dense Format (record valid or not, no ordering)
> - single bit (bitmap)
>
> Sparse formats (identifies valid records as well as their order)
> - 2 byte (for record batches up to 2^16 records).
> - 4 byte (for 2^16 batches of 2^16 records);
> - 6 byte (for 2^32 batches of 2^16 records);
>
> We've considered introducing a couple more. I imagine for other use cases,
> where people use much larger batches of records, different requirements
> would be necessary. My reason for sharing is it seems like this may be
> use-case specific. I'd also note that at the IPC level, you'd generally
> want to contract batches before dropping them on the wire (or at least that
> is what we typically do).
>
> On Fri, Jan 24, 2020 at 11:23 PM Micah Kornfield 
> wrote:
>
> > I was thinking selection vector/bitmap (possibly with different encodings),
> > but really nothing for now.  Ordinarily, I'd lean towards YAGNI but there
> > isn't a good way to add this in easily in a forward compatible way unless
> > we add a placeholder enum/table for 1.0 (the default option would be no
> > filter and wouldn't change the packaged data at all).
> >
> > On Fri, Jan 24, 2020 at 4:55 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > By filter, you mean a filter expression, or a selection vector/bitmap?
> > >
> > > On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield 
> > > wrote:
> > > >
> > > > One of the things that I think got overlooked in the conversation on
> > > having
> > > > a slice offset in the C API was a suggestion from Jacques of perhaps
> > > > generalizing the concept to an arbitrary "filter" for arrays/record
> > > batches.
> > > >
> > > > I believe this point was also discussed in the past as well.  I'm not
> > > > advocating for adding it now but I'm curious if people feel we should
> > add
> > > > something to Schema.fbs for forward compatibility,  in case we wish to
> > > > support this use-case in the future.
> > > >
> > > > Thanks,
> > > > Micah
> > >
> >


[jira] [Created] (ARROW-7697) [Release] Add a test for updating Linux packages by 00-prepare.sh

2020-01-27 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7697:
---

 Summary: [Release] Add a test for updating Linux packages by 
00-prepare.sh
 Key: ARROW-7697
 URL: https://issues.apache.org/jira/browse/ARROW-7697
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7696) [Release] Unit test on release branch is failed

2020-01-27 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7696:
---

 Summary: [Release] Unit test on release branch is failed
 Key: ARROW-7696
 URL: https://issues.apache.org/jira/browse/ARROW-7696
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://github.com/kszucs/arrow/runs/410980755

{noformat}
8 tests, 6 assertions, 1 failures, 2 errors, 0 pendings, 0 omissions, 0 
notifications
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: PR Dashboard for Java?

2020-01-27 Thread Wes McKinney
Bryan -- I just gave you (cutlerb) Confluence edit privileges. These
have to be explicitly managed on a per-user basis to avoid spam
problems

On Mon, Jan 27, 2020 at 4:12 PM Bryan Cutler  wrote:
>
> Thanks Neal, but it doesn't look like I have confluence privileges. That's
> fine though, the github interface is easy enough.
>
> On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
> neal.p.richard...@gmail.com> wrote:
>
> > If you have confluence privileges, duplicate a page like
> > https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard and
> > then edit the Jira query (something like status in open/in
> > progress/reopened, labels = pull-request-available, component = java,
> > project = ARROW) if you want to make it Java issues that have pull requests
> > open.
> >
> > Or you could bookmark
> >
> > https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> > or https://github.com/apache/arrow/labels/lang-java
> >
> > Neal
> >
> > On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler  wrote:
> >
> > > I saw on Confluence that other Arrow components have PR dashboards, but I
> > > don't see one for Java? I think it would be helpful, is it difficult to
> > add
> > > one for Java? I'm happy to do it if someone could point me in the right
> > > direction. Thanks!
> > >
> > > Bryan
> > >
> >


[jira] [Created] (ARROW-7695) [Release] Update java versions to 0.16-SNAPSHOT

2020-01-27 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7695:
--

 Summary: [Release] Update java versions to 0.16-SNAPSHOT
 Key: ARROW-7695
 URL: https://issues.apache.org/jira/browse/ARROW-7695
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Krisztian Szucs
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: PR Dashboard for Java?

2020-01-27 Thread Bryan Cutler
Thanks Neal, but it doesn't look like I have confluence privileges. That's
fine though, the github interface is easy enough.

On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
neal.p.richard...@gmail.com> wrote:

> If you have confluence privileges, duplicate a page like
> https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard and
> then edit the Jira query (something like status in open/in
> progress/reopened, labels = pull-request-available, component = java,
> project = ARROW) if you want to make it Java issues that have pull requests
> open.
>
> Or you could bookmark
>
> https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> or https://github.com/apache/arrow/labels/lang-java
>
> Neal
>
> On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler  wrote:
>
> > I saw on Confluence that other Arrow components have PR dashboards, but I
> > don't see one for Java? I think it would be helpful, is it difficult to
> add
> > one for Java? I'm happy to do it if someone could point me in the right
> > direction. Thanks!
> >
> > Bryan
> >
>


[jira] [Created] (ARROW-7694) [Packaging][deb][RPM] Can't build repository packages for RC

2020-01-27 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7694:
---

 Summary: [Packaging][deb][RPM] Can't build repository packages for 
RC
 Key: ARROW-7694
 URL: https://issues.apache.org/jira/browse/ARROW-7694
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


apache-arrow-archive-keyring failure:

https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=5737=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=13284

{noformat}
2020-01-27T16:02:31.2221451Z /host/build.sh: 27: cd: can't cd to 
apache-arrow-archive-keyring-0.16.0/
{noformat}

apache-arrow-release failure:

https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=5774=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=10330

{noformat}
/var/tmp/rpm-tmp.IfEC8a: line 39: cd: apache-arrow-release-0.16.0: No such file 
or directory
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: PR Dashboard for Java?

2020-01-27 Thread Neal Richardson
If you have confluence privileges, duplicate a page like
https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard and
then edit the Jira query (something like status in open/in
progress/reopened, labels = pull-request-available, component = java,
project = ARROW) if you want to make it Java issues that have pull requests
open.

Or you could bookmark
https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
or https://github.com/apache/arrow/labels/lang-java

Neal

On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler  wrote:

> I saw on Confluence that other Arrow components have PR dashboards, but I
> don't see one for Java? I think it would be helpful, is it difficult to add
> one for Java? I'm happy to do it if someone could point me in the right
> direction. Thanks!
>
> Bryan
>


Re: [DISCUSS][JAVA] Correct the behavior of ListVector isEmpty

2020-01-27 Thread Bryan Cutler
Return a null might be more correct since `getObject(int index)` also
return a null value if not set, but I don't think it's worth making a more
complicated API for this. It should be fine to return `false` for a null
value.
+1 for treating nulls as empty.

On Fri, Jan 24, 2020 at 9:12 AM Brian Hulette  wrote:

> What about returning null for a null list? It looks like now the function
> returns a primitive boolean, so I guess that would be a substantial change,
> but null seems more correct to me.
>
> On Thu, Jan 23, 2020, 21:38 Micah Kornfield  wrote:
>
> >  I would vote for treating nulls as empty.
> >
> > On Fri, Jan 10, 2020 at 12:36 AM Ji Liu 
> > wrote:
> >
> > > Hi all,
> > >
> > > Currently isEmpty API is always return false in
> BaseRepeatedValueVector,
> > > and its subclass ListVector did not overwrite this method.
> > > This will lead to incorrect result, for example, a ListVector with data
> > > [1,2], null, [], [5,6] would get [false, false, false, false] which is
> > not
> > > right.
> > > I opened a PR to fix this[1] and not sure what’s the right behavior for
> > > null value, should it return [false, false, true, false] or [false,
> true,
> > > true, false] ?
> > >
> > >
> > > Thanks,
> > > Ji Liu
> > >
> > >
> > > [1] https://github.com/apache/arrow/pull/6044
> > >
> > >
> >
>


PR Dashboard for Java?

2020-01-27 Thread Bryan Cutler
I saw on Confluence that other Arrow components have PR dashboards, but I
don't see one for Java? I think it would be helpful, is it difficult to add
one for Java? I'm happy to do it if someone could point me in the right
direction. Thanks!

Bryan


Re: [Java] PR Reviewers

2020-01-27 Thread Antoine Pitrou


Not directly related, but it would be nice if Java contributors could
fill the holes in the 0.16.0 release blog post.  Currently the Java
section is empty:
https://github.com/apache/arrow-site/pull/41

Regards

Antoine.


Le 27/01/2020 à 19:40, Ryan Murray a écrit :
> Hey all, I would love to help out. Is there any specific ones that are
> relatively easy for me to get started on?
> 


Re: [Java] PR Reviewers

2020-01-27 Thread Ryan Murray
Hey all, I would love to help out. Is there any specific ones that are
relatively easy for me to get started on?

On Mon, 27 Jan 2020, 18:31 Bryan Cutler,  wrote:

> Hi Micah, I don't have a ton of bandwidth at the moment, but I'll try to
> review some more PRs. Anyone, please feel free to ping me too if you have a
> stale PR that needs some help getting through. Outreach to other Java
> communities sounds like a good idea - more Java users would definitely be a
> good thing!
>
> Bryan
>
> On Mon, Jan 27, 2020 at 8:12 AM Andy Grove  wrote:
>
> > I've now started working with the Java implementation of Arrow,
> > specifically Flight, and would be happy to help although I do have
> limited
> > time each week. I can at least review from a Java correctness point of
> > view.
> >
> > Andy.
> >
> > On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield 
> > wrote:
> >
> > > I mentioned this elsewhere but my intent is to stop doing java reviews
> > for
> > > the immediate future once I wrap up the few that I have requested
> change
> > > on.
> > >
> > > I'm happy to try to triage incoming Java PRs, but in order to do this,
> I
> > > need to know which committers have some bandwidth to do reviews (some
> of
> > > the existing PRs I've tagged people who never responded).
> > >
> > > Thanks,
> > > Micah
> > >
> >
>


[jira] [Created] (ARROW-7693) [CI] Fix test-conda-python-3.7-spark-master nightly errors

2020-01-27 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-7693:
---

 Summary: [CI] Fix test-conda-python-3.7-spark-master nightly errors
 Key: ARROW-7693
 URL: https://issues.apache.org/jira/browse/ARROW-7693
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Spark master renamed some tests, need to update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Java] PR Reviewers

2020-01-27 Thread Bryan Cutler
Hi Micah, I don't have a ton of bandwidth at the moment, but I'll try to
review some more PRs. Anyone, please feel free to ping me too if you have a
stale PR that needs some help getting through. Outreach to other Java
communities sounds like a good idea - more Java users would definitely be a
good thing!

Bryan

On Mon, Jan 27, 2020 at 8:12 AM Andy Grove  wrote:

> I've now started working with the Java implementation of Arrow,
> specifically Flight, and would be happy to help although I do have limited
> time each week. I can at least review from a Java correctness point of
> view.
>
> Andy.
>
> On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield 
> wrote:
>
> > I mentioned this elsewhere but my intent is to stop doing java reviews
> for
> > the immediate future once I wrap up the few that I have requested change
> > on.
> >
> > I'm happy to try to triage incoming Java PRs, but in order to do this, I
> > need to know which committers have some bandwidth to do reviews (some of
> > the existing PRs I've tagged people who never responded).
> >
> > Thanks,
> > Micah
> >
>


[jira] [Created] (ARROW-7692) [Rust] Several pattern matches are hard to read

2020-01-27 Thread Jira
François Garillot created ARROW-7692:


 Summary: [Rust] Several pattern matches are hard to read
 Key: ARROW-7692
 URL: https://issues.apache.org/jira/browse/ARROW-7692
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: François Garillot


Several pattern matches can be rewritten directly using a combinator, e.g. 
array's `value_as_date`, more succintly expressed as a `map`:

{{ match self.value_as_datetime(i) {}}
{{ Some(datetime) => Some(datetime.date()),}}
{{ None => None,}}
{{ }}}



More importantly some of these matches obscure what the code is doing, e.g. 
parquet column writer `read_fully`'s extraction of a mutable slice:
 {{let actual_def_levels = match  def_levels {}}
{{ Some(ref mut vec) => Some( vec[..]),}}
{{ None => None,}}
{{ };}}
which can be written, using `as_mut` and `map`, as:

{{let actual_def_levels = def_levels.as_mut().map(|vec|  vec[..]);}}

A large # of these are meant to be addressed in 
[https://github.com/apache/arrow/pull/6292/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7691) [C++] Verify missing fields when walking Flatbuffers data

2020-01-27 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7691:
-

 Summary: [C++] Verify missing fields when walking Flatbuffers data
 Key: ARROW-7691
 URL: https://issues.apache.org/jira/browse/ARROW-7691
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.15.1
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This will fix some of the issues detected by OSS-Fuzz.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7690) Cannot write parquet to OutputStream

2020-01-27 Thread Bob (Jira)
Bob created ARROW-7690:
--

 Summary: Cannot write parquet to OutputStream
 Key: ARROW-7690
 URL: https://issues.apache.org/jira/browse/ARROW-7690
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 0.15.1
Reporter: Bob


The R package does not allow for the ability to write to a FileOutputStream. 

Minimal testing code:
library(arrow)
tf1 <- arrow::FileOutputStream$create(path = "output.parquet")
arrow::write_parquet(data.frame(x = 1:5), tf1)

Throws error:

Error in inherits(sink, OutputStream) : 'what' must be a character vector

 

The issue appears to be in line 153 of parquet.R

if (is.character(sink)) {
 sink <- FileOutputStream$create(sink)
 on.exit(sink$close())
 } *else if (!inherits(sink, OutputStream))* {
 abort("sink must be a file path or an OutputStream")
 }

 

Should be !inherits(sink,'OutputStream')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Improve the ergonomics of new PyArrow FileSystem API in Python ARROW-7584

2020-01-27 Thread Wes McKinney
hi Fabian

I responded on the JIRA. I'm generally supportive of ergonomic
improvements to the FS API in Python. It might make sense to break the
work into multiple patches to ease review burden

Thanks for offering to work on this.

- Wes

On Fri, Jan 24, 2020 at 4:46 AM Fabian Höring  wrote:
>
> Hello,
>
> I created this ticket to discuss possible improvements of the new PyArrow 
> FileSystem API
> https://issues.apache.org/jira/browse/ARROW-7584
>
> As of today there seem to be only two popular projects to have an agnostic 
> FileSystem API that can handle S3 & HDFS from Python:
> - PyArrow via https://arrow.apache.org/docs/python/filesystems.html
> - TensorFlow via https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile
>
> On my side I would like to reuse a clean FileSystem API in my project and 
> turned to the arrow for this purpose (I think TensorFlow already handles too 
> many use cases should not provide yet another feature).
>
> "Clean FileSystem API" for me also means to cover the interactive use case 
> where one uses that API like the file system shell commands. We actually used 
> https://github.com/dask/hdfs3 before and it worked really.
>
> Currently there is the FileSystem API work in progress (see 
> https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185) and 
> I would take the occasion to improve it and fix some issues with the existing 
> API.
>
> Can you have a look at the comments on 
> https://issues.apache.org/jira/browse/ARROW-7584 and give feedback ?
>
> I can do the implementations I suggest on my side but would like to make sure 
> they will be accepted.
>
> Best regards,
> Fabian Höring
>


Re: [Java] PR Reviewers

2020-01-27 Thread Andy Grove
I've now started working with the Java implementation of Arrow,
specifically Flight, and would be happy to help although I do have limited
time each week. I can at least review from a Java correctness point of view.

Andy.

On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield 
wrote:

> I mentioned this elsewhere but my intent is to stop doing java reviews for
> the immediate future once I wrap up the few that I have requested change
> on.
>
> I'm happy to try to triage incoming Java PRs, but in order to do this, I
> need to know which committers have some bandwidth to do reviews (some of
> the existing PRs I've tagged people who never responded).
>
> Thanks,
> Micah
>


Re: [Java] PR Reviewers

2020-01-27 Thread Wes McKinney
Somewhat related, but are there any thoughts about growing the Java
developer community generally? Perhaps we could do some outreach to
other Java-focused Apache communities (Iceberg comes to mind, but
there may be others)?

On Sat, Jan 25, 2020 at 10:14 PM Brian Hulette  wrote:
>
> I'm still pretty new to the Java implementation, but I can probably help
> out with some reviews.
>
> On Thu, Jan 23, 2020 at 8:41 PM Micah Kornfield 
> wrote:
>
> > I mentioned this elsewhere but my intent is to stop doing java reviews for
> > the immediate future once I wrap up the few that I have requested change
> > on.
> >
> > I'm happy to try to triage incoming Java PRs, but in order to do this, I
> > need to know which committers have some bandwidth to do reviews (some of
> > the existing PRs I've tagged people who never responded).
> >
> > Thanks,
> > Micah
> >


[NIGHTLY] Arrow Build Report for Job nightly-2020-01-27-0

2020-01-27 Thread Crossbow


Arrow Build Report for Job nightly-2020-01-27-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0

Failed Tasks:
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-travis-gandiva-jar-osx
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-spark-master
- test-ubuntu-fuzzit-fuzzing:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-ubuntu-fuzzit-fuzzing
- test-ubuntu-fuzzit-regression:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-ubuntu-fuzzit-regression
- wheel-manylinux2014-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-wheel-manylinux2014-cp37m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-27-0-circle-test-conda-python-3.7-turbodbc-master
- 

[jira] [Created] (ARROW-7689) [C++] Sporadic Flight test crash on macOS

2020-01-27 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7689:
-

 Summary: [C++] Sporadic Flight test crash on macOS
 Key: ARROW-7689
 URL: https://issues.apache.org/jira/browse/ARROW-7689
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: Antoine Pitrou


See this build:
https://github.com/apache/arrow/pull/6288/checks?check_run_id=409993893

{code}
[--] 2 tests from TestTls
[ RUN  ] TestTls.DoAction
E0127 01:40:23.87112 123145508859904 tls_pthread.cc:26]
assertion failed: 0 == pthread_setspecific(tls->key, (void*)value)
/Users/runner/runners/2.164.0/work/arrow/arrow/cpp/build-support/run-test.sh: 
line 97: 32496 Abort trap: 6   $TEST_EXECUTABLE "$@" 2>&1
 32497 Done| $ROOT/build-support/asan_symbolize.py
 32498 Done| ${CXXFILT:-c++filt}
 32499 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
 32500 Done| $pipe_cmd 2>&1
 32501 Done| tee $LOGFILE
~/runners/2.164.0/work/arrow/arrow/build/cpp/src/arrow/flight
{code}

This is a gRPC issue, reported here:
https://github.com/grpc/grpc/issues/20311

We should try to bump bundled gRPC version to see if that fixes the issue.

Side note: why aren't we using the homebrew-provided gRPC?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7688) Bump checkstyle from 8.18 to 8.29

2020-01-27 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7688:
---

 Summary: Bump checkstyle from 8.18 to 8.29
 Key: ARROW-7688
 URL: https://issues.apache.org/jira/browse/ARROW-7688
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)