Re: Arrow Flight connector for SQL Server

2020-05-19 Thread Jacques Nadeau
Hey Brendan,

Welcome to the community. At Dremio we've exposed flight as an input and
output for sql result datasets. I'll have one of our guys share some
details. I think a couple questions we've been struggling with include how
to standardize additional metadata operations, what should the prepare
behavior be and and is there is a way to stadarize exposure of a flight
path as an extension of both jdbc and odbc.

Can you share more about whether you're initially more focused on input or
output and parallel or single stream?

Thanks and welcome
Jacques

On Tue, May 19, 2020, 3:17 PM Brendan Niebruegge
 wrote:

> Hi everyone,
>
> I wanted to informally introduce myself. My name is Brendan Niebruegge,
> I'm a Software Engineer in our SQL Server extensibility team here at
> Microsoft. I am leading an effort to explore how we could integrate Arrow
> Flight with SQL Server. We think this could be a very interesting
> integration that would both benefit SQL Server and the Arrow community. We
> are very early in our thoughts so I thought it best to reach out here and
> see if you had any thoughts or suggestions for me. What would be the best
> way to socialize my thoughts to date? I am keen to learn and deepen my
> knowledge of Arrow as well so please let me know how I can be of help to
> the community.
>
> Please feel free to reach out anytime (email:brn...@microsoft.com)
>
> Thanks,
> Brendan Niebruegge
>
>


[jira] [Created] (ARROW-8869) [Rust] [DataFusion] Type Coercion optimizer rule does not support new scan nodes

2020-05-19 Thread Andy Grove (Jira)
Andy Grove created ARROW-8869:
-

 Summary: [Rust] [DataFusion] Type Coercion optimizer rule does not 
support new scan nodes
 Key: ARROW-8869
 URL: https://issues.apache.org/jira/browse/ARROW-8869
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 1.0.0
Reporter: Andy Grove
 Fix For: 1.0.0


Type Coercion optimizer rule does not support new scan nodes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Rust] Vectorized traits for using arrays outside Arrow

2020-05-19 Thread Vertexclique
Hi;

I wanted to discuss with Rust lib maintainers about how can we improve the 
current status of Rust's DictionaryArray and reading its' encoding array 
outside the Arrow. So a simple predicate filter needs to collect index over 
iterator and flat map over the optional values or map over the None values and 
replace with sentinel values. Thou iterator is written nearly frictionless and 
overheadless by the implementor. (Congrats it looks nice!) Still there is an 
overhead of iterator and yielding of elements inside iterator implementation.

So I propose a simple trait called "Vectorized" which will allow us to dispense 
arrays with defined type with requested sentinels. This approach will work 
zerocopy and will use underlying Buffer type.

I am eagerly waiting for your input and I would like to clarify more if needed.

Best,
Mahmut

[jira] [Created] (ARROW-8868) [Python] Feather format cannot store/retrieve lists correctly?

2020-05-19 Thread Farzad Abdolhosseini (Jira)
Farzad Abdolhosseini created ARROW-8868:
---

 Summary: [Python] Feather format cannot store/retrieve lists 
correctly?
 Key: ARROW-8868
 URL: https://issues.apache.org/jira/browse/ARROW-8868
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
 Environment: Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
Reporter: Farzad Abdolhosseini


I'm seeing a very weird behavior when I try to store and retrieve a Pandas 
data-frame using the Feather format. Simplified example:
{code:python}
>>> import pandas as pd
>>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
>>> df
 scalar array
0 1   [1]
1 2   [7]
>>> df.to_feather("test.ft")
>>> pd.read_feather("test.ft")
  scalar  array
0  1   [16]
1  2  [1045468844972122628]
{code}
As you can see, the retrieved data is incorrect. I was originally trying to use 
the `feather-format` (not using Pandas directly) and that didn't work well 
either.

By playing around with the data-frame that is to be stored I can also get 
different but still incorrect behavior, e.g. a larger list, an error that says 
the file size is incorrect, or simply a segmentation fault.

 

This is my first time using Feather/Arrow BTW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.17.1 - RC1

2020-05-19 Thread Neal Richardson
R submission to CRAN is done and accepted. I'm waiting to do Homebrew until
after the website update, given their pushback last time.

Neal

On Tue, May 19, 2020 at 5:25 AM Uwe L. Korn  wrote:

> Current status:
>
> 1.  [done] rebase (not required for a patch release)
> 2.  [done] upload source
> 3.  [done] upload binaries
> 4.  [done|in-pr] update website
> 5.  [done] upload ruby gems
> 6.  [ ] upload js packages
> 8.  [done] upload C# packages
> 9.  [ ] upload rust crates
> 10. [done] update conda recipes (dropped ppc64le support though)
> 11. [done] upload wheels to pypi
> 12. [nealrichardson] update homebrew packages
> 13. [done] update maven artifacts
> 14. [done|in-pr] update msys2
> 15. [nealrichardson] update R packages
> 16. [done|in-pr] update docs
>
> On Tue, May 19, 2020, at 12:06 AM, Krisztián Szűcs wrote:
> > Current status:
> >
> > 1.  [done] rebase (not required for a patch release)
> > 2.  [done] upload source
> > 3.  [done] upload binaries
> > 4.  [done|in-pr] update website
> > 5.  [done] upload ruby gems
> > 6.  [ ] upload js packages
> > 8.  [done] upload C# packages
> > 9.  [ ] upload rust crates
> > 10. [in-progress|in-pr] update conda recipes
> > 11. [done] upload wheels to pypi
> > 12. [nealrichardson] update homebrew packages
> > 13. [done] update maven artifacts
> > 14. [done|in-pr] update msys2
> > 15. [nealrichardson] update R packages
> > 16. [done|in-pr] update docs
> >
> > On Mon, May 18, 2020 at 11:33 PM Sutou Kouhei 
> wrote:
> > >
> > > >> 14. [ ] update msys2
> > > >
> > > > I'll do this.
> > >
> > > Oh, sorry. Krisztián already did!
> > >
> > > In <20200519.062731.1037230979568376433@clear-code.com>
> > >   "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Tue, 19 May 2020
> 06:27:31 +0900 (JST),
> > >   Sutou Kouhei  wrote:
> > >
> > > >> 14. [ ] update msys2
> > > >
> > > > I'll do this.
> > > >
> > > > In <
> cahm19a4wsm3hksf0ubixonu4ru+951viuuavdnzky_tynx-...@mail.gmail.com>
> > > >   "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Mon, 18 May 2020
> 22:37:50 +0200,
> > > >   Krisztián Szűcs  wrote:
> > > >
> > > >> 1.  [done] rebase (not required for a patch release)
> > > >> 2.  [done] upload source
> > > >> 3.  [done] upload binaries
> > > >> 4.  [done] update website
> > > >> 5.  [done] upload ruby gems
> > > >> 6.  [ ] upload js packages
> > > >> No javascript changes were applied to the patch release, for
> > > >> consistency we might want to choose to upload a 0.17.1 release
> though.
> > > >> 8.  [done] upload C# packages
> > > >> 9.  [ ] upload rust crates
> > > >> @Andy Grove the patch release doesn't affect the rust
> implementation.
> > > >> We can update the crates despite that no changes were made, not sure
> > > >> what policy should we choose here (same as with JS)
> > > >> 10. [ ] update conda recipes
> > > >> @Uwe Korn seems like arrow-cpp-feedstock have not picked up the new
> > > >> release once again
> > > >> 11. [done] upload wheels to pypi
> > > >> 12. [nealrichardson] update homebrew packages
> > > >> 13. [done] update maven artifacts
> > > >> 14. [ ] update msys2
> > > >> 15. [nealrichardson] update R packages
> > > >> 16. [in-progress] update docs
> > > >>
> > > >> On Mon, May 18, 2020 at 10:29 PM Krisztián Szűcs
> > > >>  wrote:
> > > >>>
> > > >>> Current status:
> > > >>>
> > > >>> 1.  [done] rebase (not required for a patch release)
> > > >>> 2.  [done] upload source
> > > >>> 3.  [done] upload binaries
> > > >>> 4.  [done] update website
> > > >>> 5.  [ ] upload ruby gems
> > > >>> 6.  [ ] upload js packages
> > > >>> 8.  [ ] upload C# packages
> > > >>> 9.  [ ] upload rust crates
> > > >>> 10. [ ] update conda recipes
> > > >>> 11. [done] upload wheels to pypi
> > > >>> 12. [nealrichardson] update homebrew packages
> > > >>> 13. [done] update maven artifacts
> > > >>> 14. [ ] update msys2
> > > >>> 15. [nealrichardson] update R packages
> > > >>> 16. [in-progress] update docs
> > > >>>
> > > >>> On Mon, May 18, 2020 at 9:39 PM Neal Richardson
> > > >>>  wrote:
> > > >>> >
> > > >>> > I'm working on the R stuff and can do Homebrew again.
> > > >>> >
> > > >>> > Neal
> > > >>> >
> > > >>> > On Mon, May 18, 2020 at 12:30 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Any help with the post release tasks is welcome!
> > > >>> > >
> > > >>> > > Checklist:
> > > >>> > > 1.  [done] rebase (not required for a patch release)
> > > >>> > > 2.  [done] upload source
> > > >>> > > 3.  [in-progress] upload binaries
> > > >>> > > 4.  [done] update website
> > > >>> > > 5.  [ ] upload ruby gems
> > > >>> > > 6.  [ ] upload js packages
> > > >>> > > 8.  [ ] upload C# packages
> > > >>> > > 9.  [ ] upload rust crates
> > > >>> > > 10. [ ] update conda recipes
> > > >>> > > 11. [kszucs] upload wheels to pypi
> > > >>> > > 12. [ ] update homebrew packages
> > > >>> > > 13. [kszucs] update maven artifacts
> > > >>> > > 14. [ ] update msys2
> > > >>> > > 15. [ ] update R packages
> > > >>> > > 16. 

Arrow Flight connector for SQL Server

2020-05-19 Thread Brendan Niebruegge
Hi everyone,

I wanted to informally introduce myself. My name is Brendan Niebruegge, I'm a 
Software Engineer in our SQL Server extensibility team here at Microsoft. I am 
leading an effort to explore how we could integrate Arrow Flight with SQL 
Server. We think this could be a very interesting integration that would both 
benefit SQL Server and the Arrow community. We are very early in our thoughts 
so I thought it best to reach out here and see if you had any thoughts or 
suggestions for me. What would be the best way to socialize my thoughts to 
date? I am keen to learn and deepen my knowledge of Arrow as well so please let 
me know how I can be of help to the community.

Please feel free to reach out anytime (email:brn...@microsoft.com)

Thanks,
Brendan Niebruegge



[jira] [Created] (ARROW-8866) [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION

2020-05-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8866:
---

 Summary: [C++] Split Type::UNION into Type::SPARSE_UNION and 
Type::DENSE_UNION
 Key: ARROW-8866
 URL: https://issues.apache.org/jira/browse/ARROW-8866
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Similar to the recent {{Type::INTERVAL}} split, having these two array types 
which have different memory layouts under the same {{Type::type}} value makes 
function dispatch somewhat more complicated. This issue is less critical from 
INTERVAL so this may not be urgent but seems like a good pre-10 change



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8865) windows distribution for 0.17.1 seems broken (conda only?

2020-05-19 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-8865:
---

 Summary: windows distribution for 0.17.1 seems broken (conda only?
 Key: ARROW-8865
 URL: https://issues.apache.org/jira/browse/ARROW-8865
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
Reporter: Maarten Breddels


We just started seeing issues with importing pyarrow on our CI:

[https://github.com/vaexio/vaex/pull/749/checks?check_run_id=689857401]

Long logs, the issue appears here:
> import pyarrow._parquet as _parquet 
[2541|https://github.com/vaexio/vaex/pull/749/checks?check_run_id=689857401#step:15:2541]E
 ImportError: DLL load failed: The specified procedure could not be found.
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8864) [R] Add methods to Table/RecordBatch for consistency with data.frame

2020-05-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8864:
--

 Summary: [R] Add methods to Table/RecordBatch for consistency with 
data.frame
 Key: ARROW-8864
 URL: https://issues.apache.org/jira/browse/ARROW-8864
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Some methods identified in the Feather package test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8862) NumericBuilder does not use MemoryPool passed to CTOR

2020-05-19 Thread Simon Watts (Jira)
Simon Watts created ARROW-8862:
--

 Summary: NumericBuilder does not use MemoryPool passed to CTOR
 Key: ARROW-8862
 URL: https://issues.apache.org/jira/browse/ARROW-8862
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.0
Reporter: Simon Watts


{{NumericBuilder}} uses the {{pool}} ({{MemoryPool*}}) parameter to initialise 
the {{ArrayBuilder}} base class, but does not use it to initialise its own 
internal builder, {{data_builder_}} ({{TypedBufferBuilder}}). For comparison 
{{ArrayBuilder}} uses the {{pool}} to initialise its own 
{{null_bitmap_builder_}} member (also a {{TypedBufferBuilder}}).

Found in version 0.15.0, present in current head.

This effect was observed when trying to switch to a custom {{MemoryPool}} for 
performance reasons. A hook was used to detect any use of the {{MemoryPool}} 
proved by {{default_memory_pool()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Sparse Union format

2020-05-19 Thread Micah Kornfield
Hi Ryan,
In addition to the limitations mentioned above another one is only 1 column
of each type that can participate in the union.

There are some old threads on these differences on the mailing list that
should be searchable.

Thanks,
Micah

On Tue, May 19, 2020 at 6:44 AM Antoine Pitrou  wrote:

>
> Also, you may want to run the integration tests and inspect the
> generated JSON file for union data, it will probably be informative
> (look for type ids).
>
> Regards
>
> Antoine.
>
>
> Le 19/05/2020 à 15:38, Ryan Murray a écrit :
> > Thanks for the clarification! Next time I will read the whole document
> ;-)
> >
> > On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou 
> wrote:
> >
> >>
> >> As explained in the comment below:
> >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 19/05/2020 à 14:14, Ryan Murray a écrit :
> >>> Thanks Antoine,
> >>>
> >>> Can you just clarify what you mean by 'type ids are logical'? In my
> mind
> >>> type ids are strongly coupled to the types and their order in
> Schema.fbs
> >>> [1]. Do you mean that the order there is only a convention and we can't
> >>> assume that 0 === Null?
> >>>
> >>> Best,
> >>> Ryan
> >>>
> >>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
> >>>
> >>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou 
> >> wrote:
> >>>
> 
>  Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> > Hey All,
> >
> > While working on https://issues.apache.org/jira/browse/ARROW-1692 I
>  noticed
> > that there is a difference between C++ and Java on the way Sparse
> >> Unions
> > are handled. I haven't seen in the format spec which the correct is
> so
> >> I
> > wanted to check with the wider community.
> >
> > c++ (and the integration tests) see sparse unions as:
> > name
> > count
> > VALIDITY[]
> > TYPE_ID[]
> > children[]
> >
> > and java as:
> > name
> > count
> > TYPE[]
> > children[]
> >
> > The precise names may only be important for json reading/writing in
> the
> > integration tests so I will ignore TYPE/TYPE_ID for now. However, the
> >> big
> > difference is that Java doesn't have a validity buffer and c++ does.
> My
> > understanding is thta technically the validity buffer is redundant (0
>  type
> > == NULL) so I can see why Java would omit it. My question is then:
> >> which
> > language is 'correct'?
> 
>  Union type ids are logical, so 0 could very well be a valid type id.
>  You can't assume that type 0 means a null entry.
> 
>  Regards
> 
>  Antoine.
> 
> >>>
> >>
> >
>


[jira] [Created] (ARROW-8861) Memory not released until Plasma process is killed

2020-05-19 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-8861:
--

 Summary: Memory not released until Plasma process is killed
 Key: ARROW-8861
 URL: https://issues.apache.org/jira/browse/ARROW-8861
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Plasma
Affects Versions: 0.16.0
 Environment: Singularity container (Ubuntu 18.04)
Reporter: Chengxin Ma


Invoking the {{Delete(const ObjectID& object_id)}} method of a plasma client 
seems not really to free up the memory used by the object.

To reproduce:
 1. use {{htop}} (or other similar tools) to monitor memory usage;
 2. start up the Plasma Object Store by {{plasma_store -m 10 -s 
/tmp/plasma}};
 3. use {{put.py}} to put an object into Plasma;
 4. compile and run {{delete.cc}} ({{g++ delete.cc `pkg-config --cflags --libs 
arrow plasma` --std=c++11 -o delete}});
 5. kill the {{plasma_store}} process.

Memory usage drops at Step 5, rather than Step 4.

How to free up the memory while keeping Plasma Object Store running?

{{put.py}}:
{code:java}
from pyarrow import plasma

if __name__ == "__main__":
client = plasma.connect("/tmp/plasma")
object_id = plasma.ObjectID(20 * b"a")
object_size = 5
buffer = memoryview(client.create(object_id, object_size))
for i in range(5):
buffer[i] = i % 128
client.seal(object_id)
client.disconnect()
{code}
{{delete.cc}}:
{code:java}
#include "arrow/util/logging.h"
#include 

using namespace plasma;

int main(int argc, char **argv)
{
PlasmaClient client;
ARROW_CHECK_OK(client.Connect("/tmp/plasma"));
ObjectID object_id = ObjectID::from_binary("");

client.Delete(object_id);

ARROW_CHECK_OK(client.Disconnect());
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8860) [C++] Compressed Feather file with struct array roundtrips incorrectly

2020-05-19 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8860:


 Summary: [C++] Compressed Feather file with struct array 
roundtrips incorrectly
 Key: ARROW-8860
 URL: https://issues.apache.org/jira/browse/ARROW-8860
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Joris Van den Bossche


When writing a table with a Struct typed column, this is read back with garbage 
values when using compression (which is the default):

{code:python}
>>>  table = pa.table({'col': pa.StructArray.from_arrays([[0,1,2], [1,2,3]], 
>>> names=["f1", "f2"])})
>>>  table.column("col")

[
  -- is_valid: all not null
  -- child 0 type: int64
[
  0,
  1,
  2
]
  -- child 1 type: int64
[
  1,
  2,
  3
]
]

# roundtrip through feather
>>> feather.write_feather(table, "test_struct.feather")
>>> table2 = feather.read_table("test_struct.feather")

>>> table2.column("col")

[
  -- is_valid: all not null
  -- child 0 type: int64
[
  24,
  1261641627085906436,
  1369095386551025664
]
  -- child 1 type: int64
[
  24,
  1405756815161762308,
  281479842103296
]
]
{code}

When not using compression, it is read back correctly:

{code:python}
>>> feather.write_feather(table, "test_struct.feather", 
>>> compression="uncompressed") 
>>>   
>>> table2 = feather.read_table("test_struct.feather")  
>>> 
>>>   

>>> table2.column("col")
>>> 
>>>   

[
  -- is_valid: all not null
  -- child 0 type: int64
[
  0,
  1,
  2
]
  -- child 1 type: int64
[
  1,
  2,
  3
]
]
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Sparse Union format

2020-05-19 Thread Antoine Pitrou


Also, you may want to run the integration tests and inspect the
generated JSON file for union data, it will probably be informative
(look for type ids).

Regards

Antoine.


Le 19/05/2020 à 15:38, Ryan Murray a écrit :
> Thanks for the clarification! Next time I will read the whole document ;-)
> 
> On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou  wrote:
> 
>>
>> As explained in the comment below:
>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 19/05/2020 à 14:14, Ryan Murray a écrit :
>>> Thanks Antoine,
>>>
>>> Can you just clarify what you mean by 'type ids are logical'? In my mind
>>> type ids are strongly coupled to the types and their order in Schema.fbs
>>> [1]. Do you mean that the order there is only a convention and we can't
>>> assume that 0 === Null?
>>>
>>> Best,
>>> Ryan
>>>
>>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
>>>
>>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou 
>> wrote:
>>>

 Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> Hey All,
>
> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
 noticed
> that there is a difference between C++ and Java on the way Sparse
>> Unions
> are handled. I haven't seen in the format spec which the correct is so
>> I
> wanted to check with the wider community.
>
> c++ (and the integration tests) see sparse unions as:
> name
> count
> VALIDITY[]
> TYPE_ID[]
> children[]
>
> and java as:
> name
> count
> TYPE[]
> children[]
>
> The precise names may only be important for json reading/writing in the
> integration tests so I will ignore TYPE/TYPE_ID for now. However, the
>> big
> difference is that Java doesn't have a validity buffer and c++ does. My
> understanding is thta technically the validity buffer is redundant (0
 type
> == NULL) so I can see why Java would omit it. My question is then:
>> which
> language is 'correct'?

 Union type ids are logical, so 0 could very well be a valid type id.
 You can't assume that type 0 means a null entry.

 Regards

 Antoine.

>>>
>>
> 


Re: Sparse Union format

2020-05-19 Thread Ryan Murray
Thanks for the clarification! Next time I will read the whole document ;-)

On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou  wrote:

>
> As explained in the comment below:
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91
>
> Regards
>
> Antoine.
>
>
> Le 19/05/2020 à 14:14, Ryan Murray a écrit :
> > Thanks Antoine,
> >
> > Can you just clarify what you mean by 'type ids are logical'? In my mind
> > type ids are strongly coupled to the types and their order in Schema.fbs
> > [1]. Do you mean that the order there is only a convention and we can't
> > assume that 0 === Null?
> >
> > Best,
> > Ryan
> >
> > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
> >
> > On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou 
> wrote:
> >
> >>
> >> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> >>> Hey All,
> >>>
> >>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
> >> noticed
> >>> that there is a difference between C++ and Java on the way Sparse
> Unions
> >>> are handled. I haven't seen in the format spec which the correct is so
> I
> >>> wanted to check with the wider community.
> >>>
> >>> c++ (and the integration tests) see sparse unions as:
> >>> name
> >>> count
> >>> VALIDITY[]
> >>> TYPE_ID[]
> >>> children[]
> >>>
> >>> and java as:
> >>> name
> >>> count
> >>> TYPE[]
> >>> children[]
> >>>
> >>> The precise names may only be important for json reading/writing in the
> >>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the
> big
> >>> difference is that Java doesn't have a validity buffer and c++ does. My
> >>> understanding is thta technically the validity buffer is redundant (0
> >> type
> >>> == NULL) so I can see why Java would omit it. My question is then:
> which
> >>> language is 'correct'?
> >>
> >> Union type ids are logical, so 0 could very well be a valid type id.
> >> You can't assume that type 0 means a null entry.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >
>


[jira] [Created] (ARROW-8859) [Rust] [Integration Testing] Implement --quiet / verbose correctly

2020-05-19 Thread Andy Grove (Jira)
Andy Grove created ARROW-8859:
-

 Summary: [Rust] [Integration Testing] Implement --quiet / verbose 
correctly
 Key: ARROW-8859
 URL: https://issues.apache.org/jira/browse/ARROW-8859
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


The Rust tester has verbose=true hard-coded for now.

{{archery --quiet}}, RustTester should receive a {{quiet: Bool}} via 
[kwargs|https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L335]
 somehwere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Sparse Union format

2020-05-19 Thread Antoine Pitrou


As explained in the comment below:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91

Regards

Antoine.


Le 19/05/2020 à 14:14, Ryan Murray a écrit :
> Thanks Antoine,
> 
> Can you just clarify what you mean by 'type ids are logical'? In my mind
> type ids are strongly coupled to the types and their order in Schema.fbs
> [1]. Do you mean that the order there is only a convention and we can't
> assume that 0 === Null?
> 
> Best,
> Ryan
> 
> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
> 
> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou  wrote:
> 
>>
>> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
>>> Hey All,
>>>
>>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
>> noticed
>>> that there is a difference between C++ and Java on the way Sparse Unions
>>> are handled. I haven't seen in the format spec which the correct is so I
>>> wanted to check with the wider community.
>>>
>>> c++ (and the integration tests) see sparse unions as:
>>> name
>>> count
>>> VALIDITY[]
>>> TYPE_ID[]
>>> children[]
>>>
>>> and java as:
>>> name
>>> count
>>> TYPE[]
>>> children[]
>>>
>>> The precise names may only be important for json reading/writing in the
>>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
>>> difference is that Java doesn't have a validity buffer and c++ does. My
>>> understanding is thta technically the validity buffer is redundant (0
>> type
>>> == NULL) so I can see why Java would omit it. My question is then: which
>>> language is 'correct'?
>>
>> Union type ids are logical, so 0 could very well be a valid type id.
>> You can't assume that type 0 means a null entry.
>>
>> Regards
>>
>> Antoine.
>>
> 


Re: [VOTE] Release Apache Arrow 0.17.1 - RC1

2020-05-19 Thread Uwe L. Korn
Current status:

1.  [done] rebase (not required for a patch release)
2.  [done] upload source
3.  [done] upload binaries
4.  [done|in-pr] update website
5.  [done] upload ruby gems
6.  [ ] upload js packages
8.  [done] upload C# packages
9.  [ ] upload rust crates
10. [done] update conda recipes (dropped ppc64le support though)
11. [done] upload wheels to pypi
12. [nealrichardson] update homebrew packages
13. [done] update maven artifacts
14. [done|in-pr] update msys2
15. [nealrichardson] update R packages
16. [done|in-pr] update docs

On Tue, May 19, 2020, at 12:06 AM, Krisztián Szűcs wrote:
> Current status:
> 
> 1.  [done] rebase (not required for a patch release)
> 2.  [done] upload source
> 3.  [done] upload binaries
> 4.  [done|in-pr] update website
> 5.  [done] upload ruby gems
> 6.  [ ] upload js packages
> 8.  [done] upload C# packages
> 9.  [ ] upload rust crates
> 10. [in-progress|in-pr] update conda recipes
> 11. [done] upload wheels to pypi
> 12. [nealrichardson] update homebrew packages
> 13. [done] update maven artifacts
> 14. [done|in-pr] update msys2
> 15. [nealrichardson] update R packages
> 16. [done|in-pr] update docs
> 
> On Mon, May 18, 2020 at 11:33 PM Sutou Kouhei  wrote:
> >
> > >> 14. [ ] update msys2
> > >
> > > I'll do this.
> >
> > Oh, sorry. Krisztián already did!
> >
> > In <20200519.062731.1037230979568376433@clear-code.com>
> >   "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Tue, 19 May 2020 
> > 06:27:31 +0900 (JST),
> >   Sutou Kouhei  wrote:
> >
> > >> 14. [ ] update msys2
> > >
> > > I'll do this.
> > >
> > > In 
> > >   "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Mon, 18 May 2020 
> > > 22:37:50 +0200,
> > >   Krisztián Szűcs  wrote:
> > >
> > >> 1.  [done] rebase (not required for a patch release)
> > >> 2.  [done] upload source
> > >> 3.  [done] upload binaries
> > >> 4.  [done] update website
> > >> 5.  [done] upload ruby gems
> > >> 6.  [ ] upload js packages
> > >> No javascript changes were applied to the patch release, for
> > >> consistency we might want to choose to upload a 0.17.1 release though.
> > >> 8.  [done] upload C# packages
> > >> 9.  [ ] upload rust crates
> > >> @Andy Grove the patch release doesn't affect the rust implementation.
> > >> We can update the crates despite that no changes were made, not sure
> > >> what policy should we choose here (same as with JS)
> > >> 10. [ ] update conda recipes
> > >> @Uwe Korn seems like arrow-cpp-feedstock have not picked up the new
> > >> release once again
> > >> 11. [done] upload wheels to pypi
> > >> 12. [nealrichardson] update homebrew packages
> > >> 13. [done] update maven artifacts
> > >> 14. [ ] update msys2
> > >> 15. [nealrichardson] update R packages
> > >> 16. [in-progress] update docs
> > >>
> > >> On Mon, May 18, 2020 at 10:29 PM Krisztián Szűcs
> > >>  wrote:
> > >>>
> > >>> Current status:
> > >>>
> > >>> 1.  [done] rebase (not required for a patch release)
> > >>> 2.  [done] upload source
> > >>> 3.  [done] upload binaries
> > >>> 4.  [done] update website
> > >>> 5.  [ ] upload ruby gems
> > >>> 6.  [ ] upload js packages
> > >>> 8.  [ ] upload C# packages
> > >>> 9.  [ ] upload rust crates
> > >>> 10. [ ] update conda recipes
> > >>> 11. [done] upload wheels to pypi
> > >>> 12. [nealrichardson] update homebrew packages
> > >>> 13. [done] update maven artifacts
> > >>> 14. [ ] update msys2
> > >>> 15. [nealrichardson] update R packages
> > >>> 16. [in-progress] update docs
> > >>>
> > >>> On Mon, May 18, 2020 at 9:39 PM Neal Richardson
> > >>>  wrote:
> > >>> >
> > >>> > I'm working on the R stuff and can do Homebrew again.
> > >>> >
> > >>> > Neal
> > >>> >
> > >>> > On Mon, May 18, 2020 at 12:30 PM Krisztián Szűcs 
> > >>> > 
> > >>> > wrote:
> > >>> >
> > >>> > > Any help with the post release tasks is welcome!
> > >>> > >
> > >>> > > Checklist:
> > >>> > > 1.  [done] rebase (not required for a patch release)
> > >>> > > 2.  [done] upload source
> > >>> > > 3.  [in-progress] upload binaries
> > >>> > > 4.  [done] update website
> > >>> > > 5.  [ ] upload ruby gems
> > >>> > > 6.  [ ] upload js packages
> > >>> > > 8.  [ ] upload C# packages
> > >>> > > 9.  [ ] upload rust crates
> > >>> > > 10. [ ] update conda recipes
> > >>> > > 11. [kszucs] upload wheels to pypi
> > >>> > > 12. [ ] update homebrew packages
> > >>> > > 13. [kszucs] update maven artifacts
> > >>> > > 14. [ ] update msys2
> > >>> > > 15. [ ] update R packages
> > >>> > > 16. [in-progress] update docs
> > >>> > >
> > >>> > > @Neal Richardson I think you need to handle the R packages.
> > >>> > >
> > >>> > > On Mon, May 18, 2020 at 8:08 PM Krisztián Szűcs
> > >>> > >  wrote:
> > >>> > > >
> > >>> > > > The VOTE carries with 6 binding +1 votes and 1 non-binding +1 
> > >>> > > > vote.
> > >>> > > >
> > >>> > > > I'm starting the post release tasks and keep posted about the 
> > >>> > > > remaining
> > >>> > > tasks.
> > >>> > > >
> > >>> > > > Thanks everyone!
> > >>> > > >
> > >>> > > >
> > >>> > > > On 

Re: Sparse Union format

2020-05-19 Thread Ryan Murray
Thanks Antoine,

Can you just clarify what you mean by 'type ids are logical'? In my mind
type ids are strongly coupled to the types and their order in Schema.fbs
[1]. Do you mean that the order there is only a convention and we can't
assume that 0 === Null?

Best,
Ryan

[1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235

On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou  wrote:

>
> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> > Hey All,
> >
> > While working on https://issues.apache.org/jira/browse/ARROW-1692 I
> noticed
> > that there is a difference between C++ and Java on the way Sparse Unions
> > are handled. I haven't seen in the format spec which the correct is so I
> > wanted to check with the wider community.
> >
> > c++ (and the integration tests) see sparse unions as:
> > name
> > count
> > VALIDITY[]
> > TYPE_ID[]
> > children[]
> >
> > and java as:
> > name
> > count
> > TYPE[]
> > children[]
> >
> > The precise names may only be important for json reading/writing in the
> > integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
> > difference is that Java doesn't have a validity buffer and c++ does. My
> > understanding is thta technically the validity buffer is redundant (0
> type
> > == NULL) so I can see why Java would omit it. My question is then: which
> > language is 'correct'?
>
> Union type ids are logical, so 0 could very well be a valid type id.
> You can't assume that type 0 means a null entry.
>
> Regards
>
> Antoine.
>


Re: Sparse Union format

2020-05-19 Thread Antoine Pitrou


Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> Hey All,
> 
> While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed
> that there is a difference between C++ and Java on the way Sparse Unions
> are handled. I haven't seen in the format spec which the correct is so I
> wanted to check with the wider community.
> 
> c++ (and the integration tests) see sparse unions as:
> name
> count
> VALIDITY[]
> TYPE_ID[]
> children[]
> 
> and java as:
> name
> count
> TYPE[]
> children[]
> 
> The precise names may only be important for json reading/writing in the
> integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
> difference is that Java doesn't have a validity buffer and c++ does. My
> understanding is thta technically the validity buffer is redundant (0 type
> == NULL) so I can see why Java would omit it. My question is then: which
> language is 'correct'?

Union type ids are logical, so 0 could very well be a valid type id.
You can't assume that type 0 means a null entry.

Regards

Antoine.


[jira] [Created] (ARROW-8858) [FlightRPC] Ensure headers are uniformly exposed

2020-05-19 Thread David Li (Jira)
David Li created ARROW-8858:
---

 Summary: [FlightRPC] Ensure headers are uniformly exposed
 Key: ARROW-8858
 URL: https://issues.apache.org/jira/browse/ARROW-8858
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java, Python
Affects Versions: 0.17.0
Reporter: David Li
Assignee: David Li


* Java: MetadataAdapter should support iterating through binary headers
* Python: binary headers need to be present in the output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Sparse Union format

2020-05-19 Thread Ryan Murray
Hey All,

While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed
that there is a difference between C++ and Java on the way Sparse Unions
are handled. I haven't seen in the format spec which the correct is so I
wanted to check with the wider community.

c++ (and the integration tests) see sparse unions as:
name
count
VALIDITY[]
TYPE_ID[]
children[]

and java as:
name
count
TYPE[]
children[]

The precise names may only be important for json reading/writing in the
integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
difference is that Java doesn't have a validity buffer and c++ does. My
understanding is thta technically the validity buffer is redundant (0 type
== NULL) so I can see why Java would omit it. My question is then: which
language is 'correct'?

I suppose the actual language implementation is not entirely relevant here,
instead correct refers to what the canonical IPC schema for a sparse union
should be.

Best,
Ryan


[NIGHTLY] Arrow Build Report for Job nightly-2020-05-19-0

2020-05-19 Thread Crossbow


Arrow Build Report for Job nightly-2020-05-19-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0

Failed Tasks:
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py38
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-homebrew-r-autobrew
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.8-dask-master
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-wheel-manylinux1-cp35m
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-wheel-manylinux2010-cp35m

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-8-amd64
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-debian-stretch-arm64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-gandiva-jar-xenial
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-nuget
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-cpp
- test-conda-python-3.6-pandas-0.23:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.6-pandas-0.23
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: