RE: ADBC -> Arrow references for C#

2023-05-23 Thread Eric Erhardt
How do the other languages in apache/arrow-adbc consume bits from apache/arrow 
before they are officially released in a major version? For example, I see Java 
uses the 11.0.0 version 
(https://github.com/apache/arrow-adbc/blob/main/java/pom.xml#L31). What would 
happen if it is dependent on a bug fix/feature in apache/arrow? Would the Java 
ADBC code needs to wait for an official release before it can use it?

-Original Message-
From: David Coe 
Sent: Tuesday, May 23, 2023 10:02 AM
To: dev@arrow.apache.org
Subject: [EXTERNAL] ADBC -> Arrow references for C#

[You don't often get email from david@microsoft.com.invalid. Learn why this 
is important at https://aka.ms/LearnAboutSenderIdentification ]

We recently put up feat(csharp): adding C# functionality by davidhcoe * Pull 
Request #697 * apache/arrow-adbc 
(github.com). This PR introduces 
C# functionality for ADBC and is dependent on capabilities introduced in 
GH-33856: [C#] Implement C Data Interface for C# by CurtHagenlocher * Pull 
Request #35496 * apache/arrow 
(github.com), which has now been 
merged to main.

We currently use the submodule approach for ADBC to reference Arrow. This 
submodule was pointed at

[submodule "arrow"]
  path = arrow
  url = https://github.com/CurtHagenlocher/arrow
  branch = CSharp_CAPI

until the PR was merged. Since this has now landed, we can clean up the 
references a bit, but wanted to get some thoughts on the best way to do so. 
Here are some proposed options, in order of preference and ease of use:

1. An Apache.Arrow 13.0.0-alpha release is published to nuget 2. Release an 
interim nuget package called something like Temp.Apache.Arrow.CData so it can 
be referenced from within the ADBC project 3. Update the submodule to point to 
the main repo. Add instructions for how to pull this submodule.

Any thoughts on the best way to do these and potential timeframes?


  *   David


Empty RecordBatch in Java Flight client

2020-12-16 Thread Eric Erhardt
An incompatibility between the .NET and Java flight implementations was raised 
with https://issues.apache.org/jira/browse/ARROW-10939. From the issue:

From investigation the java client requires the protobuf tags to be 
sent in the message even though it is empty. Java code can be seen here:

https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java
Line 257-301 (the error is that it wont accept a null body for a record 
batch)

Normal functionality of gRPC is to exclude the entire tag if an object 
is empty, example code from generated csharp:

if (DataBody.Length != 0)
{ output.WriteRawTag(194, 62); output.WriteBytes(DataBody); }

The .NET code is generated by the gRPC generator 
(https://github.com/grpc/grpc/tree/master/src/csharp/Grpc.Tools), so it applies 
to all .NET gRPC code, not just Arrow Flight.

Does anyone have any thoughts/opinions on whether this should be fixed in the 
Java code or the .NET code? Which way aligns with the spec?

Thanks
Eric Erhardt


RE: [EXTERNAL] Re: Value of Date64 type over Date32

2020-08-11 Thread Eric Erhardt
Thanks for the info, Wes.

Looking through the Java implementation, I don't see any validation that "where 
the values are evenly divisible by 8640" is enforced in DateMilliVector. We 
are having a conversation on the C# implementation whether we should allow 
values that are not evenly divisible by 8640. 

https://github.com/apache/arrow/pull/7654#discussion_r463886892

I'm wondering if C# should allow any values in Date64, or if it should 
force/coerce the values to be divisible by 8640.

It doesn't look to me that C++ or Java have these enforcements. How do other 
languages handle this?

Eric

-Original Message-
From: Wes McKinney  
Sent: Tuesday, August 11, 2020 12:18 PM
To: dev 
Subject: [EXTERNAL] Re: Value of Date64 type over Date32

On Mon, Aug 10, 2020 at 6:19 PM Eric Erhardt 
 wrote:
>
> I don't understand what the value of the Date64 type is over using Date32:
>
> From 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fformat%2FSchema.fbs%23L193-L
> 206data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cc8a2cc1d706349ab
> 0d5408d83e1a9fb4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63732763
> 1350456279sdata=AzQj1SEjvsIcoMSbGTFi1rubuJyoL955zcpEvRLSKWg%3D
> p;reserved=0
>
> enum DateUnit: short {
>   DAY,
>   MILLISECOND
> }
>
> /// Date is either a 32-bit or 64-bit type representing elapsed time 
> since UNIX /// epoch (1970-01-01), stored in either of two units:
> ///
> /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no
> ///   leap seconds), where the values are evenly divisible by 8640
> /// * Days (32 bits) since the UNIX epoch table Date {
>   unit: DateUnit = MILLISECOND;
> }
>
> If the spec specifies that Date64 must be evenly divisible by 8640, I 
> don't see the point in using millisecond units. I can't represent any 
> different information in my data. So why would I take up double the space to 
> represent the same information?
>
> Can someone explain when Date64 is useful?

As I recall the motivation of the date64 type is to allow for zero-copy of 
dates-as-milliseconds, which are used in some other libraries / platforms. For 
example Joda in uses a millisecond-based "instant". I'm not sure which others 
do off hand.

That said, it would be perfectly reasonable for a data processing system to use 
date32 throughout and convert any date64 data to date32 if desired.

> Eric


Value of Date64 type over Date32

2020-08-10 Thread Eric Erhardt
I don't understand what the value of the Date64 type is over using Date32:

>From https://github.com/apache/arrow/blob/master/format/Schema.fbs#L193-L206

enum DateUnit: short {
  DAY,
  MILLISECOND
}

/// Date is either a 32-bit or 64-bit type representing elapsed time since UNIX
/// epoch (1970-01-01), stored in either of two units:
///
/// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no
///   leap seconds), where the values are evenly divisible by 8640
/// * Days (32 bits) since the UNIX epoch
table Date {
  unit: DateUnit = MILLISECOND;
}

If the spec specifies that Date64 must be evenly divisible by 8640, I don't 
see the point in using millisecond units. I can't represent any different 
information in my data. So why would I take up double the space to represent 
the same information?

Can someone explain when Date64 is useful?

Eric


RE: [EXTERNAL] Re: .NET support for Arrow

2020-07-10 Thread Eric Erhardt
I agree with Adam, the more usage and feedback we can get the better on the 
.NET Library.

> However there is no library for C# listed anywhere else in the 
> documentation.

We have some XML style doc comments in the code. It would be great if we could 
generate a website/markdown from those XML files produced by the build. And 
then get it shown under the Documentation tab on https://arrow.apache.org/.  
I've opened https://issues.apache.org/jira/browse/ARROW-9406 for this.

Eric

-Original Message-
From: Adam Szmigin  
Sent: Friday, July 10, 2020 6:28 AM
To: dev@arrow.apache.org
Subject: [EXTERNAL] Re: .NET support for Arrow

Hi Yash,

My organisation is using the C# library for a product we are working on.  
However, we are using a fork which includes a number of bug-fixes for issues 
that would have otherwise blocked us. I've raised a few PRs to fix these 
upstream.

I think it's fair to say that the C# library is at an early stage of 
development at the moment.  The more people who are able to test and contribute 
back, the better.

Kind regards,


--
Adam Szmigin

On 10/07/2020 04:05, Yash Ganthe wrote:
> Hi,
>
> The first paragraph of docs at 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Farrow.apache.org%2Fdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C150d7a7f5f1a4274567008d824c46983%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637299773289674614sdata=IbmMQwZMqlo0Ya7ocgfNrZAsHruErwB%2Bg1DuD7qqzm0%3Dreserved=0
>  says it supports C#.
> However there is no library for C# listed anywhere else in the 
> documentation. Is .NET supported at all?
>
> Regards,
> Yash
>


[jira] [Created] (ARROW-9048) [C#] Support Float16

2020-06-06 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-9048:
---

 Summary: [C#] Support Float16
 Key: ARROW-9048
 URL: https://issues.apache.org/jira/browse/ARROW-9048
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


With [https://github.com/dotnet/runtime/issues/936], .NET is getting a 
`System.Half` type, which is a 16-bit floating point number. Once that type 
lands in .NET we can implement support for the Float16 type in Arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8953) [C#] Update to .NET SDK 3.1

2020-05-26 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-8953:
---

 Summary: [C#] Update to .NET SDK 3.1
 Key: ARROW-8953
 URL: https://issues.apache.org/jira/browse/ARROW-8953
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


We should update our tools to the latest .NET SDK - 3.1. This will enable new 
tooling features, such as the code style rules package that will enforce coding 
style:

[https://github.com/apache/arrow/pull/7246#issuecomment-634206767]

 

There are 3 places that I know of that need updating:

 

[https://github.com/apache/arrow/blob/master/.github/workflows/csharp.yml]

[https://github.com/apache/arrow/blob/f16f76ab7693ae085e82f4269a0a0bc23770bef9/.github/workflows/dev.yml#L132]

[https://github.com/apache/arrow/blob/f16f76ab7693ae085e82f4269a0a0bc23770bef9/dev/release/verify-release-candidate.sh#L327]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8882) [C#] Add .editorconfig to C# code

2020-05-21 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-8882:
---

 Summary: [C#] Add .editorconfig to C# code
 Key: ARROW-8882
 URL: https://issues.apache.org/jira/browse/ARROW-8882
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


This allows for a consistent code format throughout the C# code in the repo. 
That way when a new contributor submits a change, the editors will 
automatically format the code to be in the same format as the current code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: C# - Appetite for breaking changes to public API?

2020-04-27 Thread Eric Erhardt
Eventually I think we should get to a place where we can consider the C# API 
"stable". At that time, I don't think breaking API changes would be acceptable. 
But I don't think we are there yet.

Of course new public APIs are never considered breaking changes, so any 
functionality that can be implemented with new APIs can be freely made.

Truly breaking changes (ex. removing APIs, renaming APIs, changing 
parameters/return types, etc) can still be done, but some caution should be 
used. Breaking changes are hard to consume in .NET. Here is my thought process 
around breaking changes:

1. Is it truly required that a break MUST be done? Or can the desired 
functionality be achieved with a new API?
2. Can the existing API still exist, but be marked Obsolete?
- This at least gives consumers a period of time where their code still 
works, but produces a warning. And they can choose to switch to the new API.
3. If it isn't feasible to make the change without breaking an API, we should 
look at the impact of it.
- For example, is it frequently used? Does the new change give enough value 
to justify the break?

Note, these are just my opinions. I'd like to hear others' thoughts as well.

Eric Erhardt

-Original Message-
From: Adam Szmigin  
Sent: Monday, April 27, 2020 7:10 AM
To: dev@arrow.apache.org
Subject: [EXTERNAL] C# - Appetite for breaking changes to public API?

Dear team,

I am keen to work on a number of the tickets relating to the C# implementation 
for Apache Arrow.

Quite a few of the open tickets relate to making breaking changes to the public 
API (e.g. ARROW-7757, ARROW-8581, likely ARROW-6603 as well). What is the 
general appetite for making breaking changes to the C# code in its present 
state?

The README.md hints at the C# implementation being alpha-grade at present, so I 
assume all ok, but I would like to check opinions from the devs before I embark 
on any PRs.

Many thanks,

--
Adam Szmigin



[jira] [Created] (ARROW-7516) [C#] .NET Benchmarks are broken

2020-01-08 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-7516:
---

 Summary: [C#] .NET Benchmarks are broken
 Key: ARROW-7516
 URL: https://issues.apache.org/jira/browse/ARROW-7516
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


See [https://github.com/apache/arrow/pull/6030#issuecomment-571877721]

 

It looks like the issue is that in the Benchmarks, `Length` is specified as 
`1_000_000`, and there has only been ~730,000 days since `DateTime.Min`, so 
this line fails:

https://github.com/apache/arrow/blob/4634c89fc77f70fb5b5d035d6172263a4604da82/csharp/test/Apache.Arrow.Tests/TestData.cs#L130

A simple fix would be to cap what we pass into `AddDays` to some number like 
`100_000`, or so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: [ANNOUNCE] New Arrow committer: Eric Erhardt

2019-10-21 Thread Eric Erhardt
Thanks, everyone! I'm looking forward to continue working with you all on this 
great project.

Eric

-Original Message-
From: Krisztián Szűcs  
Sent: Friday, October 18, 2019 11:55 AM
To: dev 
Subject: Re: [ANNOUNCE] New Arrow committer: Eric Erhardt

Congrats!

On Fri, Oct 18, 2019 at 12:35 PM Bryan Cutler  wrote:

> Congrats!
>
> On Thu, Oct 17, 2019, 6:26 PM Fan Liya  wrote:
>
> > Congrats Eric!
> >
> > Best,
> > Liya Fan
> >
> > On Fri, Oct 18, 2019 at 3:06 AM paddy horan 
> > wrote:
> >
> > > Congrats Eric!
> > >
> > > 
> > > From: Micah Kornfield 
> > > Sent: Thursday, October 17, 2019 12:45:15 PM
> > > To: dev 
> > > Subject: Re: [ANNOUNCE] New Arrow committer: Eric Erhardt
> > >
> > > Congrats Eric!
> > >
> > > On Thu, Oct 17, 2019 at 6:58 AM Wes McKinney 
> > wrote:
> > >
> > > > On behalf of the Arrow PMC, I'm happy to announce that Eric has 
> > > > accepted an invitation to become a committer on Apache Arrow.
> > > >
> > > > Welcome, and thank you for your contributions!
> > > >
> > >
> >
>


[jira] [Created] (ARROW-6795) [C#] Reading large Arrow files in C# results in an exception

2019-10-04 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6795:
---

 Summary: [C#] Reading large Arrow files in C# results in an 
exception
 Key: ARROW-6795
 URL: https://issues.apache.org/jira/browse/ARROW-6795
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


If you try to read a large Arrow file (2GB+) using the C# reader, you get an 
exception because it is casting the file position (a 64-bit long) to a 32-bit 
integer. When the file size is large

 

See [https://github.com/apache/arrow/pull/5412]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6728) [C#] Support reading and writing Date32 and Date64 arrays

2019-09-27 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6728:
---

 Summary: [C#] Support reading and writing Date32 and Date64 arrays
 Key: ARROW-6728
 URL: https://issues.apache.org/jira/browse/ARROW-6728
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


The C# implementation doesn't support reading and writing Date32 and Date64 
arrays. We need to add support and some tests.

It looks like it is only a couple of lines to get this enabled. See 
[https://github.com/apache/arrow/pull/5413].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6643) [C#] Write no IPC buffer metadata for NullType

2019-09-20 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6643:
---

 Summary: [C#] Write no IPC buffer metadata for NullType
 Key: ARROW-6643
 URL: https://issues.apache.org/jira/browse/ARROW-6643
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


We need to align the C# writer (and test the reader) for NullType. See 
[https://github.com/apache/arrow/pull/5287] and ARROW-6379.

 

>The C++ implementation has been writing 2 {{Buffer}} Flatbuffer struct values 
>with length 0 for NullType. Rather than having dummy/placeholder Buffer I 
>think it is more consistent to write no metadata for this type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6603) [C#] ArrayBuilder API to support writing nulls

2019-09-18 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6603:
---

 Summary: [C#] ArrayBuilder API to support writing nulls
 Key: ARROW-6603
 URL: https://issues.apache.org/jira/browse/ARROW-6603
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


There is currently no API in the PrimitiveArrayBuilder class to support writing 
nulls.  See this TODO - 
[https://github.com/apache/arrow/blob/1515fe10c039fb6685df2e282e2e888b773caa86/csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs#L101.]

 

Also see [https://github.com/apache/arrow/issues/5381].

 

We should add some APIs to support writing nulls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6553) [C#] Decide how to read message lengths - little-endian or machine dependent

2019-09-12 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6553:
---

 Summary: [C#] Decide how to read message lengths - little-endian 
or machine dependent
 Key: ARROW-6553
 URL: https://issues.apache.org/jira/browse/ARROW-6553
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


See the discussion 
[here|[https://github.com/apache/arrow/pull/5280#discussion_r323896532]]. We 
are currently reading message lengths using machine dependent endianness. 
Should this be changed to little-endian all the time?

It appears the C++ implementation does this same thing - use machine dependent 
endianness.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


RE: Timeline for 0.15.0 release

2019-09-11 Thread Eric Erhardt
I assume the plan is to merge the ARROW-6313-flatbuffer-alignment branch into 
master before the 0.15 release, correct?

BTW - I believe the C# alignment changes are ready to be merged into the 
alignment branch -  https://github.com/apache/arrow/pull/5280/ 

Eric

-Original Message-
From: Micah Kornfield  
Sent: Tuesday, September 10, 2019 10:24 PM
To: Wes McKinney 
Cc: dev ; niki.lj 
Subject: Re: Timeline for 0.15.0 release

I should have a little more bandwidth to help with some of the packaging 
starting tomorrow and going into the weekend.

On Tuesday, September 10, 2019, Wes McKinney  wrote:

> Hi folks,
>
> With the state of nightly packaging and integration builds things 
> aren't looking too good for being in release readiness by the end of 
> this week but maybe I'm wrong. I'm planning to be working to close as 
> many issues as I can and also to help with the ongoing alignment fixes.
>
> Wes
>
> On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield 
> wrote:
>
>> Just for reference [1] has a dashboard of the current issues:
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
>> ki.apache.org%2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea
>> sedata=02%7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034
>> a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376
>> 90648216338sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4
>> %3Dreserved=0
>>
>> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney  wrote:
>>
>>> hi all,
>>>
>>> It doesn't seem like we're going to be in a position to release at 
>>> the beginning of next week. I hope that one more week of work (or 
>>> less) will be enough to get us there. Aside from merging the 
>>> alignment changes, we need to make sure that our packaging jobs 
>>> required for the release candidate are all working.
>>>
>>> If folks could remove issues from the 0.15.0 backlog that they don't 
>>> think they will finish by end of next week that would help focus 
>>> efforts (there are currently 78 issues in 0.15.0 still). I am 
>>> looking to tackle a few small features related to dictionaries while 
>>> the release window is still open.
>>>
>>> - Wes
>>>
>>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney 
>>> wrote:
>>> >
>>> > hi,
>>> >
>>> > I think we should try to release the week of September 9, so 
>>> > development work should be completed by end of next week.
>>> >
>>> > Does that seem reasonable?
>>> >
>>> > I plan to get up a patch for the protocol alignment changes for 
>>> > C++ in the next couple of days -- I think that getting the 
>>> > alignment work done is the main barrier to releasing.
>>> >
>>> > Thanks
>>> > Wes
>>> >
>>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu 
>>> > 
>>> wrote:
>>> > >
>>> > > Hi, Wes, on the java side, I can think of several bugs that need 
>>> > > to
>>> be fixed or reminded.
>>> > >
>>> > > i. ARROW-6040: Dictionary entries are required in IPC streams 
>>> > > even
>>> when empty[1]
>>> > > This one is under review now, however through this PR we find 
>>> > > that
>>> there seems a bug in java reading and writing dictionaries in IPC 
>>> which is Inconsistent with spec[2] since it assumes all dictionaries 
>>> are at the start of stream (see details in PR comments,  and this 
>>> fix may not catch up with version 0.15). @Micah Kornfield
>>> > >
>>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test
>>> JSON files[3]
>>> > > Java side code already checked in, other implementations seems not.
>>> > >
>>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] Caused by trying 
>>> > > to load all records in one contiguous batch, fixed
>>> by providing iterator API for iteratively reading in ARROW-6219[5].
>>> > >
>>> > > Thanks,
>>> > > Ji Liu
>>> > >
>>> > > [1] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4960data=02%7C01%7CE
>>> > > ric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7
>>> > > C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338
>>> > > mp;sdata=eDF%2FAsJmVs7WjfEuNBYo%2F1TypIN44xx1TTlK6kQHZVg%3D
>>> > > reserved=0 [2] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Farrow.apache.org%2Fdocs%2Fipc.htmldata=02%7C01%7CEric.Erh
>>> > > ardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7C72f988
>>> > > bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338sdat
>>> > > a=H0pM8bVKsOyeORDhHxLlS%2BpaS%2F5meT52wxTKmNssuMk%3Dreserve
>>> > > d=0 [3] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-1875data=02%7C0
>>> > > 1%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678
>>> > > a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216
>>> > > 338sdata=coTpuoEGhfjyOSBTagdlohOTX24DQZmtbWC0gYsDmkM%3D
>>> > > ;reserved=0 [4] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 

RE: Plasma scenarios

2019-09-09 Thread Eric Erhardt
I don't think the C# bindings would use the Glib-based libraries on Windows if 
it requires installing MSYS2 or Cygwin on the end-user's Windows machine. So 
don't go through the work building the Glib-based libraries with MSVC on 
account of the C# library.

-Original Message-
From: Sutou Kouhei  
Sent: Monday, September 9, 2019 4:43 PM
To: dev@arrow.apache.org
Subject: Re: Plasma scenarios

Hi,

> In theory you could use the GLib-based library with MSVC, the main 
> requirement is gobject-introspection
> 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2FGNOME%2Fgobject-introspection%2Fblob%2Fmaster%2FMSVC.README.r
> stdata=02%7C01%7CEric.Erhardt%40microsoft.com%7Cca22053d07d84cc6d
> 98a08d7356ec83b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637036622
> 234878257sdata=2V8%2Fdf1jBeXmgZChjnTJU37ZqOQTf0GrLKw5d%2B%2FsFaY%
> 3Dreserved=0

Generally, we can use the GLib-based library without GObject Introspection if 
we write bindings by hand. (We can generate bindings automatically with GObject 
Introspection.)

But we need to some tasks to build the GLib-based library with MSVC. I'll work 
on it in a few months.


Thanks,
--
kou

In 
  "Re: Plasma scenarios" on Mon, 9 Sep 2019 12:00:00 -0500,
  Wes McKinney  wrote:

> hi Eric,
> 
> On Fri, Sep 6, 2019 at 5:09 PM Eric Erhardt 
>  wrote:
>>
>> I was looking for the high level scenarios for the Plasma In-Memory Object 
>> Store. A colleague of mine suggested we could use it to pass data between a 
>> C# process and a Python process.
>>
>> I've read the intro blog [0] on Plasma, which describes using the same data 
>> set from multiple processes - which sounds like the same scenario as above.
>>
>> I am trying to prioritize creating C# bindings for the Plasma client. So I'd 
>> like to know all the scenarios that would could be enabled with Plasma.
>>
>> For example:
>> - could using Plasma speed up Pandas UDFs in PySpark? Because the data 
>> wouldn't have to go across the socket between Java and Python, but instead 
>> would be memory-mapped. We have similar functionality in .NET for Apache 
>> Spark.
> 
> Memory still would need to be copied into the memory-mappable file, so 
> it's unclear whether this would be faster than passing the data 
> through a socket as it's being done now.
> 
>> - Is Plasma being used by Nvidia RAPIDS?
> 
> AFAIK it is not. It doesn't seem out of the question, though, given 
> that we have some level of CUDA support in Plasma now.
> 
>>
>> I know Plasma today is not supported on Windows, but I think support could 
>> be added since Windows supports memory mapped files (through a different API 
>> than mmap) and it now supports Unix Domain Sockets [1].
>>
>> Also - side question about the c_glib bindings. I assume those will only 
>> ever work on Windows with something like Cygwin or MSYS2, right? Would 
>> people be opposed to adding pure "C" exports to the plasma library so the C# 
>> bindings could use it? (similar to the JNI support today).
>>
> 
> In theory you could use the GLib-based library with MSVC, the main 
> requirement is gobject-introspection
> 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2FGNOME%2Fgobject-introspection%2Fblob%2Fmaster%2FMSVC.README.r
> stdata=02%7C01%7CEric.Erhardt%40microsoft.com%7Cca22053d07d84cc6d
> 98a08d7356ec83b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637036622
> 234883247sdata=8o2XPCj7xIkUgQSNMwJYMdqHVG2BNlNOqvE0P00TvEE%3D
> ;reserved=0
> 
> Note that GLib itself is LGPL-licensed -- since it is an optional 
> component in Apache Arrow, it is OK for optional components to have an 
> LGPL dependency (though ASF projects aren't allowed to have 
> mandatory/hard dependencies on LGPL). So if you do go that route just 
> beware the possible issues you might have down the road.
> 
> I have no objection to adding a "plasma/plasma-c.h" with C exports.
> 
>> Eric
>>
>> [0] 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fray
>> -project.github.io%2F2017%2F08%2F08%2Fplasma-in-memory-object-store.h
>> tmldata=02%7C01%7CEric.Erhardt%40microsoft.com%7Cca22053d07d84cc
>> 6d98a08d7356ec83b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637036
>> 622234883247sdata=XcXFtxsbgjXntJzX3foLTJQfgdQabEHQkneQeRQDWU0%3D
>> reserved=0 [1] 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdev
>> blogs.microsoft.com%2Fcommandline%2Faf_unix-comes-to-windows%2Fd
>> ata=02%7C01%7CEric.Erhardt%40microsoft.com%7Cca22053d07d84cc6d98a08d7
>> 356ec83b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637036622234883
>> 247sdata=%2BN3hCDkppSQAHM2AMgk6SBunF70mjgXwD%2Boesz41aq0%3D
>> reserved=0


Plasma scenarios

2019-09-06 Thread Eric Erhardt
I was looking for the high level scenarios for the Plasma In-Memory Object 
Store. A colleague of mine suggested we could use it to pass data between a C# 
process and a Python process.

I've read the intro blog [0] on Plasma, which describes using the same data set 
from multiple processes - which sounds like the same scenario as above.

I am trying to prioritize creating C# bindings for the Plasma client. So I'd 
like to know all the scenarios that would could be enabled with Plasma. 

For example:
- could using Plasma speed up Pandas UDFs in PySpark? Because the data wouldn't 
have to go across the socket between Java and Python, but instead would be 
memory-mapped. We have similar functionality in .NET for Apache Spark.
- Is Plasma being used by Nvidia RAPIDS?

I know Plasma today is not supported on Windows, but I think support could be 
added since Windows supports memory mapped files (through a different API than 
mmap) and it now supports Unix Domain Sockets [1].

Also - side question about the c_glib bindings. I assume those will only ever 
work on Windows with something like Cygwin or MSYS2, right? Would people be 
opposed to adding pure "C" exports to the plasma library so the C# bindings 
could use it? (similar to the JNI support today).

Eric

[0] https://ray-project.github.io/2017/08/08/plasma-in-memory-object-store.html
[1] https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/


RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-04 Thread Eric Erhardt
The C# PR is up.

https://github.com/apache/arrow/pull/5280

Eric

-Original Message-
From: Eric Erhardt  
Sent: Wednesday, September 4, 2019 10:12 AM
To: dev@arrow.apache.org; Ji Liu 
Cc: emkornfield ; Paul Taylor 
Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
Flatbuffer alignment requirements (2nd vote)

I'm working on a PR for the C# bindings. I hope to have it up in the next day 
or two. Integration tests for C# would be a great addition at some point - it's 
been on my backlog. For now I plan on manually testing it.

-Original Message-
From: Wes McKinney 
Sent: Tuesday, September 3, 2019 10:17 PM
To: Ji Liu 
Cc: emkornfield ; dev ; Paul 
Taylor 
Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
Flatbuffer alignment requirements (2nd vote)

hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the code 
reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want to look 
at this before the 0.15.0 release also. Absent integration tests it's difficult 
to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu  wrote:
>
> Here is the Java implementation
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fpull%2F5229data=02%7C01%7CEric.Erhardt%
> 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=b87u5x8lLvfdnU5
> 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3Dreserved=0
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> --
> From:Ji Liu  Send Time:2019年8月28日(星期三)
> 17:34 To:emkornfield ; dev 
>  Cc:Paul Taylor 
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this 
> issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> --
> From:Micah Kornfield  Send Time:2019年8月28日(星期三)
> 17:14 To:dev  Cc:Paul Taylor 
> 
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the 
> next few days.  I think the one remaining unassigned piece of work in 
> the Java implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney  wrote:
>
> > Here's the C++ changes
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > thub.com%2Fapache%2Farrow%2Fpull%2F5211data=02%7C01%7CEric.Erha
> > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=zWaHS8X
> > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3Dreserved=0
> >
> > I'm going to create a integration branch where we can merge each 
> > patch before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney  wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up 
> > > for that soon (today maybe). I think we should create a branch 
> > > where we can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor 
> > > 
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > C++ integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313data=02
> > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > 38512163816sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > NiLxI%3Dreserved=0 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one 
> > > > > has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney 
> > > > > 
&g

RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-04 Thread Eric Erhardt
I'm working on a PR for the C# bindings. I hope to have it up in the next day 
or two. Integration tests for C# would be a great addition at some point - it's 
been on my backlog. For now I plan on manually testing it.

-Original Message-
From: Wes McKinney  
Sent: Tuesday, September 3, 2019 10:17 PM
To: Ji Liu 
Cc: emkornfield ; dev ; Paul 
Taylor 
Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
Flatbuffer alignment requirements (2nd vote)

hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the code 
reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want to look 
at this before the 0.15.0 release also. Absent integration tests it's difficult 
to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu  wrote:
>
> Here is the Java implementation
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fpull%2F5229data=02%7C01%7CEric.Erhardt%
> 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=b87u5x8lLvfdnU5
> 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3Dreserved=0
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> --
> From:Ji Liu  Send Time:2019年8月28日(星期三) 
> 17:34 To:emkornfield ; dev 
>  Cc:Paul Taylor 
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this 
> issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> --
> From:Micah Kornfield  Send Time:2019年8月28日(星期三) 
> 17:14 To:dev  Cc:Paul Taylor 
> 
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the 
> next few days.  I think the one remaining unassigned piece of work in 
> the Java implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney  wrote:
>
> > Here's the C++ changes
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > thub.com%2Fapache%2Farrow%2Fpull%2F5211data=02%7C01%7CEric.Erha
> > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=zWaHS8X
> > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3Dreserved=0
> >
> > I'm going to create a integration branch where we can merge each 
> > patch before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney  wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up 
> > > for that soon (today maybe). I think we should create a branch 
> > > where we can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor 
> > > 
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow 
> > > > C++ integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created 
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313data=02
> > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > 38512163816sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > NiLxI%3Dreserved=0 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one 
> > > > > has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney 
> > > > > 
> > wrote:
> > > > >
> > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > >>
> > > > >> I'll merge the specification patch later today and we can 
> > > > >> begin working on implementations so we can get this done for 
> > > > >> 0.15.0
> > > > >>
> > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler 
> > > > >> 
> > wrote:
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou 
> > > > >>> 
> > > > >> wrote:
> > > >  Sorry, had forgotten to send my vote on this.
> > > > 
> > > >  +1 from me.
> > > > 
> > > >  Regards
> > > > 
> > > >  Antoine.
> > > > 
> > > > 
> > > >  On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney 
> > > >   wrote:
> > > > > hi all,
> > > > >
> > > > > As we've been discussing [1], there is a need to introduce 
> > > > > 4
> > bytes of
> > > > > padding into the preamble of the "encapsulated 

[jira] [Created] (ARROW-6322) [C#] Implement a plasma client

2019-08-22 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6322:
---

 Summary: [C#] Implement a plasma client
 Key: ARROW-6322
 URL: https://issues.apache.org/jira/browse/ARROW-6322
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


We should create a C# plasma client, so .NET code can get and put objects into 
the plasma store.

An easy-ish way of implementing this would be to build on the c_glib C APIs 
already exposed for the plasma client. Unfortunately, I haven't found a decent 
C# GObject generator, so I think the C bindings will need to be written by 
hand, but there isn't too many of them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


RE: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-08-12 Thread Eric Erhardt
Hey Wes,

I just wanted to check-in on this work. Have there been any updates to the 
Arrow "data frame" project worth sharing?

Thanks,
Eric

-Original Message-
From: Wes McKinney  
Sent: Tuesday, May 21, 2019 8:17 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ 
libraries

On Tue, May 21, 2019, 8:43 AM Antoine Pitrou  wrote:

>
> Le 21/05/2019 à 13:42, Wes McKinney a écrit :
> > hi Antoine,
> >
> > On Tue, May 21, 2019 at 5:48 AM Antoine Pitrou 
> wrote:
> >>
> >>
> >> Hi Wes,
> >>
> >> How does copy-on-write play together with memory-mapped data?  It 
> >> seems that, depending on whether the memory map has several 
> >> concurrent users (a condition which may be timing-dependent), we 
> >> will either persist changes on disk or make them ephemeral in 
> >> memory.  That doesn't sound very user-friendly, IMHO.
> >
> > With memory-mapping, any Buffer is sliced from the parent MemoryMap 
> > [1] so mutating the data on disk using this interface wouldn't be 
> > possible with the way that I've framed it.
>
> Hmm... I always forget that SliceBuffer returns a read-only view.
>

The more important issue is that parent_ is non-null. The idea is that no 
mutation is allowed if we reason that another Buffer object has access to the 
address space of interest. I think this style of copy-on-write is a reasonable 
compromise that prevents most kinds of defensive copying.


> Regards
>
> Antoine.
>


RE: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-18 Thread Eric Erhardt
+1

Tested:
- C# source verification on Ubuntu 18
- I verified the C# source contained the fixes for the two issues I needed 
fixed in this patch.

-Original Message-
From: Krisztián Szűcs  
Sent: Tuesday, July 16, 2019 9:55 PM
To: dev@arrow.apache.org
Subject: [VOTE] Release Apache Arrow 0.14.1 - RC0

Hi,

I would like to propose the following release candidate (RC0) of Apache Arrow 
version 0.14.1. This is a patch release consiting of 47 resolved JIRA issues[1].

This release candidate is based on commit:
5f564424c71cef12619522cdde59be5f69b31b68 [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7].
The changelog is located at [8].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [9] for how to validate a release candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.14.1 [ ] +0 [ ] -1 Do not release this as 
Apache Arrow 0.14.1 because...

[1]:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520ARROW%2520AND%2520status%2520in%2520%2528Resolved%252C%2520Closed%2529%2520AND%2520fixVersion%2520%253D%25200.14.1data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930625039sdata=Ltv4Vi3G91xHkFiq9RtWmFCVzChabfeJ1EX5ZCShy4U%3Dreserved=0
[2]:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Ftree%2F5f564424c71cef12619522cdde59be5f69b31b68data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930625039sdata=nWStpf%2BqMeLfCcguqMzN9s%2FarPOv%2F32oFxI%2BK9FsQt4%3Dreserved=0
[3]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Farrow%2Fapache-arrow-0.14.1-rc0data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=o6sAxT4fWOCFwmiZgZdx%2B3kLZbXM%2FpamiUAXmGk6HCI%3Dreserved=0
[4]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbintray.com%2Fapache%2Farrow%2Fcentos-rc%2F0.14.1-rc0data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=wTXSsizkpoSVreQdrgg%2FRPp7sBWiyjK90OfBvTUdoTE%3Dreserved=0
[5]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbintray.com%2Fapache%2Farrow%2Fdebian-rc%2F0.14.1-rc0data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=94%2BOVuBMncnTLfFHV9AM%2BpL4rhswQZ1exktz1fQwBVk%3Dreserved=0
[6]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbintray.com%2Fapache%2Farrow%2Fpython-rc%2F0.14.1-rc0data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=afdkBYOdLfmtN5u1p9h5YBdwxHE0cTFriUKR8VdsmHs%3Dreserved=0
[7]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbintray.com%2Fapache%2Farrow%2Fubuntu-rc%2F0.14.1-rc0data=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=lXIeeWfN0i78beynuww%2FJjpwfO%2B7b7bYfHhYnVzP%2Fzs%3Dreserved=0
[8]:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2F5f564424c71cef12619522cdde59be5f69b31b68%2FCHANGELOG.mddata=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=lPlhLulc7yV4YwdpmBe%2FCq7sdO7GyntOgVD7aeZxiQM%3Dreserved=0
[9]:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FARROW%2FHow%2Bto%2BVerify%2BRelease%2BCandidatesdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C375d6c8545a0356808d70a62233a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636989288930635039sdata=w0LAtJzr5eWHpQRsGzYEI7t0m%2BeQ6w%2Bu7X5LVF6U%2Bus%3Dreserved=0


RE: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-11 Thread Eric Erhardt
The two C# fixes I'd like in the 0.14.1 release are:

https://issues.apache.org/jira/browse/ARROW-5887 - already marked with 0.14.1 
fix version.
https://issues.apache.org/jira/browse/ARROW-5908 - hasn't been resolved yet. 
The PR https://github.com/apache/arrow/pull/4851 has one approver and the Rust 
failure doesn't appear to be caused by my change.

I assume I shouldn't mark ARROW-5908 with a 0.14.1 fix version until the PR has 
been merged.

-Original Message-
From: Neal Richardson  
Sent: Thursday, July 11, 2019 11:59 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, 
Parquet forward compatibility problems

I just moved 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-5850data=02%7C01%7CEric.Erhardt%40microsoft.com%7C244c0dd319dd4ea18a5508d7062125de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636984611747771373sdata=B6xfFcBu4Iz0jJE5tUXkKvoJx36kMCS4UJCdTV7jqGA%3Dreserved=0
 from 1.0.0 to 0.14.1.

On Thu, Jul 11, 2019 at 8:12 AM Wes McKinney  wrote:

> To limit uncertainty, I'm going to start preparing a 0.14.1 patch 
> release branch. I will update the list with the patches that are being 
> cherry-picked. If other folks could give me a list of other PRs that 
> need to be backported I will add them to the list. Any JIRA that needs 
> to be included should have the "0.14.1" fix version added so we can 
> keep track
>
> On Wed, Jul 10, 2019 at 9:48 PM Joris Van den Bossche 
>  wrote:
> >
> > I personally prefer 0.14.1 over 0.15.0. I think that is clearer in 
> > communication, as we are fixing regressions of the 0.14.0 release.
> >
> > (but I haven't been involved much in releases, so certainly no 
> > strong
> > opinion)
> >
> > Joris
> >
> >
> > Op wo 10 jul. 2019 om 15:07 schreef Wes McKinney :
> >
> > > hi folks,
> > >
> > > Are there any opinions / strong feelings about the two options:
> > >
> > > * Prepare patch 0.14.1 release from a maintenance branch
> > > * Release 0.15.0 out of master
> > >
> > > Aside from the Parquet forward compatibility issues we're still 
> > > discussing, and Eric's C# patch PR 4836, are there any other 
> > > issues that need to be fixed before we go down one of these paths?
> > >
> > > Would anyone like to help with release management? I can do so if 
> > > necessary, but I've already done a lot of release management :)
> > >
> > > - Wes
> > >
> > > On Tue, Jul 9, 2019 at 4:13 PM Wes McKinney 
> wrote:
> > > >
> > > > Hi Eric -- of course!
> > > >
> > > > On Tue, Jul 9, 2019, 4:03 PM Eric Erhardt <
> eric.erha...@microsoft.com.invalid>
> > > wrote:
> > > >>
> > > >> Can we propose getting changes other than Python or Parquet 
> > > >> related
> > > into this release?
> > > >>
> > > >> For example, I found a critical issue in the C# implementation
> that, if
> > > possible, I'd like to get included in a patch release.
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > github.com%2Fapache%2Farrow%2Fpull%2F4836data=02%7C01%7CEric.
> > > Erhardt%40microsoft.com%7C244c0dd319dd4ea18a5508d7062125de%7C72f98
> > > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636984611747781365sdata
> > > =5wJ%2FGdh8LTxRyrB%2F2Lc3ue46%2FRqE6WUM6brsSDv2FR0%3Dreserved
> > > =0
> > > >>
> > > >> Eric
> > > >>
> > > >> -Original Message-
> > > >> From: Wes McKinney 
> > > >> Sent: Tuesday, July 9, 2019 7:59 AM
> > > >> To: dev@arrow.apache.org
> > > >> Subject: Re: [DISCUSS] Need for 0.14.1 release due to Python 
> > > >> package
> > > problems, Parquet forward compatibility problems
> > > >>
> > > >> On Tue, Jul 9, 2019 at 12:02 AM Sutou Kouhei 
> > > >> 
> > > wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > > If the problems can be resolved quickly, I should think we
> could cut
> > > >> > > an RC for 0.14.1 by the end of this week. The RC could 
> > > >> > > either
> be cut
> > > >> > > from a maintenance branch or out of master -- any thoughts 
> > > >> > > about this (cutting from master is definitely easier)?
> > > >> >
> > > >> > How about just releasing 0.15.

RE: New CI system: Ursabot

2019-07-11 Thread Eric Erhardt
My apologies if this is already covered in the docs, but I couldn't find it.

How do I re-run a single leg in the Ursabot tests? The 'AMD64 Debian 9 Rust 
1.35' failed on my PR, and I wanted to try re-running just that leg, but the 
only option I found was to re-run all Ursabot legs.

Eric

-Original Message-
From: Krisztián Szűcs  
Sent: Friday, June 14, 2019 9:48 AM
To: dev@arrow.apache.org
Subject: New CI system: Ursabot

Hello All,

We're developing a buildbot application to utilize Ursa Labs’
physical machines called Ursabot. Buildbot [1] is used by major open source 
projects, like CPython and WebKit [2].

The source code is hosted at [3], the web interface is accessible at [4]. The 
repository contains a short guide about the goals, implementation and the 
interfaces we can drive ursabot. The most notable way to trigger ursabot builds 
is via sending github comments mentioning @ursabot machine account, for more 
see [5].

Currently we have builders for the C++ implementation and the Python bindings 
on AMD64 and ARM64 architectures.
It is quite easy to attach workers to the buildmaster [7], so We can scale our 
build cluster to test and run on-demand builds (like benchmarks, packaging 
tasks) on more platforms.

Yesterday we've enabled the github status push reporter to improve the 
visibility of ursabot, although we were testing the builders in the last couple 
of weeks. I hope no one has a hard objection against this new CI. Arrow has 
already started to outgrow Travis-CI and Appveyor's capacity and we're trying 
to make the build system quicker and more robust.

Please don't hesitate to ask any questions!

Thanks, Krisztian

[1]: 
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbuildbot.net%2Fdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204963990536sdata=CX7t5kh2wLH%2BHYZq%2BwMG3cGIeg1ZHx%2BDHnGqlyRw81g%3Dreserved=0
[2]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbuildbot%2Fbuildbot%2Fwiki%2FSuccessStoriesdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204963990536sdata=vxbos9e%2BrJi7ZBIoqjUNbyj2Xmlfpj9JxsFbDc1CXrI%3Dreserved=0
[3]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursa-labs%2Fursabotdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204963990536sdata=77aMN03BotaAVZM4LhI1ER4lkEqVrYb%2B848yvELq%2BEk%3Dreserved=0
[4]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.ursalabs.orgdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204963990536sdata=JKLOOems6daX9OQGfZwsjxuvdYXxuM9Pj3r7BR869fg%3Dreserved=0
[5]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursa-labs%2Fursabot%23driving-ursabotdata=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204964000528sdata=x5oOrTOeedkfmvP9K9R4FYZZnR3jD1A7Q%2F5Qu8EC7M8%3Dreserved=0
[7]: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursa-labs%2Fursabot%2Fblob%2Fmaster%2Fdefault.yaml%23L115data=02%7C01%7CEric.Erhardt%40microsoft.com%7C7df1445a86f747c3db9608d6f0d75462%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636961204964000528sdata=FoqVfr4RPDmhEXxXWOK%2BUchzwm5mTv8tsN4nSrjKggQ%3Dreserved=0


[jira] [Created] (ARROW-5896) [C#] Array Builders should take an initial capacity in their constructors

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5896:
---

 Summary: [C#] Array Builders should take an initial capacity in 
their constructors
 Key: ARROW-5896
 URL: https://issues.apache.org/jira/browse/ARROW-5896
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


When using the Fluent Array Builder API, we should take in an initial capacity 
in the constructor, so we can avoid allocating unnecessary memory.

Today, if you create a builder, and then .Reserve(length) on it, the initial 
byte[] that was created in the constructor is wasted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RE: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-09 Thread Eric Erhardt
Just to be sure I fully understand the proposal:

For the Library Version, we are going to increment the MAJOR version on every 
normal release, and increment the MINOR version if we need to release a 
patch/bug fix type of release.

Since SemVer allows for API breaking changes on MAJOR versions, this basically 
means, each library (C++, Python, C#, Java, etc) _can_ introduce API breaking 
changes on every normal release (like we have been with the 0.x.0 releases).

So, for example, we release library v1.0.0 in a few months and then library 
v2.0.0 a few months after that.  In v2.0.0, C++, Python, and Java didn't make 
any breaking API changes from 1.0.0. But C# made 3 API breaking changes. This 
would be acceptable?

If my understanding above is correct, then I think this is a good plan. 
Initially I was concerned that the C# library wouldn't be free to make API 
breaking changes with making the version `1.0.0`. The C# library is still 
pretty inadequate, and I have a feeling there are a few things that will need 
to change about it in the future. But with the above plan, this concern won't 
be a problem.

Eric

-Original Message-
From: Micah Kornfield  
Sent: Monday, July 1, 2019 10:02 PM
To: Wes McKinney 
Cc: dev@arrow.apache.org
Subject: Re: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

Hi Wes,
Thanks for your response.  In regards to the protocol negotiation your 
description of feature reporting (snipped below) is along the lines of what I 
was thinking.  It might not be necessary for 1.0.0, but at some point might 
become useful.


>  Note that we don't really have a mechanism for clients and servers to 
> report to each other what features they support, so this could help 
> with that when for applications where it might matter.


Thanks,
Micah


On Mon, Jul 1, 2019 at 12:54 PM Wes McKinney  wrote:

> hi Micah,
>
> Sorry for the delay in feedback. I looked at the document and it seems 
> like a reasonable perspective about forward- and 
> backward-compatibility.
>
> It seems like the main thing you are proposing is to apply Semantic 
> Versioning to Format and Library versions separately. That's an 
> interesting idea, my thought had been to have a version number that is 
> FORMAT_VERSION.LIBRARY_VERSION.PATCH_VERSION. But your proposal is 
> more flexible in some ways, so let me clarify for others reading
>
> In what you are proposing, the next release would be:
>
> Format version: 1.0.0
> Library version: 1.0.0
>
> Suppose that 20 major versions down the road we stand at
>
> Format version: 1.5.0
> Library version: 20.0.0
>
> The minor version of the Format would indicate that there are 
> additions, like new elements in the Type union, but otherwise backward 
> and forward compatible. So the Minor version means "new things, but 
> old clients will not be disrupted if those new things are not used".
> We've already been doing this since the V4 Format iteration but we 
> have not had a way to signal that there may be new features. As a 
> corollary to this, I wonder if we should create a dual version in the 
> metadata
>
> PROTOCOL VERSION: (what is currently MetadataVersion, V2) FEATURE 
> VERSION: not tracked at all
>
> So Minor version bumps in the format would trigger a bump in the 
> FeatureVersion. Note that we don't really have a mechanism for clients 
> and servers to report to each other what features they support, so 
> this could help with that when for applications where it might matter.
>
> Should backward/forward compatibility be disrupted in the future, then 
> a change to the major version would be required. So in year 2025, say, 
> we might decide that we want to do:
>
> Format version: 2.0.0
> Library version: 21.0.0
>
> The Format version would live in the project's Documentation, so the 
> Apache releases are only the library version.
>
> Regarding your open questions:
>
> 1. Should we clean up "warts" on the specification, like redundant 
> information
>
> I don't think it's necessary. So if Metadata V5 is Format Version
> 1.0.0 (currently we are V4, but we're discussing some possible 
> non-forward compatible changes...) I think that's OK. None of these 
> things are "hurting" anything
>
> 2. Do we need additional mechanisms for marking some features as 
> experimental?
>
> Not sure, but I think this can be mostly addressed through 
> documentation. Flight will still be experimental in 1.0.0, for 
> example.
>
> 3. Do we need protocol negotiation mechanisms in Flight
>
> Could you explain what you mean? Are you thinking if there is some 
> major revamp of the protocol and you need to switch between a "V1 
> Flight Protocol" and a "V2 Flight Protocol"?
>
> - Wes
>
> On Thu, Jun 13, 2019 at 2:17 AM Micah Kornfield 
> 
> wrote:
> >
> > Hi Everyone,
> > I think there might be some ideas that we still need to reach 
> > consensus
> on
> > for how the format and libraries evolve in a post-1.0.0 release world.
> >  Specifically, I think we need to agree on 

RE: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-09 Thread Eric Erhardt
Can we propose getting changes other than Python or Parquet related into this 
release?

For example, I found a critical issue in the C# implementation that, if 
possible, I'd like to get included in a patch release.  
https://github.com/apache/arrow/pull/4836

Eric

-Original Message-
From: Wes McKinney  
Sent: Tuesday, July 9, 2019 7:59 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, 
Parquet forward compatibility problems

On Tue, Jul 9, 2019 at 12:02 AM Sutou Kouhei  wrote:
>
> Hi,
>
> > If the problems can be resolved quickly, I should think we could cut 
> > an RC for 0.14.1 by the end of this week. The RC could either be cut 
> > from a maintenance branch or out of master -- any thoughts about 
> > this (cutting from master is definitely easier)?
>
> How about just releasing 0.15.0 from master?
> It'll be simpler than creating a patch release.
>

I'd be fine with that, too.

>
> Thanks,
> --
> kou
>
> In 
>   "[DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet 
> forward compatibility problems" on Mon, 8 Jul 2019 11:32:07 -0500,
>   Wes McKinney  wrote:
>
> > hi folks,
> >
> > Perhaps unsurprisingly due to the expansion of our Python packages, 
> > a number of things are broken in 0.14.0 that we should fix sooner 
> > than the next major release. I'll try to send a complete list to 
> > this thread to give a status within a day or two. Other problems may 
> > arise in the next 48 hours as more people install the package.
> >
> > If the problems can be resolved quickly, I should think we could cut 
> > an RC for 0.14.1 by the end of this week. The RC could either be cut 
> > from a maintenance branch or out of master -- any thoughts about 
> > this (cutting from master is definitely easier)?
> >
> > Would someone (who is not Kou) be able to assist with creating the RC?
> >
> > Thanks,
> > Wes


[jira] [Created] (ARROW-5887) [C#] ArrowStreamWriter writes FieldNodes in wrong order

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5887:
---

 Summary: [C#] ArrowStreamWriter writes FieldNodes in wrong order
 Key: ARROW-5887
 URL: https://issues.apache.org/jira/browse/ARROW-5887
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


When ArrowStreamWriter is writing a {{RecordBatch}} with {{null}}s in it, it is 
mixing up the column's {{NullCount}}.

You can see here:

[https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L195-L200]

It is writing the fields from {{0}} -> {{fieldCount}} order. But then 
[lower|https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L216-L220],
 it is writing the fields from {{fieldCount}} -> {{0}}.

Looking at the [Java 
implementation|https://github.com/apache/arrow/blob/7b2d68570b4336308c52081a0349675e488caf11/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/FBSerializables.java#L36-L44]
 it says
{quote}// struct vectors have to be created in reverse order
{quote}
 

A simple test of roundtripping the following RecordBatch shows the issue:

 
{code:java}
var result = new RecordBatch(
new Schema.Builder()
.Field(f => f.Name("age").DataType(Int32Type.Default))
.Field(f => f.Name("CharCount").DataType(Int32Type.Default))
.Build(),
new IArrowArray[]
{
new Int32Array(
new ArrowBuffer.Builder().Append(0).Build(),
new ArrowBuffer.Builder().Append(0).Build(),
length: 1,
nullCount: 1,
offset: 0),
new Int32Array(
new ArrowBuffer.Builder().Append(7).Build(),
ArrowBuffer.Empty,
length: 1,
nullCount: 0,
offset: 0)
},
length: 1);
{code}
Here, the "age" column should have a `null` in it. However, when you write and 
read this RecordBatch back, you see that the "CharCount" column has `NullCount` 
== 1 and "age" column has `NullCount` == 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RE: [DISCUSS] Ongoing Travis CI service degradation

2019-07-02 Thread Eric Erhardt
Has anyone considered using Azure DevOps for CI and patch validation?

https://azure.microsoft.com/en-us/services/devops/pipelines/

> Get cloud-hosted pipelines for Linux, macOS, and Windows with unlimited 
> minutes and 10 free parallel jobs for open source

I guess I am not familiar with ASF policies, but we've been using Azure DevOps 
on the .NET team for a while now (we've switched off of Jenkins) and there are 
some really great features. You can use cloud-hosted machines, or your own 
machines. It has Docker integration. And can scale up as large as necessary. It 
has great test failure reporting and analytics on which tests fail more often 
than others.

One scenario we have built on our team is an "Auto-merge" bot. This allows 
committers to mark a PR as "auto-mergeable", and when the validation pipeline 
is completed successfully, the PR is automatically merged. If new changes are 
pushed to the PR or the validation build fails, it shuts the auto-merge 
capability off. This has proven super useful on my team - no more monitoring 
builds to see when they can be merged. You can review the change, approve of 
it, and mark it as "auto-merge" and when the validation passes, it is merged by 
the bot.
This is just an example of the types of extensions you can build on Azure 
DevOps.

I thought I would throw this option out here, just to hear others' opinions 
(positive or negative) on using Azure DevOps.

Eric

-Original Message-
From: Wes McKinney  
Sent: Friday, June 28, 2019 12:06 PM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Ongoing Travis CI service degradation

Based on the discussion in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FINFRA-18533data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cb9373c34d23347432e2b08d6fbeaf913%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636973383955537687sdata=h4PPFA%2BKwNjwue4V2LHrAVS0MK5QnBwO7HCA98Uo2xY%3Dreserved=0
 it does not appear to be ASF Infra's inclination to allow projects to donate 
money to the Foundation to get more build resources on Travis CI. Our likely 
only solution is going to be to reduce our dependence on Travis CI. In the 
short term, I would say that the sooner we can migrate all of our Linux builds 
to docker-compose form to aid in this transition, the better

We are hiring in our organization (Ursa Labs) for a dedicated role to support 
CI and development lifecycle automation (packaging, benchmarking, releases, 
etc.) in the Apache Arrow project, so I hope that we can provide even more help 
to resolve these issues in the future than we already are

On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou  wrote:
>
>
> Also note that the situation with AppVeyor isn't much better.
>
> Any "free as in beer" CI service is probably too capacity-limited for 
> our needs now, unless it allows private workers (which apparently 
> Gitlab CI does).
>
> Regards
>
> Antoine.
>
>
> Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> > It seems that there is intermittent Apache-wide degradation of 
> > Travis CI services -- I was looking at 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftr
> > avis-ci.org%2Fapachedata=02%7C01%7CEric.Erhardt%40microsoft.com
> > %7Cb9373c34d23347432e2b08d6fbeaf913%7C72f988bf86f141af91ab2d7cd011db
> > 47%7C1%7C0%7C636973383955547694sdata=reS1nDwycZXNo34MZPi4YQ1WIx
> > x%2By%2BbsV1Rp0108xE4%3Dreserved=0 today and there appeared to 
> > be a stretch of 3-4 hours where no queued builds on github.com/apache were 
> > running at all. I initially thought that the issue was contention with 
> > other Apache projects but even with round-robin allocation and a 
> > concurrency limit (e.g. no Apache project having more than 5-6 concurrent 
> > builds) that wouldn't explain why NO builds are running.
> >
> > This is obviously disturbing given how reliant we are on Travis CI 
> > to validate patches to be merged.
> >
> > I've opened a support ticket with Travis CI to see if they can 
> > provide some insight into what's going on. There is also an INFRA 
> > ticket where other projects have reported some similar experiences
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fis
> > sues.apache.org%2Fjira%2Fbrowse%2FINFRA-18533data=02%7C01%7CEri
> > c.Erhardt%40microsoft.com%7Cb9373c34d23347432e2b08d6fbeaf913%7C72f98
> > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636973383955547694sdata=G
> > 07luHnnCAi3aqLeoFuTaY3bq1kWqWjG1l3tnaept9c%3Dreserved=0
> >
> > As a meta-comment, at some point Apache Arrow is going to need to 
> > move off of public CI services for patch validation so that we can 
> > have unilateral control over scaling our build / test resources as 
> > the community grows larger. As the most active merger of patches (I 
> > have merged over 50% of pull requests over the project's history) 
> > this affects me greatly as I am often monitoring builds on many open 
> > PRs so that I can merge them as soon as possible. We are often 

[jira] [Created] (ARROW-5708) [C#] Null support for BooleanArray

2019-06-24 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5708:
---

 Summary: [C#] Null support for BooleanArray
 Key: ARROW-5708
 URL: https://issues.apache.org/jira/browse/ARROW-5708
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the conversation 
[here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and 
[here|https://github.com/apache/arrow/pull/3574#discussion_r262662083].

We should add null support for BooleanArray.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5546) [C#] Remove IArrowArray and use Array base class.

2019-06-10 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5546:
---

 Summary: [C#] Remove IArrowArray and use Array base class.
 Key: ARROW-5546
 URL: https://issues.apache.org/jira/browse/ARROW-5546
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Affects Versions: 0.13.0
Reporter: Eric Erhardt


In .NET libraries, we have historically favored classes (abstract or otherwise) 
over interfaces. See [Choosing Between Classes and 
Interfaces|https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ms229013(v%3dvs.100)].
 The main reasoning is that you can add members to a class over time, but once 
you ship an interface, it can never be changed. You can only add new interfaces.

 In light of this, we should remove the IArrowArray interface, and instead just 
the base `Array` class as the abstraction for all Arrow Arrays.

As part of this, we should also consider renaming `Array` because it conflicts 
with the System.Array type. Instead we should consider naming it `ArrowArray` 
to make it unique from the very common System.Array type in .NET.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Timeline for 0.14 release?

2019-05-30 Thread Eric Erhardt
Do we have an idea on a 0.14 release timeframe?

Historically, it seems there has been a release every 2-3 months. Do we think 
the next release would be about that long after 0.13 was released?

Note: I'm not pushing for a release any time soon - I am just curious when it 
would roughly be. I wanted to set expectations with my team on when some of the 
perf improvements I made on the .NET bindings would be available.

Eric 


[jira] [Created] (ARROW-5278) [C#] ArrowBuffer should either implement IEquatable correctly or not at all

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5278:
---

 Summary: [C#] ArrowBuffer should either implement IEquatable 
correctly or not at all
 Key: ARROW-5278
 URL: https://issues.apache.org/jira/browse/ARROW-5278
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Eric Erhardt


See the discussion 
[here|https://github.com/apache/arrow/pull/3925/#discussion_r281378027].

ArrowBuffer currently implement IEquatable, but doesn't override `GetHashCode`.

We should either implement IEquatable correctly by overriding Equals and 
GetHashCode, or remove IEquatable all together.

Looking at ArrowBuffer's [Equals 
implementation|https://github.com/apache/arrow/blob/08829248fd540b7e3bd96b980e357f8a4db7970e/csharp/src/Apache.Arrow/ArrowBuffer.cs#L66-L69],
 it compares each value in the buffer, which is not very efficient. Also, this 
implementation is not consistent with how `Memory` implements IEquatable - 
[https://source.dot.net/#System.Private.CoreLib/shared/System/Memory.cs,500].

If we continue implementing IEquatable on ArrowBuffer, we should consider 
implementing it in the same fashion as Memory does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5277) [C#] MemoryAllocator.Allocate(length: 0) should not return null

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5277:
---

 Summary: [C#] MemoryAllocator.Allocate(length: 0) should not 
return null
 Key: ARROW-5277
 URL: https://issues.apache.org/jira/browse/ARROW-5277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the conversation 
[here|https://github.com/apache/arrow/pull/3925#discussion_r281187184].

We should change MemoryAllocator to not return `null` when the requested memory 
length is `0`. Instead, we should create a cached "NullObject" IMemoryOwner 
that has a no-op `Dispose` method, and always returns `Memory.Empty`.

This way consuming code doesn't need to check for `null` being returned from 
MemoryAllocator.Allocate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5276) [C#] NativeMemoryAllocator expose an option for clearing allocated memory

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5276:
---

 Summary: [C#] NativeMemoryAllocator expose an option for clearing 
allocated memory
 Key: ARROW-5276
 URL: https://issues.apache.org/jira/browse/ARROW-5276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the discussion 
[here|https://github.com/apache/arrow/pull/3925#discussion_r281192698].

We should expose an option on NativeMemoryAllocator for controlling whether the 
allocated memory is cleared or not.

Maybe we should make the default not clear the memory, that way it is the best 
performing by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5092) [C#] Source Link doesn't work with the C# release script

2019-04-02 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5092:
---

 Summary: [C#] Source Link doesn't work with the C# release script
 Key: ARROW-5092
 URL: https://issues.apache.org/jira/browse/ARROW-5092
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Affects Versions: 0.13.0
Reporter: Eric Erhardt


With the 0.13.0 C# NuGet package, [Source 
Link|https://docs.microsoft.com/en-us/dotnet/standard/library-guidance/sourcelink]
 doesn't work. The symbols can be downloaded from nuget.org correctly, but when 
Visual Studio tries to download the code, it cannot find the correct files.

The following is why it doesn't work:

The .NET tooling expects the build of an official release to happen in the 
context of a {{git}} repository. This does 2 things to the produced assets:
 # In the {{.nupkg}} file that is generated, the .NET tooling will encode the 
current git commit's SHA hash into both the {{Apache.Arrow.nuspec}} file, and 
into the compiled {{Apache.Arrow.dll}} assembly. Looking at the released 
version that was published over the weekend: 
[https://www.nuget.org/packages/Apache.Arrow/0.13.0], this information made it 
into the {{.nuspec}} and the {{.dll}}:

{code}
[assembly: 
AssemblyInformationalVersion("0.13.0+57de5c3adffe526f37366bb15c3ff0d4a2e84655")]

https://github.com/apache/arrow; 
commit="57de5c3adffe526f37366bb15c3ff0d4a2e84655" />
{code}

However, I don't see how the [C# release 
script|https://github.com/apache/arrow/blob/master/dev/release/post-06-csharp.sh]
 could have done that. 

 # Also, .NET has a feature called "Source Link", which allows for the source 
code to be automatically downloaded from GitHub when debugging into this 
library. The way the tooling works today, it requires that the git repository's 
{{origin}} remote is set to [https://github.com/apache/arrow.git]. The tooling 
reads uses the `origin` git remote to encode the GitHub URL into the symbols 
file in the {{.snupkg}} file.

This, however, doesn't work with the 0.13.0 release that occurred over the 
weekend. I tried using the Source Link feature, and it didn't automatically 
download the source files from GitHub.

Looking into the symbols file, I see the Source Link information that was 
embedded:


{code}
1: 
'/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBuffer.cs'
 (#19c)C# (#3)   SHA-1 (#2) 
04-64-A0-48-82-EA-F5-B5-50-EC-CA-9F-85-75-E2-95-A4-EC-AB-B3 (#1b7)   
2: 
'/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBufferUtil.cs'
 (#68f)C# (#3)   SHA-1 (#2) 
F0-4F-28-53-88-A4-E0-6E-F1-1F-17-F6-CD-FE-0E-64-AB-0B-C2-95 (#6aa)   
{code}

{code:json}
{
"documents": {
"/home/kou/work/cpp/arrow.kou/*": 
"https://raw.githubusercontent.com/kou/arrow/57de5c3adffe526f37366bb15c3ff0d4a2e84655/*;,
"/home/kou/work/cpp/arrow.kou/cpp/submodules/parquet-testing/*": 
"https://raw.githubusercontent.com/apache/parquet-testing/bb7b6abbb3fbeff845646364a4286142127be04c/*;
}
}
{code}

Here it appears the {{origin}} remote was set to {{kou/arrow}}, and not 
{{apache/arrow}}. Also, it appears the {{apache-arrow-0.13.0}} folder was under 
a git repository, and so the sources aren't matched up with the git repository. 
(Basically that folder shouldn't have appeared in the Documents list that has 
the {{.cs}} file path.) I think this explains how (1) above happened - the 
build was under a git repository - but this script downloaded an extra copy of 
the sources into that git repository.

I'm wondering how we can fix either this script, or the .NET Tooling, or both, 
to make this experience better for the next release. I think we need to ensure 
two things:
 # The git commit information is set correctly in the {{.nuspec}} and the 
{{.dll}} when the release build is run. I think it just happened by pure luck 
this time. It just so happened that the script was executed in an already 
established repo, and it just so happened to be on the right commit (or maybe 
it wasn't the right commit?).
 # The source link information is set correctly in the symbols file.

[~wesmckinn] [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5035) [C#] ArrowBuffer.Builder is broken

2019-03-27 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5035:
---

 Summary: [C#] ArrowBuffer.Builder is broken
 Key: ARROW-5035
 URL: https://issues.apache.org/jira/browse/ARROW-5035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


If someone creates and uses `ArrowBuffer.Builder` in their code to create 
an ArrowBuffer filled with Boolean values, it is currently producing the wrong 
results.

The reason it is producing the wrong results is because it is taking the 
`sizeof(bool)` (which is 1) and using that for how many bytes to write into the 
backing buffer for each element being added to the builder. However, in Arrow, 
Boolean values are stored in a bit-wise fashion allowing for 8 Boolean values 
in a single byte. Thus, when I add 4 `true` values to the buffer, I expect to 
get a buffer with 1 byte in it with the value 0x0F. However, I am getting a 
buffer with 4 bytes in it, each with value 0x01.

One way to fix this would be to throw in `ArrowBuffer.Builder`'s constructor 
if `T` == `bool` and instead create a new class `ArrowBuffer.BooleanBuilder`, 
which will create Boolean buffers correctly. Looking at the current 
implementation, I think it would be rather hard to special case `typeof(bool)` 
all over in the `Builder` class, but if someone wanted to take that approach 
and made it work, that would be great too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5034) [C#] ArrowStreamWriter should expose synchronous Write methods

2019-03-27 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5034:
---

 Summary: [C#] ArrowStreamWriter should expose synchronous Write 
methods
 Key: ARROW-5034
 URL: https://issues.apache.org/jira/browse/ARROW-5034
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


There are times when callers are in a synchronous method and need to write an 
Arrow stream. However, ArrowStreamWriter (and ArrowFileWriter) only expose 
WriteAsync methods, which means the caller needs to call the Async method, and 
then block on the resulting Task.

Instead, we should also expose Write methods that complete in a synchronous 
fashion, so the callers are free to choose the sync or async methods as they 
need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5019) [C#] ArrowStreamWriter doesn't work on a non-seekable stream

2019-03-26 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5019:
---

 Summary: [C#] ArrowStreamWriter doesn't work on a non-seekable 
stream
 Key: ARROW-5019
 URL: https://issues.apache.org/jira/browse/ARROW-5019
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


When writing to a non-seekable .NET Stream (like a network/socket stream), 
ArrowStreamWriter will throw an exception:

 
{code:java}
Exception thrown: 'System.NotSupportedException' in System.Net.Sockets.dll
This stream does not support seek operations.
{code}
The reason this throws is because we are using `BastStream.Position` in the 
writer to calculate the length of bytes that we've written to the stream. We 
don't need to use the Position in order to calculate the lengths. We should be 
able to write an Arrow RecordBatch to a NetworkStream directly. Today, we need 
to write to a MemoryStream, and then copy the MemoryStream to the NetworkStream.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4997) [C#] ArrowStreamReader doesn't consume whole stream and doesn't implement sync read

2019-03-22 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4997:
---

 Summary: [C#] ArrowStreamReader doesn't consume whole stream and 
doesn't implement sync read
 Key: ARROW-4997
 URL: https://issues.apache.org/jira/browse/ARROW-4997
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


There are 2 major issues with the ArrowStreamReader that are blocking me from 
using it.
 # When it reads a batch from a .NET Stream that doesn't return the whole chunk 
of memory in one "Read" call (like a socket/network stream), it only calls Read 
once, and then continues on. This is an issue because it has "garbage" at the 
end of its buffer (which was never written to by the stream), and when 
attempting to read the next batch, it is in the middle of the previous batch 
from the .NET Stream. This causes all sorts of issues because it assumes the 
next 4 bytes are the message length, which it obviously isn't. See [the reading 
code|https://github.com/apache/arrow/blob/13fd813445b4738cbebbd137490fe3c02071c04b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs#L90-L97]
 for where it only calls Read once - it should be in a loop.
 # ArrowStreamReader has a synchronous ReadNextRecordBatch() method - but it 
throws NotImplementedException. This is necessary when a caller isn't in an 
async method, they can't/shouldn't call the async API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


MemoryPool in Arrow libraries

2019-03-18 Thread Eric Erhardt
We are having a discussion on 
https://github.com/apache/arrow/pull/3925#issuecomment-473605919 about the 
`MemoryPool` class in the C# library.

In reality, the way `MemoryPool` is designed in C#, it is more of a 
"MemoryAllocator" - it just allocates or reallocates memory. There is no API 
for "returning" the memory back into the pool. The memory gets deallocated 
because the finalizer, which is invoked by the garbage collector.

I was looking around a bit, and I see the Java library doesn't have a 
MemoryPool, but instead BufferManager and BufferAllocator types. The Java 
library also has `AutoCloseable` (which I assume is analogous to IDisposable in 
.NET) on all the types - ArrowRecordBatch, ArrowBuf, IntVector, etc.

Looking into the C++ implementation, I don't really see a "pooling" 
implementation, but instead just an "malloc" and "free" (or using jemalloc, if 
built with it enabled). I also see it is using "aligned" memory. I'm not sure 
how/what handles this on the Java side. I assume the alignment is useful for 
SIMD operations, but is it required?

Go also appears to be using the "Allocator" name instead of a "Pool".

So I'm wondering a few things:


  1.  Should we rename the C# "MemoryPool" class to "MemoryAllocator" instead? 
Or is there really an intention in Arrow to have "pooling" of memory?
  2.  Should there be a way to "close" (Dispose() in the .NET nomenclature) 
types that hold memory? Ex. RecordBatch, ArrowArray, etc.

I assume these mechanisms are super useful to the other implementations, so I'm 
trying to keep the C# library designed roughly the same. But I'd appreciate 
some advice.

Eric Erhardt


RE: Publishing C# NuGet package

2019-03-14 Thread Eric Erhardt
Thanks Wes. I have a PR up for this.  https://github.com/apache/arrow/pull/3891

How do I update the wiki page? Is this source controlled somewhere?  I assume 
we want to add a new section after 
https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingRubypackages
 for "Updating C# NuGet package".

I put the instructions for building and uploading the package in the 
csharp/README.md file in my PR. It should be as simple as:

1. Install the latest `.NET Core SDK` from 
https://dotnet.microsoft.com/download.
2. ~/git/arrow/csharp$ dotnet pack -c Release -p:VersionSuffix=''
3. upload the .nupkg and .snupkg files from ~/git/arrow/csharp/artifacts/ to 
https://www.nuget.org/packages/manage/upload

Eric

-Original Message-
From: Wes McKinney  
Sent: Tuesday, March 12, 2019 9:36 AM
To: dev@arrow.apache.org
Subject: Re: Publishing C# NuGet package

thanks Eric -- that sounds great. I think we're going to want to cut the 0.13 
release candidate around 2 weeks from now, so that gives some time to get the 
packaging things sorted out

- Wes

On Thu, Mar 7, 2019 at 4:46 PM Eric Erhardt 
 wrote:
>
> > Some changes may need to be made to the release scripts to update C# 
> > metadata files. The intent it to make it so that the code artifact can be 
> > pushed to a package manager using the official ASF release artifact. If we 
> > don't get it 100% right for 0.13 then > at least we can get a preliminary 
> > package up there and do things 100% by the books in 0.14.
>
> The way you build a NuGet package is you call `dotnet pack` on the `.csproj` 
> file. That will build the .NET assembly (.dll) and package it into a NuGet 
> package (.nupkg, which is a glorified .zip file). That `.nupkg` file is then 
> published to the nuget.org website.
>
> In order to publish it to nuget.org, an account will need to be made to 
> publish it under. Is that something a PMC member can/will do? The intention 
> is for the published package to be the official "Apache Arrow" nuget package.
>
> The .nupkg file can optionally be signed. See 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fnuget%2Fcreate-packages%2Fsign-a-packagedata=02%7C01%7CEric.Erhardt%40microsoft.com%7Ce6fd34cac9a84a6d55a208d6a6f81faa%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636879981946621803sdata=O0rddqqjMLzfkPssh3uOd1i70rgsPktaKIFD%2BdDQuTA%3Dreserved=0.
>
> I can create a JIRA to add all the appropriate NuGet metadata to the .csproj 
> in the repo. That way no file committed into the repo will need to change in 
> order to create the NuGet package. I can also add the instructions to create 
> the NuGet into the csharp README file in that PR.


[jira] [Created] (ARROW-4839) [C#] Add NuGet support

2019-03-12 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4839:
---

 Summary: [C#] Add NuGet support
 Key: ARROW-4839
 URL: https://issues.apache.org/jira/browse/ARROW-4839
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


We should add the metadata to the .csproj so we can create a NuGet package 
without changing any source code.

Also, we should add any scripts and documentation on how to create the NuGet 
package to allow ease of creation at release time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RE: Timeline for 0.13 Arrow release

2019-03-12 Thread Eric Erhardt
> Having C# NuGet package working would be nice

I've opened https://issues.apache.org/jira/browse/ARROW-4839 for this and I 
will be working on it this week. I'd like to see an official Arrow NuGet 
package on www.nuget.org soon, so getting it in this release would work out 
perfect from my side.


RE: Publishing C# NuGet package

2019-03-07 Thread Eric Erhardt
> Some changes may need to be made to the release scripts to update C# metadata 
> files. The intent it to make it so that the code artifact can be pushed to a 
> package manager using the official ASF release artifact. If we don't get it 
> 100% right for 0.13 then > at least we can get a preliminary package up there 
> and do things 100% by the books in 0.14.

The way you build a NuGet package is you call `dotnet pack` on the `.csproj` 
file. That will build the .NET assembly (.dll) and package it into a NuGet 
package (.nupkg, which is a glorified .zip file). That `.nupkg` file is then 
published to the nuget.org website.

In order to publish it to nuget.org, an account will need to be made to publish 
it under. Is that something a PMC member can/will do? The intention is for the 
published package to be the official "Apache Arrow" nuget package.

The .nupkg file can optionally be signed. See 
https://docs.microsoft.com/en-us/nuget/create-packages/sign-a-package.

I can create a JIRA to add all the appropriate NuGet metadata to the .csproj in 
the repo. That way no file committed into the repo will need to change in order 
to create the NuGet package. I can also add the instructions to create the 
NuGet into the csharp README file in that PR.


Publishing C# NuGet package

2019-03-07 Thread Eric Erhardt
When would it be possible to publish the C# Arrow library to 
https://www.nuget.org/? Is this something we can do as part of the 0.13.0 
release?

For anyone who is unfamiliar with NuGet - it is the .NET package manager and is 
the typical way to "ship" a .NET library.

Eric


[jira] [Created] (ARROW-4737) [C#] tests are not running in CI

2019-03-01 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4737:
---

 Summary: [C#] tests are not running in CI
 Key: ARROW-4737
 URL: https://issues.apache.org/jira/browse/ARROW-4737
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


 The C# tests are not running in CI because the filtering logic needs to be 
updated.

For example see 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22671460/job/nk1nn59k5njie720

{quote}Build started
git clone -q https://github.com/apache/arrow.git C:\projects\arrow
git fetch -q origin +refs/pull/3662/merge:
git checkout -qf FETCH_HEAD
Running Install scripts
python ci\detect-changes.py > generated_changes.bat
Affected files: [u'csharp/src/Apache.Arrow/Field.Builder.cs', 
u'csharp/src/Apache.Arrow/Schema.Builder.cs', 
u'csharp/test/Apache.Arrow.Tests/SchemaBuilderTests.cs', 
u'csharp/test/Apache.Arrow.Tests/TypeTests.cs']
Affected topics:
{'c_glib': False,
 'cpp': False,
 'dev': False,
 'docs': False,
 'go': False,
 'integration': False,
 'java': False,
 'js': False,
 'python': False,
 'r': False,
 'ruby': False,
 'rust': False,
 'site': False}
call generated_changes.bat
call ci\appveyor-filter-changes.bat
===
=== No C++ or Python changes, exiting job
===
Build was forcibly terminated
Build success{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4717) [C#] Consider exposing ValueTask instead of Task

2019-02-28 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4717:
---

 Summary: [C#] Consider exposing ValueTask instead of Task
 Key: ARROW-4717
 URL: https://issues.apache.org/jira/browse/ARROW-4717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See [https://github.com/apache/arrow/pull/3736#pullrequestreview-207169204] for 
the discussion and 
[https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/]
 for the reasoning.

Using `Task` in public API requires that a new Task instance be allocated on 
every call. When returning synchronously, using ValueTask will allow the method 
to not allocate.

In order to do this, we will need to take a new dependency on  
{{System.Threading.Tasks.Extensions}} NuGet package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4571) [Format] Tensor.fbs file has multiple root_type declarations

2019-02-14 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4571:
---

 Summary: [Format] Tensor.fbs file has multiple root_type 
declarations
 Key: ARROW-4571
 URL: https://issues.apache.org/jira/browse/ARROW-4571
 Project: Apache Arrow
  Issue Type: Bug
  Components: Format
Reporter: Eric Erhardt


Looking at [the flatbuffers 
doc|https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html], it 
appears there should only be one `root_type` declaration in an fbs file:
{code:java}
The last part of the schema is the root_type. The root type declares what will 
be the root table for the serialized data. In our case, the root type is our 
Monster table.{code}
However, the Tensor.fbs file has multiple `root_type` declarations:

[https://github.com/apache/arrow/blob/69d595ae4c61902b3f2778e536fca6675350c88c/format/Tensor.fbs#L53]

[https://github.com/apache/arrow/blob/69d595ae4c61902b3f2778e536fca6675350c88c/format/Tensor.fbs#L146]

 

See the discussion here: 
https://github.com/apache/arrow/pull/2546#discussion_r256549256



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4543) [C#] Update Flat Buffers code to latest version

2019-02-12 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4543:
---

 Summary: [C#] Update Flat Buffers code to latest version
 Key: ARROW-4543
 URL: https://issues.apache.org/jira/browse/ARROW-4543
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


In order to support zero-copy reads, we should update to the latest Google Flat 
Buffers code. A recent change now allows [C# support for directly reading and 
writing to memory other than 
byte|https://github.com/google/flatbuffers/pull/4886][] which will make reading 
native memory using `Memory` possible.

Along with this update, we should mark the flat buffers types as `internal`, 
since they are an implementation detail of the library. From an API 
perspective, it is confusing to see multiple public types named "Schema", 
"Field", "RecordBatch" etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4503) [C#] ArrowStreamReader allocates and copies data excessively

2019-02-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4503:
---

 Summary: [C#] ArrowStreamReader allocates and copies data 
excessively
 Key: ARROW-4503
 URL: https://issues.apache.org/jira/browse/ARROW-4503
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


When reading `RecordBatch` instances using the `ArrowStreamReader` class, it is 
currently allocating and copying memory 3 times for the data.
 # It is allocating memory in order to [read the data from the 
Stream|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L72-L74],
 and then reading from the Stream.  (This should be the only allocation that is 
necessary.)
 # It then [creates a new 
`ArrowBuffer.Builder`|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L227-L228],
 which allocates another `byte[]`, and calls `Append` on it, which copies the 
values to the new `byte[]`.
 # Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which 
[allocates memory from the MemoryPool, and then copies the intermediate 
buffer|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/ArrowBuffer.Builder.cs#L112-L121]
 into it.

 

We should reduce this overhead to only allocating a single time (from the 
MemoryPool), and not copying the data more times than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4502) [C#] Add support for zero-copy reads

2019-02-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4502:
---

 Summary: [C#] Add support for zero-copy reads
 Key: ARROW-4502
 URL: https://issues.apache.org/jira/browse/ARROW-4502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


In the Python (and C++) API, you can create a `RecordBatchStreamReader`, and if 
you give it an `InputStream` that supports zero-copy reads, you can get back 
`RecordBatch` objects without allocating new memory and copying all the data.

There is currently no way to read Arrow RecordBatch instances without 
allocating new memory and copying all the data. We should enable this scenario 
in the C# API.

 

My proposal is to create a new `class ArrowRecordBatchReader : IArrowReader`. 
It's constructor will take a `ReadOnlyMemory data` parameter, and it will 
be able to read `RecordBatch` instances just like the existing 
`ArrowStreamReader`. As part of this new class, we will refactor any common 
code out of `ArrowStreamReader` in order for the parsing logic to be shared, 
where necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)