[DISCUSSION] Automatically adding a the URL of the corresponding JIRA ticket as a comment in GitHub pull-request

2019-08-23 Thread Kenta Murata
I frequently do the following little bit bothersome steps for opening JIRA tickets when I watch a GitHub pull-request: 1. Select the "ARROW-" text in the title and copy it 2. Open JIRA if I haven't open it 3. Select a ticket to open it 4. Alter the URL by pasting text that copied at the

Re: Assigning Issues to New Users

2019-08-23 Thread Jacques Nadeau
Let's add committers as admins on jira. I don't see any downsides to that. On Fri, Aug 23, 2019, 9:42 PM Wes McKinney wrote: > hi Paddy, > > I just added andyscho to the "Contributor" role on JIRA so you can > assign them the issue now. > > You need to be a JIRA administrator on the "Arrow"

[jira] [Created] (ARROW-6342) [Python] Add pyarrow.record_batch factory function with same basic API / semantics as pyarrow.table

2019-08-23 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6342: --- Summary: [Python] Add pyarrow.record_batch factory function with same basic API / semantics as pyarrow.table Key: ARROW-6342 URL: https://issues.apache.org/jira/browse/ARROW-6342

Re: [DISCUSS] ArrayBuilders with mutable type

2019-08-23 Thread Sutou Kouhei
Hi, Sorry for my late response. The VisitBuilder/VisitBuidlerInline API is enough for Arrow GLib. If VisitBuilder/VisitBuidlerInline API is provided, Arrow GLib doesn't use ArrayBuilder::type()/type_id() to convert C++ object to GLib object. > Is there any application which uses the adaptive >

[jira] [Created] (ARROW-6341) [Python] Implements low-level bindings to Dataset classes:

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6341: - Summary: [Python] Implements low-level bindings to Dataset classes: Key: ARROW-6341 URL: https://issues.apache.org/jira/browse/ARROW-6341 Project:

[jira] [Created] (ARROW-6340) [R] Implements low-level bindings to Dataset classes

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6340: - Summary: [R] Implements low-level bindings to Dataset classes Key: ARROW-6340 URL: https://issues.apache.org/jira/browse/ARROW-6340 Project: Apache

[jira] [Created] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined

2019-08-23 Thread Florian Jetter (Jira)
Florian Jetter created ARROW-6339: - Summary: [Python][C++] Rowgroup statistics for pd.NaT array ill defined Key: ARROW-6339 URL: https://issues.apache.org/jira/browse/ARROW-6339 Project: Apache Arrow

Re: Assigning Issues to New Users

2019-08-23 Thread paddy horan
Thanks Wes, I think it’s fine the way it is, I just wasn’t sure. If it becomes a distraction to PMC’s we can change policy so that committers can help but it’s fairly infrequent so far. Paddy Get Outlook for iOS From: Wes McKinney

[jira] [Created] (ARROW-6338) [R] Type function names don't match type names

2019-08-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6338: -- Summary: [R] Type function names don't match type names Key: ARROW-6338 URL: https://issues.apache.org/jira/browse/ARROW-6338 Project: Apache Arrow

[jira] [Created] (ARROW-6337) [R] as_tibble in R API is a misnomer

2019-08-23 Thread James Lamb (Jira)
James Lamb created ARROW-6337: - Summary: [R] as_tibble in R API is a misnomer Key: ARROW-6337 URL: https://issues.apache.org/jira/browse/ARROW-6337 Project: Apache Arrow Issue Type: Improvement

Re: Assigning Issues to New Users

2019-08-23 Thread Wes McKinney
hi Paddy, I just added andyscho to the "Contributor" role on JIRA so you can assign them the issue now. You need to be a JIRA administrator on the "Arrow" project to alter roles -- currently only PMC members are admins. I am not opposed to letting all committers be Admin on JIRA, but we have

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield wrote: > The vector indexes being limited to 32 bits doesn't limit the addressing >> to 32 bit chunks of memory. For example, you're prime example before was >> image data. Having 2 billion images of 1mb images would still be supported >> without

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Micah Kornfield
> > The vector indexes being limited to 32 bits doesn't limit the addressing > to 32 bit chunks of memory. For example, you're prime example before was > image data. Having 2 billion images of 1mb images would still be supported > without changing the index addressing. This might be pre-coffee

Assigning Issues to New Users

2019-08-23 Thread paddy horan
Hi All, I was going to merge a PR last night when I noticed that it was still unassigned, I believe it is best practice to make sure all issues are assigned on JIRA before merging the corresponding PR? However, I cannot assign the issue to the user, I believe that I need to change his

Re: Binary compatibility of pyarrow.serialize

2019-08-23 Thread Wes McKinney
That said, the protocol data produced now by `RecordBatchStreamWriter` should be readable in 1.0.0 and beyond. `pyarrow.serialize` is only intended for transient storage. We should add some language to the docstring for this function to explain that it is distinct from the Arrow IPC format (which

[jira] [Created] (ARROW-6336) [Python] Clarify pyarrow.serialize/deserialize docstrings viz-a-viz relationship with Arrow IPC protocol

2019-08-23 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6336: --- Summary: [Python] Clarify pyarrow.serialize/deserialize docstrings viz-a-viz relationship with Arrow IPC protocol Key: ARROW-6336 URL:

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-23 Thread Wes McKinney
It isn't implemented in C++ yet but I will try to get a patch up for that soon (today maybe). I think we should create a branch where we can stack the patches that implement this for each language. On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor wrote: > > I'll do the JS updates. Is it safe to

[jira] [Created] (ARROW-6335) [Java] Improve the performance of DictionaryHashTable

2019-08-23 Thread Liya Fan (Jira)
Liya Fan created ARROW-6335: --- Summary: [Java] Improve the performance of DictionaryHashTable Key: ARROW-6335 URL: https://issues.apache.org/jira/browse/ARROW-6335 Project: Apache Arrow Issue Type:

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield wrote: > I don't think we should couple this discussion with the implementation of >> large list, etc since I think those two concepts are independent. > > I'm still trying to balance in my mind which is a worse experience for > consumers of the

Re: In-memory sorting of plasma objects

2019-08-23 Thread Tanveer Ahmad - EWI
Thank you Wes. I see. Regards, Tanveer Ahmad From: Wes McKinney Sent: Thursday, August 22, 2019 5:12:06 PM To: dev@arrow.apache.org Subject: Re: In-memory sorting of plasma objects hi Tanveer, IIUC there is logic for moving data that's managed by Plasma

[jira] [Created] (ARROW-6334) [Java] Improve the dictionary builder API to return the position of the value in the dictionary

2019-08-23 Thread Liya Fan (Jira)
Liya Fan created ARROW-6334: --- Summary: [Java] Improve the dictionary builder API to return the position of the value in the dictionary Key: ARROW-6334 URL: https://issues.apache.org/jira/browse/ARROW-6334

[jira] [Created] (ARROW-6333) [C++] Third party download URLs are duplicated

2019-08-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6333: - Summary: [C++] Third party download URLs are duplicated Key: ARROW-6333 URL: https://issues.apache.org/jira/browse/ARROW-6333 Project: Apache Arrow Issue

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-23 Thread Paul Taylor
I'll do the JS updates. Is it safe to validate against the Arrow C++ integration tests? On 8/22/19 7:28 PM, Micah Kornfield wrote: I created https://issues.apache.org/jira/browse/ARROW-6313 as a tracking issue with sub-issues on the development work. So far no-one has claimed Java and

Re: Binary compatibility of pyarrow.serialize

2019-08-23 Thread Antoine Pitrou
Hi Yevgeni, I don't think we have ever promised binary stability of the pyarrow.serialize() protocol. Binary compatibility starting from 1.0.0 is about the Arrow in-memory format and the Arrow IPC format (i.e. how Arrow arrays, tables... are laid out and how their metadata is encoded on the

[jira] [Created] (ARROW-6332) [Java] [CPP] Handle size of varchar vectors correctly

2019-08-23 Thread Praveen Kumar Desabandu (Jira)
Praveen Kumar Desabandu created ARROW-6332: -- Summary: [Java] [CPP] Handle size of varchar vectors correctly Key: ARROW-6332 URL: https://issues.apache.org/jira/browse/ARROW-6332 Project:

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Micah Kornfield
> > I don't think we should couple this discussion with the implementation of > large list, etc since I think those two concepts are independent. I'm still trying to balance in my mind which is a worse experience for consumers of the libraries for these types. Claiming that Java supports these