[jira] [Updated] (ARROW-3195) [C++] NumPy initialization error check is missing in test

2018-09-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3195:
--
Labels: pull-request-available  (was: )

> [C++] NumPy initialization error check is missing in test
> -
>
> Key: ARROW-3195
> URL: https://issues.apache.org/jira/browse/ARROW-3195
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3195) [C++] NumPy initialization error check is missing in test

2018-09-07 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-3195:
---

 Summary: [C++] NumPy initialization error check is missing in test
 Key: ARROW-3195
 URL: https://issues.apache.org/jira/browse/ARROW-3195
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.10.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3194) Fix setValueCount in spitAndTransfer for variable width vectors

2018-09-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3194:
--
Labels: pull-request-available  (was: )

> Fix setValueCount in spitAndTransfer for variable width vectors
> ---
>
> Key: ARROW-3194
> URL: https://issues.apache.org/jira/browse/ARROW-3194
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
>  Labels: pull-request-available
>
> We need to use the split length as the value count of the target vector. We 
> are incorrectly using the value count of the current vector for the target 
> vector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3194) Fix setValueCount in spitAndTransfer for variable width vectors

2018-09-07 Thread Siddharth Teotia (JIRA)
Siddharth Teotia created ARROW-3194:
---

 Summary: Fix setValueCount in spitAndTransfer for variable width 
vectors
 Key: ARROW-3194
 URL: https://issues.apache.org/jira/browse/ARROW-3194
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Siddharth Teotia
Assignee: Siddharth Teotia


We need to use the split length as the value count of the target vector. We are 
incorrectly using the value count of the current vector for the target vector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3171) [Java] checkstyle - fix line length and indentation

2018-09-07 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-3171.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2512
[https://github.com/apache/arrow/pull/2512]

> [Java] checkstyle - fix line length and indentation
> ---
>
> Key: ARROW-3171
> URL: https://issues.apache.org/jira/browse/ARROW-3171
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3127) [C++] Add Tutorial about Sending Tensor from C++ to Python

2018-09-07 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3127.
---
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2481
[https://github.com/apache/arrow/pull/2481]

> [C++] Add Tutorial about Sending Tensor from C++ to Python
> --
>
> Key: ARROW-3127
> URL: https://issues.apache.org/jira/browse/ARROW-3127
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Simon Mo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I can add a short tutorial showing how to
>  # Serialize a floating-point array in C++ into Tensor
>  # Save the Tensor to Plasma
>  # Access the Tensor in Python
> c.f. [https://github.com/apache/arrow/pull/2481]
> cc @[pcmoritz|https://github.com/pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1325) [R] Bootstrap R bindings subproject

2018-09-07 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1325.
-
   Resolution: Fixed
Fix Version/s: (was: 0.12.0)
   0.11.0

Issue resolved by pull request 2489
[https://github.com/apache/arrow/pull/2489]

> [R] Bootstrap R bindings subproject
> ---
>
> Key: ARROW-1325
> URL: https://issues.apache.org/jira/browse/ARROW-1325
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Clark Fitzgerald
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The R language was designed to perform "Columnar in memory analytics". R / 
> Arrow bindings would be useful for:
> * better compatibility between R and other languages / big data systems
> * chunk-based data parallelism
> * portable and efficient IO via Parquet
> R has a C++ interface so the natural way to write these bindings is to 
> leverage Arrow's C++ library as much as possible.
> Feather provides a starting point: 
> [https://github.com/wesm/feather/tree/master/R].
> This can serve as an umbrella JIRA for work on R related tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3193) [C++] Native database client for MariaDB / MySQL client protocol

2018-09-07 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3193:
---

 Summary: [C++] Native database client for MariaDB / MySQL client 
protocol
 Key: ARROW-3193
 URL: https://issues.apache.org/jira/browse/ARROW-3193
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


Along with the theme of building optimized database client interfaces for Arrow 
users, it would be valuable to have such an add-on library for MariaDB/MySQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3061) [Java] headroom does not take into account reservation

2018-09-07 Thread Siddharth Teotia (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Teotia resolved ARROW-3061.
-
   Resolution: Fixed
 Assignee: Laurent Goujon
Fix Version/s: 0.11.0

> [Java] headroom does not take into account reservation
> --
>
> Key: ARROW-3061
> URL: https://issues.apache.org/jira/browse/ARROW-3061
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like {{BufferAllocator#getHeadroom()}} does not take into account 
> current allocator reservation. For example if a parent allocator has a limit 
> of 10, and a child allocator is created with a reservation of 6, the headroom 
> of the parent is 4, and the current headroom will report 4, but the actual 
> headroom is 10 because the child has no memory used and so can use the full 
> extend of its reservation on top of what the parent allocator can allocate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3191:

Component/s: Java

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2018-09-07 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3192:
---

 Summary: [Java] Implement "ArrowBufReadChannel" abstraction and 
alternate MessageSerializer that uses this
 Key: ARROW-3192
 URL: https://issues.apache.org/jira/browse/ARROW-3192
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Wes McKinney
 Fix For: 0.12.0


The current MessageSerializer imlementation is wasteful when used to read an 
IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads 
out of a {{ReadChannel}} require memory allocation

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290

In C++, we have abstracted memory allocation out of the IPC read path so that 
zero-copy is possible. I suggest that a similar mechanism can be developed for 
Java to improve deserialization performance for in-memory messages. The new 
interface would return {{ArrowBuf}} when performing reads, which could be 
zero-copy when possible, but when not the current strategy of allocate-copy 
could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2018-09-07 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3192:

Description: 
The current MessageSerializer implementation is wasteful when used to read an 
IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads 
out of a {{ReadChannel}} require memory allocation

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290

In C++, we have abstracted memory allocation out of the IPC read path so that 
zero-copy is possible. I suggest that a similar mechanism can be developed for 
Java to improve deserialization performance for in-memory messages. The new 
interface would return {{ArrowBuf}} when performing reads, which could be 
zero-copy when possible, but when not the current strategy of allocate-copy 
could be used

  was:
The current MessageSerializer imlementation is wasteful when used to read an 
IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads 
out of a {{ReadChannel}} require memory allocation

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569

* 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290

In C++, we have abstracted memory allocation out of the IPC read path so that 
zero-copy is possible. I suggest that a similar mechanism can be developed for 
Java to improve deserialization performance for in-memory messages. The new 
interface would return {{ArrowBuf}} when performing reads, which could be 
zero-copy when possible, but when not the current strategy of allocate-copy 
could be used


> [Java] Implement "ArrowBufReadChannel" abstraction and alternate 
> MessageSerializer that uses this
> -
>
> Key: ARROW-3192
> URL: https://issues.apache.org/jira/browse/ARROW-3192
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> The current MessageSerializer implementation is wasteful when used to read an 
> IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, 
> reads out of a {{ReadChannel}} require memory allocation
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290
> In C++, we have abstracted memory allocation out of the IPC read path so that 
> zero-copy is possible. I suggest that a similar mechanism can be developed 
> for Java to improve deserialization performance for in-memory messages. The 
> new interface would return {{ArrowBuf}} when performing reads, which could be 
> zero-copy when possible, but when not the current strategy of allocate-copy 
> could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{code}
public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release();
}
{code}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
 {{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
 {{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
>  {{  protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
{{  protected final int length;}}
{{  protected final long address;}}
{{   protected abstract void release(); }}
{{ }}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
> {{protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
> {{  protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
{{  protected final int length;}}
{{  protected final long address;}}
{{   protected abstract void release(); }}
{{ }}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release(); 
}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
> {{  protected final int length;}}
> {{  protected final long address;}}
> {{   protected abstract void release(); }}
> {{ }}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-3191:
-

 Summary: [Java] Add support for ArrowBuf to point to arbitrary 
memory.
 Key: ARROW-3191
 URL: https://issues.apache.org/jira/browse/ARROW-3191
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release(); 
}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset

2018-09-07 Thread Ying Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607546#comment-16607546
 ] 

Ying Wang commented on ARROW-1956:
--

I don't know if this is helpful to people, but I found myself needing to ingest 
an entire Parquet dataset at once (database company) and I came up with this:

 

```python

import pyarrow.parquet as pq

 

dataset = pq.ParquetDataset('/path/to/dataset')

dataset_pieces = dataset.pieces # ParquetDataset is composed of a list of 
ParquetDatasetPieces

for dataset_piece in dataset_pieces:

    df = dataset_piece.read(partitions=dataset.partitions).to_pandas() # 
dataset.partitions is ParquetPartitions object

    # do whatever with dataframe

```

It'll be slow but you can parallelize it as you want and each dataframe will 
contain the full dataset schema (as opposed to reading the individual 
ParquetFile which will not include partition keys as part of the schema).

> [Python] Support reading specific partitions from a partitioned parquet 
> dataset
> ---
>
> Key: ARROW-1956
> URL: https://issues.apache.org/jira/browse/ARROW-1956
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Affects Versions: 0.8.0
> Environment: Kernel: 4.14.8-300.fc27.x86_64
> Python: 3.6.3
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: parquet
> Fix For: 0.11.0
>
> Attachments: so-example.py
>
>
> I want to read specific partitions from a partitioned parquet dataset.  This 
> is very useful in case of large datasets.  I have attached a small script 
> that creates a dataset and shows what is expected when reading (quoting 
> salient points below).
> # There is no way to read specific partitions in Pandas
> # In pyarrow I tried to achieve the goal by providing a list of 
> files/directories to ParquetDataset, but it didn't work: 
> # In PySpark it works if I simply do:
> {code:none}
> spark.read.options('basePath', 'datadir').parquet(*list_of_partitions)
> {code}
> I also couldn't find a way to easily write partitioned parquet files.  In the 
> end I did it by hand by creating the directory hierarchies, and writing the 
> individual files myself (similar to the implementation in the attached 
> script).  Again, in PySpark I can do 
> {code:none}
> df.write.partitionBy(*list_of_partitions).parquet(output)
> {code}
> to achieve that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3189) [Python] Support seek(...) on writable files that support it

2018-09-07 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3189:
---

 Summary: [Python] Support seek(...) on writable files that support 
it 
 Key: ARROW-3189
 URL: https://issues.apache.org/jira/browse/ARROW-3189
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.11.0


See relevant mailing list discussion

https://lists.apache.org/thread.html/67fc945fa01b7cf682a241f36de09fe495b84b119868dd7c9f8168ba@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3188) [Python] Table.from_arrays segfaults if lists and schema are passed

2018-09-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3188:
--
Labels: pull-request-available  (was: )

> [Python] Table.from_arrays segfaults if lists and schema are passed
> ---
>
> Key: ARROW-3188
> URL: https://issues.apache.org/jira/browse/ARROW-3188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> {code:python}
>     data = [
>     list(range(5)),
>     [-10, -5, 0, 5, 10]
>     ]
>     schema = pa.schema([
>     pa.field('a', pa.uint16()),
>     pa.field('b', pa.int64())
>     ])
>     pa.Table.from_arrays(data, schema=schema)
> {code}
> Whereas it should raise a `TypeError`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3188) [Python] Table.from_arrays segfaults if lists and schema are passed

2018-09-07 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3188:
--

 Summary: [Python] Table.from_arrays segfaults if lists and schema 
are passed
 Key: ARROW-3188
 URL: https://issues.apache.org/jira/browse/ARROW-3188
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.11.0


{code:python}
    data = [
    list(range(5)),
    [-10, -5, 0, 5, 10]
    ]

    schema = pa.schema([
    pa.field('a', pa.uint16()),
    pa.field('b', pa.int64())
    ])

    pa.Table.from_arrays(data, schema=schema)
{code}

Whereas it should raise a `TypeError`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3187) [Plasma] Make Plasma Log pluggable with glog

2018-09-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3187:
--
Labels: pull-request-available  (was: )

> [Plasma] Make Plasma Log pluggable with glog
> 
>
> Key: ARROW-3187
> URL: https://issues.apache.org/jira/browse/ARROW-3187
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
>
> Make Plasma pluggable with glog using Macro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3187) [Plasma] Make Plasma Log pluggable with glog

2018-09-07 Thread Yuhong Guo (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuhong Guo updated ARROW-3187:
--
Summary: [Plasma] Make Plasma Log pluggable with glog  (was: [Plasma] 
Change Logging to glog)

> [Plasma] Make Plasma Log pluggable with glog
> 
>
> Key: ARROW-3187
> URL: https://issues.apache.org/jira/browse/ARROW-3187
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>
> Make Plasma pluggable with glog using Macro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3187) [Plasma] Change Logging to glog

2018-09-07 Thread Yuhong Guo (JIRA)
Yuhong Guo created ARROW-3187:
-

 Summary: [Plasma] Change Logging to glog
 Key: ARROW-3187
 URL: https://issues.apache.org/jira/browse/ARROW-3187
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Yuhong Guo
Assignee: Yuhong Guo


Make Plasma pluggable with glog using Macro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2948) [Packaging] Generate changelog with crossbow

2018-09-07 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-2948.

   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2348
[https://github.com/apache/arrow/pull/2348]

> [Packaging] Generate changelog with crossbow
> 
>
> Key: ARROW-2948
> URL: https://issues.apache.org/jira/browse/ARROW-2948
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Basically the port of 
> https://github.com/apache/arrow/blob/master/dev/release/changelog.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2709) [Python] write_to_dataset poor performance when splitting

2018-09-07 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2709:
---
Description: 
Hello,

Posting this from github (master [~wesmckinn] asked for it :) )

[https://github.com/apache/arrow/issues/2138]

 
{code:java}
import pandas as pd 
import numpy as np 
import pyarrow.parquet as pq 
import pyarrow as pa 

idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 12:00:00.000', freq 
= 'T') 
dataframe = pd.DataFrame({'numeric_col' : np.random.rand(len(idx)), 
  'string_col' : 
pd.util.testing.rands_array(8,len(idx))}, 
 index = idx){code}
 
{code:java}
df["dt"] = df.index 
df["dt"] = df["dt"].dt.date 
table = pa.Table.from_pandas(df) 
pq.write_to_dataset(table, root_path='dataset_name', partition_cols=['dt'], 
flavor='spark'){code}
 

{{this works but is inefficient memory-wise. The arrow table is a copy of the 
large pandas daframe and quickly saturates the RAM.}}

 

{{Thanks!}}

  was:
Hello,

Posting this from github (master [~wesmckinn] asked for it :) )

https://github.com/apache/arrow/issues/2138

 
{code:java}
import pandas as pd import numpy as np import pyarrow.parquet as pq import 
pyarrow as pa idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 
12:00:00.000', freq = 'T') dataframe = pd.DataFrame({'numeric_col' : 
np.random.rand(len(idx)), 'string_col' : 
pd.util.testing.rands_array(8,len(idx))}, index = idx){code}
 
{code:java}
df["dt"] = df.index df["dt"] = df["dt"].dt.date table = 
pa.Table.from_pandas(df) pq.write_to_dataset(table, root_path='dataset_name', 
partition_cols=['dt'], flavor='spark'){code}
 

{{this works but is inefficient memory-wise. The arrow table is a copy of the 
large pandas daframe and quickly saturates the RAM.}}

 

{{Thanks!}}


> [Python] write_to_dataset poor performance when splitting
> -
>
> Key: ARROW-2709
> URL: https://issues.apache.org/jira/browse/ARROW-2709
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Olaf
>Priority: Critical
>  Labels: parquet
>
> Hello,
> Posting this from github (master [~wesmckinn] asked for it :) )
> [https://github.com/apache/arrow/issues/2138]
>  
> {code:java}
> import pandas as pd 
> import numpy as np 
> import pyarrow.parquet as pq 
> import pyarrow as pa 
> idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 12:00:00.000', 
> freq = 'T') 
> dataframe = pd.DataFrame({'numeric_col' : np.random.rand(len(idx)), 
>   'string_col' : 
> pd.util.testing.rands_array(8,len(idx))}, 
>  index = idx){code}
>  
> {code:java}
> df["dt"] = df.index 
> df["dt"] = df["dt"].dt.date 
> table = pa.Table.from_pandas(df) 
> pq.write_to_dataset(table, root_path='dataset_name', partition_cols=['dt'], 
> flavor='spark'){code}
>  
> {{this works but is inefficient memory-wise. The arrow table is a copy of the 
> large pandas daframe and quickly saturates the RAM.}}
>  
> {{Thanks!}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3177) [Rust] Update expected error messages for tests that 'should panic'

2018-09-07 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-3177.

   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2519
[https://github.com/apache/arrow/pull/2519]

> [Rust] Update expected error messages for tests that 'should panic'
> ---
>
> Key: ARROW-3177
> URL: https://issues.apache.org/jira/browse/ARROW-3177
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)