[jira] [Updated] (ARROW-3195) [C++] NumPy initialization error check is missing in test
[ https://issues.apache.org/jira/browse/ARROW-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3195: -- Labels: pull-request-available (was: ) > [C++] NumPy initialization error check is missing in test > - > > Key: ARROW-3195 > URL: https://issues.apache.org/jira/browse/ARROW-3195 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.10.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3195) [C++] NumPy initialization error check is missing in test
Kouhei Sutou created ARROW-3195: --- Summary: [C++] NumPy initialization error check is missing in test Key: ARROW-3195 URL: https://issues.apache.org/jira/browse/ARROW-3195 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.10.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3194) Fix setValueCount in spitAndTransfer for variable width vectors
[ https://issues.apache.org/jira/browse/ARROW-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3194: -- Labels: pull-request-available (was: ) > Fix setValueCount in spitAndTransfer for variable width vectors > --- > > Key: ARROW-3194 > URL: https://issues.apache.org/jira/browse/ARROW-3194 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > > We need to use the split length as the value count of the target vector. We > are incorrectly using the value count of the current vector for the target > vector -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3194) Fix setValueCount in spitAndTransfer for variable width vectors
Siddharth Teotia created ARROW-3194: --- Summary: Fix setValueCount in spitAndTransfer for variable width vectors Key: ARROW-3194 URL: https://issues.apache.org/jira/browse/ARROW-3194 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Siddharth Teotia Assignee: Siddharth Teotia We need to use the split length as the value count of the target vector. We are incorrectly using the value count of the current vector for the target vector -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3171) [Java] checkstyle - fix line length and indentation
[ https://issues.apache.org/jira/browse/ARROW-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved ARROW-3171. - Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2512 [https://github.com/apache/arrow/pull/2512] > [Java] checkstyle - fix line length and indentation > --- > > Key: ARROW-3171 > URL: https://issues.apache.org/jira/browse/ARROW-3171 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3127) [C++] Add Tutorial about Sending Tensor from C++ to Python
[ https://issues.apache.org/jira/browse/ARROW-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3127. --- Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2481 [https://github.com/apache/arrow/pull/2481] > [C++] Add Tutorial about Sending Tensor from C++ to Python > -- > > Key: ARROW-3127 > URL: https://issues.apache.org/jira/browse/ARROW-3127 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Simon Mo >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I can add a short tutorial showing how to > # Serialize a floating-point array in C++ into Tensor > # Save the Tensor to Plasma > # Access the Tensor in Python > c.f. [https://github.com/apache/arrow/pull/2481] > cc @[pcmoritz|https://github.com/pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-1325) [R] Bootstrap R bindings subproject
[ https://issues.apache.org/jira/browse/ARROW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1325. - Resolution: Fixed Fix Version/s: (was: 0.12.0) 0.11.0 Issue resolved by pull request 2489 [https://github.com/apache/arrow/pull/2489] > [R] Bootstrap R bindings subproject > --- > > Key: ARROW-1325 > URL: https://issues.apache.org/jira/browse/ARROW-1325 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Clark Fitzgerald >Assignee: Romain François >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > The R language was designed to perform "Columnar in memory analytics". R / > Arrow bindings would be useful for: > * better compatibility between R and other languages / big data systems > * chunk-based data parallelism > * portable and efficient IO via Parquet > R has a C++ interface so the natural way to write these bindings is to > leverage Arrow's C++ library as much as possible. > Feather provides a starting point: > [https://github.com/wesm/feather/tree/master/R]. > This can serve as an umbrella JIRA for work on R related tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3193) [C++] Native database client for MariaDB / MySQL client protocol
Wes McKinney created ARROW-3193: --- Summary: [C++] Native database client for MariaDB / MySQL client protocol Key: ARROW-3193 URL: https://issues.apache.org/jira/browse/ARROW-3193 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Along with the theme of building optimized database client interfaces for Arrow users, it would be valuable to have such an add-on library for MariaDB/MySQL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3061) [Java] headroom does not take into account reservation
[ https://issues.apache.org/jira/browse/ARROW-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Teotia resolved ARROW-3061. - Resolution: Fixed Assignee: Laurent Goujon Fix Version/s: 0.11.0 > [Java] headroom does not take into account reservation > -- > > Key: ARROW-3061 > URL: https://issues.apache.org/jira/browse/ARROW-3061 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > It looks like {{BufferAllocator#getHeadroom()}} does not take into account > current allocator reservation. For example if a parent allocator has a limit > of 10, and a child allocator is created with a reservation of 6, the headroom > of the parent is 4, and the current headroom will report 4, but the actual > headroom is 10 because the child has no memory used and so can use the full > extend of its reservation on top of what the parent allocator can allocate -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3191: Component/s: Java > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {code} > public abstract class Memory { > protected final int length; > protected final long address; > protected abstract void release(); > } > {code} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this
Wes McKinney created ARROW-3192: --- Summary: [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this Key: ARROW-3192 URL: https://issues.apache.org/jira/browse/ARROW-3192 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Wes McKinney Fix For: 0.12.0 The current MessageSerializer imlementation is wasteful when used to read an IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads out of a {{ReadChannel}} require memory allocation * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569 * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290 In C++, we have abstracted memory allocation out of the IPC read path so that zero-copy is possible. I suggest that a similar mechanism can be developed for Java to improve deserialization performance for in-memory messages. The new interface would return {{ArrowBuf}} when performing reads, which could be zero-copy when possible, but when not the current strategy of allocate-copy could be used -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this
[ https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3192: Description: The current MessageSerializer implementation is wasteful when used to read an IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads out of a {{ReadChannel}} require memory allocation * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569 * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290 In C++, we have abstracted memory allocation out of the IPC read path so that zero-copy is possible. I suggest that a similar mechanism can be developed for Java to improve deserialization performance for in-memory messages. The new interface would return {{ArrowBuf}} when performing reads, which could be zero-copy when possible, but when not the current strategy of allocate-copy could be used was: The current MessageSerializer imlementation is wasteful when used to read an IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, reads out of a {{ReadChannel}} require memory allocation * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569 * https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290 In C++, we have abstracted memory allocation out of the IPC read path so that zero-copy is possible. I suggest that a similar mechanism can be developed for Java to improve deserialization performance for in-memory messages. The new interface would return {{ArrowBuf}} when performing reads, which could be zero-copy when possible, but when not the current strategy of allocate-copy could be used > [Java] Implement "ArrowBufReadChannel" abstraction and alternate > MessageSerializer that uses this > - > > Key: ARROW-3192 > URL: https://issues.apache.org/jira/browse/ARROW-3192 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > The current MessageSerializer implementation is wasteful when used to read an > IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, > reads out of a {{ReadChannel}} require memory allocation > * > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569 > * > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290 > In C++, we have abstracted memory allocation out of the IPC read path so that > zero-copy is possible. I suggest that a similar mechanism can be developed > for Java to improve deserialization performance for in-memory messages. The > new interface would return {{ArrowBuf}} when performing reads, which could be > zero-copy when possible, but when not the current strategy of allocate-copy > could be used -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-3191: -- Description: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {code} public abstract class Memory { protected final int length; protected final long address; protected abstract void release(); } {code} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). was: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {code} > public abstract class Memory { > protected final int length; > protected final long address; > protected abstract void release(); > } > {code} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-3191: -- Description: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). was: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {{public abstract class Memory {}} > {{ protected final int length;}} > {{ protected final long address;}} > {{ protected abstract void release();}} > {{}}} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-3191: -- Description: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). was: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release(); }} {{ }}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {{public abstract class Memory {}} > {{ protected final int length;}} > {{ protected final long address;}} > {{protected abstract void release();}} > {{}}} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-3191: -- Description: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). was: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{protected abstract void release();}} {{}}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {{public abstract class Memory {}} > {{ protected final int length;}} > {{ protected final long address;}} > {{ protected abstract void release();}} > {{}}} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
[ https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-3191: -- Description: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: {{public abstract class Memory {}} {{ protected final int length;}} {{ protected final long address;}} {{ protected abstract void release(); }} {{ }}} We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). was: Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: public abstract class Memory { protected final int length; protected final long address; protected abstract void release(); } We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). > [Java] Add support for ArrowBuf to point to arbitrary memory. > - > > Key: ARROW-3191 > URL: https://issues.apache.org/jira/browse/ARROW-3191 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Jacques Nadeau >Priority: Major > > Right now ArrowBuf can only point to memory managed by an Arrow Allocator. > This is because in many cases we want to be able to support hierarchical > accounting of memory and the ability to transfer memory ownership between > separate allocators within the same hierarchy. > At the same time, there are definitely times where someone might want to map > some amount of arbitrary off-heap memory. In these situations they should > still be able to use ArrowBuf. > I propose we have a new ArrowBuf constructor that takes an input that > subclasses an interface similar to: > {{public abstract class Memory {}} > {{ protected final int length;}} > {{ protected final long address;}} > {{ protected abstract void release(); }} > {{ }}} > We then make it so all the memory transfer semantics and accounting behavior > are noops for this type of memory. The target of this work will be to make > sure that all the fast paths continue to be efficient but some of the other > paths like transfer can include a conditional (either directly or through > alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
Jacques Nadeau created ARROW-3191: - Summary: [Java] Add support for ArrowBuf to point to arbitrary memory. Key: ARROW-3191 URL: https://issues.apache.org/jira/browse/ARROW-3191 Project: Apache Arrow Issue Type: New Feature Reporter: Jacques Nadeau Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: public abstract class Memory { protected final int length; protected final long address; protected abstract void release(); } We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset
[ https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607546#comment-16607546 ] Ying Wang commented on ARROW-1956: -- I don't know if this is helpful to people, but I found myself needing to ingest an entire Parquet dataset at once (database company) and I came up with this: ```python import pyarrow.parquet as pq dataset = pq.ParquetDataset('/path/to/dataset') dataset_pieces = dataset.pieces # ParquetDataset is composed of a list of ParquetDatasetPieces for dataset_piece in dataset_pieces: df = dataset_piece.read(partitions=dataset.partitions).to_pandas() # dataset.partitions is ParquetPartitions object # do whatever with dataframe ``` It'll be slow but you can parallelize it as you want and each dataframe will contain the full dataset schema (as opposed to reading the individual ParquetFile which will not include partition keys as part of the schema). > [Python] Support reading specific partitions from a partitioned parquet > dataset > --- > > Key: ARROW-1956 > URL: https://issues.apache.org/jira/browse/ARROW-1956 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Affects Versions: 0.8.0 > Environment: Kernel: 4.14.8-300.fc27.x86_64 > Python: 3.6.3 >Reporter: Suvayu Ali >Priority: Minor > Labels: parquet > Fix For: 0.11.0 > > Attachments: so-example.py > > > I want to read specific partitions from a partitioned parquet dataset. This > is very useful in case of large datasets. I have attached a small script > that creates a dataset and shows what is expected when reading (quoting > salient points below). > # There is no way to read specific partitions in Pandas > # In pyarrow I tried to achieve the goal by providing a list of > files/directories to ParquetDataset, but it didn't work: > # In PySpark it works if I simply do: > {code:none} > spark.read.options('basePath', 'datadir').parquet(*list_of_partitions) > {code} > I also couldn't find a way to easily write partitioned parquet files. In the > end I did it by hand by creating the directory hierarchies, and writing the > individual files myself (similar to the implementation in the attached > script). Again, in PySpark I can do > {code:none} > df.write.partitionBy(*list_of_partitions).parquet(output) > {code} > to achieve that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3189) [Python] Support seek(...) on writable files that support it
Wes McKinney created ARROW-3189: --- Summary: [Python] Support seek(...) on writable files that support it Key: ARROW-3189 URL: https://issues.apache.org/jira/browse/ARROW-3189 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.11.0 See relevant mailing list discussion https://lists.apache.org/thread.html/67fc945fa01b7cf682a241f36de09fe495b84b119868dd7c9f8168ba@%3Cdev.arrow.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3188) [Python] Table.from_arrays segfaults if lists and schema are passed
[ https://issues.apache.org/jira/browse/ARROW-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3188: -- Labels: pull-request-available (was: ) > [Python] Table.from_arrays segfaults if lists and schema are passed > --- > > Key: ARROW-3188 > URL: https://issues.apache.org/jira/browse/ARROW-3188 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > {code:python} > data = [ > list(range(5)), > [-10, -5, 0, 5, 10] > ] > schema = pa.schema([ > pa.field('a', pa.uint16()), > pa.field('b', pa.int64()) > ]) > pa.Table.from_arrays(data, schema=schema) > {code} > Whereas it should raise a `TypeError` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3188) [Python] Table.from_arrays segfaults if lists and schema are passed
Krisztian Szucs created ARROW-3188: -- Summary: [Python] Table.from_arrays segfaults if lists and schema are passed Key: ARROW-3188 URL: https://issues.apache.org/jira/browse/ARROW-3188 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 0.11.0 {code:python} data = [ list(range(5)), [-10, -5, 0, 5, 10] ] schema = pa.schema([ pa.field('a', pa.uint16()), pa.field('b', pa.int64()) ]) pa.Table.from_arrays(data, schema=schema) {code} Whereas it should raise a `TypeError` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3187) [Plasma] Make Plasma Log pluggable with glog
[ https://issues.apache.org/jira/browse/ARROW-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3187: -- Labels: pull-request-available (was: ) > [Plasma] Make Plasma Log pluggable with glog > > > Key: ARROW-3187 > URL: https://issues.apache.org/jira/browse/ARROW-3187 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yuhong Guo >Assignee: Yuhong Guo >Priority: Major > Labels: pull-request-available > > Make Plasma pluggable with glog using Macro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3187) [Plasma] Make Plasma Log pluggable with glog
[ https://issues.apache.org/jira/browse/ARROW-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuhong Guo updated ARROW-3187: -- Summary: [Plasma] Make Plasma Log pluggable with glog (was: [Plasma] Change Logging to glog) > [Plasma] Make Plasma Log pluggable with glog > > > Key: ARROW-3187 > URL: https://issues.apache.org/jira/browse/ARROW-3187 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yuhong Guo >Assignee: Yuhong Guo >Priority: Major > > Make Plasma pluggable with glog using Macro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3187) [Plasma] Change Logging to glog
Yuhong Guo created ARROW-3187: - Summary: [Plasma] Change Logging to glog Key: ARROW-3187 URL: https://issues.apache.org/jira/browse/ARROW-3187 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Yuhong Guo Assignee: Yuhong Guo Make Plasma pluggable with glog using Macro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2948) [Packaging] Generate changelog with crossbow
[ https://issues.apache.org/jira/browse/ARROW-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-2948. Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2348 [https://github.com/apache/arrow/pull/2348] > [Packaging] Generate changelog with crossbow > > > Key: ARROW-2948 > URL: https://issues.apache.org/jira/browse/ARROW-2948 > Project: Apache Arrow > Issue Type: Sub-task > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Basically the port of > https://github.com/apache/arrow/blob/master/dev/release/changelog.py -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2709) [Python] write_to_dataset poor performance when splitting
[ https://issues.apache.org/jira/browse/ARROW-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2709: --- Description: Hello, Posting this from github (master [~wesmckinn] asked for it :) ) [https://github.com/apache/arrow/issues/2138] {code:java} import pandas as pd import numpy as np import pyarrow.parquet as pq import pyarrow as pa idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 12:00:00.000', freq = 'T') dataframe = pd.DataFrame({'numeric_col' : np.random.rand(len(idx)), 'string_col' : pd.util.testing.rands_array(8,len(idx))}, index = idx){code} {code:java} df["dt"] = df.index df["dt"] = df["dt"].dt.date table = pa.Table.from_pandas(df) pq.write_to_dataset(table, root_path='dataset_name', partition_cols=['dt'], flavor='spark'){code} {{this works but is inefficient memory-wise. The arrow table is a copy of the large pandas daframe and quickly saturates the RAM.}} {{Thanks!}} was: Hello, Posting this from github (master [~wesmckinn] asked for it :) ) https://github.com/apache/arrow/issues/2138 {code:java} import pandas as pd import numpy as np import pyarrow.parquet as pq import pyarrow as pa idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 12:00:00.000', freq = 'T') dataframe = pd.DataFrame({'numeric_col' : np.random.rand(len(idx)), 'string_col' : pd.util.testing.rands_array(8,len(idx))}, index = idx){code} {code:java} df["dt"] = df.index df["dt"] = df["dt"].dt.date table = pa.Table.from_pandas(df) pq.write_to_dataset(table, root_path='dataset_name', partition_cols=['dt'], flavor='spark'){code} {{this works but is inefficient memory-wise. The arrow table is a copy of the large pandas daframe and quickly saturates the RAM.}} {{Thanks!}} > [Python] write_to_dataset poor performance when splitting > - > > Key: ARROW-2709 > URL: https://issues.apache.org/jira/browse/ARROW-2709 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Olaf >Priority: Critical > Labels: parquet > > Hello, > Posting this from github (master [~wesmckinn] asked for it :) ) > [https://github.com/apache/arrow/issues/2138] > > {code:java} > import pandas as pd > import numpy as np > import pyarrow.parquet as pq > import pyarrow as pa > idx = pd.date_range('2017-01-01 12:00:00.000', '2017-03-01 12:00:00.000', > freq = 'T') > dataframe = pd.DataFrame({'numeric_col' : np.random.rand(len(idx)), > 'string_col' : > pd.util.testing.rands_array(8,len(idx))}, > index = idx){code} > > {code:java} > df["dt"] = df.index > df["dt"] = df["dt"].dt.date > table = pa.Table.from_pandas(df) > pq.write_to_dataset(table, root_path='dataset_name', partition_cols=['dt'], > flavor='spark'){code} > > {{this works but is inefficient memory-wise. The arrow table is a copy of the > large pandas daframe and quickly saturates the RAM.}} > > {{Thanks!}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3177) [Rust] Update expected error messages for tests that 'should panic'
[ https://issues.apache.org/jira/browse/ARROW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-3177. Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2519 [https://github.com/apache/arrow/pull/2519] > [Rust] Update expected error messages for tests that 'should panic' > --- > > Key: ARROW-3177 > URL: https://issues.apache.org/jira/browse/ARROW-3177 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)