[jira] [Commented] (ARROW-2457) garrow_array_builder_append_values() won't work for large arrays

2018-04-14 Thread Kouhei Sutou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438555#comment-16438555
 ] 

Kouhei Sutou commented on ARROW-2457:
-

Do you have a program that reproduces this case?

> garrow_array_builder_append_values() won't work for large arrays
> 
>
> Key: ARROW-2457
> URL: https://issues.apache.org/jira/browse/ARROW-2457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C, C++, GLib
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Haralampos Gavriilidis
>Assignee: Kouhei Sutou
>Priority: Major
>
> I am using garrow_array_builder_append_values() to transform a native C array 
> to an Arrow array, without calling arrow_array_builder_append multiple times. 
> When calling garrow_array_builder_append_values() in array-builder.cpp with 
> following signature:
> {code:java}
> garrow_array_builder_append_values(GArrowArrayBuilder *builder,
> const VALUE *values,
> gint64 values_length,
> const gboolean *is_valids,
> gint64 is_valids_length,
> GError **error,
> const gchar *context)
> {code}
> it will fail for large arrays. This is probably happening because the 
> is_valids array is copied to the valid_bytes array (of different type), for 
> which the memory is allocated on the stack, and not on the heap, like shown 
> on the snippet below:
> {code:java}
> uint8_t valid_bytes[is_valids_length];
> for (gint64 i = 0; i < is_valids_length; ++i){ 
>   valid_bytes[i] = is_valids[i]; 
> }
> {code}
>  A way to avoid this problem would be to allocate memory for the valid_bytes 
> array using malloc() or something similar. Is this behavior intended, maybe 
> because no large arrays should be handed over to that function, or it is 
> rather a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2457) garrow_array_builder_append_values() won't work for large arrays

2018-04-14 Thread Kouhei Sutou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-2457:
---

Assignee: Kouhei Sutou

> garrow_array_builder_append_values() won't work for large arrays
> 
>
> Key: ARROW-2457
> URL: https://issues.apache.org/jira/browse/ARROW-2457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C, C++, GLib
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Haralampos Gavriilidis
>Assignee: Kouhei Sutou
>Priority: Major
>
> I am using garrow_array_builder_append_values() to transform a native C array 
> to an Arrow array, without calling arrow_array_builder_append multiple times. 
> When calling garrow_array_builder_append_values() in array-builder.cpp with 
> following signature:
> {code:java}
> garrow_array_builder_append_values(GArrowArrayBuilder *builder,
> const VALUE *values,
> gint64 values_length,
> const gboolean *is_valids,
> gint64 is_valids_length,
> GError **error,
> const gchar *context)
> {code}
> it will fail for large arrays. This is probably happening because the 
> is_valids array is copied to the valid_bytes array (of different type), for 
> which the memory is allocated on the stack, and not on the heap, like shown 
> on the snippet below:
> {code:java}
> uint8_t valid_bytes[is_valids_length];
> for (gint64 i = 0; i < is_valids_length; ++i){ 
>   valid_bytes[i] = is_valids[i]; 
> }
> {code}
>  A way to avoid this problem would be to allocate memory for the valid_bytes 
> array using malloc() or something similar. Is this behavior intended, maybe 
> because no large arrays should be handed over to that function, or it is 
> rather a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438507#comment-16438507
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

robertnishihara commented on issue #1893: ARROW-2458: [Plasma] Use one thread 
pool per PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381361348
 
 
   Ah, the issue is that the current usage of the global thread pool variable 
is not thread safe, so if two clients in the same process try to `Seal` an 
object at the same time, it can segfault.
   
   This fixes that issue.
   
   I agree that there's no reason to preserve a detrimental behavior (I brought 
that up because we may need to solve the problem in a different way down the 
road since it will still exist for multiple processes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438506#comment-16438506
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

robertnishihara commented on issue #1893: ARROW-2458: [Plasma] Use one thread 
pool per PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381361348
 
 
   Ah, the issue is that the current usage of the global thread pool variable 
is not thread safe, so if two clients in the same process try to `Seal` an 
object at the same time, it can segfault.
   
   This fixes that issue.
   
   I agree with that there's no reason to preserve a detrimental behavior (I 
brought that up because we may need to solve the problem in a different way 
down the road since it will still exist for multiple processes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438504#comment-16438504
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

pitrou commented on issue #1893: ARROW-2458: [Plasma] Use one thread pool per 
PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381360963
 
 
   > @pitrou the problem that you point exists regardless because we already 
give each client its own thread pool when the clients are isolated in different 
processes, and this preserves the behavior when the clients are in the same 
process.
   
   I don't think preserving a behavior which is clearly detrimental is a good 
idea. It's true that when several processes have a thread pool each, there can 
be excessive resource consumption, but that's not a reason to reproduce the 
problem inside a single process as well.
   
   By the way, it's not obvious to me what the issue is we are trying to solve. 
The JIRA issue says "This prevents us from using multiple PlasmaClients in the 
same process (one per thread)", but I don't understand why the current thread 
pool policy prevents that. Your clients will simply be sharing a single thread 
pool.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438503#comment-16438503
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

robertnishihara commented on issue #1893: ARROW-2458: [Plasma] Use one thread 
pool per PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381360785
 
 
   I think the approach in this PR makes sense since. @pitrou the problem that 
you point exists regardless because we already give each client its own thread 
pool when the clients are isolated in different processes, and this preserves 
the behavior when the clients are in the same process.
   
   One thing that could help (though I would prefer doing it in a separate PR) 
is to let the user pass in the thread pool size to the constructor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2397) Document changes in Tensor encoding in IPC.md.

2018-04-14 Thread Robert Nishihara (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Nishihara resolved ARROW-2397.
-
   Resolution: Fixed
Fix Version/s: JS-0.4.0

Issue resolved by pull request 1837
[https://github.com/apache/arrow/pull/1837]

> Document changes in Tensor encoding in IPC.md.
> --
>
> Key: ARROW-2397
> URL: https://issues.apache.org/jira/browse/ARROW-2397
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Update IPC.md to reflect the changes in 
> https://github.com/apache/arrow/pull/1802.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2397) Document changes in Tensor encoding in IPC.md.

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438501#comment-16438501
 ] 

ASF GitHub Bot commented on ARROW-2397:
---

robertnishihara closed pull request #1837: ARROW-2397: [Documentation] Update 
format documentation to describe tensor alignment.
URL: https://github.com/apache/arrow/pull/1837
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/format/IPC.md b/format/IPC.md
index 5a5d3aef6..1c8532377 100644
--- a/format/IPC.md
+++ b/format/IPC.md
@@ -240,8 +240,9 @@ tools. Arrow implementations in general are not required to 
implement this data
 format, though we provide a reference implementation in C++.
 
 When writing a standalone encapsulated tensor message, we use the format as
-indicated above, but additionally align the starting offset (if writing to a
-shared memory region) to be a multiple of 8:
+indicated above, but additionally align the starting offset of the metadata as
+well as the starting offset of the tensor body (if writing to a shared memory
+region) to be multiples of 64 bytes:
 
 ```
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document changes in Tensor encoding in IPC.md.
> --
>
> Key: ARROW-2397
> URL: https://issues.apache.org/jira/browse/ARROW-2397
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Update IPC.md to reflect the changes in 
> https://github.com/apache/arrow/pull/1802.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2460) [Rust] Schema and DataType::Struct should use Vec<Rc>

2018-04-14 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2460:
-

 Summary: [Rust] Schema and DataType::Struct should use 
Vec
 Key: ARROW-2460
 URL: https://issues.apache.org/jira/browse/ARROW-2460
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
 Fix For: 0.10.0


Currently we use Vec instead of Vec which is resulting in 
having to clone fields in some use cases, which could be expensive for structs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438394#comment-16438394
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

pitrou commented on issue #1875: ARROW-2435: [Rust] Add memory pool abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381338090
 
 
   I'll let @xhochy handle it, since he commented on the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-04-14 Thread EmericP (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438389#comment-16438389
 ] 

EmericP commented on ARROW-2459:


I can reproduce the issue easily on both Linux and MacOS. The segfault happens 
in libarrow:
{noformat}
==20185== Process terminating with default action of signal 11 (SIGSEGV)
==20185==  Bad permissions for mapped region at address 0x536E696
==20185==    at 0xB7B36A6: 
arrow::ipc::Message::ReadFrom(std::shared_ptr const&, 
arrow::io::InputStream*, std::unique_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7B4490: arrow::ipc::ReadMessage(arrow::io::InputStream*, 
std::unique_ptr*) (in /usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7B5A0C: 
arrow::ipc::InputStreamMessageReader::ReadNextMessage(std::unique_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7BDF41: 
arrow::ipc::ReadMessageAndValidate(arrow::ipc::MessageReader*, 
arrow::ipc::Message::Type, bool, std::unique_ptr*) [clone .constprop.261] (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C69E0: 
arrow::ipc::RecordBatchStreamReader::RecordBatchStreamReaderImpl::ReadSchema() 
(in /usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C0EB5: 
arrow::ipc::RecordBatchStreamReader::Open(std::unique_ptr, 
std::shared_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB7C0FB3: 
arrow::ipc::RecordBatchStreamReader::Open(arrow::io::InputStream*, 
std::shared_ptr*) (in 
/usr/lib/python3.5/site-packages/pyarrow/libarrow.so.0)
==20185==    by 0xB3770C7: 
__pyx_pw_7pyarrow_3lib_18_RecordBatchReader_3_open(_object*, _object*) (in 
/usr/lib/python3.5/site-packages/pyarrow/lib.cpython-35m-x86_64-linux-gnu.so)
==20185==    by 0x288CAB: PyEval_EvalFrameEx (in /usr/bin/python3)
==20185==    by 0x28E0DE: PyEval_EvalCodeEx (in /usr/bin/python3)
==20185==    by 0x2CA5D2: ??? (in /usr/bin/python3)
==20185==    by 0x311646: PyObject_Call (in /usr/bin/python3){noformat}

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Priority: Major
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438386#comment-16438386
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

andygrove commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381336647
 
 
   @xhochy @pitrou I think we can merge this one now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438385#comment-16438385
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

andygrove commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381336600
 
 
   @crepererum I'm all for switching to allocator API once stable. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438368#comment-16438368
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

pitrou commented on issue #1893: ARROW-2458: [Plasma] Use one thread pool per 
PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381333227
 
 
   Note we could also take an existing header-only C++11 thread pool 
implementation (example: https://github.com/inkooboo/thread-pool-cpp), though 
I'm not sure what our policy is for vendoring code. @fjetter @xhochy 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438366#comment-16438366
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

pitrou commented on issue #1893: ARROW-2458: [Plasma] Use one thread pool per 
PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381332975
 
 
   In other words, I think the right way of improving the plasma client here 
would be to have some global thread pool policy for Arrow. That global policy 
can provide two separate thread pools, one for CPU-bound tasks (e.g. memcpy, 
hashing or compression) and one for IO-bound tasks (such as file or network 
tasks). The CPU-bound threadpool should have, by default, the machine's number 
of HW threads (*). The IO-bound threadpool can have a multiple of that number 
(e.g. 5x).
   
   (*) e.g. through 
http://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-14 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/14/18 1:44 PM:
-

Additional TODO notes:
- -write readme-
- note about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- -format commit message-




was (Author: kszucs):
Additional TODO notes:
- -write readme-
- create a docker container with the dependencies pre-installed
- note about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- -format commit message-



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438341#comment-16438341
 ] 

ASF GitHub Bot commented on ARROW-2430:
---

kszucs commented on a change in pull request #1869: ARROW-2430: [Packaging] MVP 
for branch based packaging automation
URL: https://github.com/apache/arrow/pull/1869#discussion_r181552588
 
 

 ##
 File path: cd/crossbow.py
 ##
 @@ -0,0 +1,201 @@
+#!/usr/bin/env python
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import re
+import sys
+import click
+import pygit2
 
 Review comment:
   Until that explicitly list the whitelist 
[here](https://github.com/kszucs/arrow/blob/d838c42ffb1a6f0dfa8100f0e5174c355a1d7203/dev/release/02-source.sh#L57)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-14 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/14/18 1:23 PM:
-

Additional TODO notes:
- -write readme-
- create a docker container with the dependencies pre-installed
- note about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- -format commit message-




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- note about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- format commit message



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438335#comment-16438335
 ] 

ASF GitHub Bot commented on ARROW-2430:
---

kszucs commented on issue #1869: ARROW-2430: [Packaging] MVP for branch based 
packaging automation
URL: https://github.com/apache/arrow/pull/1869#issuecomment-381328690
 
 
   @cpcloud I've added a readme, rendered 
[here](https://github.com/kszucs/arrow/blob/d838c42ffb1a6f0dfa8100f0e5174c355a1d7203/cd/README.md)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438283#comment-16438283
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

pitrou commented on issue #1893: ARROW-2458: [Plasma] Use one thread pool per 
PlasmaClient
URL: https://github.com/apache/arrow/pull/1893#issuecomment-381315217
 
 
   Is there a risk of excessive resource consumption if there are several 
plasma clients around? I see that the thread pool is used for memcpy and 
hashing. I think a single global thread pool with # threads == # number of CPU 
cores would make more sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-14 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/14/18 9:04 AM:
-

Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- note about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- format commit message




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- format commit message



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)