[jira] [Resolved] (ARROW-2403) [C++] arrow::CpuInfo::model_name_ destructed twice on exit

2018-04-06 Thread Leif Walsh (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Walsh resolved ARROW-2403.
---
Resolution: Fixed
  Assignee: Leif Walsh

> [C++] arrow::CpuInfo::model_name_ destructed twice on exit
> --
>
> Key: ARROW-2403
> URL: https://issues.apache.org/jira/browse/ARROW-2403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Leif Walsh
>Assignee: Leif Walsh
>Priority: Major
>
> {noformat}
> valgrind --trace-children=yes --track-origins=yes 
> --keep-stacktraces=alloc-and-free python -c 'import pyarrow'
> ...
> ==6132== Invalid free() / delete / delete[] / realloc() 
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78) 
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276)
> ==6132== Address 0x9f1f4b0 is 0 bytes inside a block of size 66 free'd
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78)
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276) 
> ==6132== Block was alloc'd at 
> ==6132== at 0x4C2901B: operator new(unsigned long) (vg_replace_malloc.c:324)
> ==6132== by 0xBEF46CC: allocate (new_allocator.h:104)
> ==6132== by 0xBEF46CC: std::string::_Rep::_S_create(unsigned long, unsigned 
> long, std::allocator const&) (basic_string.tcc:1051)
> ==6132== by 0xBEF4F24: std::string::_Rep::_M_clone(std::allocator 
> const&, unsigned long) (basic_string.tcc:1073)
> ==6132== by 0xBEF5359: std::string::assign(std::string const&) 
> (basic_string.tcc:693) 
> ==6132== by 0xB18856C: arrow::CpuInfo::Init() (in /path/to/lib/libarrow.so.0) 
> ==6132== by 0xB190F8D: 
> arrow::compute::FunctionContext::FunctionContext(arrow::MemoryPool*) (in 
> /path/to/lib/libarrow.so.0)
> ==6132== by 0xAD5EC25: 
> __pyx_tp_new_7pyarrow_3lib__FunctionContext(_typeobject*, _object*, _object*) 
> (in /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4F0E122: type_call (typeobject.c:895) 
> ==6132== by 0xAD5AF0E: __Pyx_PyObject_Call(_object*, _object*, _object*) 
> [clone .constprop.861] (in 
> /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0xADEC463: PyInit_lib (in /path/to/lib/pyth
> on3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4FA6F17: _PyImport_LoadDynamicModuleWithSpec (importdl.c:159) 
> ==6132== by 0x4FA4F2A: _imp_create_dynamic_impl (import.c:1982)
> ==6132== by 0x4FA4F2A: _imp_create_dynamic (import.c.h:289){noformat}
> It appears that the destructor for this static string is being called twice 
> by {{__run_exit_handlers}} and I don't know why.  Anyone else seen this?
> For programs which are otherwise normal, this causes (nondeterministic) 
> aborts on exit when glibc detects the double free.  It might be specific to 
> pyarrow, I haven't tried reproducing it with a C program that links with 
> libarrow.so yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2403) [C++] arrow::CpuInfo::model_name_ destructed twice on exit

2018-04-06 Thread Leif Walsh (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429152#comment-16429152
 ] 

Leif Walsh commented on ARROW-2403:
---

Ok, my problem was that parquet-cpp was built linking arrow-cpp statically, and 
then pyarrow was loading both libarrow.so and libparquet.so (which included 
libarrow.a).  Fixed by building with the right cmake option.  Thanks, [~xhochy]!

> [C++] arrow::CpuInfo::model_name_ destructed twice on exit
> --
>
> Key: ARROW-2403
> URL: https://issues.apache.org/jira/browse/ARROW-2403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Leif Walsh
>Priority: Major
>
> {noformat}
> valgrind --trace-children=yes --track-origins=yes 
> --keep-stacktraces=alloc-and-free python -c 'import pyarrow'
> ...
> ==6132== Invalid free() / delete / delete[] / realloc() 
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78) 
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276)
> ==6132== Address 0x9f1f4b0 is 0 bytes inside a block of size 66 free'd
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78)
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276) 
> ==6132== Block was alloc'd at 
> ==6132== at 0x4C2901B: operator new(unsigned long) (vg_replace_malloc.c:324)
> ==6132== by 0xBEF46CC: allocate (new_allocator.h:104)
> ==6132== by 0xBEF46CC: std::string::_Rep::_S_create(unsigned long, unsigned 
> long, std::allocator const&) (basic_string.tcc:1051)
> ==6132== by 0xBEF4F24: std::string::_Rep::_M_clone(std::allocator 
> const&, unsigned long) (basic_string.tcc:1073)
> ==6132== by 0xBEF5359: std::string::assign(std::string const&) 
> (basic_string.tcc:693) 
> ==6132== by 0xB18856C: arrow::CpuInfo::Init() (in /path/to/lib/libarrow.so.0) 
> ==6132== by 0xB190F8D: 
> arrow::compute::FunctionContext::FunctionContext(arrow::MemoryPool*) (in 
> /path/to/lib/libarrow.so.0)
> ==6132== by 0xAD5EC25: 
> __pyx_tp_new_7pyarrow_3lib__FunctionContext(_typeobject*, _object*, _object*) 
> (in /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4F0E122: type_call (typeobject.c:895) 
> ==6132== by 0xAD5AF0E: __Pyx_PyObject_Call(_object*, _object*, _object*) 
> [clone .constprop.861] (in 
> /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0xADEC463: PyInit_lib (in /path/to/lib/pyth
> on3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4FA6F17: _PyImport_LoadDynamicModuleWithSpec (importdl.c:159) 
> ==6132== by 0x4FA4F2A: _imp_create_dynamic_impl (import.c:1982)
> ==6132== by 0x4FA4F2A: _imp_create_dynamic (import.c.h:289){noformat}
> It appears that the destructor for this static string is being called twice 
> by {{__run_exit_handlers}} and I don't know why.  Anyone else seen this?
> For programs which are otherwise normal, this causes (nondeterministic) 
> aborts on exit when glibc detects the double free.  It might be specific to 
> pyarrow, I haven't tried reproducing it with a C program that links with 
> libarrow.so yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2409) [Rust] Test for build warnings, remove current warnings

2018-04-06 Thread Maximilian Roos (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Roos reassigned ARROW-2409:
--

Assignee: Maximilian Roos

> [Rust] Test for build warnings, remove current warnings
> ---
>
> Key: ARROW-2409
> URL: https://issues.apache.org/jira/browse/ARROW-2409
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Maximilian Roos
>Assignee: Maximilian Roos
>Priority: Major
>
> Test for build warnings, remove current warnings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428685#comment-16428685
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to 
convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#issuecomment-379330225
 
 
   @donderom I recently did that change based on some earlier comments from 
@laurentgo  I have added that as another interface. So we are good!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428677#comment-16428677
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on issue #1784: ARROW-2328: [C++] Fixed and unit tested 
feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#issuecomment-379329455
 
 
   Thanks for the review comments. I’ve rewritten the tests as parameter unit 
tests and did most of the rest. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428613#comment-16428613
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179819483
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,125 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+TEST_F(TestTableWriter, SliceRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(300, ));
+  batch = batch->Slice(100, 100);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceStringsRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatchWithNulls(, false));
+  batch = batch->Slice(320, 30);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceStringsWithNullsRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatchWithNulls(, true));
+  batch = batch->Slice(320, 30);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceAtNonEightOffsetStringsWithNullsRoundTrip) {
 
 Review comment:
   Cool :-) Though I don't think using a loop is a big deal either...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428592#comment-16428592
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179816648
 
 

 ##
 File path: cpp/src/arrow/ipc/feather.cc
 ##
 @@ -75,6 +75,43 @@ static Status WritePadded(io::OutputStream* stream, const 
uint8_t* data, int64_t
   return Status::OK();
 }
 
+static Status WritePaddedWithOffset(io::OutputStream* stream, const uint8_t* 
data,
+int64_t bit_offset, const int64_t length,
+int64_t* bytes_written) {
+  data = data + bit_offset / 8;
+  uint8_t bit_shift = static_cast(bit_offset % 8);
 
 Review comment:
   Not really, but I think I got some warning, maybe with clang.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428582#comment-16428582
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179815275
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,125 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+TEST_F(TestTableWriter, SliceRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(300, ));
+  batch = batch->Slice(100, 100);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceStringsRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatchWithNulls(, false));
+  batch = batch->Slice(320, 30);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceStringsWithNullsRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatchWithNulls(, true));
+  batch = batch->Slice(320, 30);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceAtNonEightOffsetStringsWithNullsRoundTrip) {
 
 Review comment:
   I think the problem with loops in unit tests is that you only see the first 
failure. That said I should be able to convert these tests to one parameterised 
test, which hopefully also answers the above concerns..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428578#comment-16428578
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

crepererum commented on a change in pull request #1846: ARROW-2408: [Rust] 
Ability to get ` [T]` from `Buffer`
URL: https://github.com/apache/arrow/pull/1846#discussion_r179813579
 
 

 ##
 File path: rust/src/builder.rs
 ##
 @@ -48,6 +47,18 @@ impl Builder {
 }
 }
 
+/// Get the internal byte-aligned memory buffer as a mutable slice
+pub fn slice_mut(, start: usize, end: usize) ->  [T] {
+unsafe {
+slice::from_raw_parts_mut(self.data.offset(start as isize), (end - 
start) as usize)
+}
+}
+
+/// Override the length
+pub fn set_len( self, len: usize) {
 
 Review comment:
   So this function should semantically be marked as `unsafe`, since the user 
could set this to whatever value (even higher then the actual capacity).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428577#comment-16428577
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179814535
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,125 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+TEST_F(TestTableWriter, SliceRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(300, ));
+  batch = batch->Slice(100, 100);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, ));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_F(TestTableWriter, SliceStringsRoundTrip) {
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatchWithNulls(, false));
+  batch = batch->Slice(320, 30);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, ));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
 
 Review comment:
   Yes(ish) It helps give a helpfull message when the test fails.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428576#comment-16428576
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179814300
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -223,15 +223,17 @@ Status MakeRandomBinaryArray(int64_t length, bool 
include_nulls, MemoryPool* poo
 if (include_nulls && values_index == 0) {
   RETURN_NOT_OK(builder.AppendNull());
 } else {
-  const std::string& value = values[values_index];
+  const std::string value =
+  i < int64_t(values.size()) ? values[values_index] : 
std::to_string(i);
   RETURN_NOT_OK(builder.Append(reinterpret_cast(value.data()),
static_cast(value.size(;
 }
   }
   return builder.Finish(out);
 }
 
-Status MakeStringTypesRecordBatch(std::shared_ptr* out) {
+Status MakeStringTypesRecordBatchWithNulls(std::shared_ptr* out,
 
 Review comment:
   Adding the default argument didn't work, because MakeStringTypesRecordBatch 
is used as a function pointer in other tests. I'll rename is to 
MakeStringTypesRecordBatchWithoutNulls abd this one to 
MakeStringTypesRecordBatch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-04-06 Thread Deepak Majeti (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428528#comment-16428528
 ] 

Deepak Majeti commented on ARROW-2247:
--

https://issues.apache.org/jira/browse/PARQUET-1265 probably is a fix for this 
issue.

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Wes McKinney
>Priority: Major
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2411:
---
Fix Version/s: 0.10.0

> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428518#comment-16428518
 ] 

Mitar commented on ARROW-2355:
--

Thanks for explanation. I understand the issues here. And thank you for all the 
work around resolving them.

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2411:
--

 Summary: [C++] Add method to append batches of null-terminated 
strings to StringBuilder
 Key: ARROW-2411
 URL: https://issues.apache.org/jira/browse/ARROW-2411
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, GLib
Reporter: Uwe L. Korn


We should add a method {{StringBuilder::AppendCStrings(const char** values, 
const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to have 
fast inserts for these strings. See 
https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428513#comment-16428513
 ] 

ASF GitHub Bot commented on ARROW-2407:
---

xhochy commented on a change in pull request #1845: ARROW-2407: [GLib] Add 
garrow_string_array_builder_append_values()
URL: https://github.com/apache/arrow/pull/1845#discussion_r179805269
 
 

 ##
 File path: c_glib/arrow-glib/array-builder.cpp
 ##
 @@ -2184,6 +2184,72 @@ 
garrow_string_array_builder_append(GArrowStringArrayBuilder *builder,
   return garrow_error_check(error, status, "[string-array-builder][append]");
 }
 
+/**
+ * garrow_string_array_builder_append_values:
+ * @builder: A #GArrowStringArrayBuilder.
+ * @values: (array length=values_length): The array of
+ *   strings.
+ * @values_length: The length of `values`.
+ * @is_valids: (nullable) (array length=is_valids_length): The array of
+ *   boolean that shows whether the Nth value is valid or not. If the
+ *   Nth `is_valids` is %TRUE, the Nth `values` is valid value. Otherwise
+ *   the Nth value is null value.
+ * @is_valids_length: The length of `is_valids`.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Append multiple values at once. It's efficient than multiple
+ * `append()` and `append_null()` calls.
+ *
+ * Returns: %TRUE on success, %FALSE if there was an error.
+ *
+ * Since: 0.10.0
+ */
+gboolean
+garrow_string_array_builder_append_values(GArrowStringArrayBuilder *builder,
+  const gchar **values,
+  gint64 values_length,
+  const gboolean *is_valids,
+  gint64 is_valids_length,
+  GError **error)
+{
+  const char *context = "[string-array-builder][append-values]";
+  auto arrow_builder =
+static_cast(
+  garrow_array_builder_get_raw(GARROW_ARRAY_BUILDER(builder)));
+
+  if (is_valids_length > 0) {
+if (values_length != is_valids_length) {
+  g_set_error(error,
+  GARROW_ERROR,
+  GARROW_ERROR_INVALID,
+  "%s: values length and is_valids length must be equal: "
+  "<%" G_GINT64_FORMAT "> != "
+  "<%" G_GINT64_FORMAT ">",
+  context,
+  values_length,
+  is_valids_length);
+  return FALSE;
+}
+  }
+
+  std::vector value_vector;
+  if (is_valids_length > 0) {
+uint8_t valid_bytes[is_valids_length];
+for (gint64 i = 0; i < values_length; ++i) {
+  value_vector.push_back(std::string(values[i]));
+  valid_bytes[i] = is_valids[i];
+}
+auto status = arrow_builder->Append(value_vector, valid_bytes);
 
 Review comment:
   Converting to `std::string` and inserting then into the Builder is still not 
the most efficient way, it would be better if we could directly pass the 
`values` to the StringBuilder instance. This has not be in this PR but I made a 
JIRA about that: https://issues.apache.org/jira/browse/ARROW-2411


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428514#comment-16428514
 ] 

ASF GitHub Bot commented on ARROW-2407:
---

xhochy commented on a change in pull request #1845: ARROW-2407: [GLib] Add 
garrow_string_array_builder_append_values()
URL: https://github.com/apache/arrow/pull/1845#discussion_r179803416
 
 

 ##
 File path: c_glib/arrow-glib/array-builder.cpp
 ##
 @@ -2184,6 +2184,72 @@ 
garrow_string_array_builder_append(GArrowStringArrayBuilder *builder,
   return garrow_error_check(error, status, "[string-array-builder][append]");
 }
 
+/**
+ * garrow_string_array_builder_append_values:
+ * @builder: A #GArrowStringArrayBuilder.
+ * @values: (array length=values_length): The array of
+ *   strings.
+ * @values_length: The length of `values`.
+ * @is_valids: (nullable) (array length=is_valids_length): The array of
+ *   boolean that shows whether the Nth value is valid or not. If the
+ *   Nth `is_valids` is %TRUE, the Nth `values` is valid value. Otherwise
+ *   the Nth value is null value.
+ * @is_valids_length: The length of `is_valids`.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Append multiple values at once. It's efficient than multiple
 
 Review comment:
   Missing word: … It's *more* efficient …


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1975) [C++] Add abi-compliance-checker to build process

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1975:
---
Fix Version/s: (was: 0.10.0)
   0.9.1

> [C++] Add abi-compliance-checker to build process
> -
>
> Key: ARROW-1975
> URL: https://issues.apache.org/jira/browse/ARROW-1975
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I would like to check our baseline modules with 
> https://lvc.github.io/abi-compliance-checker/ to ensure that version upgrades 
> are much smoother and that we don‘t break the ABI in patch releases. 
> As we‘re pre-1.0 yet, I accept that there will be breakage but I would like 
> to keep them to a minimum. Currently the biggest pain with Arrow is you need 
> to pin it in Python always with {{==0.x.y}}, otherwise segfaults are 
> inevitable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2409) [Rust] Test for build warnings, remove current warnings

2018-04-06 Thread Maximilian Roos (JIRA)
Maximilian Roos created ARROW-2409:
--

 Summary: [Rust] Test for build warnings, remove current warnings
 Key: ARROW-2409
 URL: https://issues.apache.org/jira/browse/ARROW-2409
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: Maximilian Roos


Test for build warnings, remove current warnings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428464#comment-16428464
 ] 

Uwe L. Korn commented on ARROW-2355:


[~mitar] you would have to pin `pyarrow>=0.9.0,<0.9.1`. I'm now starting work 
in preparing a 0.9.1 release so we get the situation resolved as soon as 
possible (as with every opensource project, I'm unable to give an ETA on the 
release though).

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2403) [C++] arrow::CpuInfo::model_name_ destructed twice on exit

2018-04-06 Thread Leif Walsh (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428436#comment-16428436
 ] 

Leif Walsh commented on ARROW-2403:
---

I'll take a look at that, thanks.  That's the form of problem I have been 
assuming it is, but what's weird to me is that it's nondeterministic and seems 
to happen sometimes to boost::regex symbols, sometimes to arrow symbols (like 
CpuInfo::model_name_), sometimes to other libraries.

> [C++] arrow::CpuInfo::model_name_ destructed twice on exit
> --
>
> Key: ARROW-2403
> URL: https://issues.apache.org/jira/browse/ARROW-2403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Leif Walsh
>Priority: Major
>
> {noformat}
> valgrind --trace-children=yes --track-origins=yes 
> --keep-stacktraces=alloc-and-free python -c 'import pyarrow'
> ...
> ==6132== Invalid free() / delete / delete[] / realloc() 
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78) 
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276)
> ==6132== Address 0x9f1f4b0 is 0 bytes inside a block of size 66 free'd
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78)
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276) 
> ==6132== Block was alloc'd at 
> ==6132== at 0x4C2901B: operator new(unsigned long) (vg_replace_malloc.c:324)
> ==6132== by 0xBEF46CC: allocate (new_allocator.h:104)
> ==6132== by 0xBEF46CC: std::string::_Rep::_S_create(unsigned long, unsigned 
> long, std::allocator const&) (basic_string.tcc:1051)
> ==6132== by 0xBEF4F24: std::string::_Rep::_M_clone(std::allocator 
> const&, unsigned long) (basic_string.tcc:1073)
> ==6132== by 0xBEF5359: std::string::assign(std::string const&) 
> (basic_string.tcc:693) 
> ==6132== by 0xB18856C: arrow::CpuInfo::Init() (in /path/to/lib/libarrow.so.0) 
> ==6132== by 0xB190F8D: 
> arrow::compute::FunctionContext::FunctionContext(arrow::MemoryPool*) (in 
> /path/to/lib/libarrow.so.0)
> ==6132== by 0xAD5EC25: 
> __pyx_tp_new_7pyarrow_3lib__FunctionContext(_typeobject*, _object*, _object*) 
> (in /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4F0E122: type_call (typeobject.c:895) 
> ==6132== by 0xAD5AF0E: __Pyx_PyObject_Call(_object*, _object*, _object*) 
> [clone .constprop.861] (in 
> /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0xADEC463: PyInit_lib (in /path/to/lib/pyth
> on3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4FA6F17: _PyImport_LoadDynamicModuleWithSpec (importdl.c:159) 
> ==6132== by 0x4FA4F2A: _imp_create_dynamic_impl (import.c:1982)
> ==6132== by 0x4FA4F2A: _imp_create_dynamic (import.c.h:289){noformat}
> It appears that the destructor for this static string is being called twice 
> by {{__run_exit_handlers}} and I don't know why.  Anyone else seen this?
> For programs which are otherwise normal, this causes (nondeterministic) 
> aborts on exit when glibc detects the double free.  It might be specific to 
> pyarrow, I haven't tried reproducing it with a C program that links with 
> libarrow.so yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428433#comment-16428433
 ] 

Mitar commented on ARROW-2355:
--

How do I do a dependency then on PyArrow, so that on Linux and Windows it would 
install 0.9.0, and on Mac OS X 0.9.0.post1? I currently use strict (==) 
versions in my requirements.txt.

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428432#comment-16428432
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

xhochy commented on issue #1846: ARROW-2408: [Rust] Ability to get ` [T]` 
from `Buffer`
URL: https://github.com/apache/arrow/pull/1846#issuecomment-379282023
 
 
   @andygrove I retrigged. But you can normally retrigger by using `git commit 
--amend --no-edit && git push --force` (This updates the hash of the last 
commit, be careful to understand what it really does)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428430#comment-16428430
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

xhochy commented on issue #1846: ARROW-2408: [Rust] Ability to get ` [T]` 
from `Buffer`
URL: https://github.com/apache/arrow/pull/1846#issuecomment-379282023
 
 
   @andygrove I retrigged. But you can normally retrigger by using `git commit 
--amend --no-edit && git push --force` (This updates the hash of the last 
commit)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428428#comment-16428428
 ] 

Uwe L. Korn commented on ARROW-2355:


[~mitar] 0.9.0.post1 is a OSX-only version hidden under 
https://pypi.python.org/pypi/pyarrow/0.9.0.post1

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428427#comment-16428427
 ] 

Mitar commented on ARROW-2355:
--

pip install pyarrow==0.9.0.post1 
Collecting pyarrow==0.9.0.post1
  Could not find a version that satisfies the requirement pyarrow==0.9.0.post1 
(from versions: 0.2.0, 0.3.0, 0.4.0, 0.4.1, 0.5.0.post2, 0.6.0, 0.7.0, 0.7.1, 
0.8.0, 0.9.0)
No matching distribution found for pyarrow==0.9.0.post1

I do not see it here either: https://pypi.python.org/pypi/pyarrow

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428410#comment-16428410
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

donderom commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert 
Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#issuecomment-379276243
 
 
   As I understand the idea is to convert `java.sql.ResultSet` to Arrow. The 
result set can be provided by 3-party lib what will make `sqlToArrow(Connection 
connection, String query)` API not usable. What about something like 
`sqlToArrow(ResultSet resultSet)`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Nicholas Schrock (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428403#comment-16428403
 ] 

Nicholas Schrock commented on ARROW-2355:
-

I was able to repro that this indeed fixed the issue. Thanks so much for 
looking into this!

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428396#comment-16428396
 ] 

ASF GitHub Bot commented on ARROW-2301:
---

xhochy commented on issue #1795: ARROW-2301: [Python] Build source distribution 
inside the manylinux1 docker
URL: https://github.com/apache/arrow/pull/1795#issuecomment-379272507
 
 
   @kou Sorry for the delay, I have uploaded them now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2355.

Resolution: Fixed

{{pyarrow==0.9.0.post1}} is now on PyPI

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2355:
--

Assignee: Uwe L. Korn

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2355:
---
Fix Version/s: 0.9.1

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2406:
---
Fix Version/s: 0.10.0

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
> Fix For: 0.10.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2403) [C++] arrow::CpuInfo::model_name_ destructed twice on exit

2018-04-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428384#comment-16428384
 ] 

Uwe L. Korn commented on ARROW-2403:


I haven't seen this yet. I had once the problem where other global constants 
got deallocated twice but that was due to the fact that parquet-cpp was once 
linked dynamically and once statically in the same process.

> [C++] arrow::CpuInfo::model_name_ destructed twice on exit
> --
>
> Key: ARROW-2403
> URL: https://issues.apache.org/jira/browse/ARROW-2403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Leif Walsh
>Priority: Major
>
> {noformat}
> valgrind --trace-children=yes --track-origins=yes 
> --keep-stacktraces=alloc-and-free python -c 'import pyarrow'
> ...
> ==6132== Invalid free() / delete / delete[] / realloc() 
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78) 
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276)
> ==6132== Address 0x9f1f4b0 is 0 bytes inside a block of size 66 free'd
> ==6132== at 0x4C28040: operator delete(void*) (vg_replace_malloc.c:507)
> ==6132== by 0xBEF47FA: std::basic_string std::allocator >::~basic_string() (basic_string.h:2943)
> ==6132== by 0x5E24AA1: __run_exit_handlers (exit.c:78)
> ==6132== by 0x5E24AF4: exit (exit.c:100) 
> ==6132== by 0x5E0CEB3: (below main) (libc-start.c:276) 
> ==6132== Block was alloc'd at 
> ==6132== at 0x4C2901B: operator new(unsigned long) (vg_replace_malloc.c:324)
> ==6132== by 0xBEF46CC: allocate (new_allocator.h:104)
> ==6132== by 0xBEF46CC: std::string::_Rep::_S_create(unsigned long, unsigned 
> long, std::allocator const&) (basic_string.tcc:1051)
> ==6132== by 0xBEF4F24: std::string::_Rep::_M_clone(std::allocator 
> const&, unsigned long) (basic_string.tcc:1073)
> ==6132== by 0xBEF5359: std::string::assign(std::string const&) 
> (basic_string.tcc:693) 
> ==6132== by 0xB18856C: arrow::CpuInfo::Init() (in /path/to/lib/libarrow.so.0) 
> ==6132== by 0xB190F8D: 
> arrow::compute::FunctionContext::FunctionContext(arrow::MemoryPool*) (in 
> /path/to/lib/libarrow.so.0)
> ==6132== by 0xAD5EC25: 
> __pyx_tp_new_7pyarrow_3lib__FunctionContext(_typeobject*, _object*, _object*) 
> (in /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4F0E122: type_call (typeobject.c:895) 
> ==6132== by 0xAD5AF0E: __Pyx_PyObject_Call(_object*, _object*, _object*) 
> [clone .constprop.861] (in 
> /path/to/lib/python3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0xADEC463: PyInit_lib (in /path/to/lib/pyth
> on3.6/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so)
> ==6132== by 0x4FA6F17: _PyImport_LoadDynamicModuleWithSpec (importdl.c:159) 
> ==6132== by 0x4FA4F2A: _imp_create_dynamic_impl (import.c:1982)
> ==6132== by 0x4FA4F2A: _imp_create_dynamic (import.c.h:289){noformat}
> It appears that the destructor for this static string is being called twice 
> by {{__run_exit_handlers}} and I don't know why.  Anyone else seen this?
> For programs which are otherwise normal, this causes (nondeterministic) 
> aborts on exit when glibc detects the double free.  It might be specific to 
> pyarrow, I haven't tried reproducing it with a C program that links with 
> libarrow.so yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428363#comment-16428363
 ] 

ASF GitHub Bot commented on ARROW-2402:
---

pitrou commented on a change in pull request #1841: ARROW-2402: [C++] Avoid 
spurious copies with FixedSizeBinaryBuilder
URL: https://github.com/apache/arrow/pull/1841#discussion_r179767477
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -730,7 +730,14 @@ class ARROW_EXPORT FixedSizeBinaryBuilder : public 
ArrayBuilder {
   FixedSizeBinaryBuilder(const std::shared_ptr& type,
  MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT);
 
-  Status Append(const uint8_t* value);
+  Status Append(const uint8_t* value) {
 
 Review comment:
   The downside with LTO is inflated build times (the linking step becomes much 
longer, and it isn't parallelized). Also IIRC some compilers have various bugs 
or limitations with LTO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload
> -
>
> Key: ARROW-2402
> URL: https://issues.apache.org/jira/browse/ARROW-2402
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This implies that calling {{FixedSizeBinaryBuilder::Append}} with a "const 
> char*" argument currently instantiates a temporary {{std::string}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2402:
--

Assignee: Antoine Pitrou

> [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload
> -
>
> Key: ARROW-2402
> URL: https://issues.apache.org/jira/browse/ARROW-2402
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This implies that calling {{FixedSizeBinaryBuilder::Append}} with a "const 
> char*" argument currently instantiates a temporary {{std::string}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428336#comment-16428336
 ] 

ASF GitHub Bot commented on ARROW-2402:
---

xhochy closed pull request #1841: ARROW-2402: [C++] Avoid spurious copies with 
FixedSizeBinaryBuilder
URL: https://github.com/apache/arrow/pull/1841
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/array-test.cc b/cpp/src/arrow/array-test.cc
index 0b4342a53..fb1bebfca 100644
--- a/cpp/src/arrow/array-test.cc
+++ b/cpp/src/arrow/array-test.cc
@@ -1412,9 +1412,9 @@ TEST_F(TestFWBinaryArray, ZeroSize) {
   auto type = fixed_size_binary(0);
   FixedSizeBinaryBuilder builder(type);
 
-  ASSERT_OK(builder.Append(nullptr));
-  ASSERT_OK(builder.Append(nullptr));
-  ASSERT_OK(builder.Append(nullptr));
+  ASSERT_OK(builder.Append(""));
+  ASSERT_OK(builder.Append(std::string()));
+  ASSERT_OK(builder.Append(static_cast(nullptr)));
   ASSERT_OK(builder.AppendNull());
   ASSERT_OK(builder.AppendNull());
   ASSERT_OK(builder.AppendNull());
diff --git a/cpp/src/arrow/builder-benchmark.cc 
b/cpp/src/arrow/builder-benchmark.cc
index 12dfbe817..9ad129577 100644
--- a/cpp/src/arrow/builder-benchmark.cc
+++ b/cpp/src/arrow/builder-benchmark.cc
@@ -131,6 +131,25 @@ static void BM_BuildBinaryArray(benchmark::State& state) { 
 // NOLINT non-const
   state.SetBytesProcessed(state.iterations() * iterations * value.size());
 }
 
+static void BM_BuildFixedSizeBinaryArray(
+benchmark::State& state) {  // NOLINT non-const reference
+  const int64_t iterations = 1 << 20;
+  const int width = 10;
+
+  auto type = fixed_size_binary(width);
+  const char value[width + 1] = "1234567890";
+
+  while (state.KeepRunning()) {
+FixedSizeBinaryBuilder builder(type);
+for (int64_t i = 0; i < iterations; i++) {
+  ABORT_NOT_OK(builder.Append(value));
+}
+std::shared_ptr out;
+ABORT_NOT_OK(builder.Finish());
+  }
+  state.SetBytesProcessed(state.iterations() * iterations * width);
+}
+
 
BENCHMARK(BM_BuildPrimitiveArrayNoNulls)->Repetitions(3)->Unit(benchmark::kMicrosecond);
 
BENCHMARK(BM_BuildVectorNoNulls)->Repetitions(3)->Unit(benchmark::kMicrosecond);
 
BENCHMARK(BM_BuildAdaptiveIntNoNulls)->Repetitions(3)->Unit(benchmark::kMicrosecond);
@@ -140,5 +159,6 @@ BENCHMARK(BM_BuildAdaptiveIntNoNullsScalarAppend)
 
BENCHMARK(BM_BuildAdaptiveUIntNoNulls)->Repetitions(3)->Unit(benchmark::kMicrosecond);
 
 BENCHMARK(BM_BuildBinaryArray)->Repetitions(3)->Unit(benchmark::kMicrosecond);
+BENCHMARK(BM_BuildFixedSizeBinaryArray)->Repetitions(3)->Unit(benchmark::kMicrosecond);
 
 }  // namespace arrow
diff --git a/cpp/src/arrow/builder.cc b/cpp/src/arrow/builder.cc
index a502e1fc2..c97253e64 100644
--- a/cpp/src/arrow/builder.cc
+++ b/cpp/src/arrow/builder.cc
@@ -1422,12 +1422,6 @@ FixedSizeBinaryBuilder::FixedSizeBinaryBuilder(const 
std::shared_ptr&
   byte_width_(static_cast(*type).byte_width()),
   byte_builder_(pool) {}
 
-Status FixedSizeBinaryBuilder::Append(const uint8_t* value) {
-  RETURN_NOT_OK(Reserve(1));
-  UnsafeAppendToBitmap(true);
-  return byte_builder_.Append(value, byte_width_);
-}
-
 Status FixedSizeBinaryBuilder::Append(const uint8_t* data, int64_t length,
   const uint8_t* valid_bytes) {
   RETURN_NOT_OK(Reserve(length));
diff --git a/cpp/src/arrow/builder.h b/cpp/src/arrow/builder.h
index 32cfdd408..b0f77bd98 100644
--- a/cpp/src/arrow/builder.h
+++ b/cpp/src/arrow/builder.h
@@ -730,7 +730,14 @@ class ARROW_EXPORT FixedSizeBinaryBuilder : public 
ArrayBuilder {
   FixedSizeBinaryBuilder(const std::shared_ptr& type,
  MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT);
 
-  Status Append(const uint8_t* value);
+  Status Append(const uint8_t* value) {
+RETURN_NOT_OK(Reserve(1));
+UnsafeAppendToBitmap(true);
+return byte_builder_.Append(value, byte_width_);
+  }
+  Status Append(const char* value) {
+return Append(reinterpret_cast(value));
+  }
 
   template 
   Status Append(const std::array& value) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload
> -
>
> Key: ARROW-2402
> URL: https://issues.apache.org/jira/browse/ARROW-2402
> Project: Apache Arrow
>  Issue Type: Improvement
>  

[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428337#comment-16428337
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

andygrove opened a new pull request #1846: ARROW-2408: [Rust] Ability to get 
` [T]` from `Buffer`
URL: https://github.com/apache/arrow/pull/1846
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2402.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1841
[https://github.com/apache/arrow/pull/1841]

> [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload
> -
>
> Key: ARROW-2402
> URL: https://issues.apache.org/jira/browse/ARROW-2402
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This implies that calling {{FixedSizeBinaryBuilder::Append}} with a "const 
> char*" argument currently instantiates a temporary {{std::string}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2408:
--
Labels: pull-request-available  (was: )

> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428338#comment-16428338
 ] 

Andy Grove commented on ARROW-2408:
---

PR: https://github.com/apache/arrow/pull/1846

> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-06 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2408:
-

 Summary: [Rust] It should be possible to get a [T] from 
Builder
 Key: ARROW-2408
 URL: https://issues.apache.org/jira/browse/ARROW-2408
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


I am currently adding Arrow support to the parquet-rs crate and I found a need 
to get a mutable slice from a Buffer to pass to the parquet column reader 
methods.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428332#comment-16428332
 ] 

ASF GitHub Bot commented on ARROW-2402:
---

xhochy commented on a change in pull request #1841: ARROW-2402: [C++] Avoid 
spurious copies with FixedSizeBinaryBuilder
URL: https://github.com/apache/arrow/pull/1841#discussion_r179759229
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -730,7 +730,14 @@ class ARROW_EXPORT FixedSizeBinaryBuilder : public 
ArrayBuilder {
   FixedSizeBinaryBuilder(const std::shared_ptr& type,
  MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT);
 
-  Status Append(const uint8_t* value);
+  Status Append(const uint8_t* value) {
 
 Review comment:
   I understand the performance benefits of moving this to the header but this 
also reminds me that we should have a look at our build process. This will 
inline the code in more places but also makes binaries a bit larger, if we 
would build using `-fLTO` we should be able to get rid of those duplications 
while having even a bit better performance at the same time. 
   
   The change here is probably not significant at all but all the small ones 
add up over time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload
> -
>
> Key: ARROW-2402
> URL: https://issues.apache.org/jira/browse/ARROW-2402
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This implies that calling {{FixedSizeBinaryBuilder::Append}} with a "const 
> char*" argument currently instantiates a temporary {{std::string}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428327#comment-16428327
 ] 

ASF GitHub Bot commented on ARROW-2407:
---

kou opened a new pull request #1845: ARROW-2407: [GLib] Add 
garrow_string_array_builder_append_values()
URL: https://github.com/apache/arrow/pull/1845
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2407:
--
Labels: pull-request-available  (was: )

> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2407:
---

 Summary: [GLib] Add garrow_string_array_builder_append_values()
 Key: ARROW-2407
 URL: https://issues.apache.org/jira/browse/ARROW-2407
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2401) Support filters on Hive partitioned Parquet files

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2401:
--

Assignee: Julius Neuffer

> Support filters on Hive partitioned Parquet files
> -
>
> Key: ARROW-2401
> URL: https://issues.apache.org/jira/browse/ARROW-2401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>
> I'll open a PR on GitHub to support filtering of a `ParquetDataset` along a 
> Hive partitioned directory structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2401) Support filters on Hive partitioned Parquet files

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2401.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1840
[https://github.com/apache/arrow/pull/1840]

> Support filters on Hive partitioned Parquet files
> -
>
> Key: ARROW-2401
> URL: https://issues.apache.org/jira/browse/ARROW-2401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>
> I'll open a PR on GitHub to support filtering of a `ParquetDataset` along a 
> Hive partitioned directory structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2401) Support filters on Hive partitioned Parquet files

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428318#comment-16428318
 ] 

ASF GitHub Bot commented on ARROW-2401:
---

xhochy closed pull request #1840: ARROW-2401 Support filters on Hive 
partitioned Parquet files
URL: https://github.com/apache/arrow/pull/1840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py
index 0929a1549..beeedca03 100644
--- a/python/pyarrow/parquet.py
+++ b/python/pyarrow/parquet.py
@@ -711,9 +711,14 @@ class ParquetDataset(object):
 Divide files into pieces for each row group in the file
 validate_schema : boolean, default True
 Check that individual file schemas are all the same / compatible
+filters : List[Tuple] or None (default)
+List of filters to apply, like ``[('x', '=', 0), ...]``. This
+implements partition-level (hive) filtering only, i.e., to prevent the
+loading of some files of the dataset.
 """
 def __init__(self, path_or_paths, filesystem=None, schema=None,
- metadata=None, split_row_groups=False, validate_schema=True):
+ metadata=None, split_row_groups=False, validate_schema=True,
+ filters=None):
 if filesystem is None:
 a_path = path_or_paths
 if isinstance(a_path, list):
@@ -744,6 +749,9 @@ def __init__(self, path_or_paths, filesystem=None, 
schema=None,
 if validate_schema:
 self.validate_schemas()
 
+if filters:
+self._filter(filters)
+
 def validate_schemas(self):
 open_file = self._get_open_file_func()
 
@@ -849,6 +857,31 @@ def open_file(path, meta=None):
common_metadata=self.common_metadata)
 return open_file
 
+def _filter(self, filters):
+def filter_accepts_partition(part_key, filter, level):
+p_column, p_value_index = part_key
+f_column, op, f_value = filter
+if p_column != f_column:
+return True
+
+f_value_index = self.partitions.get_index(level, p_column,
+  str(f_value))
+if op == "=":
+return f_value_index == p_value_index
+elif op == "!=":
+return f_value_index != p_value_index
+else:
+return True
+
+def one_filter_accepts(piece, filter):
+return all(filter_accepts_partition(part_key, filter, level)
+   for level, part_key in enumerate(piece.partition_keys))
+
+def all_filters_accept(piece):
+return all(one_filter_accepts(piece, f) for f in filters)
+
+self.pieces = [p for p in self.pieces if all_filters_accept(p)]
+
 
 def _ensure_filesystem(fs):
 fs_type = type(fs)
diff --git a/python/pyarrow/tests/test_parquet.py 
b/python/pyarrow/tests/test_parquet.py
index b301de606..27d6bc781 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -996,6 +996,43 @@ def test_read_partitioned_directory(tmpdir):
 _partition_test_for_filesystem(fs, base_path)
 
 
+@parquet
+def test_read_partitioned_directory_filtered(tmpdir):
+fs = LocalFileSystem.get_instance()
+base_path = str(tmpdir)
+
+import pyarrow.parquet as pq
+
+foo_keys = [0, 1]
+bar_keys = ['a', 'b', 'c']
+partition_spec = [
+['foo', foo_keys],
+['bar', bar_keys]
+]
+N = 30
+
+df = pd.DataFrame({
+'index': np.arange(N),
+'foo': np.array(foo_keys, dtype='i4').repeat(15),
+'bar': np.tile(np.tile(np.array(bar_keys, dtype=object), 5), 2),
+'values': np.random.randn(N)
+}, columns=['index', 'foo', 'bar', 'values'])
+
+_generate_partition_directories(fs, base_path, partition_spec, df)
+
+dataset = pq.ParquetDataset(
+base_path, filesystem=fs,
+filters=[('foo', '=', 1), ('bar', '!=', 'b')]
+)
+table = dataset.read()
+result_df = (table.to_pandas()
+ .sort_values(by='index')
+ .reset_index(drop=True))
+
+assert 0 not in result_df['foo'].values
+assert 'b' not in result_df['bar'].values
+
+
 @pytest.yield_fixture
 def s3_example():
 access_key = os.environ['PYARROW_TEST_S3_ACCESS_KEY']


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Resolved] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2405.

Resolution: Fixed

Issue resolved by pull request 1844
[https://github.com/apache/arrow/pull/1844]

> [C++]  is missing in plasma/client.h
> 
>
> Key: ARROW-2405
> URL: https://issues.apache.org/jira/browse/ARROW-2405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.10.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I got the following compile error:
> {noformat}
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:32: error: 
> ‘function’ in namespace ‘std’ does not name a template type
>  const std::function ^~~~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:40: error: expected 
> ‘,’ or ‘...’ before ‘<’ token
>  const std::function ^
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:276:8: error: prototype 
> for ‘arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, const std::function plasma::UniqueID&, const std::shared_ptr&)>&, 
> plasma::ObjectBuffer*)’ does not match any in class ‘plasma::PlasmaClient’
>  Status PlasmaClient::GetBuffers(
> ^~~~
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: error: candidate 
> is: arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, std::vector*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:410:85: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> value_type*, const size_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, 
> std::vector*):: std::shared_ptr&)>&, 
> __gnu_cxx::__alloc_traits >::value_type*)’
>return GetBuffers(_ids[0], num_objects, timeout_ms, wrap_buffer, 
> &(*out)[0]);
>   
>^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const ObjectID*, int64_t, int64_t, 
> plasma::ObjectBuffer*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:417:74: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> ObjectID*&, int64_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> ObjectID*, int64_t, int64_t, plasma::ObjectBuffer*):: const std::shared_ptr&)>&, plasma::ObjectBuffer*&)’
>return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out);
>   ^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> {noformat}
> I don't know why it's not occurred on Travis CI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428308#comment-16428308
 ] 

ASF GitHub Bot commented on ARROW-2405:
---

xhochy closed pull request #1844: ARROW-2405: [C++]  is required for 
std::function
URL: https://github.com/apache/arrow/pull/1844
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/plasma/client.h b/cpp/src/plasma/client.h
index dd8175d48..5787abc32 100644
--- a/cpp/src/plasma/client.h
+++ b/cpp/src/plasma/client.h
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++]  is missing in plasma/client.h
> 
>
> Key: ARROW-2405
> URL: https://issues.apache.org/jira/browse/ARROW-2405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.10.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I got the following compile error:
> {noformat}
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:32: error: 
> ‘function’ in namespace ‘std’ does not name a template type
>  const std::function ^~~~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:40: error: expected 
> ‘,’ or ‘...’ before ‘<’ token
>  const std::function ^
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:276:8: error: prototype 
> for ‘arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, const std::function plasma::UniqueID&, const std::shared_ptr&)>&, 
> plasma::ObjectBuffer*)’ does not match any in class ‘plasma::PlasmaClient’
>  Status PlasmaClient::GetBuffers(
> ^~~~
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: error: candidate 
> is: arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, std::vector*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:410:85: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> value_type*, const size_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, 
> std::vector*):: std::shared_ptr&)>&, 
> __gnu_cxx::__alloc_traits >::value_type*)’
>return GetBuffers(_ids[0], num_objects, timeout_ms, wrap_buffer, 
> &(*out)[0]);
>   
>^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const ObjectID*, int64_t, int64_t, 
> plasma::ObjectBuffer*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:417:74: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> ObjectID*&, int64_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> ObjectID*, int64_t, int64_t, plasma::ObjectBuffer*):: const std::shared_ptr&)>&, plasma::ObjectBuffer*&)’
>return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out);
>   

[jira] [Commented] (ARROW-2404) Fix declaration of 'type_id' hides class member warning in msvc build

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428295#comment-16428295
 ] 

ASF GitHub Bot commented on ARROW-2404:
---

xhochy closed pull request #1843: ARROW-2404: [C++] Fix "declaration of 
'type_id' hides class member" w…
URL: https://github.com/apache/arrow/pull/1843
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/type.h b/cpp/src/arrow/type.h
index 95f010a8f..ce213b999 100644
--- a/cpp/src/arrow/type.h
+++ b/cpp/src/arrow/type.h
@@ -437,8 +437,8 @@ class ARROW_EXPORT FixedSizeBinaryType : public 
FixedWidthType, public Parametri
 
   explicit FixedSizeBinaryType(int32_t byte_width)
   : FixedWidthType(Type::FIXED_SIZE_BINARY), byte_width_(byte_width) {}
-  explicit FixedSizeBinaryType(int32_t byte_width, Type::type type_id)
-  : FixedWidthType(type_id), byte_width_(byte_width) {}
+  explicit FixedSizeBinaryType(int32_t byte_width, Type::type override_type_id)
+  : FixedWidthType(override_type_id), byte_width_(byte_width) {}
 
   Status Accept(TypeVisitor* visitor) const override;
   std::string ToString() const override;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix declaration of 'type_id' hides class member warning in msvc build
> -
>
> Key: ARROW-2404
> URL: https://issues.apache.org/jira/browse/ARROW-2404
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
> Environment: MSVC
>Reporter: rip.nsk
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> warning C4458: declaration of 'type_id' hides class member



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2404) Fix declaration of 'type_id' hides class member warning in msvc build

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2404:
--

Assignee: rip.nsk

> Fix declaration of 'type_id' hides class member warning in msvc build
> -
>
> Key: ARROW-2404
> URL: https://issues.apache.org/jira/browse/ARROW-2404
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
> Environment: MSVC
>Reporter: rip.nsk
>Assignee: rip.nsk
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> warning C4458: declaration of 'type_id' hides class member



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2404) Fix declaration of 'type_id' hides class member warning in msvc build

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2404.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1843
[https://github.com/apache/arrow/pull/1843]

> Fix declaration of 'type_id' hides class member warning in msvc build
> -
>
> Key: ARROW-2404
> URL: https://issues.apache.org/jira/browse/ARROW-2404
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
> Environment: MSVC
>Reporter: rip.nsk
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> warning C4458: declaration of 'type_id' hides class member



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2398) [Rust] Provide a zero-copy builder for type-safe Buffer

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428292#comment-16428292
 ] 

ASF GitHub Bot commented on ARROW-2398:
---

xhochy closed pull request #1838: ARROW-2398: [Rust] Create Builder for 
building buffers directly in aligned memory
URL: https://github.com/apache/arrow/pull/1838
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/README.md b/rust/README.md
index 20e4b73f1..2bbc7f104 100644
--- a/rust/README.md
+++ b/rust/README.md
@@ -25,17 +25,37 @@ This is a starting point for a native Rust implementation 
of Arrow.
 
 The current code demonstrates arrays of primitive types and structs.
 
-## Example
+## Creating an Array from a Vec
 
 ```rust
-let _schema = Schema::new(vec![
-Field::new("a", DataType::Int32, false),
-Field::new("b", DataType::Float32, false),
-]);
-
-let a = Rc::new(Array::from(vec![1,2,3,4,5]));
-let b = Rc::new(Array::from(vec![1.1, 2.2, 3.3, 4.4, 5.5]));
-let _ = Rc::new(Array::from(vec![a,b]));
+// create a memory-aligned Arrow array from an existing Vec
+let array = Array::from(vec![1,2,3,4,5]);
+
+match array.data() {
+::Int32(ref buffer) => {
+println!("array contents: {:?}", buffer.iter().collect::());
+}
+_ => {}
+}
+```
+
+## Creating an Array from a Builder
+
+```rust
+let mut builder: Builder = Builder::new();
+for i in 0..10 {
+builder.push(i);
+}
+let buffer = builder.finish();
+let array = Array::from(buffer);
+```
+
+## Run Examples
+
+Examples can be run using the `cargo run --example` command. For example:
+
+```bash
+cargo run --example array_from_builder
 ```
 
 ## Run Tests
diff --git a/rust/examples/array_from_builder.rs 
b/rust/examples/array_from_builder.rs
new file mode 100644
index 0..3a273a64d
--- /dev/null
+++ b/rust/examples/array_from_builder.rs
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+extern crate arrow;
+
+use arrow::array::*;
+use arrow::buffer::*;
+use arrow::builder::*;
+
+fn main() {
+let mut builder: Builder = Builder::new();
+for i in 0..10 {
+builder.push(i);
+}
+let buffer = builder.finish();
+
+println!("buffer length: {}", buffer.len());
+println!("buffer contents: {:?}", buffer.iter().collect::());
+
+// note that the builder can no longer be used once it has built a buffer, 
so either
+// of the following calls will fail
+
+//builder.push(123);
+//builder.build();
+
+// create a memory-aligned Arrow from the builder (zero-copy)
+let array = Array::from(buffer);
+
+match array.data() {
+::Int32(ref buffer) => {
+println!("array contents: {:?}", 
buffer.iter().collect::());
+}
+_ => {}
+}
+}
diff --git a/rust/examples/array_from_vec.rs b/rust/examples/array_from_vec.rs
new file mode 100644
index 0..8cb4b268f
--- /dev/null
+++ b/rust/examples/array_from_vec.rs
@@ -0,0 +1,32 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+extern crate arrow;
+
+use arrow::array::*;
+
+fn main() {
+// create a memory-aligned Arrow array from an existing Vec
+let array = Array::from(vec![1, 

[jira] [Resolved] (ARROW-2398) [Rust] Provide a zero-copy builder for type-safe Buffer

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2398.

Resolution: Fixed

Issue resolved by pull request 1838
[https://github.com/apache/arrow/pull/1838]

> [Rust] Provide a zero-copy builder for type-safe Buffer
> --
>
> Key: ARROW-2398
> URL: https://issues.apache.org/jira/browse/ARROW-2398
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This PR implements a builder so that buffers can be populated directly in 
> aligned memory (as opposed to being created from Vec).
>  
> https://github.com/apache/arrow/pull/1838



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2401) Support filters on Hive partitioned Parquet files

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428276#comment-16428276
 ] 

ASF GitHub Bot commented on ARROW-2401:
---

xhochy commented on a change in pull request #1840: ARROW-2401 Support filters 
on Hive partitioned Parquet files
URL: https://github.com/apache/arrow/pull/1840#discussion_r179750293
 
 

 ##
 File path: python/pyarrow/parquet.py
 ##
 @@ -849,6 +857,31 @@ def open_file(path, meta=None):
common_metadata=self.common_metadata)
 return open_file
 
+def _filter(self, filters):
+def filter_accepts_partition(part_key, filter, level):
+p_column, p_value_index = part_key
+f_column, op, f_value = filter
+if p_column != f_column:
+return True
+
+f_value_index = self.partitions.get_index(level, p_column,
+  str(f_value))
+if op == "=":
 
 Review comment:
   Clarified offline with @jneuff that we will refactor 
`self.partitions.get_index` a bit to support more operation than the two above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filters on Hive partitioned Parquet files
> -
>
> Key: ARROW-2401
> URL: https://issues.apache.org/jira/browse/ARROW-2401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> I'll open a PR on GitHub to support filtering of a `ParquetDataset` along a 
> Hive partitioned directory structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-06 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2391:

Description: 
When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and a 
`pyarrow.Schema` provided, the function call results in a segmentation fault if 
Pandas `datetime64[ns]` column tries to be converted to a `pyarrow.date64` type.

A minimal example which shows this is:
{code:python}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
df['created'] = pd.to_datetime(df['created'])}}
schema = pa.schema([pa.field('created', pa.date64())])
pa.Table.from_pandas(df, schema=schema)
{code}

Executing the above causes the python interpreter to exit with "Segmentation 
fault: 11".

Attempting to convert into various other datatypes (by specifying different 
schemas) either succeeds, or raises an exception if the conversion is invalid.

  was:
When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and a 
`pyarrow.Schema` provided, the function call results in a segmentation fault if 
Pandas `datetime64[ns]` column tries to be converted to a `pyarrow.date64` type.

 

A minimal example which shows this is:

{{import pandas as pd}}
{{import pyarrow as pa}}

{{df = pd.DataFrame(\{'created': ['2018-05-10T10:24:01']})}}
{{df['created'] = pd.to_datetime(df['created'])}}
{{schema = pa.schema([pa.field('created', pa.date64())])}}
{{pa.Table.from_pandas(df, schema=schema)}}

 

Executing the above causes the python interpreter to exit with "Segmentation 
fault: 11".

 

Attempting to convert into various other datatypes (by specifying different 
schemas) either succeeds, or raises an exception if the conversion is invalid.

Summary: [Python] Segmentation fault from PyArrow when mapping Pandas 
datetime column to pyarrow.date64  (was: Segmentation fault from PyArrow when 
mapping Pandas datetime column to pyarrow.date64)

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-06 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2406:

Description: 
Minimal example to recreate:
{code}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 
This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
{code}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}
{code}
# column 'a' is empty, but no type 'str' specified in Pandas
df = pd.DataFrame({'a': []})
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
 

  was:
Minimal example to recreate:
{code}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
{code}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}
{code}
# column 'a' is empty, but no type 'str' specified in Pandas
df = pd.DataFrame({'a': []})
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
 


> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-06 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2406:

Description: 
Minimal example to recreate:
{code}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
{code}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}
 

  was:
Minimal example to recreate:

 

 
{code:python}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code:python}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}

{code:python}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}


 


> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-06 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2406:
---

 Summary: [Python] Segfault when creating PyArrow table from Pandas 
for empty string column when schema provided
 Key: ARROW-2406
 URL: https://issues.apache.org/jira/browse/ARROW-2406
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Mac OS High Sierra
Python 3.6.3
Reporter: Dave Challis


Minimal example to recreate:

 

 
{code:python}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code:python}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}

{code:python}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2405:
--
Labels: pull-request-available  (was: )

> [C++]  is missing in plasma/client.h
> 
>
> Key: ARROW-2405
> URL: https://issues.apache.org/jira/browse/ARROW-2405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.10.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I got the following compile error:
> {noformat}
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:32: error: 
> ‘function’ in namespace ‘std’ does not name a template type
>  const std::function ^~~~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:40: error: expected 
> ‘,’ or ‘...’ before ‘<’ token
>  const std::function ^
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:276:8: error: prototype 
> for ‘arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, const std::function plasma::UniqueID&, const std::shared_ptr&)>&, 
> plasma::ObjectBuffer*)’ does not match any in class ‘plasma::PlasmaClient’
>  Status PlasmaClient::GetBuffers(
> ^~~~
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: error: candidate 
> is: arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, std::vector*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:410:85: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> value_type*, const size_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, 
> std::vector*):: std::shared_ptr&)>&, 
> __gnu_cxx::__alloc_traits >::value_type*)’
>return GetBuffers(_ids[0], num_objects, timeout_ms, wrap_buffer, 
> &(*out)[0]);
>   
>^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const ObjectID*, int64_t, int64_t, 
> plasma::ObjectBuffer*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:417:74: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> ObjectID*&, int64_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> ObjectID*, int64_t, int64_t, plasma::ObjectBuffer*):: const std::shared_ptr&)>&, plasma::ObjectBuffer*&)’
>return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out);
>   ^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> {noformat}
> I don't know why it's not occurred on Travis CI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2405:
---

 Summary: [C++]  is missing in plasma/client.h
 Key: ARROW-2405
 URL: https://issues.apache.org/jira/browse/ARROW-2405
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Affects Versions: 0.10.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0


I got the following compile error:

{noformat}
In file included from 
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:32: error: ‘function’ 
in namespace ‘std’ does not name a template type
 const std::function&, 
plasma::ObjectBuffer*)’ does not match any in class ‘plasma::PlasmaClient’
 Status PlasmaClient::GetBuffers(
^~~~
In file included from 
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: error: candidate 
is: arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
int64_t, int)
   Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
timeout_ms,
  ^~
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
‘arrow::Status plasma::PlasmaClient::Get(const std::vector&, 
int64_t, std::vector*)’:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:410:85: error: no 
matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
value_type*, const size_t&, int64_t&, const plasma::PlasmaClient::Get(const 
std::vector&, int64_t, 
std::vector*)::&, 
__gnu_cxx::__alloc_traits::value_type*)’
   return GetBuffers(_ids[0], num_objects, timeout_ms, wrap_buffer, 
&(*out)[0]);

 ^
In file included from 
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
int64_t, int)
   Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
timeout_ms,
  ^~
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   candidate 
expects 4 arguments, 5 provided
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
‘arrow::Status plasma::PlasmaClient::Get(const ObjectID*, int64_t, int64_t, 
plasma::ObjectBuffer*)’:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:417:74: error: no 
matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
ObjectID*&, int64_t&, int64_t&, const plasma::PlasmaClient::Get(const 
ObjectID*, int64_t, int64_t, plasma::ObjectBuffer*)::&, plasma::ObjectBuffer*&)’
   return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out);
  ^
In file included from 
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
int64_t, int)
   Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
timeout_ms,
  ^~
/home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   candidate 
expects 4 arguments, 5 provided
{noformat}

I don't know why it's not occurred on Travis CI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428076#comment-16428076
 ] 

ASF GitHub Bot commented on ARROW-2405:
---

kou opened a new pull request #1844: ARROW-2405: [C++]  is required 
for std::function
URL: https://github.com/apache/arrow/pull/1844
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++]  is missing in plasma/client.h
> 
>
> Key: ARROW-2405
> URL: https://issues.apache.org/jira/browse/ARROW-2405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.10.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I got the following compile error:
> {noformat}
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:32: error: 
> ‘function’ in namespace ‘std’ does not name a template type
>  const std::function ^~~~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:363:40: error: expected 
> ‘,’ or ‘...’ before ‘<’ token
>  const std::function ^
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:276:8: error: prototype 
> for ‘arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, const std::function plasma::UniqueID&, const std::shared_ptr&)>&, 
> plasma::ObjectBuffer*)’ does not match any in class ‘plasma::PlasmaClient’
>  Status PlasmaClient::GetBuffers(
> ^~~~
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: error: candidate 
> is: arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, std::vector*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:410:85: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> value_type*, const size_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> std::vector&, int64_t, 
> std::vector*):: std::shared_ptr&)>&, 
> __gnu_cxx::__alloc_traits >::value_type*)’
>return GetBuffers(_ids[0], num_objects, timeout_ms, wrap_buffer, 
> &(*out)[0]);
>   
>^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc: In member function 
> ‘arrow::Status plasma::PlasmaClient::Get(const ObjectID*, int64_t, int64_t, 
> plasma::ObjectBuffer*)’:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:417:74: error: no 
> matching function for call to ‘plasma::PlasmaClient::GetBuffers(const 
> ObjectID*&, int64_t&, int64_t&, const plasma::PlasmaClient::Get(const 
> ObjectID*, int64_t, int64_t, plasma::ObjectBuffer*):: const std::shared_ptr&)>&, plasma::ObjectBuffer*&)’
>return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out);
>   ^
> In file included from 
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.cc:20:0:
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note: candidate: 
> arrow::Status plasma::PlasmaClient::GetBuffers(const ObjectID*, int64_t, 
> int64_t, int)
>Status GetBuffers(const ObjectID* object_ids, int64_t num_objects, int64_t 
> timeout_ms,
>   ^~
> /home/kou/work/cpp/arrow.kou/cpp/src/plasma/client.h:362:10: note:   
> candidate expects 4 arguments, 5 provided

[jira] [Resolved] (ARROW-2267) Rust bindings

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2267.

Resolution: Fixed

Resolved this as we have a basic set in Rust. It will take a longer time until 
we have all integration tests set up but there is no benefit in keeping the 
issue open.

> Rust bindings
> -
>
> Key: ARROW-2267
> URL: https://issues.apache.org/jira/browse/ARROW-2267
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Joshua Howard
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Provide Rust bindings for Arrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2267) Rust bindings

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2267:
---
Fix Version/s: 0.10.0

> Rust bindings
> -
>
> Key: ARROW-2267
> URL: https://issues.apache.org/jira/browse/ARROW-2267
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Joshua Howard
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Provide Rust bindings for Arrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2267) Rust bindings

2018-04-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2267:
--

Assignee: Andy Grove

> Rust bindings
> -
>
> Key: ARROW-2267
> URL: https://issues.apache.org/jira/browse/ARROW-2267
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Joshua Howard
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Provide Rust bindings for Arrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)