[jira] [Resolved] (ARROW-4411) [C++] Add missing parameter documentation to UnaryKernel to fix build

2019-01-30 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-4411.

Resolution: Duplicate

> [C++] Add missing parameter documentation to UnaryKernel to fix build
> -
>
> Key: ARROW-4411
> URL: https://issues.apache.org/jira/browse/ARROW-4411
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Also missing was documentation in util-internal.h



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps

2019-01-30 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756928#comment-16756928
 ] 

Micah Kornfield commented on ARROW-1425:


[~icexelloss] I pushed a new PR for this so if you don't mind, I will try to 
finish it up.

> [Python] Document semantic differences between Spark timestamps and Arrow 
> timestamps
> 
>
> Key: ARROW-1425
> URL: https://issues.apache.org/jira/browse/ARROW-1425
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The way that Spark treats non-timezone-aware timestamps as session local can 
> be problematic when using pyarrow which may view the data coming from 
> toPandas() as time zone naive (but with fields as though it were UTC, not 
> session local). We should document carefully how to properly handle the data 
> coming from Spark to avoid problems.
> cc [~bryanc] [~holdenkarau]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps

2019-01-30 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield reassigned ARROW-1425:
--

Assignee: Micah Kornfield  (was: Li Jin)

> [Python] Document semantic differences between Spark timestamps and Arrow 
> timestamps
> 
>
> Key: ARROW-1425
> URL: https://issues.apache.org/jira/browse/ARROW-1425
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The way that Spark treats non-timezone-aware timestamps as session local can 
> be problematic when using pyarrow which may view the data coming from 
> toPandas() as time zone naive (but with fields as though it were UTC, not 
> session local). We should document carefully how to properly handle the data 
> coming from Spark to avoid problems.
> cc [~bryanc] [~holdenkarau]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store

2019-01-30 Thread Siyuan Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756914#comment-16756914
 ] 

Siyuan Zhuang commented on ARROW-4418:
--

[~zhijunfu] I wonder if we could just move "client_connection" from Ray to 
Arrow, so we can share some common functions.

> [Plasma] replace event loop with boost::asio for plasma store
> -
>
> Key: ARROW-4418
> URL: https://issues.apache.org/jira/browse/ARROW-4418
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Original text:
> It would be nice to move plasma store from current event loop to boost::asio 
> to modernize the code, and more importantly to benefit from the 
> functionalities provided by asio, which I think also provides opportunities 
> for performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4436) [Documentation] Clarify instructions for building documentation

2019-01-30 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4436:
--

 Summary: [Documentation] Clarify instructions for building 
documentation
 Key: ARROW-4436
 URL: https://issues.apache.org/jira/browse/ARROW-4436
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Micah Kornfield


[https://arrow.apache.org/docs/building.html#building-docs] seems to assume 
some prior setup.  It is not entirely clear what that setup is.  At the very 
least we should update it to bridge the gap from instructions for building from 
source 
https://arrow.apache.org/docs/python/development.html#building-the-documentation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4430) [C++] add unit test for currently unused append method

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4430:
--
Labels: pull-request-available  (was: )

> [C++] add unit test for currently unused append method
> --
>
> Key: ARROW-4430
> URL: https://issues.apache.org/jira/browse/ARROW-4430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>  Labels: pull-request-available
>
> TypedBufferBuilder::Append(num_copies, value) is currently not 
> tested and is incorrect. Add a test and correct the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store

2019-01-30 Thread Zhijun Fu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756803#comment-16756803
 ] 

Zhijun Fu commented on ARROW-4418:
--

[~robertnishihara], yes totally agreed that the move from event loop to asio 
should be a straightforward change. The multithread optimization can be 
evaluated after that, and see if it can provide non-trivial performance 
improvements.

> [Plasma] replace event loop with boost::asio for plasma store
> -
>
> Key: ARROW-4418
> URL: https://issues.apache.org/jira/browse/ARROW-4418
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Original text:
> It would be nice to move plasma store from current event loop to boost::asio 
> to modernize the code, and more importantly to benefit from the 
> functionalities provided by asio, which I think also provides opportunities 
> for performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store

2019-01-30 Thread Robert Nishihara (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756791#comment-16756791
 ] 

Robert Nishihara commented on ARROW-4418:
-

[~zhijunfu], I'd suggest doing a straightforward translation from the AE event 
loop to boost::asio. The multithreaded architecture sounds pretty complex, but 
I think it makes sense to explore if it provides substantial performance 
benefits (though it may not make sense for modest performance benefits).

Maybe we should use {{asio}} instead of {{boost::asio}} because {{asio}} looks 
like it's header only. http://think-async.com/Asio/AsioAndBoostAsio

[~pitrou], the non-boost {{asio}} looks like it's header only, so that may be 
preferable. What do you think about the overall tradeoff?

> [Plasma] replace event loop with boost::asio for plasma store
> -
>
> Key: ARROW-4418
> URL: https://issues.apache.org/jira/browse/ARROW-4418
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Original text:
> It would be nice to move plasma store from current event loop to boost::asio 
> to modernize the code, and more importantly to benefit from the 
> functionalities provided by asio, which I think also provides opportunities 
> for performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit

2019-01-30 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4422.
---
Resolution: Fixed

Issue resolved by pull request 3526
[https://github.com/apache/arrow/pull/3526]

> [Plasma] Enforce memory limit in plasma, rather than relying on 
> dlmalloc_set_footprint_limit
> 
>
> Key: ARROW-4422
> URL: https://issues.apache.org/jira/browse/ARROW-4422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Plasma (C++)
>Affects Versions: 0.12.0
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
> Fix For: 0.13.0
>
>
> Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory 
> utilization for Plasma Store. This is restrictive because:
>  * It restricts Plasma to dlmalloc, which supports limiting memory footprint, 
> as opposed to other, potentially more performant malloc implementations 
> (e.g., jemalloc)
>  * dlmalloc_set_footprint_limit does not guarantee that the limit set by it 
> the amount of _usable_ memory. As such, we might trigger evictions much 
> earlier than hitting this limit, e.g., due to fragmentation or metadata 
> overheads.
> To overcome this, we can impose the memory limit at Plasma by tracking the 
> number of bytes allocated and freed using malloc and free calls. Whenever the 
> allocation reaches the set limit, we fail any subsequent allocations (i.e., 
> return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and 
> also provides more accurate tracking of memory allocation/capacity. 
> Caveat: We will need to make sure that the mmaped files are living on a file 
> system that is a bit larger (depending on malloc implementation) than the 
> Plasma memory limit to account for the extra memory required due to 
> fragmentation/metadata overheads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4422:
--
Labels: pull-request-available  (was: )

> [Plasma] Enforce memory limit in plasma, rather than relying on 
> dlmalloc_set_footprint_limit
> 
>
> Key: ARROW-4422
> URL: https://issues.apache.org/jira/browse/ARROW-4422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Plasma (C++)
>Affects Versions: 0.12.0
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory 
> utilization for Plasma Store. This is restrictive because:
>  * It restricts Plasma to dlmalloc, which supports limiting memory footprint, 
> as opposed to other, potentially more performant malloc implementations 
> (e.g., jemalloc)
>  * dlmalloc_set_footprint_limit does not guarantee that the limit set by it 
> the amount of _usable_ memory. As such, we might trigger evictions much 
> earlier than hitting this limit, e.g., due to fragmentation or metadata 
> overheads.
> To overcome this, we can impose the memory limit at Plasma by tracking the 
> number of bytes allocated and freed using malloc and free calls. Whenever the 
> allocation reaches the set limit, we fail any subsequent allocations (i.e., 
> return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and 
> also provides more accurate tracking of memory allocation/capacity. 
> Caveat: We will need to make sure that the mmaped files are living on a file 
> system that is a bit larger (depending on malloc implementation) than the 
> Plasma memory limit to account for the extra memory required due to 
> fragmentation/metadata overheads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4435) [C#] Add .sln file and minor .csproj fix ups

2019-01-30 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4435:
---

 Summary: [C#] Add .sln file and minor .csproj fix ups
 Key: ARROW-4435
 URL: https://issues.apache.org/jira/browse/ARROW-4435
 Project: Apache Arrow
  Issue Type: Task
  Components: C#
Reporter: Eric Erhardt


There is currently no .sln file in the repo, which makes it hard to use the src 
and test code at the same time.

 

Also, there are some settings in the .csproj that can be moved up to the outer 
PropertyGroup, and not under a "Configuration|Platform" conditional, like they 
were in the old .csproj format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits

2019-01-30 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4268.
-
Resolution: Fixed

resolved by 
https://github.com/apache/arrow/commit/012f77a96880cff49f588ae1ff2f65d5105ee433

> [C++] Add C primitive to Arrow:Type compile time in TypeTraits
> --
>
> Key: ARROW-4268
> URL: https://issues.apache.org/jira/browse/ARROW-4268
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>   Original Estimate: 1h
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The user would use something like
> {code:c++}
> ...
> using ArrowType = CTypeTraits::ArrowType;
> using ArrayType = CTypeTraits::ArrayType;
> auto type = CTypeTraits::type_singleton();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4407) [C++] ExternalProject_Add does not capture CC/CXX correctly

2019-01-30 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4407.
-
Resolution: Fixed

Issue resolved by pull request 3515
[https://github.com/apache/arrow/pull/3515]

> [C++] ExternalProject_Add does not capture CC/CXX correctly
> ---
>
> Key: ARROW-4407
> URL: https://issues.apache.org/jira/browse/ARROW-4407
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The issue is that CC/CXX environment variables are captured on the first 
> invocation of the builder (e.g make or ninja) instead of when CMake is 
> invoked into to build directory. This can lead to compilation errors (notably 
> when compiling with clang in the top directory due to the addition of the 
> `-Qunused-arguments` option).
> This leads to an issue where I have a script that prepare the build directory 
> and export CXX within the script. When I jump in the build folder, there's a 
> mismatch between the external gbenchmark (and all deps if conda is not used) 
> compiler and the build.
> To reproduce:
> # Create a new build directory with clang as compiler, don't build yet
> # In a new shell (without the compiler environment variable), go into 
> directory invoke make/ninja



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4434) [Python] Cannot create empty StructArray via pa.StructArray.from_arrays

2019-01-30 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4434:
--

 Summary: [Python] Cannot create empty StructArray via 
pa.StructArray.from_arrays
 Key: ARROW-4434
 URL: https://issues.apache.org/jira/browse/ARROW-4434
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Krisztian Szucs


{code:python}
In [5]: pa.StructArray.from_arrays([], names=[])
---
ValueErrorTraceback (most recent call last)
 in 
> 1 pa.StructArray.from_arrays([], names=[])

~/Workspace/arrow/python/pyarrow/array.pxi in 
pyarrow.lib.StructArray.from_arrays()
   1326 num_arrays = len(arrays)
   1327 if num_arrays == 0:
-> 1328 raise ValueError("arrays list is empty")
   1329
   1330 length = len(arrays[0])

ValueError: arrays list is empty
{code}

however

{code:python}
pa.array([], type=pa.struct([]))
{code}

works



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4433) [R] Segmentation fault when instantiating arrow::table from data frame

2019-01-30 Thread Lutz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lutz updated ARROW-4433:

Summary: [R] Segmentation fault when instantiating arrow::table from data 
frame  (was: Segmentation fault when instantiating arrow::table from data frame)

> [R] Segmentation fault when instantiating arrow::table from data frame
> --
>
> Key: ARROW-4433
> URL: https://issues.apache.org/jira/browse/ARROW-4433
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
> Environment: R version 3.5.2 (2018-12-20)
> Platform: x86_64-suse-linux-gnu (64-bit)
>Reporter: Lutz
>Priority: Critical
>
> The sample code from [https://github.com/apache/arrow/tree/master/r] leads to 
> a segmentation fault
>  
> {quote}library(arrow, warn.conflicts = FALSE)
>  library(tibble)
>  library(reticulate)
>  tf <- tempfile() 
> (tib <- tibble(x = 1:10, y = rnorm(10)))
> arrow::write_arrow(tib, tf)
>  *** caught segfault *** 
> address (nil), cause 'memory not mapped' 
>  
> Traceback: 
>  1: Table__from_dataframe(.data) 
>  2: shared_ptr_is_null(xp) 
>  3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
>  4: table(x) 
>  5: to_arrow.data.frame(x) 
>  6: to_arrow(x) 
>  7: write_arrow.fs_path(x, fs::path_abs(stream), ...) 
>  8: write_arrow(x, fs::path_abs(stream), ...) 
>  9: write_arrow.character(tib, tf) 
> 10: arrow::write_arrow(tib, tf)
>  {quote}
>  
> The same problem appears also when just calling arrow::table(tib):
> {quote}> arrow::table(tib) 
>  
>  *** caught segfault *** 
> address (nil), cause 'memory not mapped' 
>  
> Traceback: 
>  1: Table__from_dataframe(.data) 
>  2: shared_ptr_is_null(xp) 
>  3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
>  4: arrow::table(tib)
>  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4433) Segmentation fault when instantiating arrow::table from data frame

2019-01-30 Thread Lutz (JIRA)
Lutz created ARROW-4433:
---

 Summary: Segmentation fault when instantiating arrow::table from 
data frame
 Key: ARROW-4433
 URL: https://issues.apache.org/jira/browse/ARROW-4433
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
 Environment: R version 3.5.2 (2018-12-20)
Platform: x86_64-suse-linux-gnu (64-bit)
Reporter: Lutz


The sample code from [https://github.com/apache/arrow/tree/master/r] leads to a 
segmentation fault

 
{quote}library(arrow, warn.conflicts = FALSE)
 library(tibble)
 library(reticulate)
 tf <- tempfile() 
(tib <- tibble(x = 1:10, y = rnorm(10)))
arrow::write_arrow(tib, tf)

 *** caught segfault *** 
address (nil), cause 'memory not mapped' 
 
Traceback: 
 1: Table__from_dataframe(.data) 
 2: shared_ptr_is_null(xp) 
 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
 4: table(x) 
 5: to_arrow.data.frame(x) 
 6: to_arrow(x) 
 7: write_arrow.fs_path(x, fs::path_abs(stream), ...) 
 8: write_arrow(x, fs::path_abs(stream), ...) 
 9: write_arrow.character(tib, tf) 
10: arrow::write_arrow(tib, tf)
 {quote}
 

The same problem appears also when just calling arrow::table(tib):
{quote}> arrow::table(tib) 
 
 *** caught segfault *** 
address (nil), cause 'memory not mapped' 
 
Traceback: 
 1: Table__from_dataframe(.data) 
 2: shared_ptr_is_null(xp) 
 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
 4: arrow::table(tib)
 
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4432) [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables

2019-01-30 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-4432:
---
Description: 
The following test case fails for empty tables:

{code:python}
import hypothesis as h
import pyarrow.tests.strategies as past

@h.given(past.all_tables)
def test_pandas_roundtrip(table):
df = table.to_pandas()
table_ = pa.Table.from_pandas(df)
assert table == table_
{code}

  was:
The following test case fails for empty tables:

{code:python}
@h.given(past.all_tables)
def test_pandas_roundtrip(table):
df = table.to_pandas()
table_ = pa.Table.from_pandas(df)
assert table == table_
{code}


> [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables
> ---
>
> Key: ARROW-4432
> URL: https://issues.apache.org/jira/browse/ARROW-4432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: hypothesis
>
> The following test case fails for empty tables:
> {code:python}
> import hypothesis as h
> import pyarrow.tests.strategies as past
> @h.given(past.all_tables)
> def test_pandas_roundtrip(table):
> df = table.to_pandas()
> table_ = pa.Table.from_pandas(df)
> assert table == table_
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4432) [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables

2019-01-30 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4432:
--

 Summary: [Python][Hypothesis] Empty table - pandas roundtrip 
produces inequal tables
 Key: ARROW-4432
 URL: https://issues.apache.org/jira/browse/ARROW-4432
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs


The following test case fails for empty tables:

{code:python}
@h.given(past.all_tables)
def test_pandas_roundtrip(table):
df = table.to_pandas()
table_ = pa.Table.from_pandas(df)
assert table == table_
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4431) [C++] Build gRPC as ExternalProject without allowing it to build its vendored dependencies

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4431:
--
Labels: pull-request-available  (was: )

> [C++] Build gRPC as ExternalProject without allowing it to build its vendored 
> dependencies
> --
>
> Key: ARROW-4431
> URL: https://issues.apache.org/jira/browse/ARROW-4431
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4431) [C++] Build gRPC as ExternalProject without allowing it to build its vendored dependencies

2019-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4431:
---

 Summary: [C++] Build gRPC as ExternalProject without allowing it 
to build its vendored dependencies
 Key: ARROW-4431
 URL: https://issues.apache.org/jira/browse/ARROW-4431
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4430) [C++] add unit test for currently unused append method

2019-01-30 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4430:


 Summary: [C++] add unit test for currently unused append method
 Key: ARROW-4430
 URL: https://issues.apache.org/jira/browse/ARROW-4430
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


TypedBufferBuilder::Append(num_copies, value) is currently not 
tested and is incorrect. Add a test and correct the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4425:
---
Description: 
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence page-(*EDIT) in the Sphinx docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a page in the actual [Arrow Sphinx docs 
(location in repo)|https://github.com/apache/arrow/tree/master/docs] would also 
make it easier to find and modify. An additional task, ARROW-4427  was added to 
do this.

  was:
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence page-(*EDIT) in the static docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. An additional task, ARROW-4427  was added to 
do this.


> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  -Confluence page-(*EDIT) in the Sphinx docs directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 
>  
> EDIT: Moving the "Contributing" wiki to a page in the actual [Arrow Sphinx 
> docs (location in repo)|https://github.com/apache/arrow/tree/master/docs] 
> would also make it easier to find and modify. An additional task, ARROW-4427  
> was added to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4429) Add git rebase tips to the 'Contributing' page in the developer docs

2019-01-30 Thread Tanya Schlusser (JIRA)
Tanya Schlusser created ARROW-4429:
--

 Summary: Add git rebase tips to the 'Contributing' page in the 
developer docs
 Key: ARROW-4429
 URL: https://issues.apache.org/jira/browse/ARROW-4429
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Tanya Schlusser


A recent discussion on the listserv (link below) asked about how contributors 
should handle rebasing. It would be helpful if the tips made it into the 
developer documentation somehow. I suggest in the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 page—currently a wiki, but hopefully eventually part of the Sphinx docs 
ARROW-4427.

Here is the relevant thread:

[https://lists.apache.org/thread.html/c74d8027184550b8d9041e3f2414b517ffb76ccbc1d5aa4563d364b6@%3Cdev.arrow.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4428) [R] Feature flags for R build

2019-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4428:
---

 Summary: [R] Feature flags for R build
 Key: ARROW-4428
 URL: https://issues.apache.org/jira/browse/ARROW-4428
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Wes McKinney
 Fix For: 0.13.0


There are a number of optional components in the Arrow C++ library. In Python 
we have feature flags to turn on and off parts of the bindings based on what 
C++ libraries have been built. There is also some logic to try to detect what 
has been built and enable those features.

We need to have the same thing in R. Some components, like Plasma, are not 
available for Windows and so necessarily these will have to be flagged off. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4427:
---
Summary: Move Confluence Wiki pages to the Sphinx docs  (was: Move 
"Contributing to Apache Arrow" page to the static docs)

> Move Confluence Wiki pages to the Sphinx docs
> -
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  and other developers' wiki pages in Confluence. If these were moved to 
> inside the project web page, that would make it easier.
> There are 5 steps to this:
>  # Create a new directory inside of `arrow/docs/source` to house the wiki 
> pages. (It will look like the 
> [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or 
> [python|https://github.com/apache/arrow/tree/master/docs/source/python] 
> directories.)
>  # Copy the wiki page contents to new `*.rst` pages inside this new directory.
>  # Add an `index.rst` that links to them all with enough description to help 
> navigation.
>  # Modify the Sphinx index page 
> [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst]
>  to have an entry that points to the new index page made in step 3
>  # Modify the static site page 
> [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs

2019-01-30 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756190#comment-16756190
 ] 

Tanya Schlusser edited comment on ARROW-4427 at 1/30/19 3:06 PM:
-

Hoo boy. A big task! Modified the description + title per discussion above.


was (Author: tanya):
Hoo boy. And all of their child wiki pages.

> Move Confluence Wiki pages to the Sphinx docs
> -
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  and other developers' wiki pages in Confluence. If these were moved to 
> inside the project web page, that would make it easier.
> There are 5 steps to this:
>  # Create a new directory inside of `arrow/docs/source` to house the wiki 
> pages. (It will look like the 
> [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or 
> [python|https://github.com/apache/arrow/tree/master/docs/source/python] 
> directories.)
>  # Copy the wiki page contents to new `*.rst` pages inside this new directory.
>  # Add an `index.rst` that links to them all with enough description to help 
> navigation.
>  # Modify the Sphinx index page 
> [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst]
>  to have an entry that points to the new index page made in step 3
>  # Modify the static site page 
> [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4427:
---
Description: 
It's hard to find and modify the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 and other developers' wiki pages in Confluence. If these were moved to inside 
the project web page, that would make it easier.

There are 5 steps to this:
 # Create a new directory inside of `arrow/docs/source` to house the wiki 
pages. (It will look like the 
[cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or 
[python|https://github.com/apache/arrow/tree/master/docs/source/python] 
directories.)
 # Copy the wiki page contents to new `*.rst` pages inside this new directory.
 # Add an `index.rst` that links to them all with enough description to help 
navigation.
 # Modify the Sphinx index page 
[`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst]
 to have an entry that points to the new index page made in step 3
 # Modify the static site page 
[`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
 to point to the newly created page instead of the wiki page.

 

  was:
It's hard to find and modify the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 wiki page in Confluence. If it were moved to inside the static web page, that 
would make it easier.

There are two steps to this:
 # Copy the wiki page contents to a new web page at the top "site" level (under 
arrow/site/ just like the [committers 
page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
named "contributing.html" or something.
 # Modify the [navigation section in 
arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
 to point to the newly created page instead of the wiki page.

The affected pages are all part of the Jekyll components, so there isn't a need 
to build the Sphinx part of the docs to check your work.
  


> Move "Contributing to Apache Arrow" page to the static docs
> ---
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  and other developers' wiki pages in Confluence. If these were moved to 
> inside the project web page, that would make it easier.
> There are 5 steps to this:
>  # Create a new directory inside of `arrow/docs/source` to house the wiki 
> pages. (It will look like the 
> [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or 
> [python|https://github.com/apache/arrow/tree/master/docs/source/python] 
> directories.)
>  # Copy the wiki page contents to new `*.rst` pages inside this new directory.
>  # Add an `index.rst` that links to them all with enough description to help 
> navigation.
>  # Modify the Sphinx index page 
> [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst]
>  to have an entry that points to the new index page made in step 3
>  # Modify the static site page 
> [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756190#comment-16756190
 ] 

Tanya Schlusser commented on ARROW-4427:


Hoo boy. And all of their child wiki pages.

> Move "Contributing to Apache Arrow" page to the static docs
> ---
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  wiki page in Confluence. If it were moved to inside the static web page, 
> that would make it easier.
> There are two steps to this:
>  # Copy the wiki page contents to a new web page at the top "site" level 
> (under arrow/site/ just like the [committers 
> page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
> named "contributing.html" or something.
>  # Modify the [navigation section in 
> arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
> The affected pages are all part of the Jekyll components, so there isn't a 
> need to build the Sphinx part of the docs to check your work.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4328) Make R build compatible with DARROW_TENSORFLOW=ON

2019-01-30 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4328:
--

Assignee: cHYzZQo 

> Make R build compatible with DARROW_TENSORFLOW=ON
> -
>
> Key: ARROW-4328
> URL: https://issues.apache.org/jira/browse/ARROW-4328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: cHYzZQo 
>Assignee: cHYzZQo 
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add an option that sets GLIBCXX_USE_CXX11_ABI=0 so that the R lib will be 
> compatible with a shared library built h -DARROW_TENSORFLOW=ON



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4328) Make R build compatible with DARROW_TENSORFLOW=ON

2019-01-30 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4328.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3409
[https://github.com/apache/arrow/pull/3409]

> Make R build compatible with DARROW_TENSORFLOW=ON
> -
>
> Key: ARROW-4328
> URL: https://issues.apache.org/jira/browse/ARROW-4328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: cHYzZQo 
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add an option that sets GLIBCXX_USE_CXX11_ABI=0 so that the R lib will be 
> compatible with a shared library built h -DARROW_TENSORFLOW=ON



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756188#comment-16756188
 ] 

Tanya Schlusser commented on ARROW-4427:


Ok. Am I understanding that maybe a number of the wiki pages should be 
moved—anything not directly related to Jira? So:
 * [Contributing to Apache 
Arrow|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow?src=contextnavpagetreemode]
 * [Guide for Committers and Project 
Maintainers|https://cwiki.apache.org/confluence/display/ARROW/Guide+for+Committers+and+Project+Maintainers]
 * [HDFS Filesystem 
Support|https://cwiki.apache.org/confluence/display/ARROW/HDFS+Filesystem+Support]
 * [How to Verify Release 
Candidates|https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates]
 * [Product 
Requirements|https://cwiki.apache.org/confluence/display/ARROW/Product+requirements]
 (possibly not this one as it's empty)
 * [Release Management 
Guide|https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide]

What do you think of another directory, then, in `arrow/docs/source` where all 
of the listed pages reside...say `arrow/docs/source/dev` or something?

> Move "Contributing to Apache Arrow" page to the static docs
> ---
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  wiki page in Confluence. If it were moved to inside the static web page, 
> that would make it easier.
> There are two steps to this:
>  # Copy the wiki page contents to a new web page at the top "site" level 
> (under arrow/site/ just like the [committers 
> page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
> named "contributing.html" or something.
>  # Modify the [navigation section in 
> arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
> The affected pages are all part of the Jekyll components, so there isn't a 
> need to build the Sphinx part of the docs to check your work.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4425:
---
Description: 
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence- page in the Sphinx docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. An additional task  was added to do this.

  was:
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence- page in the Sphinx docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. A sub-task was added to do this.


> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  -Confluence- page in the Sphinx docs directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 
>  
> EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow 
> site (location in repo)|https://github.com/apache/arrow/tree/master/site] 
> would also make it easier to find and modify. An additional task  was added 
> to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756176#comment-16756176
 ] 

Wes McKinney commented on ARROW-4427:
-

+1

> Move "Contributing to Apache Arrow" page to the static docs
> ---
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  wiki page in Confluence. If it were moved to inside the static web page, 
> that would make it easier.
> There are two steps to this:
>  # Copy the wiki page contents to a new web page at the top "site" level 
> (under arrow/site/ just like the [committers 
> page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
> named "contributing.html" or something.
>  # Modify the [navigation section in 
> arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
> The affected pages are all part of the Jekyll components, so there isn't a 
> need to build the Sphinx part of the docs to check your work.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4425:
---
Description: 
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence page-(*EDIT) in the static docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. An additional task, ARROW-4427  was added to 
do this.

  was:
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence- page in the Sphinx docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. An additional task  was added to do this.


> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  -Confluence page-(*EDIT) in the static docs directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 
>  
> EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow 
> site (location in repo)|https://github.com/apache/arrow/tree/master/site] 
> would also make it easier to find and modify. An additional task, ARROW-4427  
> was added to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756166#comment-16756166
 ] 

Antoine Pitrou commented on ARROW-4427:
---

I think this would be better moved to the Sphinx docs, which allow extensive 
cross-referencing. IMHO the Jekyll site should only be used for top-level pages 
and the blog.

See https://github.com/apache/arrow/tree/master/docs

> Move "Contributing to Apache Arrow" page to the static docs
> ---
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  wiki page in Confluence. If it were moved to inside the static web page, 
> that would make it easier.
> There are two steps to this:
>  # Copy the wiki page contents to a new web page at the top "site" level 
> (under arrow/site/ just like the [committers 
> page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
> named "contributing.html" or something.
>  # Modify the [navigation section in 
> arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
> The affected pages are all part of the Jekyll components, so there isn't a 
> need to build the Sphinx part of the docs to check your work.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs

2019-01-30 Thread Tanya Schlusser (JIRA)
Tanya Schlusser created ARROW-4427:
--

 Summary: Move "Contributing to Apache Arrow" page to the static 
docs
 Key: ARROW-4427
 URL: https://issues.apache.org/jira/browse/ARROW-4427
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Tanya Schlusser


It's hard to find and modify the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 wiki page in Confluence. If it were moved to inside the static web page, that 
would make it easier.

There are two steps to this:
 # Copy the wiki page contents to a new web page at the top "site" level (under 
arrow/site/ just like the [committers 
page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe 
named "contributing.html" or something.
 # Modify the [navigation section in 
arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
 to point to the newly created page instead of the wiki page.

The affected pages are all part of the Jekyll components, so there isn't a need 
to build the Sphinx part of the docs to check your work.
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Tanya Schlusser (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanya Schlusser updated ARROW-4425:
---
Description: 
It would be nice to link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 -Confluence- page in the Sphinx docs directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 

 

EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site 
(location in repo)|https://github.com/apache/arrow/tree/master/site] would also 
make it easier to find and modify. A sub-task was added to do this.

  was:
It would be nice to add a link to the ["Contributing to Apache 
Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
 Confluence page directly from the main project 
[README|https://github.com/apache/arrow/blob/master/README.md] (in the already 
existing "Getting involved" section) because it's a bit hard to find right now.

"contributing" page: 
[https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]

main project README: [https://github.com/apache/arrow/blob/master/README.md] 


> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  -Confluence- page in the Sphinx docs directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 
>  
> EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow 
> site (location in repo)|https://github.com/apache/arrow/tree/master/site] 
> would also make it easier to find and modify. A sub-task was added to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3596) [Packaging] Build gRPC in conda-forge

2019-01-30 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756160#comment-16756160
 ] 

Antoine Pitrou commented on ARROW-3596:
---

I'm currently investigating this.

> [Packaging] Build gRPC in conda-forge
> -
>
> Key: ARROW-3596
> URL: https://issues.apache.org/jira/browse/ARROW-3596
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This may be a bit annoying as grpc's CMake files want to also install its 
> third party dependencies. [~msarahan] pointed me to a version for 
> AnacondaRecipes where there was an effort to "unvendor" these components
> grpc depends on at least
> * protobuf
> * zlib
> * either boringssl or openssl
> * cares
> * gflags
> It looks like the grpc developers want to take on Abseil as a dependency 
> eventually, but it is not required yet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756158#comment-16756158
 ] 

Tanya Schlusser commented on ARROW-4425:


Fair statement. Confluence is really hard for me to navigate. Updating and 
adding a sub-task.

> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to add a link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  Confluence page directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

2019-01-30 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756155#comment-16756155
 ] 

Wes McKinney commented on ARROW-3769:
-

Cool. This is only implemented at the encoder level, so you should be able to 
use {{ArrayFromJSON}} to make writing the unit tests easier -- so for this JIRA 
I am expecting tests in {{parquet-encoding-test}}

> [C++] Support reading non-dictionary encoded binary Parquet columns directly 
> as DictionaryArray
> ---
>
> Key: ARROW-3769
> URL: https://issues.apache.org/jira/browse/ARROW-3769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> If the goal is to hash this data anyway into a categorical-type array, then 
> it would be better to offer the option to "push down" the hashing into the 
> Parquet read hot path rather than first fully materializing a dense vector of 
> {{ByteArray}} values, which could use a lot of memory after decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3965) [Java] JDBC-to-Arrow Conversion: Configuration Object

2019-01-30 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3965:
---

Assignee: Michael Pigott

> [Java] JDBC-to-Arrow Conversion: Configuration Object
> -
>
> Key: ARROW-3965
> URL: https://issues.apache.org/jira/browse/ARROW-3965
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are various methods for constructing Arrow vectors, alternating on two 
> inputs:
>  * A calendar object
>  * A BaseAllocator
> This creates a configuration class (JdbcToArrowConfig) to simplify 
> configuring the adapter, and make it easier to add new functionality later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3965) [Java] JDBC-to-Arrow Conversion: Configuration Object

2019-01-30 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3965.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3133
[https://github.com/apache/arrow/pull/3133]

> [Java] JDBC-to-Arrow Conversion: Configuration Object
> -
>
> Key: ARROW-3965
> URL: https://issues.apache.org/jira/browse/ARROW-3965
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are various methods for constructing Arrow vectors, alternating on two 
> inputs:
>  * A calendar object
>  * A BaseAllocator
> This creates a configuration class (JdbcToArrowConfig) to simplify 
> configuring the adapter, and make it easier to add new functionality later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4426) [C++] Vendor CMake's UseJava

2019-01-30 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4426:
--

 Summary: [C++] Vendor CMake's UseJava
 Key: ARROW-4426
 URL: https://issues.apache.org/jira/browse/ARROW-4426
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs


In order to provide gandiva JNI bindings for for Ubuntu 16.04 and 18.04 We need 
to vendor UseJava, because GENERATE_NATIVE_HEADERS is avilable since CMake 3.11 
and Ubuntu 18.04 ships CMake 3.10.

See https://github.com/apache/arrow/pull/3522



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4424) [Python] Manylinux CI builds failing

2019-01-30 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756052#comment-16756052
 ] 

Uwe L. Korn commented on ARROW-4424:


A new version of {{keras_preprocessing}} was released yesterday. This has an 
undeclared pandas dependency. I'll have a look.

> [Python] Manylinux CI builds failing
> 
>
> Key: ARROW-4424
> URL: https://issues.apache.org/jira/browse/ARROW-4424
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Micah Kornfield
>Priority: Blocker
>
> Example error build: https://api.travis-ci.org/v3/job/486336662/log.txt
>  
> {{+python -c 'import pyarrow; import tensorflow'}}
> {{ Traceback (most recent call last):}}
> {{ File "", line 1, in }}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/__init__.py",
>  line 24, in }}
> {{ from tensorflow.python import pywrap_tensorflow # pylint: 
> disable=unused-import}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/__init__.py",
>  line 88, in }}
> {{ from tensorflow.python import keras}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py",
>  line 29, in }}
> {{ from tensorflow.python.keras import datasets}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/datasets/__init__.py",
>  line 25, in }}
> {{ from tensorflow.python.keras.datasets import imdb}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/datasets/imdb.py",
>  line 25, in }}
> {{ from tensorflow.python.keras.preprocessing.sequence import 
> _remove_long_seq}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/preprocessing/__init__.py",
>  line 30, in }}
> {{ from tensorflow.python.keras.preprocessing import image}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/preprocessing/image.py",
>  line 23, in }}
> {{ from keras_preprocessing import image}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/keras_preprocessing/image/__init__.py",
>  line 8, in }}
> {{ from .dataframe_iterator import DataFrameIterator}}
> {{ File 
> "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/keras_preprocessing/image/dataframe_iterator.py",
>  line 11, in }}
> {{ from pandas.api.types import is_numeric_dtype}}
> {{ ModuleNotFoundError: No module named 'pandas'}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4414) [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros

2019-01-30 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4414.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3522
[https://github.com/apache/arrow/pull/3522]

> [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds 
> for older distros
> --
>
> Key: ARROW-4414
> URL: https://issues.apache.org/jira/browse/ARROW-4414
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> COMMAND_EXPAND_LISTS option of add_custom_command is too new on Ubuntu Xenial 
> and Debian stretch. It's available since CMake 3.8: 
> https://cmake.org/cmake/help/v3.8/command/add_custom_command.html
> We need to stop using it in cpp/src/gandiva/precompiled/CMakeLists.txt
> Also We should pin cmake to version 3.5 in travis builds (xenial ships cmake 
> 3.5)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

2019-01-30 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756031#comment-16756031
 ] 

Hatem Helal commented on ARROW-3769:


I've started looking into this and starting with some unit tests to make sure I 
understand the inner workings.  

> [C++] Support reading non-dictionary encoded binary Parquet columns directly 
> as DictionaryArray
> ---
>
> Key: ARROW-3769
> URL: https://issues.apache.org/jira/browse/ARROW-3769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> If the goal is to hash this data anyway into a categorical-type array, then 
> it would be better to offer the option to "push down" the hashing into the 
> Parquet read hot path rather than first fully materializing a dense vector of 
> {{ByteArray}} values, which could use a lot of memory after decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README

2019-01-30 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756035#comment-16756035
 ] 

Antoine Pitrou commented on ARROW-4425:
---

I think we should actually put such documents in the Sphinx-generated docs. 
There's no reason to have separate wiki pages for that, IMHO.


> Add link to 'Contributing' page in the top-level Arrow README
> -
>
> Key: ARROW-4425
> URL: https://issues.apache.org/jira/browse/ARROW-4425
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> It would be nice to add a link to the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  Confluence page directly from the main project 
> [README|https://github.com/apache/arrow/blob/master/README.md] (in the 
> already existing "Getting involved" section) because it's a bit hard to find 
> right now.
> "contributing" page: 
> [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
> main project README: [https://github.com/apache/arrow/blob/master/README.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4423) [C++] Update version of vendored gtest to 1.8.1

2019-01-30 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4423:
--

 Summary: [C++] Update version of vendored gtest to 1.8.1
 Key: ARROW-4423
 URL: https://issues.apache.org/jira/browse/ARROW-4423
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


conda-forge builds already use 1.8.1

 

This is a little tricky because library files get renamed on windows with the 
incremental version bump (debug files become libgmockd.lib).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4277) [C++] Add gmock to toolchain

2019-01-30 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755872#comment-16755872
 ] 

Micah Kornfield commented on ARROW-4277:


current issue seems to be linking on trusty:

https://travis-ci.org/apache/arrow/builds/486300854?utm_source=github_status_medium=notification

> [C++] Add gmock to toolchain
> 
>
> Key: ARROW-4277
> URL: https://issues.apache.org/jira/browse/ARROW-4277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add gmock to the toolchain.
>  
> It looks like before this can happen, a gmock feedstock on conda-forge has to 
> be setup so our CI setup can work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store

2019-01-30 Thread Zhijun Fu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755804#comment-16755804
 ] 

Zhijun Fu edited comment on ARROW-4418 at 1/30/19 8:12 AM:
---

In addition to above benefits that Robert mentioned, asio also provides 
opportunities for performance improvements, by providing io service, thread 
pool .etc.

In our internal testing, which uses 10+ actors on a single machine, I found 50% 
of plasma store CPU are spent on receiving messages from plasma clients, using 
UNIX domain socket.

I'm thinking that one way to improve perf is like this: 
 * Use a pool of threads to receive messages from clients. To ensure correct 
behavior, we can bind a boost::strand to a single client, so that all the 
messages from a given client arrives in order. As this part is CPU consumings, 
using multiple threads is going to help.
 * After this, the messages are posted into io service of main thread, which 
calls ProcessMessages for each of them in order.
 * After this, post the replies to a pool of threads, again use boost::strand 
for each plasma client to ensure correct order. 

I'm thinking this would probably help on cases where there are multiple workers 
using plasma store on the same machine, which should be very common. And it 
seems implementing this would be hard without asio functionalities.


was (Author: zhijunfu):
In addition to above benefits that Robert mentioned, asio also provides 
opportunities for performance improvements, by providing io service, thread 
pool .etc.

In our internal testing, which uses 10+ actors on a single machine, I found 50% 
of plasma store CPU are spent on receiving messages from plasma clients, using 
UNIX domain socket.

I'm thinking that one way to improve perf is like this: 
 * Use a pool of threads to receive messages from clients. To ensure correct 
behavior, we can bind a boost::strand to a single client, so that all the 
messages from a given client arrives in order. As this part is CPU consumings, 
using multiple threads is going to help.
 * After this, the messages are posted into io service of main thread, which 
calls ProcessMessages for each of them in order.
 * After this, post the replies to a pool of threads, again use boost::strand 
for each plasma client to ensure correct order. 

I'm thinking this would probably help on cases where there are multiple workers 
using plasma store on the same machine, which should be very common. And it 
seems implementing this would be hard without asio functionalities.

Thoughts?

> [Plasma] replace event loop with boost::asio for plasma store
> -
>
> Key: ARROW-4418
> URL: https://issues.apache.org/jira/browse/ARROW-4418
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Original text:
> It would be nice to move plasma store from current event loop to boost::asio 
> to modernize the code, and more importantly to benefit from the 
> functionalities provided by asio, which I think also provides opportunities 
> for performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)