[jira] [Resolved] (ARROW-4411) [C++] Add missing parameter documentation to UnaryKernel to fix build
[ https://issues.apache.org/jira/browse/ARROW-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-4411. Resolution: Duplicate > [C++] Add missing parameter documentation to UnaryKernel to fix build > - > > Key: ARROW-4411 > URL: https://issues.apache.org/jira/browse/ARROW-4411 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Blocker > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Also missing was documentation in util-internal.h -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps
[ https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756928#comment-16756928 ] Micah Kornfield commented on ARROW-1425: [~icexelloss] I pushed a new PR for this so if you don't mind, I will try to finish it up. > [Python] Document semantic differences between Spark timestamps and Arrow > timestamps > > > Key: ARROW-1425 > URL: https://issues.apache.org/jira/browse/ARROW-1425 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The way that Spark treats non-timezone-aware timestamps as session local can > be problematic when using pyarrow which may view the data coming from > toPandas() as time zone naive (but with fields as though it were UTC, not > session local). We should document carefully how to properly handle the data > coming from Spark to avoid problems. > cc [~bryanc] [~holdenkarau] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps
[ https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield reassigned ARROW-1425: -- Assignee: Micah Kornfield (was: Li Jin) > [Python] Document semantic differences between Spark timestamps and Arrow > timestamps > > > Key: ARROW-1425 > URL: https://issues.apache.org/jira/browse/ARROW-1425 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The way that Spark treats non-timezone-aware timestamps as session local can > be problematic when using pyarrow which may view the data coming from > toPandas() as time zone naive (but with fields as though it were UTC, not > session local). We should document carefully how to properly handle the data > coming from Spark to avoid problems. > cc [~bryanc] [~holdenkarau] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store
[ https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756914#comment-16756914 ] Siyuan Zhuang commented on ARROW-4418: -- [~zhijunfu] I wonder if we could just move "client_connection" from Ray to Arrow, so we can share some common functions. > [Plasma] replace event loop with boost::asio for plasma store > - > > Key: ARROW-4418 > URL: https://issues.apache.org/jira/browse/ARROW-4418 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Original text: > It would be nice to move plasma store from current event loop to boost::asio > to modernize the code, and more importantly to benefit from the > functionalities provided by asio, which I think also provides opportunities > for performance improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4436) [Documentation] Clarify instructions for building documentation
Micah Kornfield created ARROW-4436: -- Summary: [Documentation] Clarify instructions for building documentation Key: ARROW-4436 URL: https://issues.apache.org/jira/browse/ARROW-4436 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Micah Kornfield [https://arrow.apache.org/docs/building.html#building-docs] seems to assume some prior setup. It is not entirely clear what that setup is. At the very least we should update it to bridge the gap from instructions for building from source https://arrow.apache.org/docs/python/development.html#building-the-documentation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4430) [C++] add unit test for currently unused append method
[ https://issues.apache.org/jira/browse/ARROW-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4430: -- Labels: pull-request-available (was: ) > [C++] add unit test for currently unused append method > -- > > Key: ARROW-4430 > URL: https://issues.apache.org/jira/browse/ARROW-4430 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Labels: pull-request-available > > TypedBufferBuilder::Append(num_copies, value) is currently not > tested and is incorrect. Add a test and correct the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store
[ https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756803#comment-16756803 ] Zhijun Fu commented on ARROW-4418: -- [~robertnishihara], yes totally agreed that the move from event loop to asio should be a straightforward change. The multithread optimization can be evaluated after that, and see if it can provide non-trivial performance improvements. > [Plasma] replace event loop with boost::asio for plasma store > - > > Key: ARROW-4418 > URL: https://issues.apache.org/jira/browse/ARROW-4418 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Original text: > It would be nice to move plasma store from current event loop to boost::asio > to modernize the code, and more importantly to benefit from the > functionalities provided by asio, which I think also provides opportunities > for performance improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store
[ https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756791#comment-16756791 ] Robert Nishihara commented on ARROW-4418: - [~zhijunfu], I'd suggest doing a straightforward translation from the AE event loop to boost::asio. The multithreaded architecture sounds pretty complex, but I think it makes sense to explore if it provides substantial performance benefits (though it may not make sense for modest performance benefits). Maybe we should use {{asio}} instead of {{boost::asio}} because {{asio}} looks like it's header only. http://think-async.com/Asio/AsioAndBoostAsio [~pitrou], the non-boost {{asio}} looks like it's header only, so that may be preferable. What do you think about the overall tradeoff? > [Plasma] replace event loop with boost::asio for plasma store > - > > Key: ARROW-4418 > URL: https://issues.apache.org/jira/browse/ARROW-4418 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Original text: > It would be nice to move plasma store from current event loop to boost::asio > to modernize the code, and more importantly to benefit from the > functionalities provided by asio, which I think also provides opportunities > for performance improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
[ https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4422. --- Resolution: Fixed Issue resolved by pull request 3526 [https://github.com/apache/arrow/pull/3526] > [Plasma] Enforce memory limit in plasma, rather than relying on > dlmalloc_set_footprint_limit > > > Key: ARROW-4422 > URL: https://issues.apache.org/jira/browse/ARROW-4422 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++) >Affects Versions: 0.12.0 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Fix For: 0.13.0 > > > Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory > utilization for Plasma Store. This is restrictive because: > * It restricts Plasma to dlmalloc, which supports limiting memory footprint, > as opposed to other, potentially more performant malloc implementations > (e.g., jemalloc) > * dlmalloc_set_footprint_limit does not guarantee that the limit set by it > the amount of _usable_ memory. As such, we might trigger evictions much > earlier than hitting this limit, e.g., due to fragmentation or metadata > overheads. > To overcome this, we can impose the memory limit at Plasma by tracking the > number of bytes allocated and freed using malloc and free calls. Whenever the > allocation reaches the set limit, we fail any subsequent allocations (i.e., > return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and > also provides more accurate tracking of memory allocation/capacity. > Caveat: We will need to make sure that the mmaped files are living on a file > system that is a bit larger (depending on malloc implementation) than the > Plasma memory limit to account for the extra memory required due to > fragmentation/metadata overheads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
[ https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4422: -- Labels: pull-request-available (was: ) > [Plasma] Enforce memory limit in plasma, rather than relying on > dlmalloc_set_footprint_limit > > > Key: ARROW-4422 > URL: https://issues.apache.org/jira/browse/ARROW-4422 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++) >Affects Versions: 0.12.0 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory > utilization for Plasma Store. This is restrictive because: > * It restricts Plasma to dlmalloc, which supports limiting memory footprint, > as opposed to other, potentially more performant malloc implementations > (e.g., jemalloc) > * dlmalloc_set_footprint_limit does not guarantee that the limit set by it > the amount of _usable_ memory. As such, we might trigger evictions much > earlier than hitting this limit, e.g., due to fragmentation or metadata > overheads. > To overcome this, we can impose the memory limit at Plasma by tracking the > number of bytes allocated and freed using malloc and free calls. Whenever the > allocation reaches the set limit, we fail any subsequent allocations (i.e., > return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and > also provides more accurate tracking of memory allocation/capacity. > Caveat: We will need to make sure that the mmaped files are living on a file > system that is a bit larger (depending on malloc implementation) than the > Plasma memory limit to account for the extra memory required due to > fragmentation/metadata overheads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4435) [C#] Add .sln file and minor .csproj fix ups
Eric Erhardt created ARROW-4435: --- Summary: [C#] Add .sln file and minor .csproj fix ups Key: ARROW-4435 URL: https://issues.apache.org/jira/browse/ARROW-4435 Project: Apache Arrow Issue Type: Task Components: C# Reporter: Eric Erhardt There is currently no .sln file in the repo, which makes it hard to use the src and test code at the same time. Also, there are some settings in the .csproj that can be moved up to the outer PropertyGroup, and not under a "Configuration|Platform" conditional, like they were in the old .csproj format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits
[ https://issues.apache.org/jira/browse/ARROW-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4268. - Resolution: Fixed resolved by https://github.com/apache/arrow/commit/012f77a96880cff49f588ae1ff2f65d5105ee433 > [C++] Add C primitive to Arrow:Type compile time in TypeTraits > -- > > Key: ARROW-4268 > URL: https://issues.apache.org/jira/browse/ARROW-4268 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Original Estimate: 1h > Time Spent: 1h 40m > Remaining Estimate: 0h > > The user would use something like > {code:c++} > ... > using ArrowType = CTypeTraits::ArrowType; > using ArrayType = CTypeTraits::ArrayType; > auto type = CTypeTraits::type_singleton(); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4407) [C++] ExternalProject_Add does not capture CC/CXX correctly
[ https://issues.apache.org/jira/browse/ARROW-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4407. - Resolution: Fixed Issue resolved by pull request 3515 [https://github.com/apache/arrow/pull/3515] > [C++] ExternalProject_Add does not capture CC/CXX correctly > --- > > Key: ARROW-4407 > URL: https://issues.apache.org/jira/browse/ARROW-4407 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The issue is that CC/CXX environment variables are captured on the first > invocation of the builder (e.g make or ninja) instead of when CMake is > invoked into to build directory. This can lead to compilation errors (notably > when compiling with clang in the top directory due to the addition of the > `-Qunused-arguments` option). > This leads to an issue where I have a script that prepare the build directory > and export CXX within the script. When I jump in the build folder, there's a > mismatch between the external gbenchmark (and all deps if conda is not used) > compiler and the build. > To reproduce: > # Create a new build directory with clang as compiler, don't build yet > # In a new shell (without the compiler environment variable), go into > directory invoke make/ninja -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4434) [Python] Cannot create empty StructArray via pa.StructArray.from_arrays
Krisztian Szucs created ARROW-4434: -- Summary: [Python] Cannot create empty StructArray via pa.StructArray.from_arrays Key: ARROW-4434 URL: https://issues.apache.org/jira/browse/ARROW-4434 Project: Apache Arrow Issue Type: Bug Reporter: Krisztian Szucs {code:python} In [5]: pa.StructArray.from_arrays([], names=[]) --- ValueErrorTraceback (most recent call last) in > 1 pa.StructArray.from_arrays([], names=[]) ~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib.StructArray.from_arrays() 1326 num_arrays = len(arrays) 1327 if num_arrays == 0: -> 1328 raise ValueError("arrays list is empty") 1329 1330 length = len(arrays[0]) ValueError: arrays list is empty {code} however {code:python} pa.array([], type=pa.struct([])) {code} works -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4433) [R] Segmentation fault when instantiating arrow::table from data frame
[ https://issues.apache.org/jira/browse/ARROW-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lutz updated ARROW-4433: Summary: [R] Segmentation fault when instantiating arrow::table from data frame (was: Segmentation fault when instantiating arrow::table from data frame) > [R] Segmentation fault when instantiating arrow::table from data frame > -- > > Key: ARROW-4433 > URL: https://issues.apache.org/jira/browse/ARROW-4433 > Project: Apache Arrow > Issue Type: Bug > Components: R > Environment: R version 3.5.2 (2018-12-20) > Platform: x86_64-suse-linux-gnu (64-bit) >Reporter: Lutz >Priority: Critical > > The sample code from [https://github.com/apache/arrow/tree/master/r] leads to > a segmentation fault > > {quote}library(arrow, warn.conflicts = FALSE) > library(tibble) > library(reticulate) > tf <- tempfile() > (tib <- tibble(x = 1:10, y = rnorm(10))) > arrow::write_arrow(tib, tf) > *** caught segfault *** > address (nil), cause 'memory not mapped' > > Traceback: > 1: Table__from_dataframe(.data) > 2: shared_ptr_is_null(xp) > 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) > 4: table(x) > 5: to_arrow.data.frame(x) > 6: to_arrow(x) > 7: write_arrow.fs_path(x, fs::path_abs(stream), ...) > 8: write_arrow(x, fs::path_abs(stream), ...) > 9: write_arrow.character(tib, tf) > 10: arrow::write_arrow(tib, tf) > {quote} > > The same problem appears also when just calling arrow::table(tib): > {quote}> arrow::table(tib) > > *** caught segfault *** > address (nil), cause 'memory not mapped' > > Traceback: > 1: Table__from_dataframe(.data) > 2: shared_ptr_is_null(xp) > 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) > 4: arrow::table(tib) > > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4433) Segmentation fault when instantiating arrow::table from data frame
Lutz created ARROW-4433: --- Summary: Segmentation fault when instantiating arrow::table from data frame Key: ARROW-4433 URL: https://issues.apache.org/jira/browse/ARROW-4433 Project: Apache Arrow Issue Type: Bug Components: R Environment: R version 3.5.2 (2018-12-20) Platform: x86_64-suse-linux-gnu (64-bit) Reporter: Lutz The sample code from [https://github.com/apache/arrow/tree/master/r] leads to a segmentation fault {quote}library(arrow, warn.conflicts = FALSE) library(tibble) library(reticulate) tf <- tempfile() (tib <- tibble(x = 1:10, y = rnorm(10))) arrow::write_arrow(tib, tf) *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: Table__from_dataframe(.data) 2: shared_ptr_is_null(xp) 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 4: table(x) 5: to_arrow.data.frame(x) 6: to_arrow(x) 7: write_arrow.fs_path(x, fs::path_abs(stream), ...) 8: write_arrow(x, fs::path_abs(stream), ...) 9: write_arrow.character(tib, tf) 10: arrow::write_arrow(tib, tf) {quote} The same problem appears also when just calling arrow::table(tib): {quote}> arrow::table(tib) *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: Table__from_dataframe(.data) 2: shared_ptr_is_null(xp) 3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 4: arrow::table(tib) {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4432) [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables
[ https://issues.apache.org/jira/browse/ARROW-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-4432: --- Description: The following test case fails for empty tables: {code:python} import hypothesis as h import pyarrow.tests.strategies as past @h.given(past.all_tables) def test_pandas_roundtrip(table): df = table.to_pandas() table_ = pa.Table.from_pandas(df) assert table == table_ {code} was: The following test case fails for empty tables: {code:python} @h.given(past.all_tables) def test_pandas_roundtrip(table): df = table.to_pandas() table_ = pa.Table.from_pandas(df) assert table == table_ {code} > [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables > --- > > Key: ARROW-4432 > URL: https://issues.apache.org/jira/browse/ARROW-4432 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Priority: Major > Labels: hypothesis > > The following test case fails for empty tables: > {code:python} > import hypothesis as h > import pyarrow.tests.strategies as past > @h.given(past.all_tables) > def test_pandas_roundtrip(table): > df = table.to_pandas() > table_ = pa.Table.from_pandas(df) > assert table == table_ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4432) [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables
Krisztian Szucs created ARROW-4432: -- Summary: [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables Key: ARROW-4432 URL: https://issues.apache.org/jira/browse/ARROW-4432 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs The following test case fails for empty tables: {code:python} @h.given(past.all_tables) def test_pandas_roundtrip(table): df = table.to_pandas() table_ = pa.Table.from_pandas(df) assert table == table_ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4431) [C++] Build gRPC as ExternalProject without allowing it to build its vendored dependencies
[ https://issues.apache.org/jira/browse/ARROW-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4431: -- Labels: pull-request-available (was: ) > [C++] Build gRPC as ExternalProject without allowing it to build its vendored > dependencies > -- > > Key: ARROW-4431 > URL: https://issues.apache.org/jira/browse/ARROW-4431 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4431) [C++] Build gRPC as ExternalProject without allowing it to build its vendored dependencies
Wes McKinney created ARROW-4431: --- Summary: [C++] Build gRPC as ExternalProject without allowing it to build its vendored dependencies Key: ARROW-4431 URL: https://issues.apache.org/jira/browse/ARROW-4431 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Assignee: Wes McKinney Fix For: 0.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4430) [C++] add unit test for currently unused append method
Benjamin Kietzman created ARROW-4430: Summary: [C++] add unit test for currently unused append method Key: ARROW-4430 URL: https://issues.apache.org/jira/browse/ARROW-4430 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Benjamin Kietzman Assignee: Benjamin Kietzman TypedBufferBuilder::Append(num_copies, value) is currently not tested and is incorrect. Add a test and correct the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4425: --- Description: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence page-(*EDIT) in the Sphinx docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a page in the actual [Arrow Sphinx docs (location in repo)|https://github.com/apache/arrow/tree/master/docs] would also make it easier to find and modify. An additional task, ARROW-4427 was added to do this. was: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence page-(*EDIT) in the static docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. An additional task, ARROW-4427 was added to do this. > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > -Confluence page-(*EDIT) in the Sphinx docs directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] > > EDIT: Moving the "Contributing" wiki to a page in the actual [Arrow Sphinx > docs (location in repo)|https://github.com/apache/arrow/tree/master/docs] > would also make it easier to find and modify. An additional task, ARROW-4427 > was added to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4429) Add git rebase tips to the 'Contributing' page in the developer docs
Tanya Schlusser created ARROW-4429: -- Summary: Add git rebase tips to the 'Contributing' page in the developer docs Key: ARROW-4429 URL: https://issues.apache.org/jira/browse/ARROW-4429 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Tanya Schlusser A recent discussion on the listserv (link below) asked about how contributors should handle rebasing. It would be helpful if the tips made it into the developer documentation somehow. I suggest in the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] page—currently a wiki, but hopefully eventually part of the Sphinx docs ARROW-4427. Here is the relevant thread: [https://lists.apache.org/thread.html/c74d8027184550b8d9041e3f2414b517ffb76ccbc1d5aa4563d364b6@%3Cdev.arrow.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4428) [R] Feature flags for R build
Wes McKinney created ARROW-4428: --- Summary: [R] Feature flags for R build Key: ARROW-4428 URL: https://issues.apache.org/jira/browse/ARROW-4428 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Wes McKinney Fix For: 0.13.0 There are a number of optional components in the Arrow C++ library. In Python we have feature flags to turn on and off parts of the bindings based on what C++ libraries have been built. There is also some logic to try to detect what has been built and enable those features. We need to have the same thing in R. Some components, like Plasma, are not available for Windows and so necessarily these will have to be flagged off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4427: --- Summary: Move Confluence Wiki pages to the Sphinx docs (was: Move "Contributing to Apache Arrow" page to the static docs) > Move Confluence Wiki pages to the Sphinx docs > - > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > and other developers' wiki pages in Confluence. If these were moved to > inside the project web page, that would make it easier. > There are 5 steps to this: > # Create a new directory inside of `arrow/docs/source` to house the wiki > pages. (It will look like the > [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or > [python|https://github.com/apache/arrow/tree/master/docs/source/python] > directories.) > # Copy the wiki page contents to new `*.rst` pages inside this new directory. > # Add an `index.rst` that links to them all with enough description to help > navigation. > # Modify the Sphinx index page > [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst] > to have an entry that points to the new index page made in step 3 > # Modify the static site page > [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756190#comment-16756190 ] Tanya Schlusser edited comment on ARROW-4427 at 1/30/19 3:06 PM: - Hoo boy. A big task! Modified the description + title per discussion above. was (Author: tanya): Hoo boy. And all of their child wiki pages. > Move Confluence Wiki pages to the Sphinx docs > - > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > and other developers' wiki pages in Confluence. If these were moved to > inside the project web page, that would make it easier. > There are 5 steps to this: > # Create a new directory inside of `arrow/docs/source` to house the wiki > pages. (It will look like the > [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or > [python|https://github.com/apache/arrow/tree/master/docs/source/python] > directories.) > # Copy the wiki page contents to new `*.rst` pages inside this new directory. > # Add an `index.rst` that links to them all with enough description to help > navigation. > # Modify the Sphinx index page > [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst] > to have an entry that points to the new index page made in step 3 > # Modify the static site page > [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4427: --- Description: It's hard to find and modify the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] and other developers' wiki pages in Confluence. If these were moved to inside the project web page, that would make it easier. There are 5 steps to this: # Create a new directory inside of `arrow/docs/source` to house the wiki pages. (It will look like the [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or [python|https://github.com/apache/arrow/tree/master/docs/source/python] directories.) # Copy the wiki page contents to new `*.rst` pages inside this new directory. # Add an `index.rst` that links to them all with enough description to help navigation. # Modify the Sphinx index page [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst] to have an entry that points to the new index page made in step 3 # Modify the static site page [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] to point to the newly created page instead of the wiki page. was: It's hard to find and modify the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] wiki page in Confluence. If it were moved to inside the static web page, that would make it easier. There are two steps to this: # Copy the wiki page contents to a new web page at the top "site" level (under arrow/site/ just like the [committers page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe named "contributing.html" or something. # Modify the [navigation section in arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] to point to the newly created page instead of the wiki page. The affected pages are all part of the Jekyll components, so there isn't a need to build the Sphinx part of the docs to check your work. > Move "Contributing to Apache Arrow" page to the static docs > --- > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > and other developers' wiki pages in Confluence. If these were moved to > inside the project web page, that would make it easier. > There are 5 steps to this: > # Create a new directory inside of `arrow/docs/source` to house the wiki > pages. (It will look like the > [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or > [python|https://github.com/apache/arrow/tree/master/docs/source/python] > directories.) > # Copy the wiki page contents to new `*.rst` pages inside this new directory. > # Add an `index.rst` that links to them all with enough description to help > navigation. > # Modify the Sphinx index page > [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst] > to have an entry that points to the new index page made in step 3 > # Modify the static site page > [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756190#comment-16756190 ] Tanya Schlusser commented on ARROW-4427: Hoo boy. And all of their child wiki pages. > Move "Contributing to Apache Arrow" page to the static docs > --- > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > wiki page in Confluence. If it were moved to inside the static web page, > that would make it easier. > There are two steps to this: > # Copy the wiki page contents to a new web page at the top "site" level > (under arrow/site/ just like the [committers > page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe > named "contributing.html" or something. > # Modify the [navigation section in > arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > The affected pages are all part of the Jekyll components, so there isn't a > need to build the Sphinx part of the docs to check your work. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4328) Make R build compatible with DARROW_TENSORFLOW=ON
[ https://issues.apache.org/jira/browse/ARROW-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-4328: -- Assignee: cHYzZQo > Make R build compatible with DARROW_TENSORFLOW=ON > - > > Key: ARROW-4328 > URL: https://issues.apache.org/jira/browse/ARROW-4328 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: cHYzZQo >Assignee: cHYzZQo >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add an option that sets GLIBCXX_USE_CXX11_ABI=0 so that the R lib will be > compatible with a shared library built h -DARROW_TENSORFLOW=ON -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4328) Make R build compatible with DARROW_TENSORFLOW=ON
[ https://issues.apache.org/jira/browse/ARROW-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-4328. Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3409 [https://github.com/apache/arrow/pull/3409] > Make R build compatible with DARROW_TENSORFLOW=ON > - > > Key: ARROW-4328 > URL: https://issues.apache.org/jira/browse/ARROW-4328 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: cHYzZQo >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Add an option that sets GLIBCXX_USE_CXX11_ABI=0 so that the R lib will be > compatible with a shared library built h -DARROW_TENSORFLOW=ON -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756188#comment-16756188 ] Tanya Schlusser commented on ARROW-4427: Ok. Am I understanding that maybe a number of the wiki pages should be moved—anything not directly related to Jira? So: * [Contributing to Apache Arrow|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow?src=contextnavpagetreemode] * [Guide for Committers and Project Maintainers|https://cwiki.apache.org/confluence/display/ARROW/Guide+for+Committers+and+Project+Maintainers] * [HDFS Filesystem Support|https://cwiki.apache.org/confluence/display/ARROW/HDFS+Filesystem+Support] * [How to Verify Release Candidates|https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates] * [Product Requirements|https://cwiki.apache.org/confluence/display/ARROW/Product+requirements] (possibly not this one as it's empty) * [Release Management Guide|https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide] What do you think of another directory, then, in `arrow/docs/source` where all of the listed pages reside...say `arrow/docs/source/dev` or something? > Move "Contributing to Apache Arrow" page to the static docs > --- > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > wiki page in Confluence. If it were moved to inside the static web page, > that would make it easier. > There are two steps to this: > # Copy the wiki page contents to a new web page at the top "site" level > (under arrow/site/ just like the [committers > page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe > named "contributing.html" or something. > # Modify the [navigation section in > arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > The affected pages are all part of the Jekyll components, so there isn't a > need to build the Sphinx part of the docs to check your work. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4425: --- Description: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence- page in the Sphinx docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. An additional task was added to do this. was: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence- page in the Sphinx docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. A sub-task was added to do this. > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > -Confluence- page in the Sphinx docs directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] > > EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow > site (location in repo)|https://github.com/apache/arrow/tree/master/site] > would also make it easier to find and modify. An additional task was added > to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756176#comment-16756176 ] Wes McKinney commented on ARROW-4427: - +1 > Move "Contributing to Apache Arrow" page to the static docs > --- > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > wiki page in Confluence. If it were moved to inside the static web page, > that would make it easier. > There are two steps to this: > # Copy the wiki page contents to a new web page at the top "site" level > (under arrow/site/ just like the [committers > page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe > named "contributing.html" or something. > # Modify the [navigation section in > arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > The affected pages are all part of the Jekyll components, so there isn't a > need to build the Sphinx part of the docs to check your work. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4425: --- Description: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence page-(*EDIT) in the static docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. An additional task, ARROW-4427 was added to do this. was: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence- page in the Sphinx docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. An additional task was added to do this. > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > -Confluence page-(*EDIT) in the static docs directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] > > EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow > site (location in repo)|https://github.com/apache/arrow/tree/master/site] > would also make it easier to find and modify. An additional task, ARROW-4427 > was added to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756166#comment-16756166 ] Antoine Pitrou commented on ARROW-4427: --- I think this would be better moved to the Sphinx docs, which allow extensive cross-referencing. IMHO the Jekyll site should only be used for top-level pages and the blog. See https://github.com/apache/arrow/tree/master/docs > Move "Contributing to Apache Arrow" page to the static docs > --- > > Key: ARROW-4427 > URL: https://issues.apache.org/jira/browse/ARROW-4427 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It's hard to find and modify the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > wiki page in Confluence. If it were moved to inside the static web page, > that would make it easier. > There are two steps to this: > # Copy the wiki page contents to a new web page at the top "site" level > (under arrow/site/ just like the [committers > page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe > named "contributing.html" or something. > # Modify the [navigation section in > arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] > to point to the newly created page instead of the wiki page. > The affected pages are all part of the Jekyll components, so there isn't a > need to build the Sphinx part of the docs to check your work. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
Tanya Schlusser created ARROW-4427: -- Summary: Move "Contributing to Apache Arrow" page to the static docs Key: ARROW-4427 URL: https://issues.apache.org/jira/browse/ARROW-4427 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Tanya Schlusser It's hard to find and modify the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] wiki page in Confluence. If it were moved to inside the static web page, that would make it easier. There are two steps to this: # Copy the wiki page contents to a new web page at the top "site" level (under arrow/site/ just like the [committers page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe named "contributing.html" or something. # Modify the [navigation section in arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] to point to the newly created page instead of the wiki page. The affected pages are all part of the Jekyll components, so there isn't a need to build the Sphinx part of the docs to check your work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanya Schlusser updated ARROW-4425: --- Description: It would be nice to link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] -Confluence- page in the Sphinx docs directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow site (location in repo)|https://github.com/apache/arrow/tree/master/site] would also make it easier to find and modify. A sub-task was added to do this. was: It would be nice to add a link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] Confluence page directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > -Confluence- page in the Sphinx docs directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] > > EDIT: Moving the "Contributing" wiki to a static page in the actual [Arrow > site (location in repo)|https://github.com/apache/arrow/tree/master/site] > would also make it easier to find and modify. A sub-task was added to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3596) [Packaging] Build gRPC in conda-forge
[ https://issues.apache.org/jira/browse/ARROW-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756160#comment-16756160 ] Antoine Pitrou commented on ARROW-3596: --- I'm currently investigating this. > [Packaging] Build gRPC in conda-forge > - > > Key: ARROW-3596 > URL: https://issues.apache.org/jira/browse/ARROW-3596 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > This may be a bit annoying as grpc's CMake files want to also install its > third party dependencies. [~msarahan] pointed me to a version for > AnacondaRecipes where there was an effort to "unvendor" these components > grpc depends on at least > * protobuf > * zlib > * either boringssl or openssl > * cares > * gflags > It looks like the grpc developers want to take on Abseil as a dependency > eventually, but it is not required yet -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756158#comment-16756158 ] Tanya Schlusser commented on ARROW-4425: Fair statement. Confluence is really hard for me to navigate. Updating and adding a sub-task. > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to add a link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > Confluence page directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756155#comment-16756155 ] Wes McKinney commented on ARROW-3769: - Cool. This is only implemented at the encoder level, so you should be able to use {{ArrayFromJSON}} to make writing the unit tests easier -- so for this JIRA I am expecting tests in {{parquet-encoding-test}} > [C++] Support reading non-dictionary encoded binary Parquet columns directly > as DictionaryArray > --- > > Key: ARROW-3769 > URL: https://issues.apache.org/jira/browse/ARROW-3769 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Hatem Helal >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > If the goal is to hash this data anyway into a categorical-type array, then > it would be better to offer the option to "push down" the hashing into the > Parquet read hot path rather than first fully materializing a dense vector of > {{ByteArray}} values, which could use a lot of memory after decompression -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3965) [Java] JDBC-to-Arrow Conversion: Configuration Object
[ https://issues.apache.org/jira/browse/ARROW-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3965: --- Assignee: Michael Pigott > [Java] JDBC-to-Arrow Conversion: Configuration Object > - > > Key: ARROW-3965 > URL: https://issues.apache.org/jira/browse/ARROW-3965 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Michael Pigott >Assignee: Michael Pigott >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > There are various methods for constructing Arrow vectors, alternating on two > inputs: > * A calendar object > * A BaseAllocator > This creates a configuration class (JdbcToArrowConfig) to simplify > configuring the adapter, and make it easier to add new functionality later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3965) [Java] JDBC-to-Arrow Conversion: Configuration Object
[ https://issues.apache.org/jira/browse/ARROW-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3965. - Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3133 [https://github.com/apache/arrow/pull/3133] > [Java] JDBC-to-Arrow Conversion: Configuration Object > - > > Key: ARROW-3965 > URL: https://issues.apache.org/jira/browse/ARROW-3965 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Michael Pigott >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are various methods for constructing Arrow vectors, alternating on two > inputs: > * A calendar object > * A BaseAllocator > This creates a configuration class (JdbcToArrowConfig) to simplify > configuring the adapter, and make it easier to add new functionality later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4426) [C++] Vendor CMake's UseJava
Krisztian Szucs created ARROW-4426: -- Summary: [C++] Vendor CMake's UseJava Key: ARROW-4426 URL: https://issues.apache.org/jira/browse/ARROW-4426 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Krisztian Szucs In order to provide gandiva JNI bindings for for Ubuntu 16.04 and 18.04 We need to vendor UseJava, because GENERATE_NATIVE_HEADERS is avilable since CMake 3.11 and Ubuntu 18.04 ships CMake 3.10. See https://github.com/apache/arrow/pull/3522 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4424) [Python] Manylinux CI builds failing
[ https://issues.apache.org/jira/browse/ARROW-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756052#comment-16756052 ] Uwe L. Korn commented on ARROW-4424: A new version of {{keras_preprocessing}} was released yesterday. This has an undeclared pandas dependency. I'll have a look. > [Python] Manylinux CI builds failing > > > Key: ARROW-4424 > URL: https://issues.apache.org/jira/browse/ARROW-4424 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Micah Kornfield >Priority: Blocker > > Example error build: https://api.travis-ci.org/v3/job/486336662/log.txt > > {{+python -c 'import pyarrow; import tensorflow'}} > {{ Traceback (most recent call last):}} > {{ File "", line 1, in }} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/__init__.py", > line 24, in }} > {{ from tensorflow.python import pywrap_tensorflow # pylint: > disable=unused-import}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/__init__.py", > line 88, in }} > {{ from tensorflow.python import keras}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py", > line 29, in }} > {{ from tensorflow.python.keras import datasets}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/datasets/__init__.py", > line 25, in }} > {{ from tensorflow.python.keras.datasets import imdb}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/datasets/imdb.py", > line 25, in }} > {{ from tensorflow.python.keras.preprocessing.sequence import > _remove_long_seq}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/preprocessing/__init__.py", > line 30, in }} > {{ from tensorflow.python.keras.preprocessing import image}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/tensorflow/python/keras/preprocessing/image.py", > line 23, in }} > {{ from keras_preprocessing import image}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/keras_preprocessing/image/__init__.py", > line 8, in }} > {{ from .dataframe_iterator import DataFrameIterator}} > {{ File > "/home/travis/build/apache/arrow/pyarrow-test-3.6/lib/python3.6/site-packages/keras_preprocessing/image/dataframe_iterator.py", > line 11, in }} > {{ from pandas.api.types import is_numeric_dtype}} > {{ ModuleNotFoundError: No module named 'pandas'}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4414) [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros
[ https://issues.apache.org/jira/browse/ARROW-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-4414. Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3522 [https://github.com/apache/arrow/pull/3522] > [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds > for older distros > -- > > Key: ARROW-4414 > URL: https://issues.apache.org/jira/browse/ARROW-4414 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > COMMAND_EXPAND_LISTS option of add_custom_command is too new on Ubuntu Xenial > and Debian stretch. It's available since CMake 3.8: > https://cmake.org/cmake/help/v3.8/command/add_custom_command.html > We need to stop using it in cpp/src/gandiva/precompiled/CMakeLists.txt > Also We should pin cmake to version 3.5 in travis builds (xenial ships cmake > 3.5) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756031#comment-16756031 ] Hatem Helal commented on ARROW-3769: I've started looking into this and starting with some unit tests to make sure I understand the inner workings. > [C++] Support reading non-dictionary encoded binary Parquet columns directly > as DictionaryArray > --- > > Key: ARROW-3769 > URL: https://issues.apache.org/jira/browse/ARROW-3769 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Hatem Helal >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > If the goal is to hash this data anyway into a categorical-type array, then > it would be better to offer the option to "push down" the hashing into the > Parquet read hot path rather than first fully materializing a dense vector of > {{ByteArray}} values, which could use a lot of memory after decompression -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
[ https://issues.apache.org/jira/browse/ARROW-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756035#comment-16756035 ] Antoine Pitrou commented on ARROW-4425: --- I think we should actually put such documents in the Sphinx-generated docs. There's no reason to have separate wiki pages for that, IMHO. > Add link to 'Contributing' page in the top-level Arrow README > - > > Key: ARROW-4425 > URL: https://issues.apache.org/jira/browse/ARROW-4425 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Tanya Schlusser >Priority: Major > > It would be nice to add a link to the ["Contributing to Apache > Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > Confluence page directly from the main project > [README|https://github.com/apache/arrow/blob/master/README.md] (in the > already existing "Getting involved" section) because it's a bit hard to find > right now. > "contributing" page: > [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] > main project README: [https://github.com/apache/arrow/blob/master/README.md] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4423) [C++] Update version of vendored gtest to 1.8.1
Micah Kornfield created ARROW-4423: -- Summary: [C++] Update version of vendored gtest to 1.8.1 Key: ARROW-4423 URL: https://issues.apache.org/jira/browse/ARROW-4423 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield conda-forge builds already use 1.8.1 This is a little tricky because library files get renamed on windows with the incremental version bump (debug files become libgmockd.lib). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4277) [C++] Add gmock to toolchain
[ https://issues.apache.org/jira/browse/ARROW-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755872#comment-16755872 ] Micah Kornfield commented on ARROW-4277: current issue seems to be linking on trusty: https://travis-ci.org/apache/arrow/builds/486300854?utm_source=github_status_medium=notification > [C++] Add gmock to toolchain > > > Key: ARROW-4277 > URL: https://issues.apache.org/jira/browse/ARROW-4277 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add gmock to the toolchain. > > It looks like before this can happen, a gmock feedstock on conda-forge has to > be setup so our CI setup can work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-4418) [Plasma] replace event loop with boost::asio for plasma store
[ https://issues.apache.org/jira/browse/ARROW-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755804#comment-16755804 ] Zhijun Fu edited comment on ARROW-4418 at 1/30/19 8:12 AM: --- In addition to above benefits that Robert mentioned, asio also provides opportunities for performance improvements, by providing io service, thread pool .etc. In our internal testing, which uses 10+ actors on a single machine, I found 50% of plasma store CPU are spent on receiving messages from plasma clients, using UNIX domain socket. I'm thinking that one way to improve perf is like this: * Use a pool of threads to receive messages from clients. To ensure correct behavior, we can bind a boost::strand to a single client, so that all the messages from a given client arrives in order. As this part is CPU consumings, using multiple threads is going to help. * After this, the messages are posted into io service of main thread, which calls ProcessMessages for each of them in order. * After this, post the replies to a pool of threads, again use boost::strand for each plasma client to ensure correct order. I'm thinking this would probably help on cases where there are multiple workers using plasma store on the same machine, which should be very common. And it seems implementing this would be hard without asio functionalities. was (Author: zhijunfu): In addition to above benefits that Robert mentioned, asio also provides opportunities for performance improvements, by providing io service, thread pool .etc. In our internal testing, which uses 10+ actors on a single machine, I found 50% of plasma store CPU are spent on receiving messages from plasma clients, using UNIX domain socket. I'm thinking that one way to improve perf is like this: * Use a pool of threads to receive messages from clients. To ensure correct behavior, we can bind a boost::strand to a single client, so that all the messages from a given client arrives in order. As this part is CPU consumings, using multiple threads is going to help. * After this, the messages are posted into io service of main thread, which calls ProcessMessages for each of them in order. * After this, post the replies to a pool of threads, again use boost::strand for each plasma client to ensure correct order. I'm thinking this would probably help on cases where there are multiple workers using plasma store on the same machine, which should be very common. And it seems implementing this would be hard without asio functionalities. Thoughts? > [Plasma] replace event loop with boost::asio for plasma store > - > > Key: ARROW-4418 > URL: https://issues.apache.org/jira/browse/ARROW-4418 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Original text: > It would be nice to move plasma store from current event loop to boost::asio > to modernize the code, and more importantly to benefit from the > functionalities provided by asio, which I think also provides opportunities > for performance improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)