[jira] [Assigned] (ARROW-8310) [C++] Minio's exceptions not recognized by IsConnectError()

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-8310:
-

Assignee: Antoine Pitrou

> [C++] Minio's exceptions not recognized by IsConnectError()
> ---
>
> Key: ARROW-8310
> URL: https://issues.apache.org/jira/browse/ARROW-8310
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
> Fix For: 1.0.0
>
>
> Minio emits an {{XMinioServerNotInitialized}} exception on failure to 
> connect, which is recognized by {{ConnectRetryStrategy}} and used to trigger 
> a retry instead of an error. This exception has an HTTP error 503.
> However this code does not round trip through the AWS SDK, which maintains an 
> explicit [mapping from known exception names to error 
> codes|https://github.com/aws/aws-sdk-cpp/blob/d36c2b16c9c3caf81524ebfff1e70782b8e1a006/aws-cpp-sdk-core/source/client/CoreErrors.cpp#L37]
>  and will demote an unrecognized exception name [to 
> {{CoreErrors::UNKNOWN}}|https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/source/client/AWSErrorMarshaller.cpp#L150]
> The end result is flakiness in the test (and therefore CI) since 
> {{ConnectRetryStrategy}} never gets a chance to operate, see for example 
> https://github.com/apache/arrow/pull/6789/checks?check_run_id=552871444#step:6:1778
> Probably {{IsConnectError}} will need to examine the error string in the 
> event of {{CoreErrors::UNKNOWN}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8314) [Python] Provide a method to select a subset of columns of a Table

2020-04-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8314:


 Summary: [Python] Provide a method to select a subset of columns 
of a Table
 Key: ARROW-8314
 URL: https://issues.apache.org/jira/browse/ARROW-8314
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Joris Van den Bossche


I looked through the open issues and in our API, but didn't directly find 
something about selecting a subset of columns of a table.

Assume you have a table like:

{code}
table = pa.table({'a': [1, 2], 'b': [.1, .2], 'c': ['a', 'b']})
{code}

You can select a single column with {{table.column('a')}} or {{table['a']}} to 
get a chunked array. You can add, append, remove and replace columns (with 
{{add_column}}, {{append_column}}, {{remove_column}}, {{set_column}}). 
But an easy way to get a subset of the columns (without the manuall removing 
the ones you don't want one by one) doesn't seem possible. 

I would propose something like:

{code}
table.select(['a', 'c'])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8310) [C++] Minio's exceptions not recognized by IsConnectError()

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8310:
--
Labels: pull-request-available  (was: )

> [C++] Minio's exceptions not recognized by IsConnectError()
> ---
>
> Key: ARROW-8310
> URL: https://issues.apache.org/jira/browse/ARROW-8310
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Minio emits an {{XMinioServerNotInitialized}} exception on failure to 
> connect, which is recognized by {{ConnectRetryStrategy}} and used to trigger 
> a retry instead of an error. This exception has an HTTP error 503.
> However this code does not round trip through the AWS SDK, which maintains an 
> explicit [mapping from known exception names to error 
> codes|https://github.com/aws/aws-sdk-cpp/blob/d36c2b16c9c3caf81524ebfff1e70782b8e1a006/aws-cpp-sdk-core/source/client/CoreErrors.cpp#L37]
>  and will demote an unrecognized exception name [to 
> {{CoreErrors::UNKNOWN}}|https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/source/client/AWSErrorMarshaller.cpp#L150]
> The end result is flakiness in the test (and therefore CI) since 
> {{ConnectRetryStrategy}} never gets a chance to operate, see for example 
> https://github.com/apache/arrow/pull/6789/checks?check_run_id=552871444#step:6:1778
> Probably {{IsConnectError}} will need to examine the error string in the 
> event of {{CoreErrors::UNKNOWN}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Fix Version/s: 0.17.0

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Minor
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Affects Version/s: 0.16.0

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8315) [Python]

2020-04-02 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8315:
---

 Summary: [Python]
 Key: ARROW-8315
 URL: https://issues.apache.org/jira/browse/ARROW-8315
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Ben Kietzman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Summary: [Python][Dataset] Don't rely on ordered dict keys in 
test_dataset.py  (was: [Python])

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Ben Kietzman
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Description: 
Python 3.5 does not guarantee insertion order of dict keys, so we can't rely on 
it when constructing tables in test_dataset.py

https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: dataset
> Fix For: 0.17.0
>
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Component/s: Python

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Minor
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8315:

Labels: dataset  (was: )

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: dataset
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8310) [C++] Minio's exceptions not recognized by IsConnectError()

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-8310.
-
Fix Version/s: (was: 1.0.0)
   0.17.0
   Resolution: Fixed

Issue resolved by pull request 6809
[https://github.com/apache/arrow/pull/6809]

> [C++] Minio's exceptions not recognized by IsConnectError()
> ---
>
> Key: ARROW-8310
> URL: https://issues.apache.org/jira/browse/ARROW-8310
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Minio emits an {{XMinioServerNotInitialized}} exception on failure to 
> connect, which is recognized by {{ConnectRetryStrategy}} and used to trigger 
> a retry instead of an error. This exception has an HTTP error 503.
> However this code does not round trip through the AWS SDK, which maintains an 
> explicit [mapping from known exception names to error 
> codes|https://github.com/aws/aws-sdk-cpp/blob/d36c2b16c9c3caf81524ebfff1e70782b8e1a006/aws-cpp-sdk-core/source/client/CoreErrors.cpp#L37]
>  and will demote an unrecognized exception name [to 
> {{CoreErrors::UNKNOWN}}|https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/source/client/AWSErrorMarshaller.cpp#L150]
> The end result is flakiness in the test (and therefore CI) since 
> {{ConnectRetryStrategy}} never gets a chance to operate, see for example 
> https://github.com/apache/arrow/pull/6789/checks?check_run_id=552871444#step:6:1778
> Probably {{IsConnectError}} will need to examine the error string in the 
> event of {{CoreErrors::UNKNOWN}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7534) [Java] Create a new java/contrib module

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7534:
--
Labels: pull-request-available  (was: )

> [Java] Create a new java/contrib module
> ---
>
> Key: ARROW-7534
> URL: https://issues.apache.org/jira/browse/ARROW-7534
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>
> To better clarify the status of java sub-modules, create a contrib module and 
> move the following modules underneath it.
> * algorithm
> * adapter
> * plasma



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074084#comment-17074084
 ] 

Wes McKinney commented on ARROW-8323:
-

I'm surprised that errors caused by thirdparty libraries can fail the build. Is 
there the equivalent of {{-isystem}} with MSVC?

> [C++] Pin gRPC at v1.27 to avoid compilation error in its headers
> -
>
> Key: ARROW-8323
> URL: https://issues.apache.org/jira/browse/ARROW-8323
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [gRPC 1.28|https://github.com/grpc/grpc/releases/tag/v1.28.0] includes a 
> change which introduces an implicit size_t->int conversion in proto_utils.h: 
> https://github.com/grpc/grpc/commit/2748755a4ff9ed940356e78c105f55f839fdf38b
> Conversion warnings are treated as errors for example here: 
> https://ci.appveyor.com/project/BenjaminKietzman/arrow/build/job/9cl0vqa8e495knn3#L1126
> So IIUC we need to pin gRPC to 1.27 for now.
> Upstream PR: https://github.com/grpc/grpc/pull/22557



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8245:
--
Labels: parquet pull-request-available  (was: parquet)

> [Python][Parquet] Skip hidden directories when reading partitioned parquet 
> files
> 
>
> Key: ARROW-8245
> URL: https://issues.apache.org/jira/browse/ARROW-8245
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Caleb Overman
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.17.0
>
>
> When writing a partitioned parquet file Spark can create a temporary hidden 
> {{.spark-staging}} directory within the parquet file. Because it is a 
> directory and not a file, it is not skipped when trying to read the parquet 
> file. Pyarrow currently only skips directories prefixed with {{_}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074114#comment-17074114
 ] 

Wes McKinney commented on ARROW-8244:
-

As long as there's a well-documented way to generate the _metadata file 
containing all the row group metadata and file paths in a single structure, and 
then construct a dataset from the _metadata file (avoiding having to parse the 
metadata from all the constituent files -- which is time consuming), that 
sounds good to me

> [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" 
> metadata fields
> ---
>
> Key: ARROW-8244
> URL: https://issues.apache.org/jira/browse/ARROW-8244
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: Rick Zamora
>Assignee: Joris Van den Bossche
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been 
> using the `write_to_dataset` API to write partitioned parquet datasets.  This 
> PR is switching to a (hopefully temporary) custom solution, because that API 
> makes it difficult to populate the the "file_path"  column-chunk metadata 
> fields that are returned within the optional `metadata_collector` kwarg.  
> Dask needs to set these fields correctly in order to generate a proper global 
> `"_metadata"` file.
> Possible solutions to this problem:
>  # Optionally populate the file-path fields within `write_to_dataset`
>  # Always populate the file-path fields within `write_to_dataset`
>  # Return the file paths for the data written within `write_to_dataset` (up 
> to the user to manually populate the file-path fields)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5473.
-
Resolution: Fixed

Issue resolved by pull request 6816
[https://github.com/apache/arrow/pull/6816]

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8266) [C++] Add backup mirrors for external project source downloads

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-8266:
---

Assignee: Ben Kietzman  (was: Neal Richardson)

> [C++] Add backup mirrors for external project source downloads
> --
>
> Key: ARROW-8266
> URL: https://issues.apache.org/jira/browse/ARROW-8266
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Ben Kietzman
>Priority: Minor
> Fix For: 0.17.0
>
>
> As we've seen a number of times, most recently with boost, our builds 
> sometimes fail because of a failure to download bundled dependencies. To 
> reduce this risk, we can add alternate URLs to the cmake externalprojects, so 
> that it will attempt to download from the second location if the first fails 
> (https://cmake.org/cmake/help/latest/module/ExternalProject.html). This 
> feature is available in cmake >=3.7. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8324) [R] Add read/write_ipc_file separate from _feather

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8324:
--

 Summary: [R] Add read/write_ipc_file separate from _feather
 Key: ARROW-8324
 URL: https://issues.apache.org/jira/browse/ARROW-8324
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson


See [https://github.com/apache/arrow/pull/6771#issuecomment-608133760]
{quote}Let's add read/write_ipc_file also? I'm wary of the "version" option in 
"write_feather" and the Feather version inference capability in "read_feather". 
It's potentially confusing and we may choose to add options to 
write_ipc_file/read_ipc_file that are more developer centric, having to do with 
particulars in the IPC format, that are not relevant or appropriate for the 
Feather APIs.

IMHO it's best for "Feather format" to remain an abstracted higher-level 
concept with its use of the "IPC file format" as an implementation detail, and 
segregated from the other things.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8316) [CI] Set docker-compose to use docker-cli instead of docker-py for building images

2020-04-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-8316:
---

Assignee: Krisztian Szucs

> [CI] Set docker-compose to use docker-cli instead of docker-py for building 
> images
> --
>
> Key: ARROW-8316
> URL: https://issues.apache.org/jira/browse/ARROW-8316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The images pushed from the master branch were sometimes producing reusable 
> layers, sometimes not. So the caching was working non-deterministically. 
> The underlying issue is https://github.com/docker/compose/issues/883



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5501) [R] Reorganize read/write file/stream functions

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-5501.

Resolution: Fixed

Issue resolved by pull request 6771
[https://github.com/apache/arrow/pull/6771]

> [R] Reorganize read/write file/stream functions
> ---
>
> Key: ARROW-5501
> URL: https://issues.apache.org/jira/browse/ARROW-5501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> read_feather and write_feather exist, and there is also write_arrow. But no 
> read_arrow.
> Some questions (which go beyond just R): There's talk of a "feather 2.0", 
> i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). 
> Are we going to continue to call the file format "Feather", and possibly 
> continue supporting the "feather 1.0" format as a subset/special case? Or 
> will "feather" mean this limited format and "arrow" be the name of the 
> full-featured file?
> In terms of this issue, should write_arrow be folded into write_feather and 
> there be an argument for indicating which version to write? Or should the 
> distinction be maintained, and we need to add a read_arrow() function?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8325) [R][CI] Stop including boost in R windows bundle

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8325:
--
Labels: pull-request-available  (was: )

> [R][CI] Stop including boost in R windows bundle
> 
>
> Key: ARROW-8325
> URL: https://issues.apache.org/jira/browse/ARROW-8325
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8325) [R][CI] Stop including boost in R windows bundle

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8325:
--

 Summary: [R][CI] Stop including boost in R windows bundle
 Key: ARROW-8325
 URL: https://issues.apache.org/jira/browse/ARROW-8325
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8325) [R][CI] Stop including boost in R windows bundle

2020-04-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-8325.
-
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6822
[https://github.com/apache/arrow/pull/6822]

> [R][CI] Stop including boost in R windows bundle
> 
>
> Key: ARROW-8325
> URL: https://issues.apache.org/jira/browse/ARROW-8325
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074188#comment-17074188
 ] 

Wes McKinney commented on ARROW-7939:
-

I can't reproduce on Windows 10 64-bit with the conda-forge nightly, not with 
the 0.16.0 wheel. I recommend we close as a cannot reproduce

> [Python] crashes when reading parquet file compressed with snappy
> -
>
> Key: ARROW-7939
> URL: https://issues.apache.org/jira/browse/ARROW-7939
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
> Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>Reporter: Marc Bernot
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074188#comment-17074188
 ] 

Wes McKinney edited comment on ARROW-7939 at 4/3/20, 1:31 AM:
--

I can't reproduce on Windows 10 64-bit with the conda-forge nightly, nor with 
the 0.16.0 wheel. I recommend we close as a cannot reproduce


was (Author: wesmckinn):
I can't reproduce on Windows 10 64-bit with the conda-forge nightly, not with 
the 0.16.0 wheel. I recommend we close as a cannot reproduce

> [Python] crashes when reading parquet file compressed with snappy
> -
>
> Key: ARROW-7939
> URL: https://issues.apache.org/jira/browse/ARROW-7939
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
> Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>Reporter: Marc Bernot
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-8317) [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor build to fail

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-8317.
---
Fix Version/s: (was: 0.17.0)
   Resolution: Duplicate

dup of ARROW-8323

> [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor build to fail
> -
>
> Key: ARROW-8317
> URL: https://issues.apache.org/jira/browse/ARROW-8317
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> This started occurring in the last few hours since grpc-cpp 1.28.0 update was 
> just merged on conda-forge
> https://ci.appveyor.com/project/wesm/arrow/build/job/8oe0n4epkxegr21x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074097#comment-17074097
 ] 

Wes McKinney commented on ARROW-8323:
-

Ah yes that's right, I remember having to do a bunch of those pragmas around 
Protobuf-related warnings when working on Flight

> [C++] Pin gRPC at v1.27 to avoid compilation error in its headers
> -
>
> Key: ARROW-8323
> URL: https://issues.apache.org/jira/browse/ARROW-8323
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [gRPC 1.28|https://github.com/grpc/grpc/releases/tag/v1.28.0] includes a 
> change which introduces an implicit size_t->int conversion in proto_utils.h: 
> https://github.com/grpc/grpc/commit/2748755a4ff9ed940356e78c105f55f839fdf38b
> Conversion warnings are treated as errors for example here: 
> https://ci.appveyor.com/project/BenjaminKietzman/arrow/build/job/9cl0vqa8e495knn3#L1126
> So IIUC we need to pin gRPC to 1.27 for now.
> Upstream PR: https://github.com/grpc/grpc/pull/22557



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8319) [CI] Install thrift compiler in the debian build

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-8319.
-
Resolution: Fixed

Issue resolved by pull request 6818
[https://github.com/apache/arrow/pull/6818]

> [CI] Install thrift compiler in the debian build
> 
>
> Key: ARROW-8319
> URL: https://issues.apache.org/jira/browse/ARROW-8319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CMake is missing thrift compiler after setting Thrift_SOURCE to empty from 
> AUTO, 
> see build: 
> https://github.com/apache/arrow/runs/555631125?check_suite_focus=true#step:6:143



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7809) [R] vignette does not run on Win 10 nor ubuntu

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-7809.

Fix Version/s: 0.17.0
 Assignee: Neal Richardson
   Resolution: Fixed

I believe this has been resolved, either by ARROW-7641 or by various other 
dataset patches that will be released in 0.17. Please open a new issue if you 
have further trouble.

> [R] vignette does not run on Win 10 nor ubuntu
> --
>
> Key: ARROW-7809
> URL: https://issues.apache.org/jira/browse/ARROW-7809
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Zhuo Jia Dai
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.17.0
>
>
> On Win10
> {code:java}
> bucket <- "https://ursa-labs-taxi-data.s3.us-east-2.amazonaws.com;
>  dir.create("nyc-taxi")
>  for (year in 2018:2018) {
>  if(!dir.exists(glue::glue("nyc-taxi/
> {year}/"))) {
>  dir.create(glue::glue("nyc-taxi/{year}
> /"))
>  }
> for (month in 1:12) {
>  if (month < 10)
> { month <- paste0("0", month) }
> if(!dir.exists(glue::glue("nyc-taxi/
> {year}/{month}"))) {
>  dir.create(glue::glue("nyc-taxi/{year}
> /
> {month}
> "))
>  }
>  try(download.file(
>  paste(bucket, year, month, "data.parquet", sep = "/"),
>  file.path("nyc-taxi", year, month, "data.parquet")
>  ))
>  }
>  }
> aa = arrow::open_dataset("nyc-taxi", partitioning = c("year", "month"))
> {code}
> gives error
>  
> {code:java}
> Error in dataset___FSSFactory__Make3(filesystem, selector, format, 
> partitioning) : 
>   IOError: Could not open parquet input source 
> 'nyc-taxi/2018/01/data.parquet': Couldn't deserialize thrift: 
> TProtocolException: Invalid data
> In addition: Warning message:
> {code}
> On Ubuntu, running
> {code:java}
> library(dplyr)ds = arrow::open_dataset("nyc-taxi", partitioning = c("year", 
> "month"))
> system.time(ds %>%
>   filter(total_amount > 100, year == 2015) %>%
>   select(tip_amount, total_amount, passenger_count) %>%
>   group_by(passenger_count) %>%
>   collect() %>%
>   summarize(
> tip_pct = median(100 * tip_amount / total_amount),
> n = n()
>   ) %>%
>   print())
> {code}
> gives the following segfault
> {code:java}
> *** caught segfault ***
> address (nil), cause 'memory not mapped'Traceback:
>  1: Table__to_dataframe(x, use_threads = option_use_threads())
>  2: as.data.frame.Table(scanner_builder$Finish()$ToTable())
>  3: as.data.frame(scanner_builder$Finish()$ToTable())
>  4: collect.arrow_dplyr_query(.)
>  5: collect(.)
>  6: function_list[[i]](value)
>  7: freduce(value, `_function_list`)
>  8: `_fseq`(`_lhs`)
>  9: eval(quote(`_fseq`(`_lhs`)), env, env)
> 10: eval(quote(`_fseq`(`_lhs`)), env, env)
> 11: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
> 12: ds %>% filter(total_amount > 100, year == 2015) %>% select(tip_amount,
>  total_amount, passenger_count) %>% group_by(passenger_count) %>% 
> collect() %>% summarize(tip_pct = median(100 * tip_amount/total_amount), 
> n = n()) %>% print()
> 13: system.time(ds %>% filter(total_amount > 100, year == 2015) %>% 
> select(tip_amount, total_amount, passenger_count) %>% 
> group_by(passenger_count) %>% collect() %>% summarize(tip_pct = 
> median(100 * tip_amount/total_amount), n = n()) %>% print())
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8216) [R][C++][Dataset] Filtering returns all-missing rows where the filtering column is missing

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8216.

Resolution: Fixed

Issue resolved by pull request 6732
[https://github.com/apache/arrow/pull/6732]

> [R][C++][Dataset] Filtering returns all-missing rows where the filtering 
> column is missing
> --
>
> Key: ARROW-8216
> URL: https://issues.apache.org/jira/browse/ARROW-8216
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.16.0
> Environment: R 3.6.3, Windows 10
>Reporter: Sam Albers
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
>  
> I have just noticed some slightly odd behaviour with the filter method for 
> Dataset. 
>  
> {code:java}
> library(arrow)
> library(dplyr)
> packageVersion("arrow")
> #> [1] '0.16.0.20200323'
> ## Make sample parquet
> starwars$hair_color[starwars$hair_color == "brown"] <- ""
> dir <- tempdir()
> fpath <- file.path(dir, "data.parquet")
> write_parquet(starwars, fpath)
> ## df in memory
> df_mem <- starwars %>%
>  filter(hair_color == "")
> ## reading from the parquet
> df_parquet <- read_parquet(fpath) %>%
>  filter(hair_color == "")
> ## using open_dataset
> df_dataset <- open_dataset(dir) %>%
>  filter(hair_color == "") %>%
>  collect()
> identical(df_mem, df_parquet)
> #> [1] TRUE
> identical(df_mem, df_dataset)
> #> [1] FALSE
> {code}
>  
>  
> I'm pretty sure all these should return the same data.frame. Am I missing 
> something?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-5473:
---

Assignee: Ben Kietzman

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-4286) [C++/R] Namespace vendored Boost

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-4286.
--
Fix Version/s: 0.17.0
 Assignee: Neal Richardson
   Resolution: Won't Fix

The R package no longer depends on boost

> [C++/R] Namespace vendored Boost
> 
>
> Key: ARROW-4286
> URL: https://issues.apache.org/jira/browse/ARROW-4286
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Packaging, R
>Reporter: Uwe Korn
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.17.0
>
>
> For R, we vendor Boost and thus also include the symbols privately in our 
> modules. While they are private, some things like virtual destructors can 
> still interfere with other packages that vendor Boost. We should also 
> namespace the vendored Boost as we do in the manylinux1 packaging: 
> https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread Renat Valiullin (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074092#comment-17074092
 ] 

Renat Valiullin commented on ARROW-8323:


#pragma warning(disable:4267)

?

> [C++] Pin gRPC at v1.27 to avoid compilation error in its headers
> -
>
> Key: ARROW-8323
> URL: https://issues.apache.org/jira/browse/ARROW-8323
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [gRPC 1.28|https://github.com/grpc/grpc/releases/tag/v1.28.0] includes a 
> change which introduces an implicit size_t->int conversion in proto_utils.h: 
> https://github.com/grpc/grpc/commit/2748755a4ff9ed940356e78c105f55f839fdf38b
> Conversion warnings are treated as errors for example here: 
> https://ci.appveyor.com/project/BenjaminKietzman/arrow/build/job/9cl0vqa8e495knn3#L1126
> So IIUC we need to pin gRPC to 1.27 for now.
> Upstream PR: https://github.com/grpc/grpc/pull/22557



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8312) improve IN expression support

2020-04-02 Thread Yuan Zhou (Jira)
Yuan Zhou created ARROW-8312:


 Summary: improve IN expression support
 Key: ARROW-8312
 URL: https://issues.apache.org/jira/browse/ARROW-8312
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva, Java
Reporter: Yuan Zhou
Assignee: Yuan Zhou


Gandiva C++ provided IN API[1] is able to accept TreeNode as param, which 
allows IN expression to operate on output of some function. However in Java 
API[2], IN expression only accept Field as param, which limits the API usage. 

[1] 
https://github.com/apache/arrow/blob/master/cpp/src/gandiva/tree_expr_builder.h#L94-L125
[2] 
https://github.com/apache/arrow/blob/master/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/InNode.java#L50-L63



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8239) [Java] fix param checks in splitAndTransfer method

2020-04-02 Thread Prudhvi Porandla (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prudhvi Porandla resolved ARROW-8239.
-
Resolution: Fixed

> [Java] fix param checks in splitAndTransfer method
> --
>
> Key: ARROW-8239
> URL: https://issues.apache.org/jira/browse/ARROW-8239
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6479) [C++] inline errors from external projects' build logs

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6479:
--
Labels: pull-request-available  (was: )

> [C++] inline errors from external projects' build logs
> --
>
> Key: ARROW-6479
> URL: https://issues.apache.org/jira/browse/ARROW-6479
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently when an external project build fails, we get a very uninformative 
> message:
> {code}
> [88/543] Performing build step for 'flatbuffers_ep'
> FAILED: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/lib/libflatbuffers.a 
> cd /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-build && 
> /usr/bin/cmake -P 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake
>  && /usr/bin/cmake -E touch 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build
> CMake Error at 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake:16
>  (message):
>   Command failed: 1
>'/usr/bin/cmake' '--build' '.'
>   See also
> 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log
> {code}
> It would be far more useful if the error were caught and relevant section (or 
> even the entirity) of {{ 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log}}
>  were output instead. This is doubly the case on CI where accessing those 
> logs is non trivial



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-8315:
--

Assignee: Krisztian Szucs

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Krisztian Szucs
>Priority: Minor
>  Labels: dataset
> Fix For: 0.17.0
>
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7904) [C++] Decide about Field/Schema metadata printing parameters and how much to show by default

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-7904.
-
Resolution: Fixed

Issue resolved by pull request 6577
[https://github.com/apache/arrow/pull/6577]

> [C++] Decide about Field/Schema metadata printing parameters and how much to 
> show by default
> 
>
> Key: ARROW-7904
> URL: https://issues.apache.org/jira/browse/ARROW-7904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> See discussion in https://github.com/apache/arrow/pull/6472 for follow up 
> discussions to ARROW-7063



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8023) [Website] Write a blog post about the C data interface

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8023:
--
Labels: pull-request-available  (was: )

> [Website] Write a blog post about the C data interface
> --
>
> Key: ARROW-8023
> URL: https://issues.apache.org/jira/browse/ARROW-8023
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-8276.
-
Resolution: Fixed

Issue resolved by pull request 6765
[https://github.com/apache/arrow/pull/6765]

> [C++][Dataset] Scanning a Fragment does not take into account the partition 
> columns
> ---
>
> Key: ARROW-8276
> URL: https://issues.apache.org/jira/browse/ARROW-8276
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Dataset
>Reporter: Joris Van den Bossche
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Follow-up on ARROW-8061, the {{to_table}} method doesn't work for fragments 
> created from a partitioned dataset.
> (will add a reproducer later)
> cc [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6479) [C++] inline errors from external projects' build logs

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-6479:
---

Assignee: Ben Kietzman

> [C++] inline errors from external projects' build logs
> --
>
> Key: ARROW-6479
> URL: https://issues.apache.org/jira/browse/ARROW-6479
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently when an external project build fails, we get a very uninformative 
> message:
> {code}
> [88/543] Performing build step for 'flatbuffers_ep'
> FAILED: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/lib/libflatbuffers.a 
> cd /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-build && 
> /usr/bin/cmake -P 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake
>  && /usr/bin/cmake -E touch 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build
> CMake Error at 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake:16
>  (message):
>   Command failed: 1
>'/usr/bin/cmake' '--build' '.'
>   See also
> 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log
> {code}
> It would be far more useful if the error were caught and relevant section (or 
> even the entirity) of {{ 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log}}
>  were output instead. This is doubly the case on CI where accessing those 
> logs is non trivial



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8312) improve IN expression support

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8312:
--
Labels: pull-request-available  (was: )

> improve IN expression support
> -
>
> Key: ARROW-8312
> URL: https://issues.apache.org/jira/browse/ARROW-8312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva, Java
>Reporter: Yuan Zhou
>Assignee: Yuan Zhou
>Priority: Major
>  Labels: pull-request-available
>
> Gandiva C++ provided IN API[1] is able to accept TreeNode as param, which 
> allows IN expression to operate on output of some function. However in Java 
> API[2], IN expression only accept Field as param, which limits the API usage. 
> [1] 
> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/tree_expr_builder.h#L94-L125
> [2] 
> https://github.com/apache/arrow/blob/master/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/InNode.java#L50-L63



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8313) [Gandiva][UDF] Solutions to register new UDFs dynamically without checking it into arrow repo.

2020-04-02 Thread ZMZ91 (Jira)
ZMZ91 created ARROW-8313:


 Summary: [Gandiva][UDF] Solutions to register new UDFs dynamically 
without checking it into arrow repo.
 Key: ARROW-8313
 URL: https://issues.apache.org/jira/browse/ARROW-8313
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: ZMZ91


Hi there,

Recently I'm studying on gandiva and trying to add some UDF. I noted that it's 
needed to check the UDF implementation into the arrow repo, register the UDF 
and then build the UDF into precompiled_bitcode lib, right? I'm just wandering 
that is it possible to register new UDFs dynamically? Say I have the UDF 
implementation code locally which is not built into the gandiva lib yet, am I 
able to call some function or other solutions provided by gandiva officially to 
register and implement it. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5585) [Go] rename arrow.TypeEquals into arrow.TypeEqual

2020-04-02 Thread Sebastien Binet (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5585.

Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6746
[https://github.com/apache/arrow/pull/6746]

> [Go] rename arrow.TypeEquals into arrow.TypeEqual
> -
>
> Key: ARROW-5585
> URL: https://issues.apache.org/jira/browse/ARROW-5585
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> this is to follow Go' stdlib conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8304) [Flight][Python] Flight client with TLS root certificate is reporting error on do_get()

2020-04-02 Thread Ravindra Wagh (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073497#comment-17073497
 ] 

Ravindra Wagh commented on ARROW-8304:
--

Thanks [~lidavidm], I have fixed this issue by assigning root certificate to 
newly created client and created PR: 
[https://github.com/apache/arrow/pull/6805].

> [Flight][Python] Flight client with TLS root certificate is reporting error 
> on do_get()
> ---
>
> Key: ARROW-8304
> URL: https://issues.apache.org/jira/browse/ARROW-8304
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Affects Versions: 0.16.0
>Reporter: Ravindra Wagh
>Assignee: Ravindra Wagh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have started a flight's local python server with TLS support using the 
> testing certificates present in repo:
> {code:java}
> python3 /python/examples/flight/server.py  --host localhost --tls 
> /testing/data/flight/cert0.pem 
> /testing/data/flight/cert0.key{code}
> This server is started successfully.
> Now I started testing the python client with TLS support.
> 1. Client pushing a csv file to the flightendpoint server: 
> {code:java}
> python3 /python/examples/flight/client.py put --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005 
> /sharedFolder/dataset/iris.csv{code}
> File iris.csv is pushed successfully.
> 2. List the flights available on the server
> {code:java}
> python3 /python/examples/flight/client.py list --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005{code}
>  It is listing the flight which is pushed in above step 1.
> 3. Get/Retrieve the specific flight(eg. /sharedFolder/dataset/iris.csv) from 
> the server
> {code:java}
> python3 /python/examples/flight/client.py get --tls --tls-roots 
> /testing/data/flight/root-ca.pem -p /sharedFolder/dataset/iris.csv 
> localhost:5005{code}
>  It is failing with following errors:
> {quote}Ticket: 
> 
> {color:#ff}E0401 06:43:30.164324553    1055 
> ssl_transport_security.cc:1238] Handshake failed with fatal error 
> SSL_ERROR_SSL: error:14090086:SSL 
> routines:ssl3_get_server_certificate:certificate verify failed.{color}
> Traceback (most recent call last):
>   File "/python/examples/flight/client.py", line 178, in 
>     main()
>   File "/python/examples/flight/client.py", line 174, in main
>     commands[args.action](args, client)
>   File "/python/examples/flight/client.py", line 98, in get_flight
>     reader = get_client.do_get(endpoint.ticket)
>   File "pyarrow/_flight.pyx", line 1144, in 
> pyarrow._flight.FlightClient.do_get
>   File "pyarrow/_flight.pyx", line 73, in pyarrow._flight.check_flight_status
> pyarrow._flight.FlightUnavailableError: gRPC returned unavailable error, with 
> message: Connect Failed
> {quote}
> Python client.py is working for functions like _list_flights(),  do_action(), 
> push_data()_ but failing on 
> _get_flight()_ function for code line.
> {code:java}
> reader = get_client.do_get(endpoint.ticket) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8082) [Java][Plasma] Add JNI list() interface

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8082:
---

Assignee: KunshangJi

> [Java][Plasma] Add JNI list() interface
> ---
>
> Key: ARROW-8082
> URL: https://issues.apache.org/jira/browse/ARROW-8082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma, Java
>Reporter: KunshangJi
>Assignee: KunshangJi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently Plasma list() method is not implemented in Java code. Implement 
> PlasmaClientJNI.list() interface and add some UT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8082) [Java][Plasma] Add JNI list() interface

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8082.
-
Fix Version/s: (was: 1.0.0)
   0.17.0
   Resolution: Fixed

Issue resolved by pull request 6588
[https://github.com/apache/arrow/pull/6588]

> [Java][Plasma] Add JNI list() interface
> ---
>
> Key: ARROW-8082
> URL: https://issues.apache.org/jira/browse/ARROW-8082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma, Java
>Reporter: KunshangJi
>Assignee: KunshangJi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently Plasma list() method is not implemented in Java code. Implement 
> PlasmaClientJNI.list() interface and add some UT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

2020-04-02 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073860#comment-17073860
 ] 

Antoine Pitrou commented on ARROW-7939:
---

I've also checked that the nightly builds work fine.

[~marcbernot] Can you try to install a nightly build?
{code:java}
conda update -c arrow-nightlies pyarrow

{code}

> [Python] crashes when reading parquet file compressed with snappy
> -
>
> Key: ARROW-7939
> URL: https://issues.apache.org/jira/browse/ARROW-7939
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
> Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>Reporter: Marc Bernot
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8315:
--
Labels: dataset pull-request-available  (was: dataset)

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: dataset, pull-request-available
> Fix For: 0.17.0
>
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8079) [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8079.
---
Resolution: Fixed

Issue resolved by pull request 6793
[https://github.com/apache/arrow/pull/6793]

> [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where 
> relevant
> --
>
> Key: ARROW-8079
> URL: https://issues.apache.org/jira/browse/ARROW-8079
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Per mailing list discussion, it may be better to not return the metadata 
> always as a dict and instead wrap the KeyValueMetadata methods. We can make 
> {{__getitem__}} lookup a key in it of course



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors

2020-04-02 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-8162:
--
Fix Version/s: (was: 1.0.0)
   0.17.0

> [Format][Python] Add serialization for CSF sparse tensors
> -
>
> Key: ARROW-8162
> URL: https://issues.apache.org/jira/browse/ARROW-8162
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is 
> complete serialization for CSF sparse tensors should be enabled in Python too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-02 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073862#comment-17073862
 ] 

Antoine Pitrou commented on ARROW-8307:
---

Why not always make it on by default for Feather? Are we worried that people 
may overwrite the file while in use?

> [Python] Expose use_memory_map option in pyarrow.feather APIs
> -
>
> Key: ARROW-8307
> URL: https://issues.apache.org/jira/browse/ARROW-8307
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8167) [CI] Add support for skipping builds with skip pattern in pull request title

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8167.
---
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6671
[https://github.com/apache/arrow/pull/6671]

> [CI] Add support for skipping builds with skip pattern in pull request title
> 
>
> Key: ARROW-8167
> URL: https://issues.apache.org/jira/browse/ARROW-8167
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Github actions doesn't support to skip builds marked as [skip ci] by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8316) [CI] Set docker-compose to use docker-cli instead of docker-py for building images

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8316:
--
Labels: pull-request-available  (was: )

> [CI] Set docker-compose to use docker-cli instead of docker-py for building 
> images
> --
>
> Key: ARROW-8316
> URL: https://issues.apache.org/jira/browse/ARROW-8316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> The images pushed from the master branch were sometimes producing reusable 
> layers, sometimes not. So the caching was working non-deterministically. 
> The underlying issue is https://github.com/docker/compose/issues/883



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073905#comment-17073905
 ] 

Wes McKinney commented on ARROW-8315:
-

I recommend not using dict-based table construction at all in unit tests. Use 
the {{table($LIST, names=$LIST)}} method

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: dataset, pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7939) [Python] crashes when reading parquet file compressed with snappy

2020-04-02 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073859#comment-17073859
 ] 

Antoine Pitrou commented on ARROW-7939:
---

I could not reproduce under Windows 7 64-bit, with pyarrow 0.16.0 from 
conda-forge.

> [Python] crashes when reading parquet file compressed with snappy
> -
>
> Key: ARROW-7939
> URL: https://issues.apache.org/jira/browse/ARROW-7939
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
> Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>Reporter: Marc Bernot
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7852) [Python] 0.16.0 wheels not compatible with older numpy

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-7852.
---
Resolution: Fixed

Issue resolved by pull request 6767
[https://github.com/apache/arrow/pull/6767]

> [Python] 0.16.0 wheels not compatible with older numpy
> --
>
> Key: ARROW-7852
> URL: https://issues.apache.org/jira/browse/ARROW-7852
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.16.0
>Reporter: Stephanie Gott
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 
> (see below for error). Updating numpy to the most recent version fixes this, 
> and I'm wondering if pyarrow needs update its requirements.txt.
>  
> {code:java}
> ➜  ~ ipython
> Python 3.7.5 (default, Nov  7 2019, 10:50:52)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
> In [1]: import numpy as npIn [2]: np.__version__
> Out[2]: '1.14.6'
> In [3]: import pyarrow
> ---
> ModuleNotFoundError   Traceback (most recent call last)
> ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
> ---
> ImportError   Traceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init 
> pyarrow.lib()ImportError: numpy.core.multiarray failed to import
> In [4]: import pyarrow
> ---
> AttributeErrorTraceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init 
> pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no 
> attribute '__reduce_cython__'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-8315:
-

Assignee: Antoine Pitrou  (was: Krisztian Szucs)

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: dataset
> Fix For: 0.17.0
>
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8316) [CI] Set docker-compose to use docker-cli instead of docker-py for building images

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8316:
--

 Summary: [CI] Set docker-compose to use docker-cli instead of 
docker-py for building images
 Key: ARROW-8316
 URL: https://issues.apache.org/jira/browse/ARROW-8316
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Krisztian Szucs


The images pushed from the master branch were sometimes producing reusable 
layers, sometimes not. So the caching was working non-deterministically. 
The underlying issue is https://github.com/docker/compose/issues/883







--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8272.
---
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6762
[https://github.com/apache/arrow/pull/6762]

> [CI][Python] Test failure on Ubuntu 16.04
> -
>
> Key: ARROW-8272
> URL: https://issues.apache.org/jira/browse/ARROW-8272
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Krisztian Szucs
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See https://github.com/pitrou/arrow/runs/545291564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8304) [Flight][Python] Flight client with TLS root certificate is reporting error on do_get()

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-8304:
-

Assignee: Antoine Pitrou  (was: Ravindra Wagh)

> [Flight][Python] Flight client with TLS root certificate is reporting error 
> on do_get()
> ---
>
> Key: ARROW-8304
> URL: https://issues.apache.org/jira/browse/ARROW-8304
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Affects Versions: 0.16.0
>Reporter: Ravindra Wagh
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I have started a flight's local python server with TLS support using the 
> testing certificates present in repo:
> {code:java}
> python3 /python/examples/flight/server.py  --host localhost --tls 
> /testing/data/flight/cert0.pem 
> /testing/data/flight/cert0.key{code}
> This server is started successfully.
> Now I started testing the python client with TLS support.
> 1. Client pushing a csv file to the flightendpoint server: 
> {code:java}
> python3 /python/examples/flight/client.py put --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005 
> /sharedFolder/dataset/iris.csv{code}
> File iris.csv is pushed successfully.
> 2. List the flights available on the server
> {code:java}
> python3 /python/examples/flight/client.py list --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005{code}
>  It is listing the flight which is pushed in above step 1.
> 3. Get/Retrieve the specific flight(eg. /sharedFolder/dataset/iris.csv) from 
> the server
> {code:java}
> python3 /python/examples/flight/client.py get --tls --tls-roots 
> /testing/data/flight/root-ca.pem -p /sharedFolder/dataset/iris.csv 
> localhost:5005{code}
>  It is failing with following errors:
> {quote}Ticket: 
> 
> {color:#ff}E0401 06:43:30.164324553    1055 
> ssl_transport_security.cc:1238] Handshake failed with fatal error 
> SSL_ERROR_SSL: error:14090086:SSL 
> routines:ssl3_get_server_certificate:certificate verify failed.{color}
> Traceback (most recent call last):
>   File "/python/examples/flight/client.py", line 178, in 
>     main()
>   File "/python/examples/flight/client.py", line 174, in main
>     commands[args.action](args, client)
>   File "/python/examples/flight/client.py", line 98, in get_flight
>     reader = get_client.do_get(endpoint.ticket)
>   File "pyarrow/_flight.pyx", line 1144, in 
> pyarrow._flight.FlightClient.do_get
>   File "pyarrow/_flight.pyx", line 73, in pyarrow._flight.check_flight_status
> pyarrow._flight.FlightUnavailableError: gRPC returned unavailable error, with 
> message: Connect Failed
> {quote}
> Python client.py is working for functions like _list_flights(),  do_action(), 
> push_data()_ but failing on 
> _get_flight()_ function for code line.
> {code:java}
> reader = get_client.do_get(endpoint.ticket) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8304) [Flight][Python] Flight client with TLS root certificate is reporting error on do_get()

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8304.
---
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6808
[https://github.com/apache/arrow/pull/6808]

> [Flight][Python] Flight client with TLS root certificate is reporting error 
> on do_get()
> ---
>
> Key: ARROW-8304
> URL: https://issues.apache.org/jira/browse/ARROW-8304
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Affects Versions: 0.16.0
>Reporter: Ravindra Wagh
>Assignee: Ravindra Wagh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I have started a flight's local python server with TLS support using the 
> testing certificates present in repo:
> {code:java}
> python3 /python/examples/flight/server.py  --host localhost --tls 
> /testing/data/flight/cert0.pem 
> /testing/data/flight/cert0.key{code}
> This server is started successfully.
> Now I started testing the python client with TLS support.
> 1. Client pushing a csv file to the flightendpoint server: 
> {code:java}
> python3 /python/examples/flight/client.py put --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005 
> /sharedFolder/dataset/iris.csv{code}
> File iris.csv is pushed successfully.
> 2. List the flights available on the server
> {code:java}
> python3 /python/examples/flight/client.py list --tls --tls-roots 
> /testing/data/flight/root-ca.pem localhost:5005{code}
>  It is listing the flight which is pushed in above step 1.
> 3. Get/Retrieve the specific flight(eg. /sharedFolder/dataset/iris.csv) from 
> the server
> {code:java}
> python3 /python/examples/flight/client.py get --tls --tls-roots 
> /testing/data/flight/root-ca.pem -p /sharedFolder/dataset/iris.csv 
> localhost:5005{code}
>  It is failing with following errors:
> {quote}Ticket: 
> 
> {color:#ff}E0401 06:43:30.164324553    1055 
> ssl_transport_security.cc:1238] Handshake failed with fatal error 
> SSL_ERROR_SSL: error:14090086:SSL 
> routines:ssl3_get_server_certificate:certificate verify failed.{color}
> Traceback (most recent call last):
>   File "/python/examples/flight/client.py", line 178, in 
>     main()
>   File "/python/examples/flight/client.py", line 174, in main
>     commands[args.action](args, client)
>   File "/python/examples/flight/client.py", line 98, in get_flight
>     reader = get_client.do_get(endpoint.ticket)
>   File "pyarrow/_flight.pyx", line 1144, in 
> pyarrow._flight.FlightClient.do_get
>   File "pyarrow/_flight.pyx", line 73, in pyarrow._flight.check_flight_status
> pyarrow._flight.FlightUnavailableError: gRPC returned unavailable error, with 
> message: Connect Failed
> {quote}
> Python client.py is working for functions like _list_flights(),  do_action(), 
> push_data()_ but failing on 
> _get_flight()_ function for code line.
> {code:java}
> reader = get_client.do_get(endpoint.ticket) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8185) [Packaging] Document the available nightly wheels and conda packages

2020-04-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-8185.
---
Resolution: Fixed

Issue resolved by pull request 6780
[https://github.com/apache/arrow/pull/6780]

> [Packaging] Document the available nightly wheels and conda packages
> 
>
> Key: ARROW-8185
> URL: https://issues.apache.org/jira/browse/ARROW-8185
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The packaging scripts are uploading the artifacts to package manager specific 
> hosting services like Anaconda and Gemfury. We should document this in a form 
> which conforms the [ASF 
> Policy|https://www.apache.org/dev/release-distribution.html#unreleased].
> For more information see the conversation at 
> https://github.com/apache/arrow/pull/6669#issuecomment-601947006



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5473:

Fix Version/s: 0.17.0

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8005) [Website] Review and adjust any usages of Apache dist system from website / tools

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8005:
--
Labels: pull-request-available  (was: )

> [Website] Review and adjust any usages of Apache dist system from website / 
> tools
> -
>
> Key: ARROW-8005
> URL: https://issues.apache.org/jira/browse/ARROW-8005
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>
> ASF Infra has communicated
> "As of March 2020, we are deprecating www.apache.org/dist/ in favor of
> https://downloads.apache.org/ for backup downloads as well as signature
> and checksum verification. The primary driver has been splitting up web
> site visits and downloads to gain better control and offer a better
> service for both downloads and web site visits.
> As stated, this does not impact end-users, and should have a minimal
> impact on projects, as our download selectors as well as visits to
> www.apache.org/dist/ have been adjusted to make use of
> downloads.apache.org instead. We do however ask that projects, in their
> own time-frame, change references on their own web sites from
> www.apache.org/dist/ to downloads.apache.org wherever such references
> may exist, to complete the switch in full. We will NOT be turning off
> www.apache.org/dist/ in the near future, but would greatly appreciate if
> projects could help us transition away from the old URLs in their
> documentation and on their download pages.
> The standard way of uploading releases[1] will STILL apply, however
> there may be a short delay (<= 15 minutes) between releasing and
> releases showing up on downloads.apache.org for technical reasons.
> "
> We should adjust our website and if necessary our release scripts based on 
> this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7008) [Python] pyarrow.chunked_array([array]) fails on array with all-None buffers

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-7008.
-
Resolution: Fixed

Issue resolved by pull request 6803
[https://github.com/apache/arrow/pull/6803]

> [Python] pyarrow.chunked_array([array]) fails on array with all-None buffers
> 
>
> Key: ARROW-7008
> URL: https://issues.apache.org/jira/browse/ARROW-7008
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
>Reporter: Uwe Korn
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Minimal reproducer:
> {code}
> import pyarrow as pa
> pa.chunked_array([pa.array([], 
> type=pa.string()).dictionary_encode().dictionary])
> {code}
> Traceback
> {code}
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x20)
>   * frame #0: 0x000112cd5d0e libarrow.15.dylib`arrow::Status 
> arrow::internal::ValidateVisitor::ValidateOffsets const>(arrow::BinaryArray const&) + 94
> frame #1: 0x000112cc79a3 libarrow.15.dylib`arrow::Status 
> arrow::VisitArrayInline(arrow::Array 
> const&, arrow::internal::ValidateVisitor*) + 915
> frame #2: 0x000112cc747d libarrow.15.dylib`arrow::Array::Validate() 
> const + 829
> frame #3: 0x000112e3ea19 
> libarrow.15.dylib`arrow::ChunkedArray::Validate() const + 89
> frame #4: 0x000112b8eb7d 
> lib.cpython-37m-darwin.so`__pyx_pw_7pyarrow_3lib_135chunked_array(_object*, 
> _object*, _object*) + 3661
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8315) [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py

2020-04-02 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-8315.
-
Resolution: Fixed

Issue resolved by pull request 6814
[https://github.com/apache/arrow/pull/6814]

> [Python][Dataset] Don't rely on ordered dict keys in test_dataset.py
> 
>
> Key: ARROW-8315
> URL: https://issues.apache.org/jira/browse/ARROW-8315
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: dataset, pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Python 3.5 does not guarantee insertion order of dict keys, so we can't rely 
> on it when constructing tables in test_dataset.py
> https://github.com/apache/arrow/pull/6809/checks?check_run_id=554945477#step:6:2166



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8319) [CI] Install thrift compiler in the debian build

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8319:
--
Labels: pull-request-available  (was: )

> [CI] Install thrift compiler in the debian build
> 
>
> Key: ARROW-8319
> URL: https://issues.apache.org/jira/browse/ARROW-8319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>
> CMake is missing thrift compiler after setting Thrift_SOURCE to empty from 
> AUTO, 
> see build: 
> https://github.com/apache/arrow/runs/555631125?check_suite_focus=true#step:6:143



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8321) [CI] Use bundled thrift in Fedora 30 build

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8321:
--

 Summary: [CI] Use bundled thrift in Fedora 30 build
 Key: ARROW-8321
 URL: https://issues.apache.org/jira/browse/ARROW-8321
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Affects Versions: 0.17.0
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


After unsetting Thrift_SOURCE from AUTO it surfaced that the thrift available 
on Fedora 30 is older 0.10 than the minimal required version 0.11.

Build thrift_ep instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8320) [Documentation][Format] Clarify (lack of) alignment requirements in C data interface

2020-04-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8320:
---

 Summary: [Documentation][Format] Clarify (lack of) alignment 
requirements in C data interface
 Key: ARROW-8320
 URL: https://issues.apache.org/jira/browse/ARROW-8320
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Format
Reporter: Wes McKinney
 Fix For: 0.17.0


This document should clarify that memory buffers need not start on aligned 
pointer offsets. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8322) [CI] Fix C# workflow file syntax

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8322:
--
Labels: pull-request-available  (was: )

> [CI] Fix C# workflow file syntax
> 
>
> Key: ARROW-8322
> URL: https://issues.apache.org/jira/browse/ARROW-8322
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> The github actions expression requires the enclosing "${{ }}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6479) [C++] inline errors from external projects' build logs

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6479.
-
Fix Version/s: (was: 1.0.0)
   0.17.0
   Resolution: Fixed

Issue resolved by pull request 6813
[https://github.com/apache/arrow/pull/6813]

> [C++] inline errors from external projects' build logs
> --
>
> Key: ARROW-6479
> URL: https://issues.apache.org/jira/browse/ARROW-6479
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently when an external project build fails, we get a very uninformative 
> message:
> {code}
> [88/543] Performing build step for 'flatbuffers_ep'
> FAILED: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/lib/libflatbuffers.a 
> cd /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-build && 
> /usr/bin/cmake -P 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake
>  && /usr/bin/cmake -E touch 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build
> CMake Error at 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake:16
>  (message):
>   Command failed: 1
>'/usr/bin/cmake' '--build' '.'
>   See also
> 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log
> {code}
> It would be far more useful if the error were caught and relevant section (or 
> even the entirity) of {{ 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log}}
>  were output instead. This is doubly the case on CI where accessing those 
> logs is non trivial



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073925#comment-17073925
 ] 

Wes McKinney commented on ARROW-5473:
-

This has been occurring in the last week, seen in build failures here

https://github.com/apache/arrow/pull/6813

We should try [~rip@gmail.com]'s proposed fix

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5473:
--
Labels: pull-request-available  (was: )

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8317) [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor build to fail

2020-04-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8317:
---

 Summary: [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor 
build to fail
 Key: ARROW-8317
 URL: https://issues.apache.org/jira/browse/ARROW-8317
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.17.0


This started occurring in the last few hours since grpc-cpp 1.28.0 update was 
just merged on conda-forge

https://ci.appveyor.com/project/wesm/arrow/build/job/8oe0n4epkxegr21x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment

2020-04-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8318:
-

 Summary: [C++][Dataset] Dataset should instantiate Fragment
 Key: ARROW-8318
 URL: https://issues.apache.org/jira/browse/ARROW-8318
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


Fragments are created on the fly when invoking a Scan. This means that a lot of 
the auxilliary/ancilliary data must be stored by the specialised Dataset, e.g. 
the FileSystemDataset must hold the path and partition expression. With the 
venue of more complex Fragment, e.g. ParquetFileFragment, more data must be 
stored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8319) [CI] Install thrift compiler in the debian build

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8319:
--

 Summary: [CI] Install thrift compiler in the debian build
 Key: ARROW-8319
 URL: https://issues.apache.org/jira/browse/ARROW-8319
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.17.0


CMake is missing thrift compiler after setting Thrift_SOURCE to empty from 
AUTO, 
see build: 
https://github.com/apache/arrow/runs/555631125?check_suite_focus=true#step:6:143



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8320) [Documentation][Format] Clarify (lack of) alignment requirements in C data interface

2020-04-02 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8320:

Description: This document should clarify that memory buffers need not 
start on aligned (8-byte or otherwise) pointer offsets.   (was: This document 
should clarify that memory buffers need not start on aligned pointer offsets. )

> [Documentation][Format] Clarify (lack of) alignment requirements in C data 
> interface
> 
>
> Key: ARROW-8320
> URL: https://issues.apache.org/jira/browse/ARROW-8320
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> This document should clarify that memory buffers need not start on aligned 
> (8-byte or otherwise) pointer offsets. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8321) [CI] Use bundled thrift in Fedora 30 build

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8321:
--
Labels: pull-request-available  (was: )

> [CI] Use bundled thrift in Fedora 30 build
> --
>
> Key: ARROW-8321
> URL: https://issues.apache.org/jira/browse/ARROW-8321
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Affects Versions: 0.17.0
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> After unsetting Thrift_SOURCE from AUTO it surfaced that the thrift available 
> on Fedora 30 is older 0.10 than the minimal required version 0.11.
> Build thrift_ep instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8320) [Documentation][Format] Clarify (lack of) alignment requirements in C data interface

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073958#comment-17073958
 ] 

Wes McKinney commented on ARROW-8320:
-

An embedded question here is what kinds of issues might occur in our C++ 
library if fed memory that is not word-aligned. Probably the documentation 
should _recommend_ using aligned memory but not _require_ it

> [Documentation][Format] Clarify (lack of) alignment requirements in C data 
> interface
> 
>
> Key: ARROW-8320
> URL: https://issues.apache.org/jira/browse/ARROW-8320
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>
> This document should clarify that memory buffers need not start on aligned 
> (8-byte or otherwise) pointer offsets. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8322) [CI] Fix C# workflow file syntax

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8322:
--

 Summary: [CI] Fix C# workflow file syntax
 Key: ARROW-8322
 URL: https://issues.apache.org/jira/browse/ARROW-8322
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Reporter: Krisztian Szucs


The github actions expression requires the enclosing "${{ }}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8322) [CI] Fix C# workflow file syntax

2020-04-02 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-8322:
--

Assignee: Krisztian Szucs

> [CI] Fix C# workflow file syntax
> 
>
> Key: ARROW-8322
> URL: https://issues.apache.org/jira/browse/ARROW-8322
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The github actions expression requires the enclosing "${{ }}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8322) [CI] Fix C# workflow file syntax

2020-04-02 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-8322.

Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6815
[https://github.com/apache/arrow/pull/6815]

> [CI] Fix C# workflow file syntax
> 
>
> Key: ARROW-8322
> URL: https://issues.apache.org/jira/browse/ARROW-8322
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The github actions expression requires the enclosing "${{ }}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073964#comment-17073964
 ] 

Wes McKinney edited comment on ARROW-8307 at 4/2/20, 6:07 PM:
--

It is on by default. I just wanted to expose it as an option to toggle it off 
(I needed this feature for my benchmarks, for example)


was (Author: wesmckinn):
It is on by default. I just wanted to expose it as an option to toggle it off

> [Python] Expose use_memory_map option in pyarrow.feather APIs
> -
>
> Key: ARROW-8307
> URL: https://issues.apache.org/jira/browse/ARROW-8307
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073964#comment-17073964
 ] 

Wes McKinney commented on ARROW-8307:
-

It is on by default. I just wanted to expose it as an option to toggle it off

> [Python] Expose use_memory_map option in pyarrow.feather APIs
> -
>
> Key: ARROW-8307
> URL: https://issues.apache.org/jira/browse/ARROW-8307
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6830) [R] Add col_select argument to read_ipc_stream

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6830:
---
Summary: [R] Add col_select argument to read_ipc_stream  (was: [R] Select 
Subset of Columns in read_arrow)

> [R] Add col_select argument to read_ipc_stream
> --
>
> Key: ARROW-6830
> URL: https://issues.apache.org/jira/browse/ARROW-6830
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Anthony Abate
>Priority: Minor
>
> *Note:*  Not sure if this is a limitation of the R library or the underlying 
> C++ code:
> I have a ~30 gig arrow file with almost 1000 columns - it has 12,000 record 
> batches of varying row sizes
> 1. Is it possible at to use *read_arrow* to filter out columns?  (similar to 
> how *read_feather* has a (col_select =... )
> 2. Or is it possible using *RecordBatchFileReader* to filter columns?
>  
> The only thing I seem to be able to do (please confirm if this is my only 
> option) is loop over all record batches, select a single column at a time, 
> and construct the data I need to pull out manually.  ie like the following:
> {code:java}
> for(i in 0:data_rbfr$num_record_batches) {
> rbn <- data_rbfr$get_batch(i)
>   
>   if (i == 0) 
>   {
> merged <- as.data.frame(rbn$column(5)$as_vector())
>   }
>   else 
>   {
> dfn <- as.data.frame(rbn$column(5)$as_vector())
> merged <- rbind(merged,dfn)
>   }
> 
>   print(paste(i, nrow(merged)))
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7520) [R] Writing many batches causes a crash

2020-04-02 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-7520.

Fix Version/s: 0.17.0
 Assignee: Neal Richardson
   Resolution: Fixed

This has been addressed in ARROW-5501; RecordBatch*Writer now requires that you 
pass an {{OutputStream}} so that you can manage the file connection. The 
previously supported behavior would let you open connections you couldn't close.

> [R] Writing many batches causes a crash
> ---
>
> Key: ARROW-7520
> URL: https://issues.apache.org/jira/browse/ARROW-7520
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.15.1
> Environment: - Session info 
> ---
> setting  value  
>  version  R version 3.6.1 (2019-07-05)
> os   Windows 10 x64  
>  system   x86_64, mingw32
>  ui   RStudio
>  language (EN)   
>  collate  English_United States.1252 
>  ctype    English_United States.1252 
>  tz   America/New_York   
>  date 2020-01-08 
>  
> - Packages 
> ---
> ! package  * version date   lib source    
>  
>    acepack    1.4.1   2016-10-29 [1] CRAN (R 3.6.1)   
>  
>    arrow    * 0.15.1.1    2019-11-05 [1] CRAN (R 3.6.2)   
>  
>    askpass    1.1 2019-01-13 [1] CRAN (R 3.6.1)   
>   
>    assertthat 0.2.1   2019-03-21 [1] CRAN (R 3.6.1)   
>  
>    backports  1.1.5   2019-10-02 [1] CRAN (R 3.6.1)   
>  
>    base64enc  0.1-3   2015-07-28 [1] CRAN (R 3.6.0)       
>   
>    bit    1.1-14  2018-05-29 [1] CRAN (R 3.6.0)   
>  
>    bit64  0.9-7   2017-05-08 [1] CRAN (R 3.6.0)   
>  
>    blob   1.2.0   2019-07-09 [1] CRAN (R 3.6.1)   
>   
>    callr  3.3.1   2019-07-18 [1] CRAN (R 3.6.1)   
>  
>    cellranger 1.1.0   2016-07-27 [1] CRAN (R 3.6.1)   
>  
>    checkmate  1.9.4   2019-07-04 [1] CRAN (R 3.6.1)   
>   
>    cli    1.1.0   2019-03-19 [1] CRAN (R 3.6.1)   
>  
>    cluster    2.1.0   2019-06-19 [2] CRAN (R 3.6.1)   
>   
>    codetools  0.2-16  2018-12-24 [2] CRAN (R 3.6.1)   
>  
>    colorspace 1.4-1   2019-03-18 [1] CRAN (R 3.6.1)   
>  
>    commonmark 1.7 2018-12-01 [1] CRAN (R 3.6.1)       
>   
>    crayon 1.3.4   2017-09-16 [1] CRAN (R 3.6.1)   
>  
>    credentials    1.1 2019-03-12 [1] CRAN (R 3.6.2)   
>  
>    curl * 4.2 2019-09-24 [1] CRAN (R 3.6.1)   
>   
>    data.table 1.12.2  2019-04-07 [1] CRAN (R 3.6.1)   
>  
>    DBI  * 1.0.0   2018-05-02 [1] CRAN (R 3.6.1)   
>  
>    desc   1.2.0   2018-05-01 [1] CRAN (R 3.6.1)   
>   
>    devtools * 2.2.0   2019-09-07 [1] CRAN (R 3.6.1)   
>  
>    digest 0.6.23  2019-11-23 [1] CRAN (R 3.6.1)   
>  
>    dplyr    * 0.8.3   2019-07-04 [1] CRAN (R 3.6.1)   
>   
>    DT 0.9 2019-09-17 [1] CRAN (R 3.6.1)   
>  
>    ellipsis   0.3.0   2019-09-20 [1] CRAN (R 3.6.1)   
>  
>    evaluate   0.14    2019-05-28 [1] CRAN (R 3.6.1)       
>   
>    foreign    0.8-71  2018-07-20 [2] CRAN (R 3.6.1)   
>  
>    Formula  * 1.2-3   2018-05-03 [1] CRAN (R 3.6.0)   
>  
>    fs 1.3.1   2019-05-06 [1] CRAN (R 3.6.1)   
>   
>    fst  * 0.9.0   2019-04-09 [1] CRAN (R 3.6.1)   
>  
>    future   * 1.15.0-9000 2019-11-19 [1] Github 
> (HenrikBengtsson/future@bc241c7)
>    ggplot2  * 3.2.1   2019-08-10 [1] CRAN (R 3.6.1)   
>   
>    globals    0.12.4  2018-10-11 [1] CRAN (R 3.6.0)   
>  
>    glue * 1.3.1   2019-03-12 [1] CRAN (R 3.6.1)  

[jira] [Created] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8323:
---

 Summary: [C++] Pin gRPC at v1.27 to avoid compilation error in its 
headers
 Key: ARROW-8323
 URL: https://issues.apache.org/jira/browse/ARROW-8323
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 0.17.0


[gRPC 1.28|https://github.com/grpc/grpc/releases/tag/v1.28.0] includes a change 
which introduces an implicit size_t->int conversion in proto_utils.h: 
https://github.com/grpc/grpc/commit/2748755a4ff9ed940356e78c105f55f839fdf38b

Conversion warnings are treated as errors for example here: 
https://ci.appveyor.com/project/BenjaminKietzman/arrow/build/job/9cl0vqa8e495knn3#L1126
So IIUC we need to pin gRPC to 1.27 for now.

Upstream PR: https://github.com/grpc/grpc/pull/22557



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8323:
--
Labels: pull-request-available  (was: )

> [C++] Pin gRPC at v1.27 to avoid compilation error in its headers
> -
>
> Key: ARROW-8323
> URL: https://issues.apache.org/jira/browse/ARROW-8323
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>
> [gRPC 1.28|https://github.com/grpc/grpc/releases/tag/v1.28.0] includes a 
> change which introduces an implicit size_t->int conversion in proto_utils.h: 
> https://github.com/grpc/grpc/commit/2748755a4ff9ed940356e78c105f55f839fdf38b
> Conversion warnings are treated as errors for example here: 
> https://ci.appveyor.com/project/BenjaminKietzman/arrow/build/job/9cl0vqa8e495knn3#L1126
> So IIUC we need to pin gRPC to 1.27 for now.
> Upstream PR: https://github.com/grpc/grpc/pull/22557



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-04-02 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-8244:
-

Assignee: Joris Van den Bossche

> [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" 
> metadata fields
> ---
>
> Key: ARROW-8244
> URL: https://issues.apache.org/jira/browse/ARROW-8244
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: Rick Zamora
>Assignee: Joris Van den Bossche
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been 
> using the `write_to_dataset` API to write partitioned parquet datasets.  This 
> PR is switching to a (hopefully temporary) custom solution, because that API 
> makes it difficult to populate the the "file_path"  column-chunk metadata 
> fields that are returned within the optional `metadata_collector` kwarg.  
> Dask needs to set these fields correctly in order to generate a proper global 
> `"_metadata"` file.
> Possible solutions to this problem:
>  # Optionally populate the file-path fields within `write_to_dataset`
>  # Always populate the file-path fields within `write_to_dataset`
>  # Return the file paths for the data written within `write_to_dataset` (up 
> to the user to manually populate the file-path fields)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)