[jira] [Commented] (ARROW-7954) [Packaging][Crossbow] travis-gandiva-jar-osx job fails to upload jar to github releases page

2020-02-26 Thread Prudhvi Porandla (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046243#comment-17046243
 ] 

Prudhvi Porandla commented on ARROW-7954:
-

[~kszucs], need your help with this. I couldn't figure out from the logs

Thanks.

> [Packaging][Crossbow] travis-gandiva-jar-osx job fails to upload jar to 
> github releases page
> 
>
> Key: ARROW-7954
> URL: https://issues.apache.org/jira/browse/ARROW-7954
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Prudhvi Porandla
>Assignee: Krisztian Szucs
>Priority: Major
>
> Lately the jar is present only in linux release and not in the osx one.
> nightly-2020-02-26-0-travis-gandiva-jar-* jobs succeeded 
> https://github.com/ursa-labs/crossbow/releases/tag/nightly-2020-02-26-0-travis-gandiva-jar-trusty
>   has the jar file but 
> [https://github.com/ursa-labs/crossbow/releases/tag/nightly-2020-02-26-0-travis-gandiva-jar-osx]
>  doesn't
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7954) [Packaging][Crossbow] travis-gandiva-jar-osx job fails to upload jar to github releases page

2020-02-26 Thread Prudhvi Porandla (Jira)
Prudhvi Porandla created ARROW-7954:
---

 Summary: [Packaging][Crossbow] travis-gandiva-jar-osx job fails to 
upload jar to github releases page
 Key: ARROW-7954
 URL: https://issues.apache.org/jira/browse/ARROW-7954
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Prudhvi Porandla
Assignee: Krisztian Szucs


Lately the jar is present only in linux release and not in the osx one.

nightly-2020-02-26-0-travis-gandiva-jar-* jobs succeeded 

https://github.com/ursa-labs/crossbow/releases/tag/nightly-2020-02-26-0-travis-gandiva-jar-trusty
  has the jar file but 
[https://github.com/ursa-labs/crossbow/releases/tag/nightly-2020-02-26-0-travis-gandiva-jar-osx]
 doesn't

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7953) pyarrow.LocalFileSystem did not implement the method delete()

2020-02-26 Thread Liang Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Zhang updated ARROW-7953:
---
Description: 
[https://github.com/apache/arrow/blob/master/python/pyarrow/filesystem.py#L51]

The abstract class FileSystem has a delete() method, but the LocalFileSystem 
doesn't implement it.

> pyarrow.LocalFileSystem did not implement the method delete()
> -
>
> Key: ARROW-7953
> URL: https://issues.apache.org/jira/browse/ARROW-7953
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Liang Zhang
>Priority: Major
>
> [https://github.com/apache/arrow/blob/master/python/pyarrow/filesystem.py#L51]
> The abstract class FileSystem has a delete() method, but the LocalFileSystem 
> doesn't implement it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7953) pyarrow.LocalFileSystem did not implement the method delete()

2020-02-26 Thread Liang Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Zhang updated ARROW-7953:
---
Summary: pyarrow.LocalFileSystem did not implement the method delete()  
(was: LocalFileSystem did not implement the method delete())

> pyarrow.LocalFileSystem did not implement the method delete()
> -
>
> Key: ARROW-7953
> URL: https://issues.apache.org/jira/browse/ARROW-7953
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Liang Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7953) LocalFileSystem did not implement the method delete()

2020-02-26 Thread Liang Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Zhang updated ARROW-7953:
---
Component/s: Python

> LocalFileSystem did not implement the method delete()
> -
>
> Key: ARROW-7953
> URL: https://issues.apache.org/jira/browse/ARROW-7953
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Liang Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7953) LocalFileSystem did not implement the method delete()

2020-02-26 Thread Liang Zhang (Jira)
Liang Zhang created ARROW-7953:
--

 Summary: LocalFileSystem did not implement the method delete()
 Key: ARROW-7953
 URL: https://issues.apache.org/jira/browse/ARROW-7953
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Liang Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7746) [Java] Support large buffer for IPC

2020-02-26 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046221#comment-17046221
 ] 

Micah Kornfield commented on ARROW-7746:


Why is an org.apache.arrow.flight.ArrowMessage necessary for IPC (I imagine it 
would be required for flight, but don't think it should be required for IPC).

> [Java] Support large buffer for IPC
> ---
>
> Key: ARROW-7746
> URL: https://issues.apache.org/jira/browse/ARROW-7746
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Liya Fan
>Priority: Major
>
> The motivation is described in 
> https://github.com/apache/arrow/pull/6323#issuecomment-580137629.
> When the size of the ArrowBuf exceeds 2GB, our flighing library does not work 
> due to integer overflow. 
> This is because internally, we have used some data structures which are based 
> on 32-bit integers. To resolve the problem, we must revise/replace the data 
> structures to make them support 64-bit integers. 
> As a concrete example, we can see that when the server sends data through 
> IPC, an org.apache.arrow.flight.ArrowMessage object is created, and is 
> wrapped as an InputStream through the `asInputStream` method. In this method, 
> we use data stuctures like java.io.ByteArrayOutputStream and 
> io.netty.buffer.ByteBuf, which are based on 32-bit integers (we can observe 
> that NettyArrowBuf#length and ByteArrayOutputStream#count are both 32-bit 
> integers). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046191#comment-17046191
 ] 

Micah Kornfield commented on ARROW-3247:


That is correct there are other open jiras on this and a discussion on the 
mailing list.

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7940) [C++] Unable to generate cmake build with settings other than default

2020-02-26 Thread Valery Vybornov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046164#comment-17046164
 ] 

Valery Vybornov commented on ARROW-7940:


Yes, I'll submit a PR.

> [C++] Unable to generate cmake build with settings other than default
> -
>
> Key: ARROW-7940
> URL: https://issues.apache.org/jira/browse/ARROW-7940
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
> Environment: Windows 10
> Visual Studio 2019 Build Tools 16.4.5
>Reporter: Valery Vybornov
>Priority: Major
> Attachments: log.txt
>
>
> Steps to reproduce:
>  # Install conda-forge as described here: 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
>  # Install ninja+clcache 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
>  # (git bash) git clone [https://github.com/apache/arrow.git]
> cd arrow/
> git checkout apache-arrow-0.16.0
>  # (cmd)
> call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
> call "C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
> call conda activate arrow-dev
>  # cd arrow\cpp
> mkdir build
> cd build
>  # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
> -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
>  # cmake --build . --config Release
> Expected results: Examples, utilities, flight, gandiva, parquet built.
> Actual results: Default configuration and none of the above built. 
> cmake_summary.json indicates all these features OFF. Following lines in the 
> output of cmake:
> {code:java}
> -- Configuring done
> You have changed variables that require your cache to be deleted.
> Configure will be re-run and you may have to reset some variables.
> The following variables have changed:
> CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
> Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
> -- Building using CMake version: 3.16.4 {code}
> Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7625) [GLib] Parquet GLib and Red Parquet (Ruby) do not allow specifying compression type

2020-02-26 Thread Yosuke Shiro (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro resolved ARROW-7625.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6336
[https://github.com/apache/arrow/pull/6336]

> [GLib] Parquet GLib and Red Parquet (Ruby) do not allow specifying 
> compression type
> ---
>
> Key: ARROW-7625
> URL: https://issues.apache.org/jira/browse/ARROW-7625
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
> Environment: red-arrow 0.15.1
> red-parquet 0.15.1
> libarrow 0.15.1
> libparquet 0.15.1
>Reporter: Keith Gable
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> It seems that the ArrowFileWriter being used by parquet-glib just uses the 
> default writer properties 
> ([https://github.com/apache/arrow/blob/master/c_glib/parquet-glib/arrow-file-writer.cpp#L184),]
>  and does not offer the user the ability to override this. As a consumer of 
> the GLib API in Ruby (red-parquet), I therefore have no way of compressing 
> Parquet columns. Of course, I can compress the entire file by doing something 
> like {{t.save('...', format: 'parquet', compression: 'GZIP')}}, but this is 
> not compatible with most tools and isn't the correct way of compressing a 
> Parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1571) [C++] Implement argsort kernels (sort indices) for integers using O(n) counting sort

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1571:
--
Labels: Analytics pull-request-available  (was: Analytics)

> [C++] Implement argsort kernels (sort indices) for integers using O(n) 
> counting sort
> 
>
> Key: ARROW-1571
> URL: https://issues.apache.org/jira/browse/ARROW-1571
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Yibo Cai
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 2.0.0
>
> Attachments: e5-2650.png
>
>
> This function requires knowledge of the minimum and maximum of an array. If 
> it is small enough, then an array of size {{maximum - minimum}} can be 
> constructed and used to tabulate value frequencies and then compute the sort 
> indices (this is called "grade up" or "grade down" in APL languages). There 
> is generally a cross-over point where this function performs worse than 
> mergesort or quicksort due to data locality issues



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-7952) [C++][Parquet] Error when failing to read original Arrow schema from Parquet metadata

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-7952.
---
Resolution: Not A Problem

Started investigating and answered my own question

https://github.com/apache/arrow/commit/4fe330aa4ed4564c9502733e25fc2b762e1002bf

The base64-encoding of the metadata was implemented after this file was 
generated -- the non-base64-encoded version was never released, so this old 
file should simply be overwritten

> [C++][Parquet] Error when failing to read original Arrow schema from Parquet 
> metadata
> -
>
> Key: ARROW-7952
> URL: https://issues.apache.org/jira/browse/ARROW-7952
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>
> I experienced the following failure
> {code}
> ~/code/arrow/python/pyarrow/_parquet.pyx in 
> pyarrow._parquet.ParquetReader.open()
> ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Tried reading schema message, was null or length 0
> In ../src/parquet/arrow/reader_internal.cc, line 596, code: 
> ::arrow::ipc::ReadSchema(, _memo, out)
> In ../src/parquet/arrow/reader_internal.cc, line 672, code: 
> GetOriginSchema(metadata, >schema_metadata, 
> >origin_schema)
> {code}
> when reading the following file
> https://github.com/wesm/vldb-2019-apache-arrow-workshop/raw/1e9cf24bd6b8ae03e419e15ebc78b2e8135b8e7a/fec-2012.parquet
> I don't know whether this file is malformed (it was generated from a 
> development version of Arrow), so this may not actually be a problem, but 
> this mode of failure was unexpected and so I would like to understand why it 
> happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7952) [C++][Parquet] Error when failing to read original Arrow schema from Parquet metadata

2020-02-26 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7952:
---

 Summary: [C++][Parquet] Error when failing to read original Arrow 
schema from Parquet metadata
 Key: ARROW-7952
 URL: https://issues.apache.org/jira/browse/ARROW-7952
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Wes McKinney


I experienced the following failure

{code}
~/code/arrow/python/pyarrow/_parquet.pyx in 
pyarrow._parquet.ParquetReader.open()
~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Tried reading schema message, was null or length 0
In ../src/parquet/arrow/reader_internal.cc, line 596, code: 
::arrow::ipc::ReadSchema(, _memo, out)
In ../src/parquet/arrow/reader_internal.cc, line 672, code: 
GetOriginSchema(metadata, >schema_metadata, >origin_schema)
{code}

when reading the following file

https://github.com/wesm/vldb-2019-apache-arrow-workshop/raw/1e9cf24bd6b8ae03e419e15ebc78b2e8135b8e7a/fec-2012.parquet

I don't know whether this file is malformed (it was generated from a 
development version of Arrow), so this may not actually be a problem, but this 
mode of failure was unexpected and so I would like to understand why it happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7951) [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045996#comment-17045996
 ] 

Antoine Pitrou commented on ARROW-7951:
---

+1. At the minimum pyarrow should be able to read Byte-stream-split encoded 
columns and display their encoding properly (add the required enum field 
conversion).

Also cc [~jorisvandenbossche]

> [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow
> -
>
> Key: ARROW-7951
> URL: https://issues.apache.org/jira/browse/ARROW-7951
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Radev
>Assignee: Martin Radev
>Priority: Minor
>  Labels: parquet
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Parquet writer now supports the option of selecting the 
> BYTE_STREAMS_SPLIT encoding. It could be nice to have it exposed in pyarrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7951) [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7951:

Summary: [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow  (was: 
Expose BYTE_STREAM_SPLIT to pyarrow)

> [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow
> -
>
> Key: ARROW-7951
> URL: https://issues.apache.org/jira/browse/ARROW-7951
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Martin Radev
>Assignee: Martin Radev
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Parquet writer now supports the option of selecting the 
> BYTE_STREAMS_SPLIT encoding. It could be nice to have it exposed in pyarrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7951) [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7951:

Component/s: Python

> [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow
> -
>
> Key: ARROW-7951
> URL: https://issues.apache.org/jira/browse/ARROW-7951
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Radev
>Assignee: Martin Radev
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Parquet writer now supports the option of selecting the 
> BYTE_STREAMS_SPLIT encoding. It could be nice to have it exposed in pyarrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7951) [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7951:

Labels: parquet  (was: )

> [Python][Parquet] Expose BYTE_STREAM_SPLIT to pyarrow
> -
>
> Key: ARROW-7951
> URL: https://issues.apache.org/jira/browse/ARROW-7951
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Radev
>Assignee: Martin Radev
>Priority: Minor
>  Labels: parquet
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Parquet writer now supports the option of selecting the 
> BYTE_STREAMS_SPLIT encoding. It could be nice to have it exposed in pyarrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7951) Expose BYTE_STREAM_SPLIT to pyarrow

2020-02-26 Thread Martin Radev (Jira)
Martin Radev created ARROW-7951:
---

 Summary: Expose BYTE_STREAM_SPLIT to pyarrow
 Key: ARROW-7951
 URL: https://issues.apache.org/jira/browse/ARROW-7951
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Martin Radev
Assignee: Martin Radev


The Parquet writer now supports the option of selecting the BYTE_STREAMS_SPLIT 
encoding. It could be nice to have it exposed in pyarrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7926) [Developer] "archery lint" target is not ergonomic for running a single check like IWYU

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045988#comment-17045988
 ] 

Antoine Pitrou commented on ARROW-7926:
---

Can't numpydoc use a config file?

> [Developer] "archery lint" target is not ergonomic for running a single check 
> like IWYU
> ---
>
> Key: ARROW-7926
> URL: https://issues.apache.org/jira/browse/ARROW-7926
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It might be useful to have a second lint CLI target with everything disabled 
> by default so that a single lint target can be toggled on. How should this be 
> used via docker-compose? See ARROW-7925



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7926) [Developer] "archery lint" target is not ergonomic for running a single check like IWYU

2020-02-26 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045985#comment-17045985
 ] 

Krisztian Szucs commented on ARROW-7926:


That was one of the reasons, but it often useful to pass additional options and 
arguments to the numpydoc validation, at least until we have no violations.

> [Developer] "archery lint" target is not ergonomic for running a single check 
> like IWYU
> ---
>
> Key: ARROW-7926
> URL: https://issues.apache.org/jira/browse/ARROW-7926
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It might be useful to have a second lint CLI target with everything disabled 
> by default so that a single lint target can be toggled on. How should this be 
> used via docker-compose? See ARROW-7925



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7940) [C++] Unable to generate cmake build with settings other than default

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045983#comment-17045983
 ] 

Antoine Pitrou commented on ARROW-7940:
---

[~vvybornov] I agree with your suggestions. Do you want to submit a PR?

> [C++] Unable to generate cmake build with settings other than default
> -
>
> Key: ARROW-7940
> URL: https://issues.apache.org/jira/browse/ARROW-7940
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
> Environment: Windows 10
> Visual Studio 2019 Build Tools 16.4.5
>Reporter: Valery Vybornov
>Priority: Major
> Attachments: log.txt
>
>
> Steps to reproduce:
>  # Install conda-forge as described here: 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
>  # Install ninja+clcache 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
>  # (git bash) git clone [https://github.com/apache/arrow.git]
> cd arrow/
> git checkout apache-arrow-0.16.0
>  # (cmd)
> call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
> call "C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
> call conda activate arrow-dev
>  # cd arrow\cpp
> mkdir build
> cd build
>  # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
> -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
>  # cmake --build . --config Release
> Expected results: Examples, utilities, flight, gandiva, parquet built.
> Actual results: Default configuration and none of the above built. 
> cmake_summary.json indicates all these features OFF. Following lines in the 
> output of cmake:
> {code:java}
> -- Configuring done
> You have changed variables that require your cache to be deleted.
> Configure will be re-run and you may have to reset some variables.
> The following variables have changed:
> CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
> Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
> -- Building using CMake version: 3.16.4 {code}
> Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7949) [Developer] Update to '.gitignore' to not track user specific 'cpp/Brewfile.lock.json' file

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7949:

Summary: [Developer] Update to '.gitignore' to not track user specific  
'cpp/Brewfile.lock.json' file  (was: Update to '.gitignore' to not track user 
specific  'cpp/Brewfile.lock.json' file)

> [Developer] Update to '.gitignore' to not track user specific  
> 'cpp/Brewfile.lock.json' file
> 
>
> Key: ARROW-7949
> URL: https://issues.apache.org/jira/browse/ARROW-7949
> Project: Apache Arrow
>  Issue Type: Improvement
> Environment: macOS-10.15.3
>Reporter: Tarek Allam
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the developer guides for Python, there is a suggestion for users on macOS 
> to use Homebrew to install all dependencies required for building Arrow C++. 
> This creates a 'cpp/Brewfile.lock.json' file is specific to the system it 
> sits on.
> It would be desirable for this not to be tracked by version control. To 
> prevent this accidental addition, perhaps it should be ignored in the 
> gitignore file for the respository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7940) [C++] Unable to generate cmake build with settings other than default

2020-02-26 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045974#comment-17045974
 ] 

Wes McKinney commented on ARROW-7940:
-

I see. I think [~apitrou] set up the clcache stuff originally, so I will defer 
to his judgment

> [C++] Unable to generate cmake build with settings other than default
> -
>
> Key: ARROW-7940
> URL: https://issues.apache.org/jira/browse/ARROW-7940
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
> Environment: Windows 10
> Visual Studio 2019 Build Tools 16.4.5
>Reporter: Valery Vybornov
>Priority: Major
> Attachments: log.txt
>
>
> Steps to reproduce:
>  # Install conda-forge as described here: 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
>  # Install ninja+clcache 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
>  # (git bash) git clone [https://github.com/apache/arrow.git]
> cd arrow/
> git checkout apache-arrow-0.16.0
>  # (cmd)
> call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
> call "C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
> call conda activate arrow-dev
>  # cd arrow\cpp
> mkdir build
> cd build
>  # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
> -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
>  # cmake --build . --config Release
> Expected results: Examples, utilities, flight, gandiva, parquet built.
> Actual results: Default configuration and none of the above built. 
> cmake_summary.json indicates all these features OFF. Following lines in the 
> output of cmake:
> {code:java}
> -- Configuring done
> You have changed variables that require your cache to be deleted.
> Configure will be re-run and you may have to reset some variables.
> The following variables have changed:
> CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
> Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
> -- Building using CMake version: 3.16.4 {code}
> Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7950) [Python] When initializing pandas API shim, inform user if their installed pandas version is too old

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7950:

Summary: [Python] When initializing pandas API shim, inform user if their 
installed pandas version is too old  (was: [Python)

> [Python] When initializing pandas API shim, inform user if their installed 
> pandas version is too old
> 
>
> Key: ARROW-7950
> URL: https://issues.apache.org/jira/browse/ARROW-7950
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7950) [Python

2020-02-26 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7950:
---

 Summary: [Python
 Key: ARROW-7950
 URL: https://issues.apache.org/jira/browse/ARROW-7950
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Wes McKinney






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045913#comment-17045913
 ] 

Brian Hulette commented on ARROW-3247:
--

I think the Arrow parquet reader just doesn't handle maps of any sort. There 
don't seem to be any tests for it. It looks like we will run into a 
list-of-structs and crash here: 
[https://github.com/apache/arrow/blob/b557587f4f7c8b547fea45dc98b9182f3f5e9bf7/cpp/src/parquet/arrow/reader.cc#L717-L722]

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Martin Durant (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045910#comment-17045910
 ] 

Martin Durant commented on ARROW-3247:
--

They are allowed by the parquet spec, but not the simplest way to express 
arrays/maps. This is the structure that spark decided to go with, perhaps 
because the extra level of nesting allows both NULL and empty map for a given 
entry.

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045908#comment-17045908
 ] 

Brian Hulette commented on ARROW-3247:
--

[~mdurant] are these types different from the spec defined at 
[https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps]?

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045896#comment-17045896
 ] 

Brian Hulette commented on ARROW-3247:
--

I added a code block around the spec so we don't need to switch to text mode.

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-3247:
-
Description: 
As far I understand, there is already some support for nested 
array/dict/structs in arrow. However, spark Map and List types are structured 
one level deeper (I believe to allow for both NULL and empty entries). 
Surprisingly, fastparquet can load these. I do not know the plan for arbitrary 
nested object support, but it should be made clear.

Schema of spark-generated file from the fastparquet test suite:
{code:java}
 - spark_schema:
| - map_op_op: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_op_req: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - map_req_op: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_req_req: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - arr_op_op: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
| - arr_op_req: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, REQUIRED
| - arr_req_op: LIST, REQUIRED
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
  - arr_req_req: LIST, REQUIRED
- list: REPEATED
  - element: BYTE_ARRAY, UTF8, REQUIRED
{code}
(please forgive that some of this has already been mentioned elsewhere; this is 
one of the entries in the list at 
[https://github.com/dask/fastparquet/issues/374] as a feature that is useful in 
fastparquet)

  was:
As far I understand, there is already some support for nested 
array/dict/structs in arrow. However, spark Map and List types are structured 
one level deeper (I believe to allow for both NULL and empty entries). 
Surprisingly, fastparquet can load these. I do not know the plan for arbitrary 
nested object support, but it should be made clear.

Schema of spark-generated file from the fastparquet test suite (please see in 
text mode):

 - spark_schema:
| - map_op_op: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_op_req: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - map_req_op: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_req_req: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - arr_op_op: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
| - arr_op_req: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, REQUIRED
| - arr_req_op: LIST, REQUIRED
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
  - arr_req_req: LIST, REQUIRED
- list: REPEATED
  - element: BYTE_ARRAY, UTF8, REQUIRED

(please forgive that some of this has already been mentioned elsewhere; this is 
one of the entries in the list at 
https://github.com/dask/fastparquet/issues/374 as a feature that is useful in 
fastparquet)


> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, 

[jira] [Commented] (ARROW-1957) [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit

2020-02-26 Thread Daniel Nugent (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045894#comment-17045894
 ] 

Daniel Nugent commented on ARROW-1957:
--

Ah, my apologies. I see where my error was. I failed to specify the 2.0 parquet 
file version.

> [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
> 
>
> Key: ARROW-1957
> URL: https://issues.apache.org/jira/browse/ARROW-1957
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Python 3.6.4.  Mac OSX and CentOS Linux release 
> 7.3.1611.  Pandas 0.21.1 .
>Reporter: Jordan Samuels
>Assignee: TP Boudreau
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> The following code
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> n=3
> df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', 
> freq='1n', periods=n))
> pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code}
> results in:
> {{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> 14832288001}}
> The desired effect is that we can save nanosecond resolution without losing 
> precision (e.g. conversion to ms).  Note that if {{freq='1u'}} is used, the 
> code runs properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7789) [R] Unknown error when using arrow::write_feather()  in R 3.5.3

2020-02-26 Thread Karl Dunkle Werner (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045843#comment-17045843
 ] 

Karl Dunkle Werner commented on ARROW-7789:
---

Reading hits the same issues.

> [R] Unknown error when using arrow::write_feather()  in R 3.5.3
> ---
>
> Key: ARROW-7789
> URL: https://issues.apache.org/jira/browse/ARROW-7789
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Martin
>Priority: Minor
>
> Unknown error when using arrow::write_feather()  in R 3.5.3
> pb = as.data.frame(seq(1:100))
> pbFilename <- file.path(getwd(), "reproduceBug.feather")
>  arrow::write_feather(x = pb, sink = pbFilename)
> >Error in exists(name, envir = envir, inherits = FALSE) : 
>  > use of NULL environment is defunct
>  
> packageVersion('arrow')
> [1] ‘0.15.1.1’



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7940) [C++] Unable to generate cmake build with settings other than default

2020-02-26 Thread Valery Vybornov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045842#comment-17045842
 ] 

Valery Vybornov commented on ARROW-7940:


The issue is caused by the following clcache configuration code in 
cpp/CMakeLists.txt:
{code:java}
if(MSVC
   AND ARROW_USE_CLCACHE
   AND (("${CMAKE_GENERATOR}" STREQUAL "NMake Makefiles")
OR ("${CMAKE_GENERATOR}" STREQUAL "Ninja")))
  find_program(CLCACHE_FOUND clcache)
  if(CLCACHE_FOUND)
set(CMAKE_CXX_COMPILER ${CLCACHE_FOUND})
  endif(CLCACHE_FOUND)
endif(){code}
Note that setting CMAKE_CXX_COMPILER at this point violates usage limitations 
for the setting, as stated here [https://cmake.org/Bug/view.php?id=14841]

I'd propose the following:
 # Removing this piece of code entirely
 # Documenting that clcache should be explicitly specified via command line or 
environment: 

{code:java}
cmake -DCMAKE_C_COMPILER=clcache -DCMAKE_CXX_COMPILER=clcache ... {code}
or, alternatively

{code:java}
set CC=clcache
set CXX=clcache
cmake ... {code}

 

> [C++] Unable to generate cmake build with settings other than default
> -
>
> Key: ARROW-7940
> URL: https://issues.apache.org/jira/browse/ARROW-7940
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
> Environment: Windows 10
> Visual Studio 2019 Build Tools 16.4.5
>Reporter: Valery Vybornov
>Priority: Major
> Attachments: log.txt
>
>
> Steps to reproduce:
>  # Install conda-forge as described here: 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
>  # Install ninja+clcache 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
>  # (git bash) git clone [https://github.com/apache/arrow.git]
> cd arrow/
> git checkout apache-arrow-0.16.0
>  # (cmd)
> call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
> call "C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
> call conda activate arrow-dev
>  # cd arrow\cpp
> mkdir build
> cd build
>  # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
> -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
>  # cmake --build . --config Release
> Expected results: Examples, utilities, flight, gandiva, parquet built.
> Actual results: Default configuration and none of the above built. 
> cmake_summary.json indicates all these features OFF. Following lines in the 
> output of cmake:
> {code:java}
> -- Configuring done
> You have changed variables that require your cache to be deleted.
> Configure will be re-run and you may have to reset some variables.
> The following variables have changed:
> CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
> Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
> -- Building using CMake version: 3.16.4 {code}
> Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7789) [R] Unknown error when using arrow::write_feather()  in R 3.5.3

2020-02-26 Thread Karl Dunkle Werner (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045840#comment-17045840
 ] 

Karl Dunkle Werner commented on ARROW-7789:
---

I'm getting the same error when the R.oo package is loaded (not even attached). 
Here's a reprex:

 
{code:r}
loadNamespace("R.oo")
#> 
arrow::write_parquet(mtcars, tempfile())
#> Error in exists(name, envir = envir, inherits = FALSE): use of NULL 
environment is defunct


sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 19.10

#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.7.so

#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
#>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
#>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
#>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats graphics  grDevices utils datasets  methods   base 

#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.0.0  bit_1.1-15.2  compiler_3.6.1magrittr_1.5 
#>  [5] assertthat_0.2.1  R6_2.4.1  glue_1.3.1Rcpp_1.0.3   
#>  [9] bit64_0.9-7   vctrs_0.2.3   R.methodsS3_1.8.0 arrow_0.16.0.2   
#> [13] rlang_0.4.4   R.oo_1.23.0   purrr_0.3.3  

{code}

> [R] Unknown error when using arrow::write_feather()  in R 3.5.3
> ---
>
> Key: ARROW-7789
> URL: https://issues.apache.org/jira/browse/ARROW-7789
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Martin
>Priority: Minor
>
> Unknown error when using arrow::write_feather()  in R 3.5.3
> pb = as.data.frame(seq(1:100))
> pbFilename <- file.path(getwd(), "reproduceBug.feather")
>  arrow::write_feather(x = pb, sink = pbFilename)
> >Error in exists(name, envir = envir, inherits = FALSE) : 
>  > use of NULL environment is defunct
>  
> packageVersion('arrow')
> [1] ‘0.15.1.1’



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-6766.
-
Resolution: Cannot Reproduce

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045812#comment-17045812
 ] 

Uwe Korn commented on ARROW-6766:
-

Thanks for revistiting this!

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7949) Update to '.gitignore' to not track user specific 'cpp/Brewfile.lock.json' file

2020-02-26 Thread Tarek Allam (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-7949:
---
Labels: pull-request-available  (was: )

> Update to '.gitignore' to not track user specific  'cpp/Brewfile.lock.json' 
> file
> 
>
> Key: ARROW-7949
> URL: https://issues.apache.org/jira/browse/ARROW-7949
> Project: Apache Arrow
>  Issue Type: Improvement
> Environment: macOS-10.15.3
>Reporter: Tarek Allam
>Priority: Trivial
>  Labels: pull-request-available
>
> In the developer guides for Python, there is a suggestion for users on macOS 
> to use Homebrew to install all dependencies required for building Arrow C++. 
> This creates a 'cpp/Brewfile.lock.json' file is specific to the system it 
> sits on.
> It would be desirable for this not to be tracked by version control. To 
> prevent this accidental addition, perhaps it should be ignored in the 
> gitignore file for the respository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7949) Update to '.gitignore' to not track user specific 'cpp/Brewfile.lock.json' file

2020-02-26 Thread Tarek Allam (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-7949:
---
Summary: Update to '.gitignore' to not track user specific  
'cpp/Brewfile.lock.json' file  (was: [Python] Update to '.gitignore' to not 
track user specific  'cpp/Brewfile.lock.json' file)

> Update to '.gitignore' to not track user specific  'cpp/Brewfile.lock.json' 
> file
> 
>
> Key: ARROW-7949
> URL: https://issues.apache.org/jira/browse/ARROW-7949
> Project: Apache Arrow
>  Issue Type: Improvement
> Environment: macOS-10.15.3
>Reporter: Tarek Allam
>Priority: Trivial
>
> In the developer guides for Python, there is a suggestion for users on macOS 
> to use Homebrew to install all dependencies required for building Arrow C++. 
> This creates a 'cpp/Brewfile.lock.json' file is specific to the system it 
> sits on.
> It would be desirable for this not to be tracked by version control. To 
> prevent this accidental addition, perhaps it should be ignored in the 
> gitignore file for the respository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7949) [Python] Update to '.gitignore' to not track user specific 'cpp/Brewfile.lock.json' file

2020-02-26 Thread Tarek Allam (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-7949:
---
Summary: [Python] Update to '.gitignore' to not track user specific  
'cpp/Brewfile.lock.json' file  (was: Update to '.gitignore' to not track user 
specific  'cpp/Brewfile.lock.json' file)

> [Python] Update to '.gitignore' to not track user specific  
> 'cpp/Brewfile.lock.json' file
> -
>
> Key: ARROW-7949
> URL: https://issues.apache.org/jira/browse/ARROW-7949
> Project: Apache Arrow
>  Issue Type: Improvement
> Environment: macOS-10.15.3
>Reporter: Tarek Allam
>Priority: Trivial
>
> In the developer guides for Python, there is a suggestion for users on macOS 
> to use Homebrew to install all dependencies required for building Arrow C++. 
> This creates a 'cpp/Brewfile.lock.json' file is specific to the system it 
> sits on.
> It would be desirable for this not to be tracked by version control. To 
> prevent this accidental addition, perhaps it should be ignored in the 
> gitignore file for the respository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Tarek Allam (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045733#comment-17045733
 ] 

Tarek Allam edited comment on ARROW-6766 at 2/26/20 5:56 PM:
-

After being away ill for some time I came back to revisit this, and now on 
`apache-arrow-0.16.0-134-g76db492c3` this seems to build fine.

I double checked with the version stated previously i.e. `apache-arrow-0.14.0` 
and that now mysteriously is fine. I am thinking to close this issue as in the 
time I was away it seems to be fixed now.


was (Author: tallamjr):
After being away ill for some time I came back to revisit this, and now on 
`apache-arrow-0.16.0-134-g76db492c3` this seems to build fine.

It still fails with the above errors for the version stated previously i.e. 
`apache-arrow-0.14.0` but thinking to close this issue as in the time I was 
away it seems to have been fixed.

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This 

[jira] [Commented] (ARROW-1957) [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit

2020-02-26 Thread TP Boudreau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045759#comment-17045759
 ] 

TP Boudreau commented on ARROW-1957:


I'm fairly confident this issue was resolved in the 0.14.x release.  I won't 
have a chance to look closer or attempt to reproduce this for a few days.  If a 
new issue is opened on this, please tag me.

> [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
> 
>
> Key: ARROW-1957
> URL: https://issues.apache.org/jira/browse/ARROW-1957
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Python 3.6.4.  Mac OSX and CentOS Linux release 
> 7.3.1611.  Pandas 0.21.1 .
>Reporter: Jordan Samuels
>Assignee: TP Boudreau
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> The following code
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> n=3
> df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', 
> freq='1n', periods=n))
> pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code}
> results in:
> {{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> 14832288001}}
> The desired effect is that we can save nanosecond resolution without losing 
> precision (e.g. conversion to ms).  Note that if {{freq='1u'}} is used, the 
> code runs properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7877) [Packaging] Fix crossbow deployment to github artifacts

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-7877.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6458
[https://github.com/apache/arrow/pull/6458]

> [Packaging] Fix crossbow deployment to github artifacts
> ---
>
> Key: ARROW-7877
> URL: https://issues.apache.org/jira/browse/ARROW-7877
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Developer Tools, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Our artifact deploying scripts have started to fail with network errors.
> We need to overcome this issue before cutting the next release, so marking as 
> blocker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7877) [Packaging] Fix crossbow deployment to github artifacts

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7877:

Component/s: Developer Tools

> [Packaging] Fix crossbow deployment to github artifacts
> ---
>
> Key: ARROW-7877
> URL: https://issues.apache.org/jira/browse/ARROW-7877
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Our artifact deploying scripts have started to fail with network errors.
> We need to overcome this issue before cutting the next release, so marking as 
> blocker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7877) [Packaging] Fix crossbow deployment to github artifacts

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7877:

Component/s: Packaging

> [Packaging] Fix crossbow deployment to github artifacts
> ---
>
> Key: ARROW-7877
> URL: https://issues.apache.org/jira/browse/ARROW-7877
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Developer Tools, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Our artifact deploying scripts have started to fail with network errors.
> We need to overcome this issue before cutting the next release, so marking as 
> blocker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7949) Update to '.gitignore' to not track user specific 'cpp/Brewfile.lock.json' file

2020-02-26 Thread Tarek Allam (Jira)
Tarek Allam created ARROW-7949:
--

 Summary: Update to '.gitignore' to not track user specific  
'cpp/Brewfile.lock.json' file
 Key: ARROW-7949
 URL: https://issues.apache.org/jira/browse/ARROW-7949
 Project: Apache Arrow
  Issue Type: Improvement
 Environment: macOS-10.15.3
Reporter: Tarek Allam


In the developer guides for Python, there is a suggestion for users on macOS to 
use Homebrew to install all dependencies required for building Arrow C++. This 
creates a 'cpp/Brewfile.lock.json' file is specific to the system it sits on.

It would be desirable for this not to be tracked by version control. To prevent 
this accidental addition, perhaps it should be ignored in the gitignore file 
for the respository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Tarek Allam (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045733#comment-17045733
 ] 

Tarek Allam commented on ARROW-6766:


After being away ill for some time I came back to revisit this, and now on 
`apache-arrow-0.16.0-134-g76db492c3` this seems to build fine.

It still fails with the above errors for the version stated previously i.e. 
`apache-arrow-0.14.0` but thinking to close this issue as in the time I was 
away it seems to have been fixed.

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7749) [C++] Link some tests together

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7749:
--
Labels: pull-request-available  (was: )

> [C++] Link some tests together
> --
>
> Key: ARROW-7749
> URL: https://issues.apache.org/jira/browse/ARROW-7749
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> With unity builds (ARROW-7725) it may become more beneficial to reduce the 
> number of test executables, as several C++ files could be compiled together 
> so as to reduce build times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-1636) [Format] Integration tests for null type

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-1636.
---
Resolution: Fixed

Issue resolved by pull request 6368
[https://github.com/apache/arrow/pull/6368]

> [Format] Integration tests for null type
> 
>
> Key: ARROW-1636
> URL: https://issues.apache.org/jira/browse/ARROW-1636
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Integration, Java
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This was not implemented on the C++ side, and came up in ARROW-1584. 
> Realistically arrays may be of null type, and we should be able to message 
> these correctly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7749) [C++] Link some tests together

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-7749:
-

Assignee: Antoine Pitrou

> [C++] Link some tests together
> --
>
> Key: ARROW-7749
> URL: https://issues.apache.org/jira/browse/ARROW-7749
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>
> With unity builds (ARROW-7725) it may become more beneficial to reduce the 
> number of test executables, as several C++ files could be compiled together 
> so as to reduce build times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7948) [Go][Integration] Decimal integration failures

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045694#comment-17045694
 ] 

Antoine Pitrou commented on ARROW-7948:
---

cc [~sbinet]

> [Go][Integration] Decimal integration failures
> --
>
> Key: ARROW-7948
> URL: https://issues.apache.org/jira/browse/ARROW-7948
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go, Integration
>Reporter: Antoine Pitrou
>Priority: Major
>
> If I un-skip decimal data for integrations tests with Go, I get some errors 
> such as:
> {code}
> ==
> Testing file /tmp/tmpkz1_ydgp/generated_decimal.json
> ==
> -- Creating binary inputs
> -- Validating file
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/util.py", line 130, in run_cmd
> output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
>   File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
> check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>   File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
> raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command 
> '['/opt/go/bin/arrow-json-integration-test', '-arrow', 
> '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
> '/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']' returned 
> non-zero exit status 1.
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/runner.py", line 193, in 
> _run_ipc_test_case
> run_binaries(producer, consumer, outcome, test_case)
>   File "/arrow/dev/archery/archery/integration/runner.py", line 219, in 
> _produce_consume
> consumer.validate(json_path, producer_file_path)
>   File "/arrow/dev/archery/archery/integration/tester_go.py", line 55, in 
> validate
> return self._run(arrow_path, json_path, 'VALIDATE')
>   File "/arrow/dev/archery/archery/integration/tester_go.py", line 52, in _run
> run_cmd(cmd)
>   File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
> raise RuntimeError(sio.getvalue())
> RuntimeError: Command failed: ['/opt/go/bin/arrow-json-integration-test', 
> '-arrow', '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', 
> '-json', '/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']
> With output:
> --
> arrow-json: could not open JSON file reader from file 
> "/tmp/tmpkz1_ydgp/generated_decimal.json": json: cannot unmarshal number into 
> Go struct field dataType.precision of type string
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7948) [Go][Integration] Decimal integration failures

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045688#comment-17045688
 ] 

Antoine Pitrou commented on ARROW-7948:
---

Even Go-with-Go doesn't work:
{code}
FAILED TEST: decimal Go producing,  Go consuming
Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/util.py", line 130, in run_cmd
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/opt/go/bin/arrow-json-integration-test', '-arrow', 
'/tmp/tmpj9b4jgvi/fd6cfe16_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'JSON_TO_ARROW']' returned 
non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 193, in 
_run_ipc_test_case
run_binaries(producer, consumer, outcome, test_case)
  File "/arrow/dev/archery/archery/integration/runner.py", line 215, in 
_produce_consume
producer.json_to_file(json_path, producer_file_path)
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 58, in 
json_to_file
return self._run(arrow_path, json_path, 'JSON_TO_ARROW')
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 52, in _run
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: ['/opt/go/bin/arrow-json-integration-test', 
'-arrow', '/tmp/tmpj9b4jgvi/fd6cfe16_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'JSON_TO_ARROW']
With output:
--
arrow-json: could not open JSON file reader from file 
"/tmp/tmpkz1_ydgp/generated_decimal.json": json: cannot unmarshal number into 
Go struct field dataType.precision of type string

--
{code}


> [Go][Integration] Decimal integration failures
> --
>
> Key: ARROW-7948
> URL: https://issues.apache.org/jira/browse/ARROW-7948
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go, Integration
>Reporter: Antoine Pitrou
>Priority: Major
>
> If I un-skip decimal data for integrations tests with Go, I get some errors 
> such as:
> {code}
> ==
> Testing file /tmp/tmpkz1_ydgp/generated_decimal.json
> ==
> -- Creating binary inputs
> -- Validating file
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/util.py", line 130, in run_cmd
> output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
>   File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
> check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>   File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
> raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command 
> '['/opt/go/bin/arrow-json-integration-test', '-arrow', 
> '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
> '/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']' returned 
> non-zero exit status 1.
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/runner.py", line 193, in 
> _run_ipc_test_case
> run_binaries(producer, consumer, outcome, test_case)
>   File "/arrow/dev/archery/archery/integration/runner.py", line 219, in 
> _produce_consume
> consumer.validate(json_path, producer_file_path)
>   File "/arrow/dev/archery/archery/integration/tester_go.py", line 55, in 
> validate
> return self._run(arrow_path, json_path, 'VALIDATE')
>   File "/arrow/dev/archery/archery/integration/tester_go.py", line 52, in _run
> run_cmd(cmd)
>   File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
> raise RuntimeError(sio.getvalue())
> RuntimeError: Command failed: ['/opt/go/bin/arrow-json-integration-test', 
> '-arrow', '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', 
> '-json', '/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']
> With output:
> --
> arrow-json: could not open JSON file reader from file 
> "/tmp/tmpkz1_ydgp/generated_decimal.json": json: cannot unmarshal number into 
> Go struct field dataType.precision of type string
> --
> {code}




[jira] [Created] (ARROW-7948) [Go][Integration] Decimal integration failures

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7948:
-

 Summary: [Go][Integration] Decimal integration failures
 Key: ARROW-7948
 URL: https://issues.apache.org/jira/browse/ARROW-7948
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go, Integration
Reporter: Antoine Pitrou


If I un-skip decimal data for integrations tests with Go, I get some errors 
such as:
{code}
==
Testing file /tmp/tmpkz1_ydgp/generated_decimal.json
==
-- Creating binary inputs
-- Validating file
Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/util.py", line 130, in run_cmd
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/opt/go/bin/arrow-json-integration-test', '-arrow', 
'/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']' returned 
non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 193, in 
_run_ipc_test_case
run_binaries(producer, consumer, outcome, test_case)
  File "/arrow/dev/archery/archery/integration/runner.py", line 219, in 
_produce_consume
consumer.validate(json_path, producer_file_path)
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 55, in 
validate
return self._run(arrow_path, json_path, 'VALIDATE')
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 52, in _run
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: ['/opt/go/bin/arrow-json-integration-test', 
'-arrow', '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']
With output:
--
arrow-json: could not open JSON file reader from file 
"/tmp/tmpkz1_ydgp/generated_decimal.json": json: cannot unmarshal number into 
Go struct field dataType.precision of type string

--
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7940) [C++] Unable to generate cmake build with settings other than default

2020-02-26 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-7940:

Summary: [C++] Unable to generate cmake build with settings other than 
default  (was: Unable to generate cmake build with settings other than default)

> [C++] Unable to generate cmake build with settings other than default
> -
>
> Key: ARROW-7940
> URL: https://issues.apache.org/jira/browse/ARROW-7940
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.16.0
> Environment: Windows 10
> Visual Studio 2019 Build Tools 16.4.5
>Reporter: Valery Vybornov
>Priority: Major
> Attachments: log.txt
>
>
> Steps to reproduce:
>  # Install conda-forge as described here: 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
>  # Install ninja+clcache 
> [https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
>  # (git bash) git clone [https://github.com/apache/arrow.git]
> cd arrow/
> git checkout apache-arrow-0.16.0
>  # (cmd)
> call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
> call "C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
> call conda activate arrow-dev
>  # cd arrow\cpp
> mkdir build
> cd build
>  # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
> -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
>  # cmake --build . --config Release
> Expected results: Examples, utilities, flight, gandiva, parquet built.
> Actual results: Default configuration and none of the above built. 
> cmake_summary.json indicates all these features OFF. Following lines in the 
> output of cmake:
> {code:java}
> -- Configuring done
> You have changed variables that require your cache to be deleted.
> Configure will be re-run and you may have to reset some variables.
> The following variables have changed:
> CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
> Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
> -- Building using CMake version: 3.16.4 {code}
> Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7947) [Rust] [Flight] [DataFusion] Implement example for get_schema

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7947:
--
Labels: pull-request-available  (was: )

> [Rust] [Flight] [DataFusion] Implement example for get_schema
> -
>
> Key: ARROW-7947
> URL: https://issues.apache.org/jira/browse/ARROW-7947
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Implement example for get_schema and implement the required helper methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7947) [Rust] [Flight] [DataFusion] Implement example for get_schema

2020-02-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-7947:
-

 Summary: [Rust] [Flight] [DataFusion] Implement example for 
get_schema
 Key: ARROW-7947
 URL: https://issues.apache.org/jira/browse/ARROW-7947
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement example for get_schema and implement the required helper methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6110) [Java] Support LargeList Type and add integration test with C++

2020-02-26 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045633#comment-17045633
 ] 

Micah Kornfield commented on ARROW-6110:


[~projjal] please do :).  There isn't a direct blocker but until Vectors 
support int64 indexing, the value of this will likely be limited. 

> [Java] Support LargeList Type and add integration test with C++
> ---
>
> Key: ARROW-6110
> URL: https://issues.apache.org/jira/browse/ARROW-6110
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Integration, Java
>Reporter: Micah Kornfield
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-7944) [Python] Test failures without Pandas

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-7944.
-
  Assignee: Antoine Pitrou
Resolution: Fixed

> [Python] Test failures without Pandas
> -
>
> Key: ARROW-7944
> URL: https://issues.apache.org/jira/browse/ARROW-7944
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> I recently saw this:
> https://ci.appveyor.com/project/pitrou/arrow/builds/31065781/job/p08i1nrstf9wl2kr#L1964
> {code}
> == FAILURES 
> ===
> _ test_builtin_pickle_dataset 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
> datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
> def test_builtin_pickle_dataset(tempdir, datadir):
> import pickle
> >   dataset = _make_dataset_for_pickling(tempdir)
> pyarrow\tests\test_parquet.py:2821: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
> N = 100
> def _make_dataset_for_pickling(tempdir, N=100):
> path = tempdir / 'data.parquet'
> fs = LocalFileSystem.get_instance()
> 
> >   df = pd.DataFrame({
> 'index': np.arange(N),
> 'values': np.random.randn(N)
> }, columns=['index', 'values'])
> E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
> pyarrow\tests\test_parquet.py:2776: AttributeError
> __ test_cloudpickle_dataset 
> ___
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
> datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
> def test_cloudpickle_dataset(tempdir, datadir):
> cp = pytest.importorskip('cloudpickle')
> >   dataset = _make_dataset_for_pickling(tempdir)
> pyarrow\tests\test_parquet.py:2827: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
> N = 100
> def _make_dataset_for_pickling(tempdir, N=100):
> path = tempdir / 'data.parquet'
> fs = LocalFileSystem.get_instance()
> 
> >   df = pd.DataFrame({
> 'index': np.arange(N),
> 'values': np.random.randn(N)
> }, columns=['index', 'values'])
> E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
> pyarrow\tests\test_parquet.py:2776: AttributeError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7926) [Developer] "archery lint" target is not ergonomic for running a single check like IWYU

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7926:
--
Labels: pull-request-available  (was: )

> [Developer] "archery lint" target is not ergonomic for running a single check 
> like IWYU
> ---
>
> Key: ARROW-7926
> URL: https://issues.apache.org/jira/browse/ARROW-7926
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> It might be useful to have a second lint CLI target with everything disabled 
> by default so that a single lint target can be toggled on. How should this be 
> used via docker-compose? See ARROW-7925



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7946) [C++] Deduplicate schema equivalence checks

2020-02-26 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7946:
---

 Summary: [C++] Deduplicate schema equivalence checks
 Key: ARROW-7946
 URL: https://issues.apache.org/jira/browse/ARROW-7946
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
 Fix For: 1.0.0


There are several locations where a group of schemas is checked for equivalence 
including
{{UnionDataset::Make}}, {{Table::FromRecordBatches}}, {{ConcatenateTables}}, 
and {{WriteRecordBatchStream}}. These should be extracted to a helper function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7945) [C++][Dataset] Implement InMemoryDatasetFactory

2020-02-26 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045585#comment-17045585
 ] 

Krisztian Szucs commented on ARROW-7945:


We also should have convenience factories for InMemoryDataset, see discussion 
https://github.com/apache/arrow/pull/6470/files/881ee8538a2cd0c3173feb9c370b5da69388b5f4#r384503399

{code:cpp}
InMemoryDataset::Make(Table);
InMemoryDataset::Make(std::vector);  // all record batches should 
have the same schema
{code}

> [C++][Dataset] Implement InMemoryDatasetFactory
> ---
>
> Key: ARROW-7945
> URL: https://issues.apache.org/jira/browse/ARROW-7945
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Dataset
>Affects Versions: 0.16.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> This will allow in memory datasets (such as tables) to participate in 
> discovery through {{UnionDatasetFactory}}. This class will be trivial since 
> Inspect will do nothing but return the table's schema, but is necessary to 
> ensure that the resulting {{UnionDataset}}'s unified schema accommodates the 
> table's schema (for example including fields present only in the table's 
> schema or emitting an error when unification is not possible)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7926) [Developer] "archery lint" target is not ergonomic for running a single check like IWYU

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-7926:
-

Assignee: Antoine Pitrou

> [Developer] "archery lint" target is not ergonomic for running a single check 
> like IWYU
> ---
>
> Key: ARROW-7926
> URL: https://issues.apache.org/jira/browse/ARROW-7926
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> It might be useful to have a second lint CLI target with everything disabled 
> by default so that a single lint target can be toggled on. How should this be 
> used via docker-compose? See ARROW-7925



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7945) [C++][Dataset] Implement InMemoryDatasetFactory

2020-02-26 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7945:
---

 Summary: [C++][Dataset] Implement InMemoryDatasetFactory
 Key: ARROW-7945
 URL: https://issues.apache.org/jira/browse/ARROW-7945
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Dataset
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


This will allow in memory datasets (such as tables) to participate in discovery 
through {{UnionDatasetFactory}}. This class will be trivial since Inspect will 
do nothing but return the table's schema, but is necessary to ensure that the 
resulting {{UnionDataset}}'s unified schema accommodates the table's schema 
(for example including fields present only in the table's schema or emitting an 
error when unification is not possible)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6110) [Java] Support LargeList Type and add integration test with C++

2020-02-26 Thread Projjal Chanda (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045492#comment-17045492
 ] 

Projjal Chanda commented on ARROW-6110:
---

[~emkornfi...@gmail.com] [~wesm] Can I help with this? Or any other blocking 
issues which might need help?

> [Java] Support LargeList Type and add integration test with C++
> ---
>
> Key: ARROW-6110
> URL: https://issues.apache.org/jira/browse/ARROW-6110
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Integration, Java
>Reporter: Micah Kornfield
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7944) [Python] Test failures without Pandas

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045376#comment-17045376
 ] 

Antoine Pitrou commented on ARROW-7944:
---

cc [~jorisvandenbossche]

> [Python] Test failures without Pandas
> -
>
> Key: ARROW-7944
> URL: https://issues.apache.org/jira/browse/ARROW-7944
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> I recently saw this:
> https://ci.appveyor.com/project/pitrou/arrow/builds/31065781/job/p08i1nrstf9wl2kr#L1964
> {code}
> == FAILURES 
> ===
> _ test_builtin_pickle_dataset 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
> datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
> def test_builtin_pickle_dataset(tempdir, datadir):
> import pickle
> >   dataset = _make_dataset_for_pickling(tempdir)
> pyarrow\tests\test_parquet.py:2821: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
> N = 100
> def _make_dataset_for_pickling(tempdir, N=100):
> path = tempdir / 'data.parquet'
> fs = LocalFileSystem.get_instance()
> 
> >   df = pd.DataFrame({
> 'index': np.arange(N),
> 'values': np.random.randn(N)
> }, columns=['index', 'values'])
> E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
> pyarrow\tests\test_parquet.py:2776: AttributeError
> __ test_cloudpickle_dataset 
> ___
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
> datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
> def test_cloudpickle_dataset(tempdir, datadir):
> cp = pytest.importorskip('cloudpickle')
> >   dataset = _make_dataset_for_pickling(tempdir)
> pyarrow\tests\test_parquet.py:2827: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> tempdir = 
> WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
> N = 100
> def _make_dataset_for_pickling(tempdir, N=100):
> path = tempdir / 'data.parquet'
> fs = LocalFileSystem.get_instance()
> 
> >   df = pd.DataFrame({
> 'index': np.arange(N),
> 'values': np.random.randn(N)
> }, columns=['index', 'values'])
> E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
> pyarrow\tests\test_parquet.py:2776: AttributeError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7944) [Python] Test failures without Pandas

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7944:
-

 Summary: [Python] Test failures without Pandas
 Key: ARROW-7944
 URL: https://issues.apache.org/jira/browse/ARROW-7944
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
 Fix For: 1.0.0


I recently saw this:
https://ci.appveyor.com/project/pitrou/arrow/builds/31065781/job/p08i1nrstf9wl2kr#L1964

{code}
== FAILURES ===
_ test_builtin_pickle_dataset _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
def test_builtin_pickle_dataset(tempdir, datadir):
import pickle
>   dataset = _make_dataset_for_pickling(tempdir)
pyarrow\tests\test_parquet.py:2821: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
N = 100
def _make_dataset_for_pickling(tempdir, N=100):
path = tempdir / 'data.parquet'
fs = LocalFileSystem.get_instance()

>   df = pd.DataFrame({
'index': np.arange(N),
'values': np.random.randn(N)
}, columns=['index', 'values'])
E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
pyarrow\tests\test_parquet.py:2776: AttributeError
__ test_cloudpickle_dataset ___
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
def test_cloudpickle_dataset(tempdir, datadir):
cp = pytest.importorskip('cloudpickle')
>   dataset = _make_dataset_for_pickling(tempdir)
pyarrow\tests\test_parquet.py:2827: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
N = 100
def _make_dataset_for_pickling(tempdir, N=100):
path = tempdir / 'data.parquet'
fs = LocalFileSystem.get_instance()

>   df = pd.DataFrame({
'index': np.arange(N),
'values': np.random.randn(N)
}, columns=['index', 'values'])
E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
pyarrow\tests\test_parquet.py:2776: AttributeError
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7943) [C++] Add a new level builder capable of handling nested data

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7943:
--
Labels: pull-request-available  (was: )

> [C++] Add a new level builder capable of handling nested data
> -
>
> Key: ARROW-7943
> URL: https://issues.apache.org/jira/browse/ARROW-7943
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
>
> There will be one or two more steps to integrate this with the existing 
> higher level APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7936) [Python] FileSystem.from_uri test fails on python 3.5

2020-02-26 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045271#comment-17045271
 ] 

Krisztian Szucs commented on ARROW-7936:


We should discuss it on the ML

> [Python] FileSystem.from_uri test fails on python 3.5
> -
>
> Key: ARROW-7936
> URL: https://issues.apache.org/jira/browse/ARROW-7936
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> See build failure at 
> https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=7535=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=288



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-7936) [Python] FileSystem.from_uri test fails on python 3.5

2020-02-26 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045271#comment-17045271
 ] 

Krisztian Szucs edited comment on ARROW-7936 at 2/26/20 9:01 AM:
-

[~apitrou] We should discuss it on the ML


was (Author: kszucs):
We should discuss it on the ML

> [Python] FileSystem.from_uri test fails on python 3.5
> -
>
> Key: ARROW-7936
> URL: https://issues.apache.org/jira/browse/ARROW-7936
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> See build failure at 
> https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=7535=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=288



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6393) [C++]Add EqualOptions support in SparseTensor::Equals

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6393.
---
Resolution: Fixed

Issue resolved by pull request 6443
[https://github.com/apache/arrow/pull/6443]

> [C++]Add EqualOptions support in SparseTensor::Equals
> -
>
> Key: ARROW-6393
> URL: https://issues.apache.org/jira/browse/ARROW-6393
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> SparseTensor::Equals should take EqualOptions argument as Tensor::Equals does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7943) [C++] Add a new level builder capable of handling nested data

2020-02-26 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-7943:
---
Component/s: C++

> [C++] Add a new level builder capable of handling nested data
> -
>
> Key: ARROW-7943
> URL: https://issues.apache.org/jira/browse/ARROW-7943
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> There will be one or two more steps to integrate this with the existing 
> higher level APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7943) [C++] Add a new level builder capable of handling nested data

2020-02-26 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-7943:
---
Summary: [C++] Add a new level builder capable of handling nested data  
(was: Add a new level builder capable of handling nested data)

> [C++] Add a new level builder capable of handling nested data
> -
>
> Key: ARROW-7943
> URL: https://issues.apache.org/jira/browse/ARROW-7943
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> There will be one or two more steps to integrate this with the existing 
> higher level APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7943) Add a new level builder capable of handling nested data

2020-02-26 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7943:
--

 Summary: Add a new level builder capable of handling nested data
 Key: ARROW-7943
 URL: https://issues.apache.org/jira/browse/ARROW-7943
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield


There will be one or two more steps to integrate this with the existing higher 
level APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6213) [C++] tests fail for AVX512

2020-02-26 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-6213.
-
Fix Version/s: (was: 2.0.0)
   Resolution: Not A Bug

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1, 0.15.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt, arrow-0.15.1-GCC-7.3.0.env.avx512, 
> arrow-compute-compare-test.res, 
> easybuild-arrow-0.14.1-20190809.34.MgMEK.log
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6213) [C++] tests fail for AVX512

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045260#comment-17045260
 ] 

Antoine Pitrou edited comment on ARROW-6213 at 2/26/20 8:51 AM:


Ok, I think we can close it as a non-Arrow bug then. Thanks for investigating.


was (Author: pitrou):
Ok, I think we close it as a non-Arrow bug then. Thanks for investigating.

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1, 0.15.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt, arrow-0.15.1-GCC-7.3.0.env.avx512, 
> arrow-compute-compare-test.res, 
> easybuild-arrow-0.14.1-20190809.34.MgMEK.log
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512

2020-02-26 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045260#comment-17045260
 ] 

Antoine Pitrou commented on ARROW-6213:
---

Ok, I think we close it as a non-Arrow bug then. Thanks for investigating.

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1, 0.15.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt, arrow-0.15.1-GCC-7.3.0.env.avx512, 
> arrow-compute-compare-test.res, 
> easybuild-arrow-0.14.1-20190809.34.MgMEK.log
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7937) [Python][Packaging] Remove boost from the macos wheels

2020-02-26 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-7937:
--

Assignee: Krisztian Szucs

> [Python][Packaging] Remove boost from the macos wheels
> --
>
> Key: ARROW-7937
> URL: https://issues.apache.org/jira/browse/ARROW-7937
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Only boost_regex is required for libarrow but only on gcc < 4.9, see
> https://github.com/apache/arrow/blob/f609298f8f00783a6704608ca8493227a552abab/cpp/src/parquet/metadata.cc#L38
> so we can remove the bundled boost libraries from the macos wheels as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7937) [Python][Packaging] Remove boost from the macos wheels

2020-02-26 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-7937.

Resolution: Fixed

Issue resolved by pull request 6485
[https://github.com/apache/arrow/pull/6485]

> [Python][Packaging] Remove boost from the macos wheels
> --
>
> Key: ARROW-7937
> URL: https://issues.apache.org/jira/browse/ARROW-7937
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Only boost_regex is required for libarrow but only on gcc < 4.9, see
> https://github.com/apache/arrow/blob/f609298f8f00783a6704608ca8493227a552abab/cpp/src/parquet/metadata.cc#L38
> so we can remove the bundled boost libraries from the macos wheels as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)