[jira] [Commented] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-05-01 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460585#comment-16460585
 ] 

Uwe L. Korn commented on ARROW-2535:


[~wesmckinn] I remember that you have some code in this direction. Can you post 
this again? I would then turn this into a pre-commit hook and add some 
documentation.

> [C++/Python] Provide pre-commit hooks that check flake8 et al.
> --
>
> Key: ARROW-2535
> URL: https://issues.apache.org/jira/browse/ARROW-2535
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> We should provide pre-commit hooks that users can install (optionally) that 
> check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2535:
--

 Summary: [C++/Python] Provide pre-commit hooks that check flake8 
et al.
 Key: ARROW-2535
 URL: https://issues.apache.org/jira/browse/ARROW-2535
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


We should provide pre-commit hooks that users can install (optionally) that 
check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2365) [Plasma] Return status codes instead of crashing

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2365:
---
Fix Version/s: (was: 0.9.0)
   0.10.0

> [Plasma] Return status codes instead of crashing
> 
>
> Key: ARROW-2365
> URL: https://issues.apache.org/jira/browse/ARROW-2365
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> When certain {{PlasmaClient}} methods are called with bad arguments, 
> PlasmaClient crashes instead of returning an error Status. For example, try 
> calling {{Seal()}} with a non-existent object id.
> This is hostile towards users of high-level languages such as Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2531) [C++] Update clang bits to 6.0

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2531.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1977
[https://github.com/apache/arrow/pull/1977]

> [C++] Update clang bits to 6.0
> --
>
> Key: ARROW-2531
> URL: https://issues.apache.org/jira/browse/ARROW-2531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2466) [C++] misleading "append" flag to FileOutputStream

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2466.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1978
[https://github.com/apache/arrow/pull/1978]

> [C++] misleading "append" flag to FileOutputStream
> --
>
> Key: ARROW-2466
> URL: https://issues.apache.org/jira/browse/ARROW-2466
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FileOutputStream}} has a constructor option named {{append}}, but all it 
> does is prevent truncation of the file, i.e. it doesn't move the file pointer 
> to the end. And given {{FileOutputStream}} doesn't have a seek method, this 
> option is useless unless you manually seek using the file descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2332) [Python] Provide API for reading multiple Feather files

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2332:
--

Assignee: Dhruv Madeka

> [Python] Provide API for reading multiple Feather files
> ---
>
> Key: ARROW-2332
> URL: https://issues.apache.org/jira/browse/ARROW-2332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Dhruv Madeka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See discussion in 
> https://github.com/wesm/feather/issues/273#issuecomment-374093374



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2332) [Python] Provide API for reading multiple Feather files

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2332.

Resolution: Fixed

Issue resolved by pull request 1960
[https://github.com/apache/arrow/pull/1960]

> [Python] Provide API for reading multiple Feather files
> ---
>
> Key: ARROW-2332
> URL: https://issues.apache.org/jira/browse/ARROW-2332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See discussion in 
> https://github.com/wesm/feather/issues/273#issuecomment-374093374



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread Kouhei Sutou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460325#comment-16460325
 ] 

Kouhei Sutou commented on ARROW-2522:
-

If we use libarrow.so.10.0.0 naming, we should use libarrow.so.10 instead of 
libarrow.so.0.
libvlc also does it. For example, VLC 3.0.2 provides libvlc.so.5 and 
libvlc.so.5.6.0.

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2533) [CI] Fast finish failing AppVeyor builds

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2533.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1982
[https://github.com/apache/arrow/pull/1982]

> [CI] Fast finish failing AppVeyor builds
> 
>
> Key: ARROW-2533
> URL: https://issues.apache.org/jira/browse/ARROW-2533
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The main AppVeyor queue is taking very long to schedule jobs, one of the 
> measures to get it better would be to immediately fail a job once a build is 
> broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2510) [Python] Segmentation fault when converting empty column as categorical

2018-05-01 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460079#comment-16460079
 ] 

Uwe L. Korn commented on ARROW-2510:


[~fjetter] I cannot reproduce the behaviour I expect that you are describing. 
The following code succeeds on master:

{code}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame({'a': ['str']})
df['a'] = df['a'].astype('category')
pq.write_metadata(pa.Table.from_pandas(df).schema, '_common_metadata')
t = pq.read_table('_common_metadata')
t.to_pandas(categories=['a'])
{code}

> [Python] Segmentation fault when converting empty column as categorical
> ---
>
> Key: ARROW-2510
> URL: https://issues.apache.org/jira/browse/ARROW-2510
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
> Fix For: 0.10.0
>
>
> When converting an empty column to categorical in pandas I get a segmentation 
> fault



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2533) [CI] Fast finish failing AppVeyor builds

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2533:
--
Labels: pull-request-available  (was: )

> [CI] Fast finish failing AppVeyor builds
> 
>
> Key: ARROW-2533
> URL: https://issues.apache.org/jira/browse/ARROW-2533
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> The main AppVeyor queue is taking very long to schedule jobs, one of the 
> measures to get it better would be to immediately fail a job once a build is 
> broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2534.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1981
[https://github.com/apache/arrow/pull/1981]

> [C++] libarrow.so leaks zlib symbols
> 
>
> Key: ARROW-2534
> URL: https://issues.apache.org/jira/browse/ARROW-2534
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I get the following here:
> {code:bash}
> $ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | 
> \grep ' T ' | \grep -v arrow
> 0025bc8c T adler32_z
> 0025c4c9 T crc32_z
> 002ad638 T _fini
> 00078ab8 T _init
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2534:
--
Labels: pull-request-available  (was: )

> [C++] libarrow.so leaks zlib symbols
> 
>
> Key: ARROW-2534
> URL: https://issues.apache.org/jira/browse/ARROW-2534
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> I get the following here:
> {code:bash}
> $ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | 
> \grep ' T ' | \grep -v arrow
> 0025bc8c T adler32_z
> 0025c4c9 T crc32_z
> 002ad638 T _fini
> 00078ab8 T _init
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2534:
-

Assignee: Antoine Pitrou

> [C++] libarrow.so leaks zlib symbols
> 
>
> Key: ARROW-2534
> URL: https://issues.apache.org/jira/browse/ARROW-2534
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> I get the following here:
> {code:bash}
> $ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | 
> \grep ' T ' | \grep -v arrow
> 0025bc8c T adler32_z
> 0025c4c9 T crc32_z
> 002ad638 T _fini
> 00078ab8 T _init
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2534:
-

 Summary: [C++] libarrow.so leaks zlib symbols
 Key: ARROW-2534
 URL: https://issues.apache.org/jira/browse/ARROW-2534
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I get the following here:

{code:bash}
$ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | \grep 
' T ' | \grep -v arrow
0025bc8c T adler32_z
0025c4c9 T crc32_z
002ad638 T _fini
00078ab8 T _init
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2499.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1940
[https://github.com/apache/arrow/pull/1940]

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2505) [C++] Disable MSVC warning C4800

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2505:
--
Labels: build pull-request-available windows  (was: build windows)

> [C++] Disable MSVC warning C4800
> 
>
> Key: ARROW-2505
> URL: https://issues.apache.org/jira/browse/ARROW-2505
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: build, pull-request-available, windows
>
> This warning is practically pointless, and since we treat warnings as errors 
> on Appveyor, it imposes spurious back-and-forths to fix it when it occurs.
> https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4800
> {quote}This warning is generated when a value that is not bool is assigned or 
> coerced into type bool. Typically, this message is caused by assigning int 
> variables to bool variables where the int variable contains only values true 
> and false, and could be redeclared as type bool. If you cannot rewrite the 
> expression to use type bool, then you can add "!=0" to the expression, which 
> gives the expression type bool. Casting the expression to type bool does not 
> disable the warning, which is by design.
> This warning is no longer generated in Visual Studio 2017.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-05-01 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459998#comment-16459998
 ] 

Antoine Pitrou commented on ARROW-2516:
---

We can reuse the following logic from CPython:
https://github.com/python/cpython/blob/master/.github/appveyor.yml#L11-L26

> AppVeyor Build Matrix should be specific to the changes made in a PR
> 
>
> Key: ARROW-2516
> URL: https://issues.apache.org/jira/browse/ARROW-2516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2533) [CI] Fast finish failing AppVeyor builds

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2533:
--

 Summary: [CI] Fast finish failing AppVeyor builds
 Key: ARROW-2533
 URL: https://issues.apache.org/jira/browse/ARROW-2533
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


The main AppVeyor queue is taking very long to schedule jobs, one of the 
measures to get it better would be to immediately fail a job once a build is 
broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2466) [C++] misleading "append" flag to FileOutputStream

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2466:
-

Assignee: Antoine Pitrou

> [C++] misleading "append" flag to FileOutputStream
> --
>
> Key: ARROW-2466
> URL: https://issues.apache.org/jira/browse/ARROW-2466
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FileOutputStream}} has a constructor option named {{append}}, but all it 
> does is prevent truncation of the file, i.e. it doesn't move the file pointer 
> to the end. And given {{FileOutputStream}} doesn't have a seek method, this 
> option is useless unless you manually seek using the file descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2466) [C++] misleading "append" flag to FileOutputStream

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2466:
--
Labels: pull-request-available  (was: )

> [C++] misleading "append" flag to FileOutputStream
> --
>
> Key: ARROW-2466
> URL: https://issues.apache.org/jira/browse/ARROW-2466
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {{FileOutputStream}} has a constructor option named {{append}}, but all it 
> does is prevent truncation of the file, i.e. it doesn't move the file pointer 
> to the end. And given {{FileOutputStream}} doesn't have a seek method, this 
> option is useless unless you manually seek using the file descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2417) [Rust] Review APIs for safety

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2417.

Resolution: Fixed

Issue resolved by pull request 1957
[https://github.com/apache/arrow/pull/1957]

> [Rust] Review APIs for safety
> -
>
> Key: ARROW-2417
> URL: https://issues.apache.org/jira/browse/ARROW-2417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Rust library makes a lot of use of unsafe calls. We should review the API 
> to see if any methods we expose should be marked unsafe or whether we need to 
> add assertions to make APIs safe.
> We should also add more unit tests around this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2509) [CI] Intermittent npm failures

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2509:
--

Assignee: Uwe L. Korn

> [CI] Intermittent npm failures
> --
>
> Key: ARROW-2509
> URL: https://issues.apache.org/jira/browse/ARROW-2509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, JavaScript
>Reporter: Antoine Pitrou
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We occasionally get the following kind of errors on Travis-CI:
> https://travis-ci.org/apache/arrow/jobs/371023429#L1925
> {code}
> gulp[17318]: ../src/node_file.cc:829:void node::fs::Stat(const 
> v8::FunctionCallbackInfo&): Assertion `(argc) == (3)' failed.
>  1: node::Abort() [gulp]
>  2: 0x87b6c5 [gulp]
>  3: 0x8b2de2 [gulp]
>  4: 
> v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) 
> [gulp]
>  5: 0xad62fa [gulp]
>  6: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, 
> v8::internal::Isolate*) [gulp]
>  7: 0xad165d0427d
> Aborted (core dumped)
> npm ERR! code ELIFECYCLE
> npm ERR! errno 134
> npm ERR! apache-arrow@0.3.0 build: `gulp build`
> npm ERR! Exit status 134
> npm ERR! 
> npm ERR! Failed at the apache-arrow@0.3.0 build script.
> npm ERR! This is probably not a problem with npm. There is likely additional 
> logging output above.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2509) [CI] Intermittent npm failures

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2509.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1976
[https://github.com/apache/arrow/pull/1976]

> [CI] Intermittent npm failures
> --
>
> Key: ARROW-2509
> URL: https://issues.apache.org/jira/browse/ARROW-2509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, JavaScript
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We occasionally get the following kind of errors on Travis-CI:
> https://travis-ci.org/apache/arrow/jobs/371023429#L1925
> {code}
> gulp[17318]: ../src/node_file.cc:829:void node::fs::Stat(const 
> v8::FunctionCallbackInfo&): Assertion `(argc) == (3)' failed.
>  1: node::Abort() [gulp]
>  2: 0x87b6c5 [gulp]
>  3: 0x8b2de2 [gulp]
>  4: 
> v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) 
> [gulp]
>  5: 0xad62fa [gulp]
>  6: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, 
> v8::internal::Isolate*) [gulp]
>  7: 0xad165d0427d
> Aborted (core dumped)
> npm ERR! code ELIFECYCLE
> npm ERR! errno 134
> npm ERR! apache-arrow@0.3.0 build: `gulp build`
> npm ERR! Exit status 134
> npm ERR! 
> npm ERR! Failed at the apache-arrow@0.3.0 build script.
> npm ERR! This is probably not a problem with npm. There is likely additional 
> logging output above.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2532) [C++] Add chunked builder classes

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2532:
-

 Summary: [C++] Add chunked builder classes
 Key: ARROW-2532
 URL: https://issues.apache.org/jira/browse/ARROW-2532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I think it would be useful to have chunked builders for list, string and binary 
types. A chunked builder would produce a chunked array as output, circumventing 
the 32-bit offset limit of those types. There's some special-casing scatterred 
around our Numpy conversion routines right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2531) [C++] Update clang bits to 6.0

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2531:
--
Labels: pull-request-available  (was: )

> [C++] Update clang bits to 6.0
> --
>
> Key: ARROW-2531
> URL: https://issues.apache.org/jira/browse/ARROW-2531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2531) [C++] Update clang bit to 6.0

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2531:
---
Summary: [C++] Update clang bit to 6.0  (was: [C++] Update clang-format to 
6.0)

> [C++] Update clang bit to 6.0
> -
>
> Key: ARROW-2531
> URL: https://issues.apache.org/jira/browse/ARROW-2531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2531) [C++] Update clang bits to 6.0

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2531:
---
Summary: [C++] Update clang bits to 6.0  (was: [C++] Update clang bit to 
6.0)

> [C++] Update clang bits to 6.0
> --
>
> Key: ARROW-2531
> URL: https://issues.apache.org/jira/browse/ARROW-2531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2531) [C++] Update clang-format to 6.0

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2531:
--

 Summary: [C++] Update clang-format to 6.0
 Key: ARROW-2531
 URL: https://issues.apache.org/jira/browse/ARROW-2531
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2509) [CI] Intermittent npm failures

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2509:
--
Labels: pull-request-available  (was: )

> [CI] Intermittent npm failures
> --
>
> Key: ARROW-2509
> URL: https://issues.apache.org/jira/browse/ARROW-2509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, JavaScript
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> We occasionally get the following kind of errors on Travis-CI:
> https://travis-ci.org/apache/arrow/jobs/371023429#L1925
> {code}
> gulp[17318]: ../src/node_file.cc:829:void node::fs::Stat(const 
> v8::FunctionCallbackInfo&): Assertion `(argc) == (3)' failed.
>  1: node::Abort() [gulp]
>  2: 0x87b6c5 [gulp]
>  3: 0x8b2de2 [gulp]
>  4: 
> v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) 
> [gulp]
>  5: 0xad62fa [gulp]
>  6: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, 
> v8::internal::Isolate*) [gulp]
>  7: 0xad165d0427d
> Aborted (core dumped)
> npm ERR! code ELIFECYCLE
> npm ERR! errno 134
> npm ERR! apache-arrow@0.3.0 build: `gulp build`
> npm ERR! Exit status 134
> npm ERR! 
> npm ERR! Failed at the apache-arrow@0.3.0 build script.
> npm ERR! This is probably not a problem with npm. There is likely additional 
> logging output above.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459731#comment-16459731
 ] 

Uwe L. Korn commented on ARROW-2522:


I would prefer the latter, once we bump the major version to 1, we would need 
to add an offset to the ABI version. Other libraries such as libVLC are doing 
similar things.

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2522:
--
Labels: pull-request-available  (was: )

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459727#comment-16459727
 ] 

Antoine Pitrou commented on ARROW-2522:
---

What should the naming scheme be? `libarrow.so.0.10.0` reflects the official 
Arrow version, but not the ABI version (because the user will link with 
`libarrow.so.0`). `libarrow.so.10.0.0` reflects the ABI version, but not the 
official Arrow version (and therefore might be more cryptic).

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2503) Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2503.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1945
[https://github.com/apache/arrow/pull/1945]

> Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile
> --
>
> Key: ARROW-2503
> URL: https://issues.apache.org/jira/browse/ARROW-2503
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When reading a parquet file containing a string column, the _RowGroup_ 
> statistics contain a trailing space character for the string column. The 
> example below shows the behavior.
> {code}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create and write arrow table as parquet
> df = pd.DataFrame({'string_column': ['some', 'string', 'values', 'here']})
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'example.parquet')
> # read parquet file metadata and print string column statistics
> pq_file = pq.ParquetFile(open('example.parquet', 'rb'))
> print(pq_file.metadata.row_group(0).column(0).statistics.max) # yields 
> b'values '
> print(pq_file.metadata.row_group(0).column(0).statistics.min) # yields b'here 
> '
> {code}
> For other data types I did not observe this problem, even though the 
> statistics are always strings.
> When reading the same file with _fastparquet_, there is no trailing space 
> character, which implies that this problem occurs in the reading path of 
> _pyarrow.parquet_. I am aware that this might well be an issue with 
> _parquet-cpp_, but as I face this bug as a _pyarrow_ user, I report it here.
> I'll try to investigate this further and report back here.
>  
> *Update:*
> The trailing space is added in _parquet-cpp_. _pyarrow_ calls the function 
> _FormatStatValue_ which adds the trailing space 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/types.cc#L52). 
> There is no comment there to explain it. Does anyone here know what the 
> reason is?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2503) Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2503:
-

Assignee: Julius Neuffer

> Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile
> --
>
> Key: ARROW-2503
> URL: https://issues.apache.org/jira/browse/ARROW-2503
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When reading a parquet file containing a string column, the _RowGroup_ 
> statistics contain a trailing space character for the string column. The 
> example below shows the behavior.
> {code}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create and write arrow table as parquet
> df = pd.DataFrame({'string_column': ['some', 'string', 'values', 'here']})
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'example.parquet')
> # read parquet file metadata and print string column statistics
> pq_file = pq.ParquetFile(open('example.parquet', 'rb'))
> print(pq_file.metadata.row_group(0).column(0).statistics.max) # yields 
> b'values '
> print(pq_file.metadata.row_group(0).column(0).statistics.min) # yields b'here 
> '
> {code}
> For other data types I did not observe this problem, even though the 
> statistics are always strings.
> When reading the same file with _fastparquet_, there is no trailing space 
> character, which implies that this problem occurs in the reading path of 
> _pyarrow.parquet_. I am aware that this might well be an issue with 
> _parquet-cpp_, but as I face this bug as a _pyarrow_ user, I report it here.
> I'll try to investigate this further and report back here.
>  
> *Update:*
> The trailing space is added in _parquet-cpp_. _pyarrow_ calls the function 
> _FormatStatValue_ which adds the trailing space 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/types.cc#L52). 
> There is no comment there to explain it. Does anyone here know what the 
> reason is?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2484) [C++] Document ABI compliance checking

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2484.

Resolution: Fixed

Issue resolved by pull request 1922
[https://github.com/apache/arrow/pull/1922]

> [C++] Document ABI compliance checking
> --
>
> Key: ARROW-2484
> URL: https://issues.apache.org/jira/browse/ARROW-2484
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2484) [C++] Document ABI compliance checking

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2484:
--

Assignee: Uwe L. Korn

> [C++] Document ABI compliance checking
> --
>
> Key: ARROW-2484
> URL: https://issues.apache.org/jira/browse/ARROW-2484
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2485) [C++] Output diff when run_clang_format.py reports a change

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2485.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1918
[https://github.com/apache/arrow/pull/1918]

> [C++] Output diff when run_clang_format.py reports a change
> ---
>
> Key: ARROW-2485
> URL: https://issues.apache.org/jira/browse/ARROW-2485
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2530) [GLib] Out-of-source build is failed

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2530.

Resolution: Fixed

Issue resolved by pull request 1974
[https://github.com/apache/arrow/pull/1974]

> [GLib] Out-of-source build is failed 
> -
>
> Key: ARROW-2530
> URL: https://issues.apache.org/jira/browse/ARROW-2530
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2530) [GLib] Out-of-source build is failed

2018-05-01 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2530:
---

 Summary: [GLib] Out-of-source build is failed 
 Key: ARROW-2530
 URL: https://issues.apache.org/jira/browse/ARROW-2530
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2530) [GLib] Out-of-source build is failed

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2530:
--
Labels: pull-request-available  (was: )

> [GLib] Out-of-source build is failed 
> -
>
> Key: ARROW-2530
> URL: https://issues.apache.org/jira/browse/ARROW-2530
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2486) [C++/Python] Provide a Docker image that contains all dependencies for development

2018-05-01 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459622#comment-16459622
 ] 

Uwe L. Korn commented on ARROW-2486:


I think we should use a latest Ubuntu version as the base image as this is an 
environment many are familiar with it if they need to execute something in the 
image. Otherwise we're quite open on how this would work. The difficulty will 
be to have an image that can be run and do incremental builds in a persisted 
volumne or bind mount. Non-incremental builds would make it unusable for 
developing.

> [C++/Python] Provide a Docker image that contains all dependencies for 
> development
> --
>
> Key: ARROW-2486
> URL: https://issues.apache.org/jira/browse/ARROW-2486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: hackathon
> Fix For: 0.11.0
>
>
> We should provide docker image and a docker file that contains all necessary 
> dependencies that one needs for development. In addition there should be a 
> Dockerfile that can be used for development where the sources are 
> (bind-)mounted into the container. A typical workflow should consist out of a 
> wrapper script that starts the container, takes care of the bind mounts and 
> runs cmake if necessary.
> People that want to get started with Arrow development on e.g. OS X will 
> spend a long time setting up the environment. I hope this lowers the barrier 
> for a first contribution a bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2522:
---
Fix Version/s: 0.10.0

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2522) [C++] Version shared library files

2018-05-01 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459606#comment-16459606
 ] 

Uwe L. Korn commented on ARROW-2522:


{{parquet-cpp}} is already doing this, so might have a look if we can use that 
or if that maybe also needs to adjusted.

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2422:
--

Assignee: Julius Neuffer

> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2422.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1861
[https://github.com/apache/arrow/pull/1861]

> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2507) [Rust] Don't take a reference when not needed

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2507.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1941
[https://github.com/apache/arrow/pull/1941]

> [Rust] Don't take a reference when not needed
> -
>
> Key: ARROW-2507
> URL: https://issues.apache.org/jira/browse/ARROW-2507
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2507) [Rust] Don't take a reference when not needed

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2507:
--

Assignee: Bruce Mitchener

> [Rust] Don't take a reference when not needed
> -
>
> Key: ARROW-2507
> URL: https://issues.apache.org/jira/browse/ARROW-2507
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2482) [Rust] support nested types

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2482.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1959
[https://github.com/apache/arrow/pull/1959]

> [Rust] support nested types
> ---
>
> Key: ARROW-2482
> URL: https://issues.apache.org/jira/browse/ARROW-2482
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Rust Array type doesn't seem to support nested types so far. We should 
> implement it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2525) [GLib] Add garrow_struct_array_flatten()

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2525.

Resolution: Fixed

Issue resolved by pull request 1962
[https://github.com/apache/arrow/pull/1962]

> [GLib] Add garrow_struct_array_flatten()
> 
>
> Key: ARROW-2525
> URL: https://issues.apache.org/jira/browse/ARROW-2525
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2527) [GLib] Enable GPU document

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2527.

Resolution: Fixed

Issue resolved by pull request 1964
[https://github.com/apache/arrow/pull/1964]

> [GLib] Enable GPU document
> --
>
> Key: ARROW-2527
> URL: https://issues.apache.org/jira/browse/ARROW-2527
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2474) [Rust] Add windows support for memory pool abstraction

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2474.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1955
[https://github.com/apache/arrow/pull/1955]

> [Rust] Add windows support for memory pool abstraction
> --
>
> Key: ARROW-2474
> URL: https://issues.apache.org/jira/browse/ARROW-2474
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2526) [GLib] Update .gitignore

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2526.

Resolution: Fixed

Issue resolved by pull request 1963
[https://github.com/apache/arrow/pull/1963]

> [GLib] Update .gitignore
> 
>
> Key: ARROW-2526
> URL: https://issues.apache.org/jira/browse/ARROW-2526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2302) [GLib] Run autotools and meson Linux builds in same Travis CI build entry

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2302.

Resolution: Fixed

Issue resolved by pull request 1967
[https://github.com/apache/arrow/pull/1967]

> [GLib] Run autotools and meson Linux builds in same Travis CI build entry
> -
>
> Key: ARROW-2302
> URL: https://issues.apache.org/jira/browse/ARROW-2302
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since our CI matrix is going to expand, and these builds are fast (< 5 
> minutes), I suggest we run these builds in the same job:
> https://travis-ci.org/apache/arrow/builds/352848066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2302) [GLib] Run autotools and meson Linux builds in same Travis CI build entry

2018-05-01 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2302:
--

Assignee: Kouhei Sutou

> [GLib] Run autotools and meson Linux builds in same Travis CI build entry
> -
>
> Key: ARROW-2302
> URL: https://issues.apache.org/jira/browse/ARROW-2302
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since our CI matrix is going to expand, and these builds are fast (< 5 
> minutes), I suggest we run these builds in the same job:
> https://travis-ci.org/apache/arrow/builds/352848066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2462) [C++] Segfault when writing a parquet table containing a dictionary column from Record Batch Stream

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2462.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1896
[https://github.com/apache/arrow/pull/1896]

> [C++] Segfault when writing a parquet table containing a dictionary column 
> from Record Batch Stream
> ---
>
> Key: ARROW-2462
> URL: https://issues.apache.org/jira/browse/ARROW-2462
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.1
>Reporter: Matt Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Discovered this through using pyarrow and dealing with RecordBatch Streams 
> and parquet. The issue can be replicated as follows:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create record batch with 1 dictionary column
> indices = pa.array([1,0,1,1,0])
> dictionary = pa.array(['Foo', 'Bar'])
> dict_array = pa.DictionaryArray.from_arrays(indices, dictionary)
> rb = pa.RecordBatch.from_arrays( [ dict_array ], [ 'd0' ] )
> # write out using RecordBatchStreamWriter
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, rb.schema)
> writer.write_batch(rb)
> writer.close()
> buf = sink.get_result()
> # read in and try to write parquet table
> reader = pa.open_stream(buf)
> tbl = reader.read_all()
> pq.write_table(tbl, 'dict_table.parquet') # SEGFAULTS
> {code}
> When writing record batch streams, if there are no nulls in an array, Arrow 
> will put a placeholder nullptr instead of putting the full bitmap of 1s, when 
> deserializing that stream, the bitmap for the nulls isn't populated and is 
> left to being a nullptr. When attempting to write this table via 
> pyarrow.parquet, you end up 
> [here|https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L963]
>   in the parquet writer code which attempts to Cast the dictionary to a 
> non-dictionary representation. Since the null count isn't checked before 
> creating a BitmapReader, the BitmapReader is constructed with a nullptr for 
> the bitmap_data, but a non-zero length which then segfaults in the 
> constructor 
> [here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit-util.h#L415]
>  because bitmap is null.
> So a simple check of the null count before constructing the BitmapReader 
> avoids the segfault.
> Already filed [PR 1896|https://github.com/apache/arrow/pull/1896]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2436) [Rust] Add windows CI

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2436.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1949
[https://github.com/apache/arrow/pull/1949]

> [Rust] Add windows CI
> -
>
> Key: ARROW-2436
> URL: https://issues.apache.org/jira/browse/ARROW-2436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2529) [C++] Update mention of clang-format to 5.0 in the docs

2018-05-01 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2529.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1972
[https://github.com/apache/arrow/pull/1972]

> [C++] Update mention of clang-format to 5.0 in the docs
> ---
>
> Key: ARROW-2529
> URL: https://issues.apache.org/jira/browse/ARROW-2529
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Alessandro Andrioni
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The C++ README.md talks about requiring clang-format 4.0, while the current 
> required version is 5.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2529) [C++] Update mention of clang-format to 5.0 in the docs

2018-05-01 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2529:
--
Labels: pull-request-available  (was: )

> [C++] Update mention of clang-format to 5.0 in the docs
> ---
>
> Key: ARROW-2529
> URL: https://issues.apache.org/jira/browse/ARROW-2529
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Alessandro Andrioni
>Priority: Minor
>  Labels: pull-request-available
>
> The C++ README.md talks about requiring clang-format 4.0, while the current 
> required version is 5.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2529) [C++] Update mention of clang-format to 5.0 in the docs

2018-05-01 Thread Alessandro Andrioni (JIRA)
Alessandro Andrioni created ARROW-2529:
--

 Summary: [C++] Update mention of clang-format to 5.0 in the docs
 Key: ARROW-2529
 URL: https://issues.apache.org/jira/browse/ARROW-2529
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Alessandro Andrioni


The C++ README.md talks about requiring clang-format 4.0, while the current 
required version is 5.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)