[jira] [Closed] (ARROW-4755) [Java] Flight tests should use randomized server ports

2019-03-03 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield closed ARROW-4755.
--
Resolution: Duplicate

 Fixed as part of ARROW-4754

> [Java] Flight tests should use randomized server ports
> --
>
> Key: ARROW-4755
> URL: https://issues.apache.org/jira/browse/ARROW-4755
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> Follow-up from ARROW-4754 which places a fixes in only one test.  We should 
> make a library to that creates a server on a randomize port, and apply it to 
> other flight tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4756) [CI] document the procedure to update docker image for manylinux1 builds

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4756:
--
Labels: pull-request-available  (was: )

> [CI] document the procedure to update docker image for manylinux1 builds
> 
>
> Key: ARROW-4756
> URL: https://issues.apache.org/jira/browse/ARROW-4756
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4756) [CI] document the procedure to update docker image for manylinux1 builds

2019-03-03 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4756:
-

 Summary: [CI] document the procedure to update docker image for 
manylinux1 builds
 Key: ARROW-4756
 URL: https://issues.apache.org/jira/browse/ARROW-4756
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Reporter: Pindikura Ravindra
Assignee: Pindikura Ravindra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4755) [Java] Flight tests should use randomized server ports

2019-03-03 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield reassigned ARROW-4755:
--

Assignee: Micah Kornfield

> [Java] Flight tests should use randomized server ports
> --
>
> Key: ARROW-4755
> URL: https://issues.apache.org/jira/browse/ARROW-4755
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>
> Follow-up from ARROW-4754 which places a fixes in only one test.  We should 
> make a library to that creates a server on a randomize port, and apply it to 
> other flight tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4755) [Java] Flight tests should use randomized server ports

2019-03-03 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4755:
--

 Summary: [Java] Flight tests should use randomized server ports
 Key: ARROW-4755
 URL: https://issues.apache.org/jira/browse/ARROW-4755
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: Micah Kornfield


Follow-up from ARROW-4754 which places a fixes in only one test.  We should 
make a library to that creates a server on a randomize port, and apply it to 
other flight tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4754) [CI][Java] Flaky TestAuth Flight test

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4754:
--
Labels: pull-request-available  (was: )

> [CI][Java] Flaky TestAuth Flight test
> -
>
> Key: ARROW-4754
> URL: https://issues.apache.org/jira/browse/ARROW-4754
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, FlightRPC, Java
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: pull-request-available
>
> org.apache.arrow.flight.auth.TestAuth
> [ERROR] invalidAuth(org.apache.arrow.flight.auth.TestAuth) Time elapsed: 
> 0.013 s <<< ERROR!
> java.io.IOException: Failed to bind
>  at org.apache.arrow.flight.auth.TestAuth.setup(TestAuth.java:108)
> Caused by: java.net.BindException: Address already in use



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4754) [CI][Java] Flaky TestAuth Flight test

2019-03-03 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4754:
--

 Summary: [CI][Java] Flaky TestAuth Flight test
 Key: ARROW-4754
 URL: https://issues.apache.org/jira/browse/ARROW-4754
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, FlightRPC, Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield


org.apache.arrow.flight.auth.TestAuth
[ERROR] invalidAuth(org.apache.arrow.flight.auth.TestAuth) Time elapsed: 0.013 
s <<< ERROR!
java.io.IOException: Failed to bind
 at org.apache.arrow.flight.auth.TestAuth.setup(TestAuth.java:108)
Caused by: java.net.BindException: Address already in use



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4749) [Rust] RecordBatch::new() should return result instead of panicking

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4749:
--
Labels: pull-request-available  (was: )

> [Rust] RecordBatch::new() should return result instead of panicking
> ---
>
> Key: ARROW-4749
> URL: https://issues.apache.org/jira/browse/ARROW-4749
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> RecordBatch::new() has some good validation checks, but calls assert_eq 
> instead of returning a Result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule

2019-03-03 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra reassigned ARROW-4301:
-

Assignee: Praveen Kumar Desabandu

> [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva 
> submodule
> ---
>
> Key: ARROW-4301
> URL: https://issues.apache.org/jira/browse/ARROW-4301
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva, Java
>Reporter: Wes McKinney
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See 
> https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550.
>  This is breaking the build so I'm going to patch manually



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2460) [Rust] Schema and DataType::Struct should use Vec>

2019-03-03 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove closed ARROW-2460.
-
Resolution: Invalid
  Assignee: Andy Grove

This is no longer valid

> [Rust] Schema and DataType::Struct should use Vec>
> 
>
> Key: ARROW-2460
> URL: https://issues.apache.org/jira/browse/ARROW-2460
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.13.0
>
>
> Currently we use Vec instead of Vec> which is resulting in 
> having to clone fields in some use cases, which could be expensive for 
> structs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3086) [Glib] GISCAN fails due to conda-shipped openblas

2019-03-03 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782968#comment-16782968
 ] 

Kouhei Sutou commented on ARROW-3086:
-

I think that we can close this.

> [Glib] GISCAN fails due to conda-shipped openblas
> -
>
> Key: ARROW-3086
> URL: https://issues.apache.org/jira/browse/ARROW-3086
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.10.0
>Reporter: Uwe L. Korn
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> With the changes in [https://github.com/apache/arrow/pull/2374], the 
> libraries provided by conda are now in the library path when running the 
> GISCAN step. This sadly leads to the poisoning of the search path with the 
> conda provided openblas which is incompatible with the system provided 
> libLAPACK.dylib
> {code:java}
> dyld: Library not loaded: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
> Referenced from: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib
> Reason: Incompatible library version: vecLib requires version 1.0.0 or later, 
> but libLAPACK.dylib provides version 0.0.0{code}
> While mentioned that it explicitly loads 
> {{/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib}},
>  it seems that {{liblapack.so}} from the conda installation gets picked up 
> first. This only provides the library symbols with version 0.0.0 and thus is 
> incompatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4541) [Gandiva] Enable timestamp tests on windows platform

2019-03-03 Thread Pindikura Ravindra (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782969#comment-16782969
 ] 

Pindikura Ravindra commented on ARROW-4541:
---

@wesm, can you please move this out of 0.13 ?

> [Gandiva] Enable timestamp tests on windows platform
> 
>
> Key: ARROW-4541
> URL: https://issues.apache.org/jira/browse/ARROW-4541
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: shyam narayan singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As the timezone database is not available on windows operating system, the 
> cast timestamp test cases that uses timezone apis are failing.
> Tests are currently disabled on windows platform. Need to find a way to test 
> the timezone apis on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4541) [Gandiva] Enable timestamp tests on windows platform

2019-03-03 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra reassigned ARROW-4541:
-

Assignee: shyam narayan singh

> [Gandiva] Enable timestamp tests on windows platform
> 
>
> Key: ARROW-4541
> URL: https://issues.apache.org/jira/browse/ARROW-4541
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: shyam narayan singh
>Assignee: shyam narayan singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As the timezone database is not available on windows operating system, the 
> cast timestamp test cases that uses timezone apis are failing.
> Tests are currently disabled on windows platform. Need to find a way to test 
> the timezone apis on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4745) [C++][Documentation] Document process for replicating static_crt builds on windows

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4745:
--
Labels: pull-request-available  (was: )

> [C++][Documentation] Document process for replicating static_crt builds on 
> windows
> --
>
> Key: ARROW-4745
> URL: https://issues.apache.org/jira/browse/ARROW-4745
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
>
> Based on collective wisdom of the mailing list. Give some step by step 
> instructions to getting things to build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4753) Support optionally, and as an extension, an encoding layout for text-optimized data structures

2019-03-03 Thread Edmon Begoli (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edmon Begoli updated ARROW-4753:

Description: 
Narrative (text), by default, is notoriously inefficient to store on the disk 
or in memory. It is, in the most basic form, a long sequence of bytes with no 
indexing or other optimized layout structure. 
  
 There are data structures such as [tries|https://en.wikipedia.org/wiki/Trie], 
[DAFSAs|https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton],
 or [b-tries|https://dl.acm.org/citation.cfm?id=1541552] that support more 
efficient storage and lookup of phrases. 
  
 We would like to enable arrow to serialize from/to these efficient structures 
as the format/carrier between high performance text processing steps which like 
to operate on binary data structures (lookups, spellers, or more advance NLP 
routines).
  
 so, it could be something like:
  
 *{color:#707070}_text.to_arrow(infer=true|dafsa|trie|b-trie) : arrow_{color}* 
{color:#14892c}// writes arrow as format for the specified encoding. This could 
be implicit if we could store encoding in some kind of manifest{color}
  
 *{color:#707070}_arrow.to_text(infer=true|dafsa|trie|b-trie) : string_{color}* 
{color:#14892c}// restores text from the arrow format, and from a specified 
encoding, same as above. {color}
  
 {color:#33}On the dev mailing list we are discussion creation of the 
contrib folder where such features could be optionally included for 
Arrow.{color}

  was:
Narrative (text), by default, is notoriously inefficient to store on the disk 
or in memory. It is, in the most basic form, a long sequence of bytes with no 
indexing or other optimized layout structure. 
 
There are data structures such as [tries|https://en.wikipedia.org/wiki/Trie], 
[DAFSAs|]https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton
 or [b-tries|https://dl.acm.org/citation.cfm?id=1541552] that support more 
efficient storage and lookup of phrases. 
 
We would like to enable arrow to serialize from/to these efficient structures 
as the format/carrier between high performance text processing steps which like 
to operate on binary data structures (lookups, spellers, or more advance NLP 
routines).
 
 
so, it could be something like:
 
*{color:#707070}_text.to_arrow(infer=true|dafsa|trie|b-trie) : arrow_{color}* 
{color:#14892c}// writes arrow as format for the specified encoding. This could 
be implicit if we could store encoding in some kind of manifest{color}
 
*{color:#707070}_arrow.to_text(infer=true|dafsa|trie|b-trie) : string_{color}* 
{color:#14892c}// restores text from the arrow format, and from a specified 
encoding, same as above. {color}
 
{color:#33}On the dev mailing list we are discussion creation of the 
contrib folder where such features could be optionally included for 
Arrow.{color}


> Support optionally, and as an extension, an encoding layout for 
> text-optimized data structures
> --
>
> Key: ARROW-4753
> URL: https://issues.apache.org/jira/browse/ARROW-4753
> Project: Apache Arrow
>  Issue Type: Wish
> Environment: C/C++
>Reporter: Edmon Begoli
>Priority: Minor
>  Labels: features
>
> Narrative (text), by default, is notoriously inefficient to store on the disk 
> or in memory. It is, in the most basic form, a long sequence of bytes with no 
> indexing or other optimized layout structure. 
>   
>  There are data structures such as 
> [tries|https://en.wikipedia.org/wiki/Trie], 
> [DAFSAs|https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton],
>  or [b-tries|https://dl.acm.org/citation.cfm?id=1541552] that support more 
> efficient storage and lookup of phrases. 
>   
>  We would like to enable arrow to serialize from/to these efficient 
> structures as the format/carrier between high performance text processing 
> steps which like to operate on binary data structures (lookups, spellers, or 
> more advance NLP routines).
>   
>  so, it could be something like:
>   
>  *{color:#707070}_text.to_arrow(infer=true|dafsa|trie|b-trie) : 
> arrow_{color}* {color:#14892c}// writes arrow as format for the specified 
> encoding. This could be implicit if we could store encoding in some kind of 
> manifest{color}
>   
>  *{color:#707070}_arrow.to_text(infer=true|dafsa|trie|b-trie) : 
> string_{color}* {color:#14892c}// restores text from the arrow format, and 
> from a specified encoding, same as above. {color}
>   
>  {color:#33}On the dev mailing list we are discussion creation of the 
> contrib folder where such features could be optionally included for 
> Arrow.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4749) [Rust] RecordBatch::new() should return result instead of panicking

2019-03-03 Thread Neville Dipale (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-4749:
-

Assignee: Neville Dipale

> [Rust] RecordBatch::new() should return result instead of panicking
> ---
>
> Key: ARROW-4749
> URL: https://issues.apache.org/jira/browse/ARROW-4749
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Major
> Fix For: 0.13.0
>
>
> RecordBatch::new() has some good validation checks, but calls assert_eq 
> instead of returning a Result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4739) [Rust] [DataFusion] It should be possible to share a logical plan between threads

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4739:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] It should be possible to share a logical plan between 
> threads
> -
>
> Key: ARROW-4739
> URL: https://issues.apache.org/jira/browse/ARROW-4739
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> I want to be able to compile sql to a logical plan and then share that plan 
> with other threads ( so I can run the same query in parallel on partitions of 
> my input relation).
>  
> A/C
>  * LogicalPlan uses Arc instead of Rc
>  * ExecutionContext has a create_logical_plan method
>  * ExecutionContext.sql() is refactored to call create_logical_plan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4541) [Gandiva] Enable timestamp tests on windows platform

2019-03-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782955#comment-16782955
 ] 

Wes McKinney commented on ARROW-4541:
-

[~pravindra] [~praveenbingo] any thoughts about this for 0.13? It is not 
essential if you do not have time

> [Gandiva] Enable timestamp tests on windows platform
> 
>
> Key: ARROW-4541
> URL: https://issues.apache.org/jira/browse/ARROW-4541
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: shyam narayan singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As the timezone database is not available on windows operating system, the 
> cast timestamp test cases that uses timezone apis are failing.
> Tests are currently disabled on windows platform. Need to find a way to test 
> the timezone apis on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4510) [Format] copy content from IPC.rst to new document.

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4510:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] copy content from IPC.rst to new document.
> ---
>
> Key: ARROW-4510
> URL: https://issues.apache.org/jira/browse/ARROW-4510
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4753) Support optionally, and as an extension, an encoding layout for text-optimized data structures

2019-03-03 Thread Edmon Begoli (JIRA)
Edmon Begoli created ARROW-4753:
---

 Summary: Support optionally, and as an extension, an encoding 
layout for text-optimized data structures
 Key: ARROW-4753
 URL: https://issues.apache.org/jira/browse/ARROW-4753
 Project: Apache Arrow
  Issue Type: Wish
 Environment: C/C++
Reporter: Edmon Begoli


Narrative (text), by default, is notoriously inefficient to store on the disk 
or in memory. It is, in the most basic form, a long sequence of bytes with no 
indexing or other optimized layout structure. 
 
There are data structures such as [tries|https://en.wikipedia.org/wiki/Trie], 
[DAFSAs|]https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton
 or [b-tries|https://dl.acm.org/citation.cfm?id=1541552] that support more 
efficient storage and lookup of phrases. 
 
We would like to enable arrow to serialize from/to these efficient structures 
as the format/carrier between high performance text processing steps which like 
to operate on binary data structures (lookups, spellers, or more advance NLP 
routines).
 
 
so, it could be something like:
 
*{color:#707070}_text.to_arrow(infer=true|dafsa|trie|b-trie) : arrow_{color}* 
{color:#14892c}// writes arrow as format for the specified encoding. This could 
be implicit if we could store encoding in some kind of manifest{color}
 
*{color:#707070}_arrow.to_text(infer=true|dafsa|trie|b-trie) : string_{color}* 
{color:#14892c}// restores text from the arrow format, and from a specified 
encoding, same as above. {color}
 
{color:#33}On the dev mailing list we are discussion creation of the 
contrib folder where such features could be optionally included for 
Arrow.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4609) [C++] Use google benchmark from toolchain

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4609:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Use google benchmark from toolchain
> -
>
> Key: ARROW-4609
> URL: https://issues.apache.org/jira/browse/ARROW-4609
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4283:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
> 
>
> Key: ARROW-4283
> URL: https://issues.apache.org/jira/browse/ARROW-4283
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Paul Taylor
>Priority: Minor
> Fix For: 0.14.0
>
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4201) [C++][Gandiva] integrate test utils with arrow

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4201:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++][Gandiva] integrate test utils with arrow
> --
>
> Key: ARROW-4201
> URL: https://issues.apache.org/jira/browse/ARROW-4201
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
> Fix For: 0.14.0
>
>
> The following tasks to be addressed as part of this Jira :
>  # move (or consolidate) data generators in generate_data.h to arrow
>  # move convenience fns in gandiva/tests/test_util.h to arrow
>  # move (or consolidate) EXPECT_ARROW_* fns to arrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4514) [C++] Merge parquet/util/test-common.h into common arrow/testing directory

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4514:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Merge parquet/util/test-common.h into common arrow/testing directory
> --
>
> Key: ARROW-4514
> URL: https://issues.apache.org/jira/browse/ARROW-4514
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Many of these functions exist already or in slightly modified form elsewhere



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3896) [MATLAB] Decouple MATLAB-Arrow conversion logic from Feather file specific logic

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3896:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [MATLAB] Decouple MATLAB-Arrow conversion logic from Feather file specific 
> logic
> 
>
> Key: ARROW-3896
> URL: https://issues.apache.org/jira/browse/ARROW-3896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: MATLAB
>Reporter: Kevin Gurney
>Assignee: Kevin Gurney
>Priority: Major
> Fix For: 0.14.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, the logic for converting between a MATLAB mxArray and various 
> Arrow data structures (arrow::Table, arrow::Array, etc.) is tightly coupled 
> and fairly tangled up with the logic specific to handling Feather files. It 
> would be helpful to factor out these conversions into a more generic 
> "mlarrow" conversion layer component so that it can be reused in the future 
> for use cases other than Feather support. Furthermore, this would be helpful 
> to enforce a cleaner separation of concerns.
> It would be nice to start off with this refactoring work up front before 
> adding support for more datatypes to the MATLAB featherread/featherwrite 
> functions, so that we can start off with a clean base upon which to expand 
> moving forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3332) [Gandiva] Remove usages of mutable reference out arguments

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3332:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Gandiva] Remove usages of mutable reference out arguments
> --
>
> Key: ARROW-3332
> URL: https://issues.apache.org/jira/browse/ARROW-3332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I have noticed several usages of mutable reference out arguments, e.g. 
> gandiva/regex_util.h. We should change these to conform to the style guide 
> (out arguments as pointers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4507) [Format] Create outline and introduction for new document.

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4507:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Create outline and introduction for new document.
> --
>
> Key: ARROW-4507
> URL: https://issues.apache.org/jira/browse/ARROW-4507
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>
> This will ensure the document has a good flow, other subtasks on the parent 
> will handle moving content from each of the documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4508) [Format] Copy content from Layout.rst to new document.

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4508:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Copy content from Layout.rst to new document.
> --
>
> Key: ARROW-4508
> URL: https://issues.apache.org/jira/browse/ARROW-4508
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4509) [Format] Copy content from Metadata.rst to new document.

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4509:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Copy content from Metadata.rst to new document.
> 
>
> Key: ARROW-4509
> URL: https://issues.apache.org/jira/browse/ARROW-4509
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4511) [Format] remove individual documents in favor of new document once all content is moved

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4511:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] remove individual documents in favor of new document once all 
> content is moved
> ---
>
> Key: ARROW-4511
> URL: https://issues.apache.org/jira/browse/ARROW-4511
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>
> We might want to leave the documents in place and provide links to the new 
> consolidated document in case others are linking to published content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4398) [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and write)

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4398:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and 
> write)
> 
>
> Key: ARROW-4398
> URL: https://issues.apache.org/jira/browse/ARROW-4398
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This is follow-on work to PARQUET-1508, so we can monitor the performance of 
> this operation over time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4356) [CI] Add integration (docker) test for turbodbc

2019-03-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782951#comment-16782951
 ] 

Wes McKinney commented on ARROW-4356:
-

Do you think you can do this for 0.13?

> [CI] Add integration (docker) test for turbodbc
> ---
>
> Key: ARROW-4356
> URL: https://issues.apache.org/jira/browse/ARROW-4356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> We regularly break our API so that {{turbodbc}} needs to make minor changes 
> to support the new Arrow version. We should setup a small integration test to 
> check before a release that {{turbodbc}} can easily upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4409) [C++] Enable arrow::ipc internal JSON reader to read from a file path

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4409:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Enable arrow::ipc internal JSON reader to read from a file path
> -
>
> Key: ARROW-4409
> URL: https://issues.apache.org/jira/browse/ARROW-4409
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 0.14.0
>
>
> This may make tests easier to write. Currently an input buffer is required, 
> so reading from a file requires some boilerplate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule

2019-03-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782950#comment-16782950
 ] 

Wes McKinney commented on ARROW-4301:
-

[~pravindra] can you look into this? This was an issue with the 0.12 release, 
see the linked PR

> [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva 
> submodule
> ---
>
> Key: ARROW-4301
> URL: https://issues.apache.org/jira/browse/ARROW-4301
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva, Java
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See 
> https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550.
>  This is breaking the build so I'm going to patch manually



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4259) [Plasma] CI failure in test_plasma_tf_op

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4259:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Plasma] CI failure in test_plasma_tf_op
> 
>
> Key: ARROW-4259
> URL: https://issues.apache.org/jira/browse/ARROW-4259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: ci-failure
> Fix For: 0.14.0
>
>
> Recently-appeared failure on master:
> https://travis-ci.org/apache/arrow/jobs/479378188#L7108



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4292) [Release] Add script to test release verification script against master branch

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4292:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Release] Add script to test release verification script against master branch
> --
>
> Key: ARROW-4292
> URL: https://issues.apache.org/jira/browse/ARROW-4292
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This should enable us to find problems with the verification script well 
> before releases happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2237) [Python] [Plasma] Huge pages test failure

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2237:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] [Plasma] Huge pages test failure
> -
>
> Key: ARROW-2237
> URL: https://issues.apache.org/jira/browse/ARROW-2237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> This is a new failure here (Ubuntu 16.04, x86-64):
> {code}
> _ test_use_huge_pages 
> _
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, 
> in test_use_huge_pages
> create_object(plasma_client, 1)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
> create_object
> seal=seal)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
> create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
>   File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
> code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
> /home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
> , buffer)
> Encountered unexpected EOF
>  Captured stderr call 
> -
> Allowing the Plasma store to use up to 0.1GB of memory.
> Starting object store with directory /mnt/hugepages and huge page support 
> enabled
> create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4242) [C++] Seg fault when running unit tests on fresh Arch Linux install

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4242:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Seg fault when running unit tests on fresh Arch Linux install
> ---
>
> Key: ARROW-4242
> URL: https://issues.apache.org/jira/browse/ARROW-4242
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Arch Linux x86-64
>Reporter: Michael Vilim
>Assignee: Uwe L. Korn
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: Dockerfile
>
>
> First, let me say I appreciate all the work that has been put into this 
> project. I have been following it with great interest and recently decided to 
> include it in one of my projects.
> However, I have run into an issue with a segmentation fault when trying to 
> run the C++ unit tests (which previously worked for me). This issue appears 
> to be something specific to my system (I am running Arch Linux). I can 
> reproduce the issue with the minimal Docker install of Arch:
> {noformat}
> FROM archimg/base
> RUN \
>  pacman -Sy --noconfirm git cmake gcc make boost autoconf python; \
>  git clone --recursive https://github.com/apache/arrow.git; \
>  cd arrow/cpp; \
>  mkdir build; \
>  cd build; \
>  cmake -DARROW_BUILD_TESTS=ON ..; \
>  make
> {noformat}
> If you create a Dockerfile with those contents and then run
> {noformat}
> docker build -t mvilim/arch-arrow-test-segfault .
> docker run mvilim/arch-arrow-test-segfault /bin/bash -c "cd /arrow/cpp/build; 
> make unittest; gcc --version; cmake --version"{noformat}
> you should be able to reproduce the issue:
> {noformat}
> The following tests FAILED:
>  2 - arrow-array-test (Failed)
>  3 - arrow-buffer-test (Failed)
>  8 - arrow-stl-test (Failed)
>  9 - arrow-type-test (Failed)
>  10 - arrow-table-test (Failed)
>  15 - arrow-compute-boolean-test (Failed)
>  16 - arrow-compute-cast-test (Failed)
>  17 - arrow-compute-hash-test (Failed)
>  18 - arrow-feather-test (Failed)
>  19 - arrow-ipc-read-write-test (Failed)
>  20 - arrow-ipc-json-simple-test (Failed)
>  21 - arrow-ipc-json-test (Failed)
>  24 - arrow-csv-column-builder-test (Failed)
>  28 - arrow-io-compressed-test (Failed)
>  31 - arrow-io-memory-test (Failed){noformat}
> If you run the container interactively and inspect the logs, you will see 
> that all the failures are caused by seg faults.
> I used git bisect to narrow the problem down to commit 7cdab9b06 when the 
> tests were switched to use shared linking by default. Static linking works 
> fine for me (using -DARROW_TEST_LINKAGE=static).
> I also compiled with Clang and -DARROW_USE_ASAN=ON and inspected several of 
> the stack traces. It looks like all the seg faults happen during creation of 
> a shared pointer, but in varied places.
> On creation of Int32Type:
> {noformat}
> ./debug/arrow-array-test
>  AddressSanitizer:DEADLYSIGNAL
>  =
>  ==29563==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
> 0x558afa951b70 bp 0x7ffcf1c34880 sp 0x7ffcf1c34810 T0)
>  ==29563==The signal is caused by a READ memory access.
>  ==29563==Hint: address points to the zero page.
>  #0 0x558afa951b6f in std::type_info::operator==(std::type_info const&) const 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/typeinfo:123:12
>  #1 0x558afa9b2c16 in std::Sp_counted_ptr_inplace std::allocator, 
> (_gnu_cxx::_Lock_policy)2>::_M_get_deleter(std::type_info const&) 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr_base.h:573:16
>  #2 0x7fe7b711e6f1 in 
> std::_shared_count<(_gnu_cxx::_Lock_policy)2>::_M_get_deleter(std::type_info 
> const&) const 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr_base.h:751:31
>  #3 0x7fe7b73ec240 in std::_shared_ptr (gnu_cxx::_Lock_policy)2>::_shared_ptr 
> >(std::_Sp_make_shared_tag, std::allocator const&) 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr_base.h:1328:28
>  #4 0x7fe7b73ec1c7 in 
> std::shared_ptr::shared_ptr
>  >(std::_Sp_make_shared_tag, std::allocator const&) 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr.h:360:4
>  #5 0x7fe7b73ec14b in std::shared_ptr 
> std::allocate_shared 
> >(std::allocator const&) 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr.h:706:14
>  #6 0x7fe7b73cd323 in std::shared_ptr 
> std::make_shared() 
> /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/8.2.1/../../../../include/c++/8.2.1/bits/shared_ptr.h:722:14
>  #7 0x7fe7b73c3f2f in arrow::int32() 
> 

[jira] [Closed] (ARROW-4108) [Python/Java] Spark integration tests do not work

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4108.
---
Resolution: Duplicate

> [Python/Java] Spark integration tests do not work
> -
>
> Key: ARROW-4108
> URL: https://issues.apache.org/jira/browse/ARROW-4108
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.12.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Because some commands in spark_integration.sh fail Spark integration test on 
> Docker container does not work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3166) [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3166:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp
> 
>
> Key: ARROW-3166
> URL: https://issues.apache.org/jira/browse/ARROW-3166
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> With the codebase consolidation, we have the opportunity to remove cruft from 
> the Parquet codebase. I believe it would be simpler and better for the 
> ecosystem to use the Arrow IO interface classes rather than maintaining 
> separate vitual IO interfaces exported from the {{parquet::}} namespace



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3086) [Glib] GISCAN fails due to conda-shipped openblas

2019-03-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782947#comment-16782947
 ] 

Wes McKinney commented on ARROW-3086:
-

Is this still an issue?

> [Glib] GISCAN fails due to conda-shipped openblas
> -
>
> Key: ARROW-3086
> URL: https://issues.apache.org/jira/browse/ARROW-3086
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.10.0
>Reporter: Uwe L. Korn
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> With the changes in [https://github.com/apache/arrow/pull/2374], the 
> libraries provided by conda are now in the library path when running the 
> GISCAN step. This sadly leads to the poisoning of the search path with the 
> conda provided openblas which is incompatible with the system provided 
> libLAPACK.dylib
> {code:java}
> dyld: Library not loaded: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
> Referenced from: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib
> Reason: Incompatible library version: vecLib requires version 1.0.0 or later, 
> but libLAPACK.dylib provides version 0.0.0{code}
> While mentioned that it explicitly loads 
> {{/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib}},
>  it seems that {{liblapack.so}} from the conda installation gets picked up 
> first. This only provides the library symbols with version 0.0.0 and thus is 
> incompatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3897) [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3897:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file
> ---
>
> Key: ARROW-3897
> URL: https://issues.apache.org/jira/browse/ARROW-3897
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: MATLAB
>Reporter: Rylan Dmello
>Assignee: Kevin Gurney
>Priority: Major
> Fix For: 0.14.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently the MATLAB - Feather interface supports reading numeric datatypes 
> (double, single, uint* and int*) from a Feather file. We should also add 
> support for writing these numeric datatypes to a Feather file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3571:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Wiki] Release management guide does not explain how to set up Crossbow or 
> where to find instructions
> -
>
> Key: ARROW-3571
> URL: https://issues.apache.org/jira/browse/ARROW-3571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Wiki
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> If you follow the guide, at one point it says "Launch a Crossbow build" but 
> provides no link to the setup instructions for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3579) [Crossbow] Unintuitive error message when remote branch has not been pushed

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3579:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Crossbow] Unintuitive error message when remote branch has not been pushed
> ---
>
> Key: ARROW-3579
> URL: https://issues.apache.org/jira/browse/ARROW-3579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> {code}
> $ python dev/tasks/crossbow.py submit -g linux --arrow-version 0.11.1-rc0
> Traceback (most recent call last):
>   File "dev/tasks/crossbow.py", line 796, in 
> crossbow(obj={}, auto_envvar_prefix='CROSSBOW')
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py",
>  line 764, in __call__
> return self.main(*args, **kwargs)
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py",
>  line 717, in main
> rv = self.invoke(ctx)
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py",
>  line 1137, in invoke
> return _process_result(sub_ctx.command.invoke(sub_ctx))
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py",
>  line 956, in invoke
> return ctx.invoke(self.callback, **ctx.params)
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py",
>  line 555, in invoke
> return callback(*args, **kwargs)
>   File 
> "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/decorators.py",
>  line 17, in new_func
> return f(get_current_context(), *args, **kwargs)
>   File "dev/tasks/crossbow.py", line 596, in submit
> target = Target.from_repo(arrow)
>   File "dev/tasks/crossbow.py", line 407, in from_repo
> remote=repo.remote_url,
>   File "dev/tasks/crossbow.py", line 235, in remote_url
> return self.remote.url.replace(
>   File "dev/tasks/crossbow.py", line 225, in remote
> return self.repo.remotes[self.branch.upstream.remote_name]
> AttributeError: 'NoneType' object has no attribute 'remote_name'
> {code}
> The fix was to make sure the local branch and the reference branch for the 
> build in my fork wesm/arrow was the same



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3344) [Python] test_plasma.py fails (in test_plasma_list)

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3344:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] test_plasma.py fails (in test_plasma_list)
> ---
>
> Key: ARROW-3344
> URL: https://issues.apache.org/jira/browse/ARROW-3344
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, Python
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I routinely get the following failure in {{test_plasma.py}}:
> {code}
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 825, 
> in test_plasma_list
> assert l3[v]["ref_count"] == 1
> AssertionError: assert 0 == 1
>  Captured stderr call 
> -
> ../src/plasma/store.cc:926: Allowing the Plasma store to use up to 0.1GB of 
> memory.
> ../src/plasma/store.cc:956: Starting object store with directory /dev/shm and 
> huge page support disabled
> {code}
> I'm not sure whether there's something wrong in my setup (on Ubuntu 18.04, 
> x86-64), or it's a genuine bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3333) [Gandiva] Use non-platform specific integer types for lengths, indexes

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Gandiva] Use non-platform specific integer types for lengths, indexes
> --
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> There are many instances of using {{unsigned int}} and {{int}} for array 
> indexes. This may cause issues on Windows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3277) [Python] Validate manylinux1 builds with crossbow instead of each Travis CI build

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3277:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Validate manylinux1 builds with crossbow instead of each Travis CI 
> build
> -
>
> Key: ARROW-3277
> URL: https://issues.apache.org/jira/browse/ARROW-3277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> The recent manylinxu1 timeouts bring up a bigger question which is 
> centralizing the validation of packaging builds. We definitely want the 
> project to be notified in a timely way when there is some problem with a 
> packaging build -- since manylinux1 can be run locally in Docker, it is 
> easier to debug and need not necessarily be run on every commit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3200) [C++] Add support for reading Flight streams with dictionaries

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3200:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Add support for reading Flight streams with dictionaries
> --
>
> Key: ARROW-3200
> URL: https://issues.apache.org/jira/browse/ARROW-3200
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.14.0
>
>
> Some work is needed to handle schemas sent separately from their 
> dictionaries, i.e. ARROW-3144. I'm going to punt on implementing support for 
> this in the initial C++ Flight client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3221) [C++][Python] Add a virtual Slice method to buffers

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3221:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++][Python] Add a virtual Slice method to buffers
> ---
>
> Key: ARROW-3221
> URL: https://issues.apache.org/jira/browse/ARROW-3221
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Pearu Peterson
>Priority: Major
> Fix For: 0.14.0
>
>
> See
> https://github.com/apache/arrow/pull/2536#discussion_r216383211



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3086) [Glib] GISCAN fails due to conda-shipped openblas

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3086:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Glib] GISCAN fails due to conda-shipped openblas
> -
>
> Key: ARROW-3086
> URL: https://issues.apache.org/jira/browse/ARROW-3086
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.10.0
>Reporter: Uwe L. Korn
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> With the changes in [https://github.com/apache/arrow/pull/2374], the 
> libraries provided by conda are now in the library path when running the 
> GISCAN step. This sadly leads to the poisoning of the search path with the 
> conda provided openblas which is incompatible with the system provided 
> libLAPACK.dylib
> {code:java}
> dyld: Library not loaded: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
> Referenced from: 
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib
> Reason: Incompatible library version: vecLib requires version 1.0.0 or later, 
> but libLAPACK.dylib provides version 0.0.0{code}
> While mentioned that it explicitly loads 
> {{/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib}},
>  it seems that {{liblapack.so}} from the conda installation gets picked up 
> first. This only provides the library symbols with version 0.0.0 and thus is 
> incompatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2910) [Packaging] Build from official apache archive

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2910:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Packaging] Build from official apache archive
> --
>
> Key: ARROW-2910
> URL: https://issues.apache.org/jira/browse/ARROW-2910
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3144) [C++] Better solution for cases where dictionaries are unknown at schema reconstruction time, or for delta dictionaries

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3144:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Better solution for cases where dictionaries are unknown at schema 
> reconstruction time, or for delta dictionaries
> ---
>
> Key: ARROW-3144
> URL: https://issues.apache.org/jira/browse/ARROW-3144
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> There are a couple of inter-related issues:
> * Cases where a system might send the schema without the dictionaries, and 
> the user wishes to reason about the schema and its types without knowing the 
> dictionary values
> * Dictionaries that are changing, e.g. using delta dictionary messages
> {{arrow::DictionaryType}} has no "linkage" to any external object. I propose 
> adding a "LinkedDictionaryType" or something similar (purely a C++ 
> construct), which functionally would be a subclass of {{DictionaryType}}, 
> which would allow a type to be created which will obtain its dictionary later 
> through some kind of "Dictionary provider" interface. There is something 
> similar in Java already. This would allow a dictionary to evolve via delta 
> dictionaries, or for a dictionary to be retrieved later e.g. through an RPC 
> or IPC layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3144) [C++] Better solution for cases where dictionaries are unknown at schema reconstruction time, or for delta dictionaries

2019-03-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782948#comment-16782948
 ] 

Wes McKinney commented on ARROW-3144:
-

I suggest we handle dictionaries in Flight in the 0.14 release cycle

> [C++] Better solution for cases where dictionaries are unknown at schema 
> reconstruction time, or for delta dictionaries
> ---
>
> Key: ARROW-3144
> URL: https://issues.apache.org/jira/browse/ARROW-3144
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> There are a couple of inter-related issues:
> * Cases where a system might send the schema without the dictionaries, and 
> the user wishes to reason about the schema and its types without knowing the 
> dictionary values
> * Dictionaries that are changing, e.g. using delta dictionary messages
> {{arrow::DictionaryType}} has no "linkage" to any external object. I propose 
> adding a "LinkedDictionaryType" or something similar (purely a C++ 
> construct), which functionally would be a subclass of {{DictionaryType}}, 
> which would allow a type to be created which will obtain its dictionary later 
> through some kind of "Dictionary provider" interface. There is something 
> similar in Java already. This would allow a dictionary to evolve via delta 
> dictionaries, or for a dictionary to be retrieved later e.g. through an RPC 
> or IPC layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2858) [Packaging] Add unit tests for crossbow

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2858:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Packaging] Add unit tests for crossbow
> ---
>
> Key: ARROW-2858
> URL: https://issues.apache.org/jira/browse/ARROW-2858
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Phillip Cloud
>Priority: Major
> Fix For: 0.14.0
>
>
> As this code grows we should start adding unit tests to make sure we can make 
> changes safely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2905) [C++] Investigate if the *_data_ pointers used in Builder classes improve performance on hot paths

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2905:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Investigate if the *_data_ pointers used in Builder classes improve 
> performance on hot paths
> --
>
> Key: ARROW-2905
> URL: https://issues.apache.org/jira/browse/ARROW-2905
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> See [~alendit] comment in 
> https://github.com/apache/arrow/pull/2315#discussion_r204668176



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2652:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++/Python] Document how to provide information on segfaults
> -
>
> Key: ARROW-2652
> URL: https://issues.apache.org/jira/browse/ARROW-2652
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We often have users that report segmentation faults in {{pyarrow}}. This will 
> sadly keep reappearing as we also don't have the magical ability of writing 
> 100%-bug-free code. Thus we should have a small section in our documentation 
> on how people can give us the relevant information in the case of a 
> segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. 
> They both have similar commands but differ in some minor flags.
> For one of the example comments I gave to a user in tickets see 
> https://github.com/apache/arrow/issues/2089#issuecomment-393477116



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2671) [Python] Run ASV suite in nightly build, only run in Travis CI on demand

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2671:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Run ASV suite in nightly build, only run in Travis CI on demand
> 
>
> Key: ARROW-2671
> URL: https://issues.apache.org/jira/browse/ARROW-2671
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.14.0
>
>
> Lately the main Travis CI build is running nearly 40 minutes long, e.g. here 
> is the latest commit on master
> https://travis-ci.org/apache/arrow/builds/387326546
> A fair chunk of the long runtime is spent running the Python benchmarks at 
> the end of the test suite. We should absolutely keep these running smoothly. 
> However:
> * It may be just as valuable to run them on master nightly, and report in if 
> they are broken
> * We could add a check to look at the commit message and run them in Travis 
> CI if requested
> If others agree, I suggest that as soon as the packaging bot / nightly build 
> tool is working properly, that we make these changes in the interest of 
> improving CI build times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1558) [C++] Implement boolean selection kernels

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1558:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement boolean selection kernels
> -
>
> Key: ARROW-1558
> URL: https://issues.apache.org/jira/browse/ARROW-1558
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Select values where a boolean selection array is true. As a default, if any 
> values in the selection are null, then values in the output array should be 
> null. 
> The null behaviour does not need to be toggable, if the user wants to select 
> nothing in the case of null, then it is necessary to call 
> selection_array.fillna(false) first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1382) [Python] Deduplicate non-scalar Python objects when using pyarrow.serialize

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1382:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Deduplicate non-scalar Python objects when using pyarrow.serialize
> ---
>
> Key: ARROW-1382
> URL: https://issues.apache.org/jira/browse/ARROW-1382
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> If a Python object appears multiple times within a list/tuple/dictionary, 
> then when pyarrow serializes the object, it will duplicate the object many 
> times. This leads to a potentially huge expansion in the size of the object 
> (e.g., the serialized version of {{100 * [np.zeros(10 ** 6)]}} will be 100 
> times bigger than it needs to be).
> {code}
> import pyarrow as pa
> l = [0]
> original_object = [l, l]
> # Serialize and deserialize the object.
> buf = pa.serialize(original_object).to_buffer()
> new_object = pa.deserialize(buf)
> # This works.
> assert original_object[0] is original_object[1]
> # This fails.
> assert new_object[0] is new_object[1]
> {code}
> One potential way to address this is to use the Arrow dictionary encoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2041) [Python] pyarrow.serialize has high overhead for list of NumPy arrays

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2041:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] pyarrow.serialize has high overhead for list of NumPy arrays
> -
>
> Key: ARROW-2041
> URL: https://issues.apache.org/jira/browse/ARROW-2041
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Richard Shin
>Priority: Major
> Fix For: 0.14.0
>
>
> {{Python 2.7.12 (default, Nov 20 2017, 18:23:56)}}
> {{[GCC 5.4.0 20160609] on linux2}}
> {{Type "help", "copyright", "credits" or "license" for more information.}}
> {{>>> import pyarrow as pa, numpy as np}}
> {{>>> arrays = [np.arange(100, dtype=np.int32) for _ in range(1)]}}
> {{>>> with open('test.pyarrow', 'w') as f:}}
> {{... f.write(pa.serialize(arrays).to_buffer().to_pybytes())}}
> {{...}}
> {{>>> import cPickle as pickle}}
> {{>>> pickle.dump(arrays, open('test.pkl', 'w'), pickle.HIGHEST_PROTOCOL)}}
> test.pyarrow is 6.2 MB, while test.pkl is only 4.2 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1789) [Format] Consolidate specification documents and improve clarity for new implementation authors

2019-03-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1789:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Consolidate specification documents and improve clarity for new 
> implementation authors
> ---
>
> Key: ARROW-1789
> URL: https://issues.apache.org/jira/browse/ARROW-1789
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 0.14.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1296
> I believe the specification documents Layout.md, Metadata.md, and IPC.md 
> would benefit from being consolidated into a single Markdown document that 
> would be sufficient (along with the Flatbuffers schemas) to create a complete 
> Arrow implementation capable of reading and writing the binary format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4679) [Rust] [DataFusion] Implement in-memory DataSource

2019-03-03 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-4679.
---
Resolution: Fixed

Issue resolved by pull request 3754
[https://github.com/apache/arrow/pull/3754]

> [Rust] [DataFusion] Implement in-memory DataSource
> --
>
> Key: ARROW-4679
> URL: https://issues.apache.org/jira/browse/ARROW-4679
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Implement a new in-memory data source so that DataFusion can execute queries 
> against data that is already loaded into memory.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4590) [Rust] Add explicit SIMD vectorization for comparison ops in "array_ops"

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4590:
--
Labels: pull-request-available  (was: )

> [Rust] Add explicit SIMD vectorization for comparison ops in "array_ops"
> 
>
> Key: ARROW-4590
> URL: https://issues.apache.org/jira/browse/ARROW-4590
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3882) [Rust] PrimitiveArray should support cast operations

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3882:
--
Labels: pull-request-available  (was: )

> [Rust] PrimitiveArray should support cast operations
> ---
>
> Key: ARROW-3882
> URL: https://issues.apache.org/jira/browse/ARROW-3882
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> It should be possible to cast PrimitiveArray to PrimitiveArray as 
> one example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4752) [Rust] Add explicit SIMD vectorization for the divide kernel

2019-03-03 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4752:
--

 Summary: [Rust] Add explicit SIMD vectorization for the divide 
kernel
 Key: ARROW-4752
 URL: https://issues.apache.org/jira/browse/ARROW-4752
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3476) [Java] mvn test in memory fails on a big-endian platform

2019-03-03 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782925#comment-16782925
 ] 

Kazuaki Ishizaki commented on ARROW-3476:
-

Not yet done since I am busy recently.
At first, I will enable Jenkins on the big-endian environment.

> [Java] mvn test in memory fails on a big-endian platform
> 
>
> Key: ARROW-3476
> URL: https://issues.apache.org/jira/browse/ARROW-3476
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Apache Arrow is becoming commonplace to exchange data among important 
> emerging analytics frameworks such as Pandas, Numpy, and Spark.
> [IBM Z|https://en.wikipedia.org/wiki/IBM_Z] is one of platforms to process 
> critical transactions such as bank or credit card. Users of IBM Z want to 
> extract insights from these transactions using the emerging analytics systems 
> on IBM Z Linux. These analytics pipelines can be also fast and effective on 
> IBM Z Linux by using Apache Arrow on memory.
> From the technical perspective, since IBM Z Linux uses big-endian data 
> format, it is not possible to use Apache Arrow in this pipeline. If Apache 
> Arrow could support big-endian, the use case would be expanded.
> When I ran test case of Apache arrow on a big-endian platform (ppc64be), 
> {{mvn test}} in memory causes a failure due to an assertion.
> In {{TestEndianess.testLittleEndian}} test suite, the assertion occurs during 
> an allocation of a {{RootAllocator}} class.
> {code}
> $ uname -a
> Linux ppc64be.novalocal 4.5.7-300.fc24.ppc64 #1 SMP Fri Jun 10 20:29:32 UTC 
> 2016 ppc64 ppc64 ppc64 GNU/Linux
> $ arch  
> ppc64
> $ cd java/memory
> $ mvn test
> [INFO] Scanning for projects...
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Arrow Memory 0.12.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> ...
> [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.082 
> s - in org.apache.arrow.memory.TestAccountant
> [INFO] Running org.apache.arrow.memory.TestLowCostIdentityHashMap
> [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 
> s - in org.apache.arrow.memory.TestLowCostIdentityHashMap
> [INFO] Running org.apache.arrow.memory.TestBaseAllocator
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.746 
> s <<< FAILURE! - in org.apache.arrow.memory.TestEndianess
> [ERROR] testLittleEndian(org.apache.arrow.memory.TestEndianess)  Time 
> elapsed: 0.313 s  <<< ERROR!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.arrow.memory.TestEndianess.testLittleEndian(TestEndianess.java:31)
> Caused by: java.lang.IllegalStateException: Arrow only runs on LittleEndian 
> systems.
>   at 
> org.apache.arrow.memory.TestEndianess.testLittleEndian(TestEndianess.java:31)
> [ERROR] Tests run: 22, Failures: 0, Errors: 21, Skipped: 1, Time elapsed: 
> 0.055 s <<< FAILURE! - in org.apache.arrow.memory.TestBaseAllocator
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4751) [C++] Add pkg-config to conda_env_cpp.yml

2019-03-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4751:
--

 Summary: [C++] Add pkg-config to conda_env_cpp.yml
 Key: ARROW-4751
 URL: https://issues.apache.org/jira/browse/ARROW-4751
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Uwe L. Korn
 Fix For: 0.13.0


Once the CMake refactor has been merged, we should add {{pkg-config}} to the 
dependencies as it should be also available for Windows now: 
https://github.com/conda-forge/pkg-config-feedstock/pull/27 This will simplify 
some packaging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4750) [C++] RapidJSON triggers Wclass-memaccess on GCC 8+

2019-03-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4750:
--

 Summary: [C++] RapidJSON triggers Wclass-memaccess on GCC 8+
 Key: ARROW-4750
 URL: https://issues.apache.org/jira/browse/ARROW-4750
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


With GCC 8, we get build errors with RapidJSON due to {{-Wclass-memaccess}}. 
This is fixed in https://github.com/Tencent/rapidjson/pull/1323 . We should 
update our external project dependency to include this fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3107) [C++] arrow::PrettyPrint for Column instances

2019-03-03 Thread Benjamin Kietzman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kietzman resolved ARROW-3107.
--
Resolution: Fixed

> [C++] arrow::PrettyPrint for Column instances
> -
>
> Key: ARROW-3107
> URL: https://issues.apache.org/jira/browse/ARROW-3107
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.13.0
>
>
> Currently, we support {{arrow::ChunkedArray}} instances in {{PrettyPrint}}. 
> We should also support columns. The main addition will be here that will also 
> print the specified field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3107) [C++] arrow::PrettyPrint for Column instances

2019-03-03 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782811#comment-16782811
 ] 

Benjamin Kietzman commented on ARROW-3107:
--

This was resolved in https://github.com/apache/arrow/pull/2857

> [C++] arrow::PrettyPrint for Column instances
> -
>
> Key: ARROW-3107
> URL: https://issues.apache.org/jira/browse/ARROW-3107
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.13.0
>
>
> Currently, we support {{arrow::ChunkedArray}} instances in {{PrettyPrint}}. 
> We should also support columns. The main addition will be here that will also 
> print the specified field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4737) [C#] tests are not running in CI

2019-03-03 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4737.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3788
[https://github.com/apache/arrow/pull/3788]

> [C#] tests are not running in CI
> 
>
> Key: ARROW-4737
> URL: https://issues.apache.org/jira/browse/ARROW-4737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Eric Erhardt
>Assignee: Eric Erhardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>   Original Estimate: 4h
>  Time Spent: 40m
>  Remaining Estimate: 3h 20m
>
>  The C# tests are not running in CI because the filtering logic needs to be 
> updated.
> For example see 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22671460/job/nk1nn59k5njie720
> {quote}Build started
> git clone -q https://github.com/apache/arrow.git C:\projects\arrow
> git fetch -q origin +refs/pull/3662/merge:
> git checkout -qf FETCH_HEAD
> Running Install scripts
> python ci\detect-changes.py > generated_changes.bat
> Affected files: [u'csharp/src/Apache.Arrow/Field.Builder.cs', 
> u'csharp/src/Apache.Arrow/Schema.Builder.cs', 
> u'csharp/test/Apache.Arrow.Tests/SchemaBuilderTests.cs', 
> u'csharp/test/Apache.Arrow.Tests/TypeTests.cs']
> Affected topics:
> {'c_glib': False,
>  'cpp': False,
>  'dev': False,
>  'docs': False,
>  'go': False,
>  'integration': False,
>  'java': False,
>  'js': False,
>  'python': False,
>  'r': False,
>  'ruby': False,
>  'rust': False,
>  'site': False}
> call generated_changes.bat
> call ci\appveyor-filter-changes.bat
> ===
> === No C++ or Python changes, exiting job
> ===
> Build was forcibly terminated
> Build success{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4749) [Rust] RecordBatch::new() should return result instead of panicking

2019-03-03 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4749:
-

 Summary: [Rust] RecordBatch::new() should return result instead of 
panicking
 Key: ARROW-4749
 URL: https://issues.apache.org/jira/browse/ARROW-4749
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.12.0
Reporter: Andy Grove
 Fix For: 0.13.0


RecordBatch::new() has some good validation checks, but calls assert_eq instead 
of returning a Result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4748) [Rust] [DataFusion] GROUP BY performance could be optimized

2019-03-03 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4748:
-

 Summary: [Rust] [DataFusion] GROUP BY performance could be 
optimized
 Key: ARROW-4748
 URL: https://issues.apache.org/jira/browse/ARROW-4748
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.12.0
Reporter: Andy Grove
 Fix For: 0.13.0


The logic to build the group by keys is row-based, performing an array downcast 
on every single group by value. This could be done in a columnar way instead.

 

I also wonder if it is possible to avoid converting the result map to an array 
of map entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4485) [CI] Determine maintenance approach to pinned conda-forge binutils package

2019-03-03 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4485:
--

Assignee: Uwe L. Korn

> [CI] Determine maintenance approach to pinned conda-forge binutils package
> --
>
> Key: ARROW-4485
> URL: https://issues.apache.org/jira/browse/ARROW-4485
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>
> In ARROW-4469 https://github.com/apache/arrow/pull/3554 we pinned binutils 
> 2.31 because the 2.32 release broke builds on Ubuntu Xenial. We aren't sure 
> what will be our path going forward to rely on the conda-forge toolchain 
> because of this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4746) [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime

2019-03-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782713#comment-16782713
 ] 

Antoine Pitrou commented on ARROW-4746:
---

I suppose because we are accessing actually allocated memory (and probably 
zero-initialized somewhere).

> [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime
> --
>
> Key: ARROW-4746
> URL: https://issues.apache.org/jira/browse/ARROW-4746
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pypy
>
> As mentioned in 
> https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults#comment-50670536,
>  we currently access a {{PyDataTime_Date}} object with a 
> {{PyDataTime_DateTime}} cast in {{PyDateTime_DATE_GET_SECOND}} in our code in 
> two instances. While CPython is able to deal with this wrong usage, PyPy is 
> not able to do so. We should separate the path here into one that deals with 
> dates and another that deals with datetimes.
> Reproducible code:
> {code:java}
> pa.array([datetime.date(2018, 5, 10)], type=pa.date64()){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4746) [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime

2019-03-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4746:
--

 Summary: [C++/Python] PyDataTime_Date wrongly casted to 
PyDataTime_DateTime
 Key: ARROW-4746
 URL: https://issues.apache.org/jira/browse/ARROW-4746
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Uwe L. Korn


As mentioned in 
https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults#comment-50670536,
 we currently access a {{PyDataTime_Date}} object with a 
{{PyDataTime_DateTime}} cast in {{PyDateTime_DATE_GET_SECOND}} in our code in 
two instances. While CPython is able to deal with this wrong usage, PyPy is not 
able to do so. We should separate the path here into one that deals with dates 
and another that deals with datetimes.

Reproducible code:
{code:java}
pa.array([datetime.date(2018, 5, 10)], type=pa.date64()){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4747) [C++/PyPy] Add docker image to test against PyPy nightlies

2019-03-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4747:
--

 Summary: [C++/PyPy] Add docker image to test against PyPy nightlies
 Key: ARROW-4747
 URL: https://issues.apache.org/jira/browse/ARROW-4747
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Python
Reporter: Uwe L. Korn


It seems 
(https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults#comment-50670536)
 that we are close to being able to run with PyPy. At the moment, we don't 
actively work on supporting PyPy but this would be a good start on providing 
feedback at what is still missing.

To have such an image, one would need to fork one of the current test setups in 
{{docker-compose.yml}} that build with system libraries (e.g. ubuntu-xenial or 
debian-testing) and exchange the system Python with the PyPy nightly builds 
from http://buildbot.pypy.org/nightly/trunk/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4746) [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime

2019-03-03 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782682#comment-16782682
 ] 

Uwe L. Korn commented on ARROW-4746:


[~pitrou] When you have a chance to look at this, it would be nice to know why 
CPython can deal with this.

> [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime
> --
>
> Key: ARROW-4746
> URL: https://issues.apache.org/jira/browse/ARROW-4746
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pypy
>
> As mentioned in 
> https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults#comment-50670536,
>  we currently access a {{PyDataTime_Date}} object with a 
> {{PyDataTime_DateTime}} cast in {{PyDateTime_DATE_GET_SECOND}} in our code in 
> two instances. While CPython is able to deal with this wrong usage, PyPy is 
> not able to do so. We should separate the path here into one that deals with 
> dates and another that deals with datetimes.
> Reproducible code:
> {code:java}
> pa.array([datetime.date(2018, 5, 10)], type=pa.date64()){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4737) [C#] tests are not running in CI

2019-03-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4737:
--
Labels: pull-request-available  (was: )

> [C#] tests are not running in CI
> 
>
> Key: ARROW-4737
> URL: https://issues.apache.org/jira/browse/ARROW-4737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Eric Erhardt
>Assignee: Eric Erhardt
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
>  The C# tests are not running in CI because the filtering logic needs to be 
> updated.
> For example see 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22671460/job/nk1nn59k5njie720
> {quote}Build started
> git clone -q https://github.com/apache/arrow.git C:\projects\arrow
> git fetch -q origin +refs/pull/3662/merge:
> git checkout -qf FETCH_HEAD
> Running Install scripts
> python ci\detect-changes.py > generated_changes.bat
> Affected files: [u'csharp/src/Apache.Arrow/Field.Builder.cs', 
> u'csharp/src/Apache.Arrow/Schema.Builder.cs', 
> u'csharp/test/Apache.Arrow.Tests/SchemaBuilderTests.cs', 
> u'csharp/test/Apache.Arrow.Tests/TypeTests.cs']
> Affected topics:
> {'c_glib': False,
>  'cpp': False,
>  'dev': False,
>  'docs': False,
>  'go': False,
>  'integration': False,
>  'java': False,
>  'js': False,
>  'python': False,
>  'r': False,
>  'ruby': False,
>  'rust': False,
>  'site': False}
> call generated_changes.bat
> call ci\appveyor-filter-changes.bat
> ===
> === No C++ or Python changes, exiting job
> ===
> Build was forcibly terminated
> Build success{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4606) [Rust] [DataFusion] FilterRelation created RecordBatch with empty schema

2019-03-03 Thread Neville Dipale (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale closed ARROW-4606.
-
Resolution: Duplicate

> [Rust] [DataFusion] FilterRelation created RecordBatch with empty schema
> 
>
> Key: ARROW-4606
> URL: https://issues.apache.org/jira/browse/ARROW-4606
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)