[jira] [Updated] (ARROW-9904) [C++] Unroll the loop manually for CountSetBits

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9904:
--
Labels: pull-request-available  (was: )

> [C++] Unroll the loop manually for CountSetBits
> ---
>
> Key: ARROW-9904
> URL: https://issues.apache.org/jira/browse/ARROW-9904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The tight loop below can get better performance if unroll manually to 
> indicate the compiler generating better parallel instructions.
> for (auto iter = u64_data; iter < end; ++iter) {
>  count += BitUtil::PopCount(*iter);
>  }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9904) [C++] Unroll the loop manually for CountSetBits

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9904:


Assignee: Apache Arrow JIRA Bot  (was: Frank Du)

> [C++] Unroll the loop manually for CountSetBits
> ---
>
> Key: ARROW-9904
> URL: https://issues.apache.org/jira/browse/ARROW-9904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The tight loop below can get better performance if unroll manually to 
> indicate the compiler generating better parallel instructions.
> for (auto iter = u64_data; iter < end; ++iter) {
>  count += BitUtil::PopCount(*iter);
>  }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9904) [C++] Unroll the loop manually for CountSetBits

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9904:


Assignee: Frank Du  (was: Apache Arrow JIRA Bot)

> [C++] Unroll the loop manually for CountSetBits
> ---
>
> Key: ARROW-9904
> URL: https://issues.apache.org/jira/browse/ARROW-9904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The tight loop below can get better performance if unroll manually to 
> indicate the compiler generating better parallel instructions.
> for (auto iter = u64_data; iter < end; ++iter) {
>  count += BitUtil::PopCount(*iter);
>  }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9904) [C++] Unroll the loop manually for CountSetBits

2020-09-02 Thread Frank Du (Jira)
Frank Du created ARROW-9904:
---

 Summary: [C++] Unroll the loop manually for CountSetBits
 Key: ARROW-9904
 URL: https://issues.apache.org/jira/browse/ARROW-9904
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Frank Du
Assignee: Frank Du


The tight loop below can get better performance if unroll manually to indicate 
the compiler generating better parallel instructions.

for (auto iter = u64_data; iter < end; ++iter) {
 count += BitUtil::PopCount(*iter);
 }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7871) [Python] Expose more compute kernels

2020-09-02 Thread Andrew Wieteska (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wieteska reassigned ARROW-7871:
--

Assignee: (was: Andrew Wieteska)

> [Python] Expose more compute kernels
> 
>
> Key: ARROW-7871
> URL: https://issues.apache.org/jira/browse/ARROW-7871
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
>
> Currently only the sum kernel is exposed.
> Or consider to deprecate/remove the pyarrow.compute module, and bind the 
> compute kernels as methods instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9903) [R] open_dataset freezes opening feather files

2020-09-02 Thread Sean Clement (Jira)
Sean Clement created ARROW-9903:
---

 Summary: [R] open_dataset freezes opening feather files
 Key: ARROW-9903
 URL: https://issues.apache.org/jira/browse/ARROW-9903
 Project: Apache Arrow
  Issue Type: Bug
 Environment: Rstudio
Reporter: Sean Clement


Session info:
{code:java}
// R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)Matrix products: defaultlocale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C 
 
[5] LC_TIME=English_United States.1252attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base other 
attached packages:
 [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.1 purrr_0.3.4 
readr_1.3.1 tidyr_1.1.1
 [7] tibble_3.0.3ggplot2_3.3.2   tidyverse_1.3.0 arrow_1.0.1loaded via 
a namespace (and not attached):
 [1] Rcpp_1.0.5   cellranger_1.1.0 pillar_1.4.6 compiler_4.0.2   
dbplyr_1.4.4 tools_4.0.2 
 [7] bit_1.1-15.2 lubridate_1.7.9  jsonlite_1.7.0   lifecycle_0.2.0  
gtable_0.3.0 pkgconfig_2.0.3 
[13] rlang_0.4.7  reprex_0.3.0 cli_2.0.2DBI_1.1.0
rstudioapi_0.11  haven_2.3.1 
[19] withr_2.2.0  xml2_1.3.2   httr_1.4.2   fs_1.4.1 
generics_0.0.2   vctrs_0.3.2 
[25] hms_0.5.3bit64_0.9-7  grid_4.0.2   tidyselect_1.1.0 
glue_1.4.1   R6_2.4.1
[31] fansi_0.4.1  readxl_1.3.1 modelr_0.1.8 blob_1.2.1   
magrittr_1.5 backports_1.1.7 
[37] scales_1.1.1 ellipsis_0.3.1   rvest_0.3.5  assertthat_0.2.1 
colorspace_1.4-1 stringi_1.4.6   
[43] munsell_0.5.0broom_0.7.0  crayon_1.3.4
{code}
While cycling through and processing files using open_dataset(..., format = 
"feather") in R, the function hangs randomly and will not proceed to the next 
file. The freeze does not appear at the same file each time, additionally, the 
same function freezes when used one on occasion. 

When open_dataset hangs the only way to get R to stop is using Task Manager as 
Rstudio becomes totally unresponsive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9885) [Rust] [DataFusion] Simplify code of type coercion for binary types

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9885.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8076
[https://github.com/apache/arrow/pull/8076]

> [Rust] [DataFusion] Simplify code of type coercion for binary types
> ---
>
> Key: ARROW-9885
> URL: https://issues.apache.org/jira/browse/ARROW-9885
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The function `numerical_coercion` only uses the operator `op` for its error 
> formatting. But the function's intent can be simply generalized to "coerce 
> two types to numerically equivalent types".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9888) [Rust] [DataFusion] ExecutionContext can not be shared between threads

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9888.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8082
[https://github.com/apache/arrow/pull/8082]

> [Rust] [DataFusion] ExecutionContext can not be shared between threads
> --
>
> Key: ARROW-9888
> URL: https://issues.apache.org/jira/browse/ARROW-9888
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As suggested by Jorge on  https://github.com/apache/arrow/pull/8079
> The high level idea is to allow ExecutionContext on multi-threaded 
> environments such as Python.
> The two use-cases:
> 1. when a project is planning a complex number of plans that depend on a 
> common set of sources and UDFs, it would be nice to be able to multi-thread 
> the planning. This is particularly important when planning requires reading 
> remote metadata to formulate themselves (e.g. when the source is in s3 with 
> many partitions). Metadata reading is often slow and network bounded, which 
> makes threads suitable for these workloads. If multi-threading is not 
> possible, either each plan needs to read the metadata independently (one 
> context per plan) or planning must be sequential (with lots of network 
> waiting).
> 2. when creating bindings to programming languages that support 
> multi-threading, it would be nice for the ExecutionContext to be thread safe, 
> so that we can more easily integrate with those languages.
> The code might look like:
> {code}
> alamb@MacBook-Pro rust % git diff
> diff --git a/rust/datafusion/src/execution/context.rs 
> b/rust/datafusion/src/execution/context.rs
> index 5f8aa342e..7374b0a78 100644
> --- a/rust/datafusion/src/execution/context.rs
> +++ b/rust/datafusion/src/execution/context.rs
> @@ -460,7 +460,7 @@ mod tests {
>  use arrow::array::{ArrayRef, Int32Array};
>  use arrow::compute::add;
>  use std::fs::File;
> -use std::io::prelude::*;
> +use std::{sync::Mutex, io::prelude::*};
>  use tempdir::TempDir;
>  use test::*;
>  
> @@ -928,6 +928,28 @@ mod tests {
>  Ok(())
>  }
>  
> +#[test]
> +fn send_context_to_threads() -> Result<()> {
> +// ensure that ExecutionContext's can be read by multiple threads 
> concurrently
> +let tmp_dir = TempDir::new("send_context_to_threads")?;
> +let partition_count = 4;
> +let mut ctx = Arc::new(Mutex::new(create_ctx(_dir, 
> partition_count)?));
> +
> +let threads: Vec>> = (0..2)
> +.map(|_| { ctx.clone() })
> +.map(|ctx_clone| thread::spawn(move || {
> +let ctx = ctx_clone.lock().expect("Locked context");
> +// Ensure we can create logical plan code on a separate 
> thread.
> +ctx.create_logical_plan("SELECT c1, c2 FROM test WHERE c1 > 
> 0 AND c1 < 3")
> +}))
> +.collect();
> +
> +for thread in threads {
> +thread.join().expect("Failed to join thread")?;
> +}
> +Ok(())
> +}
> +
>  #[test]
>  fn scalar_udf() -> Result<()> {
>  let schema = Schema::new(vec![
> {code}
> At the moment, Rust refuses to compile this example (and also refuses to 
> share ExecutionContexts between threads) due to the following (namely that 
> there are several `dyn` objects that are also not marked as Send + Sync:
> {code}
>Compiling datafusion v2.0.0-SNAPSHOT 
> (/Users/alamb/Software/arrow/rust/datafusion)
> error[E0277]: `(dyn execution::physical_plan::PhysicalPlanner + 'static)` 
> cannot be sent between threads safely
>--> datafusion/src/execution/context.rs:940:30
> |
> 940 | .map(|ctx_clone| thread::spawn(move || {
> |  ^ `(dyn 
> execution::physical_plan::PhysicalPlanner + 'static)` cannot be sent between 
> threads safely
> | 
>::: 
> /Users/alamb/.rustup/toolchains/nightly-2020-04-22-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/thread/mod.rs:616:8
> |
> 616 | F: Send + 'static,
> | required by this bound in `std::thread::spawn`
> |
> = help: the trait `std::marker::Send` is not implemented for `(dyn 
> execution::physical_plan::PhysicalPlanner + 'static)`
> = note: required because of the requirements on the impl of 
> `std::marker::Send` for `std::sync::Arc<(dyn 
> execution::physical_plan::PhysicalPlanner + 'static)>`
> = note: required because it appears within the type 
> `std::option::Option execution::physical_plan::PhysicalPlanner + 

[jira] [Resolved] (ARROW-9583) [Rust] Offset is mishandled in arithmetic and boolean compute kernels

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9583.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 7854
[https://github.com/apache/arrow/pull/7854]

> [Rust] Offset is mishandled in arithmetic and boolean compute kernels
> -
>
> Key: ARROW-9583
> URL: https://issues.apache.org/jira/browse/ARROW-9583
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Jörn Horstmann
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Several compute kernels create the resulting ArrayData with the same offset 
> of one of the operands. Instead this offset should be 0 since the buffer is 
> freshly constructed with the correct len.
> Example of one failing test:
>  
> {code:java}
> #[test]
> fn test_primitive_array_add_sliced() {
> let a = Int32Array::from(vec![0, 0, 0, 5, 6, 7, 8, 9, 0]);
> let b = Int32Array::from(vec![0, 0, 0, 6, 7, 8, 9, 8, 0]);
> let a = a.slice(3, 5);
> let b = b.slice(3, 5);
> let a = a.as_any().downcast_ref::().unwrap();
> let b = b.as_any().downcast_ref::().unwrap();
> assert_eq!(5, a.value(0));
> assert_eq!(6, b.value(0));
> let c = add(, ).unwrap();
> assert_eq!(5, c.len());
> assert_eq!(11, c.value(0));
> assert_eq!(13, c.value(1));
> assert_eq!(15, c.value(2));
> assert_eq!(17, c.value(3));
> assert_eq!(17, c.value(4));
> }
>  {code}
> Additionally, the boolean kernels seem to require that both operands have the 
> same offset. This shouldn't be needed, but it seems that the simd 
> implementation requires that the offset is a multiple of 8 (bits) so that the 
> operation works correctly on whole bytes. The scalar implementation should be 
> fine with any offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9900) [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-9900:
--
Component/s: Rust - DataFusion
 Rust

> [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan
> 
>
> Key: ARROW-9900
> URL: https://issues.apache.org/jira/browse/ARROW-9900
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The idea is to continue to simplify the code and improve performance: the 
> inputs to nodes are often copied and using Box requires unnecessary deep 
> copies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9900) [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9900.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8098
[https://github.com/apache/arrow/pull/8098]

> [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan
> 
>
> Key: ARROW-9900
> URL: https://issues.apache.org/jira/browse/ARROW-9900
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The idea is to continue to simplify the code and improve performance: the 
> inputs to nodes are often copied and using Box requires unnecessary deep 
> copies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9892) [Rust] [DataFusion] Add support for concat

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9892.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8090
[https://github.com/apache/arrow/pull/8090]

> [Rust] [DataFusion] Add support for concat
> --
>
> Key: ARROW-9892
> URL: https://issues.apache.org/jira/browse/ARROW-9892
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> So that we can concatenate strings together.
> {{pub fn concat(args: Vec) -> Expr}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9902) [Rust] [DataFusion] Add support for array()

2020-09-02 Thread Jorge (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge updated ARROW-9902:
-
Description: 
`array` is a function that takes an arbitrary number of columns and returns a 
fixed-size array with their values.

This function is notoriously difficult to implement because it receives an 
arbitrary number of arguments or arbitrary but common types, but it is also 
useful for e.g. time-series data.

  was:
`array` is a function that takes an arbitrary number of columns and returns a 
fixed-size list with their values.

This function is notoriously difficult to implement because it receives an 
arbitrary number of arguments or arbitrary but common types, but it is also 
useful for e.g. time-series data.


> [Rust] [DataFusion] Add support for array()
> ---
>
> Key: ARROW-9902
> URL: https://issues.apache.org/jira/browse/ARROW-9902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `array` is a function that takes an arbitrary number of columns and returns a 
> fixed-size array with their values.
> This function is notoriously difficult to implement because it receives an 
> arbitrary number of arguments or arbitrary but common types, but it is also 
> useful for e.g. time-series data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9902) [Rust] [DataFusion] Add support for array()

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9902:


Assignee: Jorge  (was: Apache Arrow JIRA Bot)

> [Rust] [DataFusion] Add support for array()
> ---
>
> Key: ARROW-9902
> URL: https://issues.apache.org/jira/browse/ARROW-9902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> `array` is a function that takes an arbitrary number of columns and returns a 
> fixed-size list with their values.
> This function is notoriously difficult to implement because it receives an 
> arbitrary number of arguments or arbitrary but common types, but it is also 
> useful for e.g. time-series data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9902) [Rust] [DataFusion] Add support for array()

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9902:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Add support for array()
> ---
>
> Key: ARROW-9902
> URL: https://issues.apache.org/jira/browse/ARROW-9902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> `array` is a function that takes an arbitrary number of columns and returns a 
> fixed-size list with their values.
> This function is notoriously difficult to implement because it receives an 
> arbitrary number of arguments or arbitrary but common types, but it is also 
> useful for e.g. time-series data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9902) [Rust] [DataFusion] Add support for array()

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9902:


Assignee: Apache Arrow JIRA Bot  (was: Jorge)

> [Rust] [DataFusion] Add support for array()
> ---
>
> Key: ARROW-9902
> URL: https://issues.apache.org/jira/browse/ARROW-9902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> `array` is a function that takes an arbitrary number of columns and returns a 
> fixed-size list with their values.
> This function is notoriously difficult to implement because it receives an 
> arbitrary number of arguments or arbitrary but common types, but it is also 
> useful for e.g. time-series data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9902) [Rust] [DataFusion] Add support for array()

2020-09-02 Thread Jorge (Jira)
Jorge created ARROW-9902:


 Summary: [Rust] [DataFusion] Add support for array()
 Key: ARROW-9902
 URL: https://issues.apache.org/jira/browse/ARROW-9902
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Jorge
Assignee: Jorge


`array` is a function that takes an arbitrary number of columns and returns a 
fixed-size list with their values.

This function is notoriously difficult to implement because it receives an 
arbitrary number of arguments or arbitrary but common types, but it is also 
useful for e.g. time-series data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9868) [C++] Provide utility for copying files between filesystems

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9868:
--
Labels: filesystem pull-request-available s3  (was: filesystem s3)

> [C++] Provide utility for copying files between filesystems
> ---
>
> Key: ARROW-9868
> URL: https://issues.apache.org/jira/browse/ARROW-9868
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: filesystem, pull-request-available, s3
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{CopyStream}} in arrow/filesystem/util_internal.h does this, but we should 
> expose it, multithread it (can read in one thread while the other thread 
> writes), and further see if there are filesystem-specific optimizations (e.g. 
> S3 multipart uploading/downloading). We may also want a version that takes a 
> FileSelector or vector of paths and parallelizes the operations on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9891) [Rust] [DataFusion] Make math functions support f32

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9891.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8089
[https://github.com/apache/arrow/pull/8089]

> [Rust] [DataFusion] Make math functions support f32
> ---
>
> Key: ARROW-9891
> URL: https://issues.apache.org/jira/browse/ARROW-9891
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Given a math function `g`, we compute g(f32) using g(cast(f32 AS f64)).
> The goal of this issue is to make the operation be cast(g(f32) AS f64) 
> instead.
> Since computations on f32 are faster than on f64, this is a simple 
> optimization.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9845) [Rust] [Parquet] serde_json is only used in tests but isn't in dev-dependencies

2020-09-02 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-9845.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8087
[https://github.com/apache/arrow/pull/8087]

> [Rust] [Parquet] serde_json is only used in tests but isn't in 
> dev-dependencies
> ---
>
> Key: ARROW-9845
> URL: https://issues.apache.org/jira/browse/ARROW-9845
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Benjamin Kimock
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.0.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is resolved by moving the dependency out of dependencies and into to 
> dev-dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9873) [C++][Compute] Improve mode kernel for intergers within limited value range

2020-09-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9873.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8091
[https://github.com/apache/arrow/pull/8091]

> [C++][Compute] Improve mode kernel for intergers within limited value range
> ---
>
> Key: ARROW-9873
> URL: https://issues.apache.org/jira/browse/ARROW-9873
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yibo Cai
>Assignee: Yibo Cai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: mode-range-skylake.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's possible to improve mode kernel performance for integers within limited 
> value range by using a value indexed array instead of general hash table.
>  Similar trick is used in sorting kernel ARROW-1571.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9901) [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9901:
--
Labels: pull-request-available  (was: )

> [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading
> --
>
> Key: ARROW-9901
> URL: https://issues.apache.org/jira/browse/ARROW-9901
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should write tests where definition and repetition levels are explicitly 
> written out for a particular Parquet schema, then read as a Arrow column.
> Sketch here:
> https://gist.github.com/pitrou/282dd790cac0eb2c1b59e8c9ab1941d8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9901) [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9901:


Assignee: Antoine Pitrou  (was: Apache Arrow JIRA Bot)

> [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading
> --
>
> Key: ARROW-9901
> URL: https://issues.apache.org/jira/browse/ARROW-9901
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should write tests where definition and repetition levels are explicitly 
> written out for a particular Parquet schema, then read as a Arrow column.
> Sketch here:
> https://gist.github.com/pitrou/282dd790cac0eb2c1b59e8c9ab1941d8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9901) [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9901:


Assignee: Apache Arrow JIRA Bot  (was: Antoine Pitrou)

> [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading
> --
>
> Key: ARROW-9901
> URL: https://issues.apache.org/jira/browse/ARROW-9901
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should write tests where definition and repetition levels are explicitly 
> written out for a particular Parquet schema, then read as a Arrow column.
> Sketch here:
> https://gist.github.com/pitrou/282dd790cac0eb2c1b59e8c9ab1941d8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9901) [C++] Add hand-crafted Parquet to Arrow reconstruction test for nested reading

2020-09-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9901:
-

 Summary: [C++] Add hand-crafted Parquet to Arrow reconstruction 
test for nested reading
 Key: ARROW-9901
 URL: https://issues.apache.org/jira/browse/ARROW-9901
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


We should write tests where definition and repetition levels are explicitly 
written out for a particular Parquet schema, then read as a Arrow column.

Sketch here:
https://gist.github.com/pitrou/282dd790cac0eb2c1b59e8c9ab1941d8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9900) [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9900:
--
Labels: pull-request-available  (was: )

> [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan
> 
>
> Key: ARROW-9900
> URL: https://issues.apache.org/jira/browse/ARROW-9900
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The idea is to continue to simplify the code and improve performance: the 
> inputs to nodes are often copied and using Box requires unnecessary deep 
> copies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9900) [Rust][DataFusion] Use Arc<> instead of Box<> in LogicalPlan

2020-09-02 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-9900:
--

 Summary: [Rust][DataFusion] Use Arc<> instead of Box<> in 
LogicalPlan
 Key: ARROW-9900
 URL: https://issues.apache.org/jira/browse/ARROW-9900
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb
Assignee: Andrew Lamb


The idea is to continue to simplify the code and improve performance: the 
inputs to nodes are often copied and using Box requires unnecessary deep copies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9899) [Rust] [DataFusion] Switch from Box --> SchemaRef (Arc) to be consistent with the rest of Arrow

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9899:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Switch from Box --> SchemaRef (Arc) to be 
> consistent with the rest of Arrow
> ---
>
> Key: ARROW-9899
> URL: https://issues.apache.org/jira/browse/ARROW-9899
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The idea is to  use SchemaRef (which is an Arc) instead of 
> Box inside Datafusion to be consistent with the rest of the arrow 
> implementation, avoid so many copies, and make the code simpler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9899) [Rust] [DataFusion] Switch from Box --> SchemaRef (Arc) to be consistent with the rest of Arrow

2020-09-02 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-9899:
--

 Summary: [Rust] [DataFusion] Switch from Box --> SchemaRef 
(Arc) to be consistent with the rest of Arrow
 Key: ARROW-9899
 URL: https://issues.apache.org/jira/browse/ARROW-9899
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


The idea is to  use SchemaRef (which is an Arc) instead of Box 
inside Datafusion to be consistent with the rest of the arrow implementation, 
avoid so many copies, and make the code simpler.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9605) [C++] Optimize performance for aggregate min/max compute kernels

2020-09-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9605.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 7871
[https://github.com/apache/arrow/pull/7871]

> [C++] Optimize performance for aggregate min/max compute kernels
> 
>
> Key: ARROW-9605
> URL: https://issues.apache.org/jira/browse/ARROW-9605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> # Use BitBlockCounter to speedup the performance for typical 0.01% null-able 
> data.
>  # Enable AVX compiler auto vectorize for no-nulls on int types. Float/Double 
> use fmin/fmax to handle NaN which can't be vectorize by compiler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9794) [C++] Add functionality to cpu_info to discriminate between Intel vs AMD x86

2020-09-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9794.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8093
[https://github.com/apache/arrow/pull/8093]

> [C++] Add functionality to cpu_info to discriminate between Intel vs AMD x86
> 
>
> Key: ARROW-9794
> URL: https://issues.apache.org/jira/browse/ARROW-9794
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is needed to do runtime dispatches for places where pext/pdep can be 
> used.  These perform poorly on AMD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9898) [C++][Gandiva] Error handling in castINT fails in some enviroments

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9898:


Assignee: Projjal Chanda  (was: Apache Arrow JIRA Bot)

> [C++][Gandiva] Error handling in castINT fails in some enviroments
> --
>
> Key: ARROW-9898
> URL: https://issues.apache.org/jira/browse/ARROW-9898
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Projjal Chanda
>Assignee: Projjal Chanda
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some environment the error path in castINT leads to segfault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9898) [C++][Gandiva] Error handling in castINT fails in some enviroments

2020-09-02 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-9898:


Assignee: Apache Arrow JIRA Bot  (was: Projjal Chanda)

> [C++][Gandiva] Error handling in castINT fails in some enviroments
> --
>
> Key: ARROW-9898
> URL: https://issues.apache.org/jira/browse/ARROW-9898
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Projjal Chanda
>Assignee: Apache Arrow JIRA Bot
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some environment the error path in castINT leads to segfault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9898) [C++][Gandiva] Error handling in castINT fails in some enviroments

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9898:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Error handling in castINT fails in some enviroments
> --
>
> Key: ARROW-9898
> URL: https://issues.apache.org/jira/browse/ARROW-9898
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Projjal Chanda
>Assignee: Projjal Chanda
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some environment the error path in castINT leads to segfault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9898) [C++][Gandiva] Error handling in castINT fails in some enviroments

2020-09-02 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-9898:
-

 Summary: [C++][Gandiva] Error handling in castINT fails in some 
enviroments
 Key: ARROW-9898
 URL: https://issues.apache.org/jira/browse/ARROW-9898
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Projjal Chanda
Assignee: Projjal Chanda


In some environment the error path in castINT leads to segfault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9897) [C++][Gandiva] Add to_date() function from pattern

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9897:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Add to_date() function from pattern
> --
>
> Key: ARROW-9897
> URL: https://issues.apache.org/jira/browse/ARROW-9897
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Projjal Chanda
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Signature: date64 to_date(utf8, utf8)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.

2020-09-02 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche resolved ARROW-7226.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

Issue resolved by pull request 8055
[https://github.com/apache/arrow/pull/8055]

> [JSON][Python] Json loader fails on example in documentation.
> -
>
> Key: ARROW-7226
> URL: https://issues.apache.org/jira/browse/ARROW-7226
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Rinke Hoekstra
>Assignee: Andrew Wieteska
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I was just trying this with the example found in the pyarrow docs at 
> [http://arrow.apache.org/docs/python/json.html]
> The documented example does not work. Is this related to this issue, or is it 
> another matter?
> It says to load the following JSON file:
> {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"
>  {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"
> I fixed this to make it valid JSON (It is valid [JSON 
> Lines|[http://jsonlines.org/]], but that's another issue):
> {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}}
>  {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}}
> Then reading the JSON from a file called `my_data.json`:
> {{from pyarrow import json}}
>  {{table = json.read_json("my_data.json")}}
> Gives the following error:
> {code:java}
> ---}}
>  ArrowInvalid Traceback (most recent call last)
>   in ()
>  1 from pyarrow import json
>  > 2 table = json.read_json('test.json')
> ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx
>  in pyarrow._json.read_json()
> ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi
>  in pyarrow.lib.check_status()
> ArrowInvalid: JSON parse error: A column changed from object to array
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9897) [C++][Gandiva] Add to_date() function from pattern

2020-09-02 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-9897:
-

 Summary: [C++][Gandiva] Add to_date() function from pattern
 Key: ARROW-9897
 URL: https://issues.apache.org/jira/browse/ARROW-9897
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Projjal Chanda


Signature: date64 to_date(utf8, utf8)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9858) [C++][Python][Docs] Expand user guide for FileSystem

2020-09-02 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-9858:
-
Summary: [C++][Python][Docs] Expand user guide for FileSystem  (was: 
[C++][Python][Docs] User guide for S3FileSystem)

> [C++][Python][Docs] Expand user guide for FileSystem
> 
>
> Key: ARROW-9858
> URL: https://issues.apache.org/jira/browse/ARROW-9858
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Documentation, Python
>Reporter: Neal Richardson
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> https://arrow.apache.org/docs/python/filesystems.html is pretty thin
> https://arrow.apache.org/docs/python/api/filesystems.html doesn't mention S3
> and in general there are some tricks to getting FileSystemFromUri to work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9858) [C++][Python][Docs] User guide for S3FileSystem

2020-09-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-9858.
---
Resolution: Fixed

Issue resolved by pull request 8065
[https://github.com/apache/arrow/pull/8065]

> [C++][Python][Docs] User guide for S3FileSystem
> ---
>
> Key: ARROW-9858
> URL: https://issues.apache.org/jira/browse/ARROW-9858
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Documentation, Python
>Reporter: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> https://arrow.apache.org/docs/python/filesystems.html is pretty thin
> https://arrow.apache.org/docs/python/api/filesystems.html doesn't mention S3
> and in general there are some tricks to getting FileSystemFromUri to work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-9858) [C++][Python][Docs] User guide for S3FileSystem

2020-09-02 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-9858:
-

Assignee: Joris Van den Bossche

> [C++][Python][Docs] User guide for S3FileSystem
> ---
>
> Key: ARROW-9858
> URL: https://issues.apache.org/jira/browse/ARROW-9858
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Documentation, Python
>Reporter: Neal Richardson
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> https://arrow.apache.org/docs/python/filesystems.html is pretty thin
> https://arrow.apache.org/docs/python/api/filesystems.html doesn't mention S3
> and in general there are some tricks to getting FileSystemFromUri to work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9821) [Rust][DataFusion] User Defined PlanNode / Operator API

2020-09-02 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189105#comment-17189105
 ] 

Andrew Lamb commented on ARROW-9821:


[~emkornfi...@gmail.com] no problem -- I will do so

> [Rust][DataFusion] User Defined PlanNode / Operator API
> ---
>
> Key: ARROW-9821
> URL: https://issues.apache.org/jira/browse/ARROW-9821
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The basic goal is to  allow users to implement their own PlanNodes. I will 
> provide a google doc opened for comments shortly.
> Proposal: 
> https://docs.google.com/document/d/1IHCGkCuUvnE9BavkykPULn6Ugxgqc1JShT4nz1vMi7g/edit#
> See also mailing list discussion here: 
> https://lists.apache.org/thread.html/rf8ae7d1147e93e3f6172bc2e4fa50a38abcb35f046cc5830e09da6cc%40%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9896) [Go] Rename ipc tool from 'arrow-json-intergration-test' to 'arrow-json'

2020-09-02 Thread FredGan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189099#comment-17189099
 ] 

FredGan commented on ARROW-9896:


[~sbinet]

Hi, I tried to solve it, but failed. Seems the name 
"arrow-json-intergration-test" is used in many languages. So I give it up.

> [Go] Rename ipc tool from 'arrow-json-intergration-test' to 'arrow-json'
> 
>
> Key: ARROW-9896
> URL: https://issues.apache.org/jira/browse/ARROW-9896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 1.0.0
>Reporter: FredGan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are five tools in go/arrow/ipc/cmd directory. The other four tools are 
> named the same with that described in the code. But the 
> 'arrow-json-intergration-test' is different. The code name it "arrow-json". 
> So maybe it would be better that this directory renamed to 'arrow-json'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9896) [Go] Rename ipc tool from 'arrow-json-intergration-test' to 'arrow-json'

2020-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9896:
--
Labels: pull-request-available  (was: )

> [Go] Rename ipc tool from 'arrow-json-intergration-test' to 'arrow-json'
> 
>
> Key: ARROW-9896
> URL: https://issues.apache.org/jira/browse/ARROW-9896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 1.0.0
>Reporter: FredGan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are five tools in go/arrow/ipc/cmd directory. The other four tools are 
> named the same with that described in the code. But the 
> 'arrow-json-intergration-test' is different. The code name it "arrow-json". 
> So maybe it would be better that this directory renamed to 'arrow-json'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9896) [Go] Rename ipc tool from 'arrow-json-intergration-test' to 'arrow-json'

2020-09-02 Thread FredGan (Jira)
FredGan created ARROW-9896:
--

 Summary: [Go] Rename ipc tool from 'arrow-json-intergration-test' 
to 'arrow-json'
 Key: ARROW-9896
 URL: https://issues.apache.org/jira/browse/ARROW-9896
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Affects Versions: 1.0.0
Reporter: FredGan


There are five tools in go/arrow/ipc/cmd directory. The other four tools are 
named the same with that described in the code. But the 
'arrow-json-intergration-test' is different. The code name it "arrow-json". So 
maybe it would be better that this directory renamed to 'arrow-json'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)