[jira] [Updated] (ARROW-4748) [Rust] [DataFusion] GROUP BY performance could be optimized

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4748:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] GROUP BY performance could be optimized
> ---
>
> Key: ARROW-4748
> URL: https://issues.apache.org/jira/browse/ARROW-4748
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>
> The logic to build the group by keys is row-based, performing an array 
> downcast on every single group by value. This could be done in a columnar way 
> instead.
>  
> I also wonder if it is possible to avoid converting the result map to an 
> array of map entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6864:
--
Labels: pull-request-available  (was: )

> [C++] bz2 / zstd tests not enabled
> --
>
> Key: ARROW-6864
> URL: https://issues.apache.org/jira/browse/ARROW-6864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the 
> relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} 
> are still not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-4748) [Rust] [DataFusion] GROUP BY performance could be optimized

2019-10-12 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-4748:
-

Assignee: Andy Grove

> [Rust] [DataFusion] GROUP BY performance could be optimized
> ---
>
> Key: ARROW-4748
> URL: https://issues.apache.org/jira/browse/ARROW-4748
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> The logic to build the group by keys is row-based, performing an array 
> downcast on every single group by value. This could be done in a columnar way 
> instead.
>  
> I also wonder if it is possible to avoid converting the result map to an 
> array of map entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6864:

Fix Version/s: 1.0.0

> [C++] bz2 / zstd tests not enabled
> --
>
> Key: ARROW-6864
> URL: https://issues.apache.org/jira/browse/ARROW-6864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the 
> relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} 
> are still not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950121#comment-16950121
 ] 

Wes McKinney commented on ARROW-6864:
-

Probably caused by my change to the flags. I'll take a look

> [C++] bz2 / zstd tests not enabled
> --
>
> Key: ARROW-6864
> URL: https://issues.apache.org/jira/browse/ARROW-6864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the 
> relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} 
> are still not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6806) [C++] Segfault deserializing ListArray containing null/empty list

2019-10-12 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6806:

Summary: [C++] Segfault deserializing ListArray containing null/empty list  
(was: Segfault deserializing ListArray containing null/empty list)

> [C++] Segfault deserializing ListArray containing null/empty list
> -
>
> Key: ARROW-6806
> URL: https://issues.apache.org/jira/browse/ARROW-6806
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Max Bolingbroke
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following code segfaults for me (Windows and Linux, pyarrow 0.15):
>  
> {code:java}
> import pyarrow as pa
> from io import BytesIO
> x = 
> b'\xdc\x00\x00\x00\x10\x00\x00\x00\x0c\x00\x0e\x00\x06\x00\r\x00\x08\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x03\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x18\x00\x00\x00\x00\x00\x12\x00\x18\x00\x14\x00\x13\x00\x12\x00\x0c\x00\x00\x00\x08\x00\x04\x00\x12\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00`\x00\x00\x00\x00\x00\x0c\x01\\\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x18\x00\x00\x00\x00\x00\x12\x00\x18\x00\x14\x00\x00\x00\x13\x00\x0c\x00\x00\x00\x08\x00\x04\x00\x12\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x05\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\xff\xff\xff\x06\x00\x00\x00$data$\x00\x00\x04\x00\x04\x00\x04\x00\x00\x00\x10\x00\x00\x00exchangeCodeList\x00\x00\x00\x00\xcc\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x16\x00\x0e\x00\x15\x00\x10\x00\x04\x00\x0c\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x10\x00\x00\x00\x00\x03\n\x00\x18\x00\x0c\x00\x08\x00\x04\x00\n\x00\x00\x00\x14\x00\x00\x00h\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> r = pa.RecordBatchStreamReader(BytesIO(x))
> r.read_all()
> {code}
> I *think* what should happen instead is that I should get a Table with a 
> single column named "exchangeCodeList", where the column is a ChunkedArray 
> with a single chunk, where that chunk is a ListArray containing just a single 
> element (a null). Failing that (i.e. if the bytestring is actually 
> malformed), pyarrow should maybe throw an error instead of segfaulting?
> I'm not 100% sure how the bytestring was generated: I think it comes from a 
> Java-based server. I can deserialize the server response fine if all the 
> records have at least one element in the "exchangeCodeList" column, but not 
> if at least one of them is null. I've tried to reproduce the failure by 
> generating the bytestring with pyarrow but can't trigger the segfault.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6860) [Python] Only link libarrow_flight.so to pyarrow._flight

2019-10-12 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6860.
-
Resolution: Fixed

Issue resolved by pull request 5627
[https://github.com/apache/arrow/pull/5627]

> [Python] Only link libarrow_flight.so to pyarrow._flight
> 
>
> Key: ARROW-6860
> URL: https://issues.apache.org/jira/browse/ARROW-6860
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See BEAM-8368. We need to find a strategy to mitigate protobuf static linking 
> issues with teh Beam community



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6867:
--
Labels: pull-request-available  (was: )

> [FlightRPC][Java] Flight server can hang JVM on shutdown
> 
>
> Key: ARROW-6867
> URL: https://issues.apache.org/jira/browse/ARROW-6867
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.15.0
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> I noticed this while working on Flight integration tests. FlightService keeps 
> an executor, which can hang the JVM on shutdown if the executor itself is not 
> shut down.
> It's used by Handshake and DoPut.
> I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order

2019-10-12 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-5680.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5622
[https://github.com/apache/arrow/pull/5622]

> [Rust] datafusion group-by tests depends on result set order
> 
>
> Key: ARROW-5680
> URL: https://issues.apache.org/jira/browse/ARROW-5680
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Francois Saint-Jacques
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See 
> https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
> once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further 
> failures, e.g.
> {code:bash}
> running 18 tests
> test csv_query_group_by_int_min_max ... FAILED
> test csv_query_external_table_count ... ok
> test csv_query_count ... ok
> test csv_count_star ... ok
> test csv_query_avg ... ok
> test csv_query_avg_multi_batch ... ok
> test csv_query_cast ... ok
> test csv_query_group_by_avg ... FAILED
> test csv_query_group_by_string_min_max ... FAILED
> test csv_query_group_by_int_count ... FAILED
> test csv_query_limit ... ok
> test csv_query_limit_bigger_than_nbr_of_rows ... ok
> test csv_query_limit_with_same_nbr_of_rows ... ok
> test csv_query_cast_literal ... ok
> test csv_query_limit_zero ... ok
> test csv_query_create_external_table ... ok
> test csv_query_with_predicate ... ok
> test parquet_query ... ok
> failures:
>  csv_query_group_by_int_min_max stdout 
> thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left 
> == right)`
>   left: 
> `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
>  right: 
> `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:77:5
> note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
>  csv_query_group_by_avg stdout 
> thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == 
> right)`
>   left: 
> `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
>  right: 
> `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`',
>  datafusion/tests/sql.rs:99:5
>  csv_query_group_by_string_min_max stdout 
> thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: 
> `(left == right)`
>   left: 
> `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
>  right: 
> `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:187:5
>  csv_query_group_by_int_count stdout 
> thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left 
> == right)`
>   left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
>  right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', 
> datafusion/tests/sql.rs:175:5
> {code}
> I suspect that the tests are expecting the group-by results in a fix order. 
> That would be highly dependent on the iterator of the hash table. Note that 
> once I did a rustup update (and docker rmi rustlangrust/nightly), the 
> failures have gone away.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6690) [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD

2019-10-12 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6690.
---
Resolution: Fixed

Issue resolved by pull request 5606
[https://github.com/apache/arrow/pull/5606]

> [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD
> --
>
> Key: ARROW-6690
> URL: https://issues.apache.org/jira/browse/ARROW-6690
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the implementation of HashAggregate in the new physical plan uses 
> the same logic regardless of whether a grouping expression is used.
> For the case where there is no grouping expression, such as "SELECT SUM(a) 
> FROM b" we can use the compute kernels to perform an aggregate operation on 
> each batch rather than iterating over each row and accumulating individual 
> values.
> This optimization already exists in the original implementation of aggregate 
> queries direct from the logical plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order

2019-10-12 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-5680:
-

Assignee: Andy Grove

> [Rust] datafusion group-by tests depends on result set order
> 
>
> Key: ARROW-5680
> URL: https://issues.apache.org/jira/browse/ARROW-5680
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Francois Saint-Jacques
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See 
> https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
> once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further 
> failures, e.g.
> {code:bash}
> running 18 tests
> test csv_query_group_by_int_min_max ... FAILED
> test csv_query_external_table_count ... ok
> test csv_query_count ... ok
> test csv_count_star ... ok
> test csv_query_avg ... ok
> test csv_query_avg_multi_batch ... ok
> test csv_query_cast ... ok
> test csv_query_group_by_avg ... FAILED
> test csv_query_group_by_string_min_max ... FAILED
> test csv_query_group_by_int_count ... FAILED
> test csv_query_limit ... ok
> test csv_query_limit_bigger_than_nbr_of_rows ... ok
> test csv_query_limit_with_same_nbr_of_rows ... ok
> test csv_query_cast_literal ... ok
> test csv_query_limit_zero ... ok
> test csv_query_create_external_table ... ok
> test csv_query_with_predicate ... ok
> test parquet_query ... ok
> failures:
>  csv_query_group_by_int_min_max stdout 
> thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left 
> == right)`
>   left: 
> `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
>  right: 
> `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:77:5
> note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
>  csv_query_group_by_avg stdout 
> thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == 
> right)`
>   left: 
> `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
>  right: 
> `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`',
>  datafusion/tests/sql.rs:99:5
>  csv_query_group_by_string_min_max stdout 
> thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: 
> `(left == right)`
>   left: 
> `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
>  right: 
> `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:187:5
>  csv_query_group_by_int_count stdout 
> thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left 
> == right)`
>   left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
>  right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', 
> datafusion/tests/sql.rs:175:5
> {code}
> I suspect that the tests are expecting the group-by results in a fix order. 
> That would be highly dependent on the iterator of the hash table. Note that 
> once I did a rustup update (and docker rmi rustlangrust/nightly), the 
> failures have gone away.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6859) [CI][Nightly] Disable docker layer caching for CircleCI tasks

2019-10-12 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6859.

Resolution: Fixed

Issue resolved by pull request 5617
[https://github.com/apache/arrow/pull/5617]

> [CI][Nightly] Disable docker layer caching for CircleCI tasks
> -
>
> Key: ARROW-6859
> URL: https://issues.apache.org/jira/browse/ARROW-6859
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CircleCI builds are failing because the layer caching is not available for 
> free plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950062#comment-16950062
 ] 

David Li edited comment on ARROW-6867 at 10/12/19 3:00 PM:
---

Aha, the real reason is

1) By default, we share an executor between gRPC and Flight.
 2) gRPC doesn't take ownership of the executor, so we need to manually shut it 
down on exit.

The safest thing would be to clean up the executor, and document Flight as 
owning it.


was (Author: lidavidm):
Aha, the real reason is

1) By default, we share an executor between gRPC and Flight.
2) gRPC doesn't take ownership of the executor, so we need to manually shut it 
down on exit.

The safest thing would be to use separate executors, and make sure to clean up 
both executors. (This would also avoid potential deadlocks; gRPC can't process 
client cancellations if the executor is full.)

> [FlightRPC][Java] Flight server can hang JVM on shutdown
> 
>
> Key: ARROW-6867
> URL: https://issues.apache.org/jira/browse/ARROW-6867
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.15.0
>Reporter: David Li
>Assignee: David Li
>Priority: Major
> Fix For: 1.0.0
>
>
> I noticed this while working on Flight integration tests. FlightService keeps 
> an executor, which can hang the JVM on shutdown if the executor itself is not 
> shut down.
> It's used by Handshake and DoPut.
> I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950062#comment-16950062
 ] 

David Li commented on ARROW-6867:
-

Aha, the real reason is

1) By default, we share an executor between gRPC and Flight.
2) gRPC doesn't take ownership of the executor, so we need to manually shut it 
down on exit.

The safest thing would be to use separate executors, and make sure to clean up 
both executors. (This would also avoid potential deadlocks; gRPC can't process 
client cancellations if the executor is full.)

> [FlightRPC][Java] Flight server can hang JVM on shutdown
> 
>
> Key: ARROW-6867
> URL: https://issues.apache.org/jira/browse/ARROW-6867
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.15.0
>Reporter: David Li
>Assignee: David Li
>Priority: Major
> Fix For: 1.0.0
>
>
> I noticed this while working on Flight integration tests. FlightService keeps 
> an executor, which can hang the JVM on shutdown if the executor itself is not 
> shut down.
> It's used by Handshake and DoPut.
> I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread David Li (Jira)
David Li created ARROW-6867:
---

 Summary: [FlightRPC][Java] Flight server can hang JVM on shutdown
 Key: ARROW-6867
 URL: https://issues.apache.org/jira/browse/ARROW-6867
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.15.0
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0


I noticed this while working on Flight integration tests. FlightService keeps 
an executor, which can hang the JVM on shutdown if the executor itself is not 
shut down.

It's used by Handshake and DoPut.

I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6866:
--
Labels: pull-request-available  (was: )

> [Java] Improve the performance of calculating hash code for struct vector
> -
>
> Key: ARROW-6866
> URL: https://issues.apache.org/jira/browse/ARROW-6866
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>
> Improve the performance of hashCode(int) method for StructVector:
> 1. We can get the child vectors directly, so there is no need to get the name 
> from the child vector and then use the name to get the vector. 
> 2. The child vectors cannot be null, so there is no need to check it.
> The performance improvement depends on the complexity of the hash algorithm. 
> For computational intensive hash algorithms, the improvement can be small; 
> while for simple hash algorithms, the improvement can be notable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6866:
---

 Summary: [Java] Improve the performance of calculating hash code 
for struct vector
 Key: ARROW-6866
 URL: https://issues.apache.org/jira/browse/ARROW-6866
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


Improve the performance of hashCode(int) method for StructVector:
1. We can get the child vectors directly, so there is no need to get the name 
from the child vector and then use the name to get the vector. 
2. The child vectors cannot be null, so there is no need to check it.

The performance improvement depends on the complexity of the hash algorithm. 
For computational intensive hash algorithms, the improvement can be small; 
while for simple hash algorithms, the improvement can be notable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6865) [Java] Improve the performance of comparing an ArrowBuf against a byte array

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6865:
--
Labels: pull-request-available  (was: )

> [Java] Improve the performance of comparing an ArrowBuf against a byte array
> 
>
> Key: ARROW-6865
> URL: https://issues.apache.org/jira/browse/ARROW-6865
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>
> We change the way of comparing an ArrowBuf against a byte array from byte 
> wise comparison to comparison by long/int/byte.
> Benchmark shows that there is a 6.7x performance improvement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6865) [Java] Improve the performance of comparing an ArrowBuf against a byte array

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6865:
---

 Summary: [Java] Improve the performance of comparing an ArrowBuf 
against a byte array
 Key: ARROW-6865
 URL: https://issues.apache.org/jira/browse/ARROW-6865
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


We change the way of comparing an ArrowBuf against a byte array from byte wise 
comparison to comparison by long/int/byte.

Benchmark shows that there is a 6.7x performance improvement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950011#comment-16950011
 ] 

Antoine Pitrou commented on ARROW-6864:
---

cc [~wesm]

> [C++] bz2 / zstd tests not enabled
> --
>
> Key: ARROW-6864
> URL: https://issues.apache.org/jira/browse/ARROW-6864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the 
> relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} 
> are still not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6864:
-

 Summary: [C++] bz2 / zstd tests not enabled
 Key: ARROW-6864
 URL: https://issues.apache.org/jira/browse/ARROW-6864
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.0
Reporter: Antoine Pitrou


When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the relevant 
tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} are still 
not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6863) [Java] Provide parallel searcher

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6863:
--
Labels: pull-request-available  (was: )

> [Java] Provide parallel searcher
> 
>
> Key: ARROW-6863
> URL: https://issues.apache.org/jira/browse/ARROW-6863
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>
> For scenarios where the vector is large and the a low response time is 
> required, we need to search the vector in parallel to improve the 
> responsiveness.
> This issue tries to provide a parallel searcher for the equality semantics 
> (the support for ordering semantics is not ready yet, as we need a way to 
> distribute the comparator).
> The implementation is based on multi-threading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6863) [Java] Provide parallel searcher

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6863:
---

 Summary: [Java] Provide parallel searcher
 Key: ARROW-6863
 URL: https://issues.apache.org/jira/browse/ARROW-6863
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For scenarios where the vector is large and the a low response time is 
required, we need to search the vector in parallel to improve the 
responsiveness.

This issue tries to provide a parallel searcher for the equality semantics (the 
support for ordering semantics is not ready yet, as we need a way to distribute 
the comparator).

The implementation is based on multi-threading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6662) [Java] Implement equals/approxEquals API for VectorSchemaRoot

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6662:
--
Labels: pull-request-available  (was: )

> [Java] Implement equals/approxEquals API for VectorSchemaRoot
> -
>
> Key: ARROW-6662
> URL: https://issues.apache.org/jira/browse/ARROW-6662
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>
> Currently with the new added visitor APIs(ARROW-6211), we could implement 
> equals/approxEquals for VectorSchemaRoot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6850) [Java] Jdbc converter support Null type

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6850:
--
Labels: pull-request-available  (was: )

> [Java] Jdbc converter support Null type
> ---
>
> Key: ARROW-6850
> URL: https://issues.apache.org/jira/browse/ARROW-6850
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>
> java.sql.Types.Null is not supported yet since we have no NullVector in Java 
> code before.
> This could be implemented after ARROW-1638 merged (IPC roundtrip for null 
> type).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948
 ] 

Ji Liu edited comment on ARROW-6464 at 10/12/19 7:40 AM:
-

Issue resolved by pull request 5293

[https://github.com/apache/arrow/pull/5293]


was (Author: tianchen92):
Issue resolved in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949939#comment-16949939
 ] 

Ji Liu edited comment on ARROW-6661 at 10/12/19 7:40 AM:
-

Issue resolved by pull request 5470

[https://github.com/apache/arrow/pull/5470]


was (Author: tianchen92):
Issue resolved in [https://github.com/apache/arrow/pull/5470]

> [Java] Implement APIs like slice to enhance VectorSchemaRoot
> 
>
> Key: ARROW-6661
> URL: https://issues.apache.org/jira/browse/ARROW-6661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently in Java Implementation there is no APIs like slice for record batch 
> like C++/Python.
> This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948
 ] 

Ji Liu edited comment on ARROW-6464 at 10/12/19 7:39 AM:
-

Issue resolved in

[https://github.com/apache/arrow/pull/5293]


was (Author: tianchen92):
Issue resolve in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-6464.
---
Fix Version/s: 0.15.1
   Resolution: Fixed

Issue resolve in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-10-12 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-6661.
---
Fix Version/s: 0.15.1
   Resolution: Fixed

Issue resolved in [https://github.com/apache/arrow/pull/5470]

> [Java] Implement APIs like slice to enhance VectorSchemaRoot
> 
>
> Key: ARROW-6661
> URL: https://issues.apache.org/jira/browse/ARROW-6661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently in Java Implementation there is no APIs like slice for record batch 
> like C++/Python.
> This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)