[GitHub] [arrow-datafusion] andygrove opened a new issue, #4985: Changelog generator not working for patch releases

2023-01-19 Thread GitBox
andygrove opened a new issue, #4985: URL: https://github.com/apache/arrow-datafusion/issues/4985 **Describe the bug** I could not get the changelog script to generate an accurate changelog for the maint-16.x branch. It showed changes from the master branch that are not in maint-16.x

[GitHub] [arrow] AlenkaF commented on pull request #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-19 Thread GitBox
AlenkaF commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1397108286 Great work so far @akshaysu12 ! Looking at the code from the binding for `CSVStreamingReader` and the work done on the C++ side for the JSON stream reader I would say this PR is

[GitHub] [arrow] alamb commented on pull request #33716: WIP: DO NOT MERGE: Apache Arrow Flight SQL adapter for PostgreSQL plan

2023-01-19 Thread GitBox
alamb commented on PR #33716: URL: https://github.com/apache/arrow/pull/33716#issuecomment-1397104567 > Ok. And those clients use the wire protocol directly(ish), so they can't take advantage of the JDBC or ODBC Flight SQL drivers, presumably? That is correct. In the ideal future

[GitHub] [arrow-datafusion-python] jdye64 opened a new issue, #140: Bump pyo3 to 0.18.0

2023-01-19 Thread GitBox
jdye64 opened a new issue, #140: URL: https://github.com/apache/arrow-datafusion-python/issues/140 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** PyO3 recently had a release and there seems to be several features we could

[GitHub] [arrow-datafusion] andygrove commented on pull request #4975: [maint-16.x] Prep for release

2023-01-19 Thread GitBox
andygrove commented on PR #4975: URL: https://github.com/apache/arrow-datafusion/pull/4975#issuecomment-1397101686 @jonmmease @alamb PTAL I could not get the changelog script to generate an accurate changelog for this patch release and do not have time to debug that right now. --

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4972: Simplify GroupByHash implementation (to prepare for more work)

2023-01-19 Thread GitBox
alamb commented on code in PR #4972: URL: https://github.com/apache/arrow-datafusion/pull/4972#discussion_r1081382689 ## datafusion/core/src/physical_plan/aggregates/row_hash.rs: ## @@ -219,91 +221,76 @@ impl GroupedHashAggregateStream { batch_size,

[GitHub] [arrow-datafusion-python] andygrove opened a new issue, #139: Make it easier to create a Pandas dataframe from DataFusion query results

2023-01-19 Thread GitBox
andygrove opened a new issue, #139: URL: https://github.com/apache/arrow-datafusion-python/issues/139 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** DataFrame.collect returns a list of PyArrow record batches. Each batch can be

[GitHub] [arrow-datafusion] ursabot commented on pull request #4923: Support join-filter pushdown for semi/anti join

2023-01-19 Thread GitBox
ursabot commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1397098405 Benchmark runs are scheduled for baseline = 19f6f19c1d4783f9bcfde83744ee436f8e984154 and contender = dde23efed94704044822bcefe49c0af7f9260088.

[GitHub] [arrow] jorisvandenbossche commented on issue #33769: [Python] support quantile for temporal dtypes

2023-01-19 Thread GitBox
jorisvandenbossche commented on issue #33769: URL: https://github.com/apache/arrow/issues/33769#issuecomment-1397094509 I think also for that reason (quantile by default always returns floats), we might not want to add support for quantile to temporal types (not sure if we actually ever

[GitHub] [arrow-datafusion-python] andygrove merged pull request #137: Improve README and add more examples

2023-01-19 Thread GitBox
andygrove merged PR #137: URL: https://github.com/apache/arrow-datafusion-python/pull/137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] mbrobbel opened a new pull request, #33788: MINOR: [Documentation] Add link to IPC section on the Columnar page

2023-01-19 Thread GitBox
mbrobbel opened a new pull request, #33788: URL: https://github.com/apache/arrow/pull/33788 ### Rationale for this change Linking directly to the IPC section makes it easier to find what people are looking for. ### What changes are included in this PR? Adds a link to

[GitHub] [arrow-datafusion-python] jdye64 commented on pull request #137: Improve README and add more examples

2023-01-19 Thread GitBox
jdye64 commented on PR #137: URL: https://github.com/apache/arrow-datafusion-python/pull/137#issuecomment-1397088728 FYI - I'm working on adding support for compressed file reading and will update the examples to show that as well once completed. -- This is an automated message from the

[GitHub] [arrow] thisisnic commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
thisisnic commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081369004 ## r/NEWS.md: ## @@ -19,6 +19,92 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the

[GitHub] [arrow] thisisnic commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
thisisnic commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081366249 ## r/NEWS.md: ## @@ -19,6 +19,92 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4923: Support join-filter pushdown for semi/anti join

2023-01-19 Thread GitBox
alamb commented on code in PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#discussion_r1081365754 ## datafusion/core/tests/sqllogictests/test_files/join.slt: ## @@ -42,3 +42,42 @@ SELECT s.*, g.grade FROM students s join grades g on s.mark between g.min and

[GitHub] [arrow] thisisnic commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
thisisnic commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081366249 ## r/NEWS.md: ## @@ -19,6 +19,92 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the

[GitHub] [arrow-datafusion] alamb merged pull request #4923: Support join-filter pushdown for semi/anti join

2023-01-19 Thread GitBox
alamb merged PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] ursabot commented on pull request #4969: refactor: display input partitions for `RepartitionExec`

2023-01-19 Thread GitBox
ursabot commented on PR #4969: URL: https://github.com/apache/arrow-datafusion/pull/4969#issuecomment-1397085822 Benchmark runs are scheduled for baseline = 96cf046be57bf09548d51f50d0bc964904bcec7d and contender = 19f6f19c1d4783f9bcfde83744ee436f8e984154.

[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-19 Thread GitBox
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1397083616 > What do you think about having a single method which only takes a list of paths? For a single path, the callee can create a slice/Vec. This would be a lot simpler to do.

[GitHub] [arrow] eitsupi commented on issue #18487: [R] Read CSV from character vector

2023-01-19 Thread GitBox
eitsupi commented on issue #18487: URL: https://github.com/apache/arrow/issues/18487#issuecomment-1397083374 Note that this has not worked since `readr` 2.0.0. ``` r readr::read_csv(c("a,b", "1,2", "3,4")) #> Error: 'a,b' does not exist in current working directory

[GitHub] [arrow] jorisvandenbossche merged pull request #33764: GH-15109: [Python] Allow creation of non empty struct array with zero field

2023-01-19 Thread GitBox
jorisvandenbossche merged PR #33764: URL: https://github.com/apache/arrow/pull/33764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] cyborne100 commented on issue #33758: [R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow'

2023-01-19 Thread GitBox
cyborne100 commented on issue #33758: URL: https://github.com/apache/arrow/issues/33758#issuecomment-1397075519 Ahh, I understand now. Thanks for helping a NOOB. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] alamb merged pull request #4969: refactor: display input partitions for `RepartitionExec`

2023-01-19 Thread GitBox
alamb merged PR #4969: URL: https://github.com/apache/arrow-datafusion/pull/4969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] alamb closed issue #4964: `RepartitionExec`: print number of input partitions in text representation

2023-01-19 Thread GitBox
alamb closed issue #4964: `RepartitionExec`: print number of input partitions in text representation URL: https://github.com/apache/arrow-datafusion/issues/4964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] eitsupi commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
eitsupi commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081343959 ## r/NEWS.md: ## @@ -19,6 +19,92 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the +

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4903: Update main DataFusion README

2023-01-19 Thread GitBox
alamb commented on code in PR #4903: URL: https://github.com/apache/arrow-datafusion/pull/4903#discussion_r1081347231 ## README.md: ## @@ -57,9 +75,31 @@ a foundation for building new systems. Here are some example use cases: - _Easy to Embed_: Allowing extension at almost

[GitHub] [arrow] lidavidm commented on pull request #33716: WIP: DO NOT MERGE: Apache Arrow Flight SQL adapter for PostgreSQL plan

2023-01-19 Thread GitBox
lidavidm commented on PR #33716: URL: https://github.com/apache/arrow/pull/33716#issuecomment-1397066936 Ok. And those clients use the wire protocol directly(ish), so they can't take advantage of the JDBC or ODBC Flight SQL drivers, presumably? ADBC may not help there, unless they're

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4984: minor: Update data type support documentation

2023-01-19 Thread GitBox
alamb commented on code in PR #4984: URL: https://github.com/apache/arrow-datafusion/pull/4984#discussion_r1081344325 ## .github/workflows/dev.yml: ## @@ -42,8 +42,7 @@ jobs: node-version: "14" - name: Prettier check run: | - # if you

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4984: minor: Update data type support documentation

2023-01-19 Thread GitBox
alamb commented on code in PR #4984: URL: https://github.com/apache/arrow-datafusion/pull/4984#discussion_r1081343683 ## docs/source/user-guide/sql/data_types.md: ## @@ -25,13 +25,26 @@ execution. The SQL types from are mapped to [Arrow data

[GitHub] [arrow] thisisnic merged pull request #33780: GH-33779: [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test

2023-01-19 Thread GitBox
thisisnic merged PR #33780: URL: https://github.com/apache/arrow/pull/33780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4984: minor: Update data type support documentation

2023-01-19 Thread GitBox
alamb commented on code in PR #4984: URL: https://github.com/apache/arrow-datafusion/pull/4984#discussion_r1081343250 ## docs/source/user-guide/sql/data_types.md: ## @@ -52,11 +65,12 @@ This mapping occurs when defining the schema in a `CREATE EXTERNAL TABLE` comman ##

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4984: minor: Update data type support documentation

2023-01-19 Thread GitBox
alamb opened a new pull request, #4984: URL: https://github.com/apache/arrow-datafusion/pull/4984 # Which issue does this PR close? N/A # Rationale for this change https://arrow.apache.org/datafusion/user-guide/sql/data_types.html is out of date, as we discovered while

[GitHub] [arrow] paleolimbot commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
paleolimbot commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081335626 ## r/NEWS.md: ## @@ -33,62 +33,57 @@ functionality, but allow for readr-style options to be supplied, making it simpler to switch between individual

[GitHub] [arrow] paleolimbot commented on issue #33701: [R] Link-time optimization reports violations of one-definition rule in the R package

2023-01-19 Thread GitBox
paleolimbot commented on issue #33701: URL: https://github.com/apache/arrow/issues/33701#issuecomment-1397052533 CRAN runs an LTO build as part of its check suite...if there are warnings during the linking, we get an email saying that we need to fix them in two weeks to "safely retain our

[GitHub] [arrow] github-actions[bot] commented on pull request #33778: GH-33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #33778: URL: https://github.com/apache/arrow/pull/33778#issuecomment-1397052029 Revision: de1d764d08c6d051e9577ac4144ac4476f79a88c Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] paleolimbot commented on pull request #33778: GH-33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module

2023-01-19 Thread GitBox
paleolimbot commented on PR #33778: URL: https://github.com/apache/arrow/pull/33778#issuecomment-1397048335 @github-actions crossbow submit test-r-offline-minimal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-rs] alamb opened a new issue, #3565: Should we write a "arrow-rs" update blog post?

2023-01-19 Thread GitBox
alamb opened a new issue, #3565: URL: https://github.com/apache/arrow-rs/issues/3565 **Which part is this question about** We often write DataFusion https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/ updates as a way to help grow the community and awareness about arrow.

[GitHub] [arrow-rs] alamb commented on issue #53: [Parquet] Reading parquet file into an ndarray

2023-01-19 Thread GitBox
alamb commented on issue #53: URL: https://github.com/apache/arrow-rs/issues/53#issuecomment-1397036453 > Instead of creating multiple readers can it be done by single reader? That is an interesting question @Sach1nAgarwal -- I think @tustvold has some ideas on parallelized decode

[GitHub] [arrow] alamb commented on pull request #33716: WIP: DO NOT MERGE: Apache Arrow Flight SQL adapter for PostgreSQL plan

2023-01-19 Thread GitBox
alamb commented on PR #33716: URL: https://github.com/apache/arrow/pull/33716#issuecomment-1397033386 > Yes, the ADBC driver wraps libpq and should let you work with other databases that use the PostgreSQL wire protocol, with the caveat that it has to convert the data (of course). I

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #33660: GH-33659: [Developer Tools] Add definition of Breaking Change and Critical Fix

2023-01-19 Thread GitBox
jorisvandenbossche commented on code in PR #33660: URL: https://github.com/apache/arrow/pull/33660#discussion_r1081303563 ## docs/source/developers/reviewing.rst: ## @@ -255,3 +255,43 @@ Social aspects * Like any communication, code reviews are governed by the Apache `Code

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #33660: GH-33659: [Developer Tools] Add definition of Breaking Change and Critical Fix

2023-01-19 Thread GitBox
jorisvandenbossche commented on code in PR #33660: URL: https://github.com/apache/arrow/pull/33660#discussion_r1081303563 ## docs/source/developers/reviewing.rst: ## @@ -255,3 +255,43 @@ Social aspects * Like any communication, code reviews are governed by the Apache `Code

[GitHub] [arrow] lidavidm merged pull request #15223: GH-15203: [Java] Implement writing compressed files

2023-01-19 Thread GitBox
lidavidm merged PR #15223: URL: https://github.com/apache/arrow/pull/15223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] ursabot commented on pull request #33614: GH-33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow

2023-01-19 Thread GitBox
ursabot commented on PR #33614: URL: https://github.com/apache/arrow/pull/33614#issuecomment-1396971858 Benchmark runs are scheduled for baseline = 04ffb1f740f1e868c08313fa9043070345d9b6f0 and contender = 444dcb6779755fc33f3f81d647c188cf31abd23c. 444dcb6779755fc33f3f81d647c188cf31abd23c

[GitHub] [arrow] rok commented on issue #33782: [Release] Vote email number of issues is querying JIRA and producing a wrong number

2023-01-19 Thread GitBox
rok commented on issue #33782: URL: https://github.com/apache/arrow/issues/33782#issuecomment-1396971891 GraphQL API is another option: ``` curl -H 'Content-Type: application/json' -H "Authorization: bearer GITHUB_TOKEN" -X POST -d '{"query": "query {search(query: \"repo:apache/arrow

[GitHub] [arrow] ursabot commented on pull request #15213: GH-15212: [C++] fix sliced list array writing in ORC

2023-01-19 Thread GitBox
ursabot commented on PR #15213: URL: https://github.com/apache/arrow/pull/15213#issuecomment-1396971385 Benchmark runs are scheduled for baseline = 2b50694c10e09e4a1343b62c6b5f44ad4403d0e1 and contender = 04ffb1f740f1e868c08313fa9043070345d9b6f0. 04ffb1f740f1e868c08313fa9043070345d9b6f0

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4972: Simplify GroupByHash implementation (to prepare for more work)

2023-01-19 Thread GitBox
ozankabak commented on code in PR #4972: URL: https://github.com/apache/arrow-datafusion/pull/4972#discussion_r1081247158 ## datafusion/core/src/physical_plan/aggregates/row_hash.rs: ## @@ -219,91 +221,76 @@ impl GroupedHashAggregateStream { batch_size,

[GitHub] [arrow] github-actions[bot] commented on pull request #33781: GH-33723: [C++] re2::RE2::RE2() result must be checked

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #33781: URL: https://github.com/apache/arrow/pull/33781#issuecomment-1396950828 * Closes: #33723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] mustafasrepo closed pull request #4982: Use arrow concat_batches instead of custom non-owning merge_batches

2023-01-19 Thread GitBox
mustafasrepo closed pull request #4982: Use arrow concat_batches instead of custom non-owning merge_batches URL: https://github.com/apache/arrow-datafusion/pull/4982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #355: feat(python): add Flight SQL driver using Go library

2023-01-19 Thread GitBox
lidavidm commented on code in PR #355: URL: https://github.com/apache/arrow-adbc/pull/355#discussion_r1081224490 ## docs/source/driver/go/flight_sql.rst: ## @@ -25,10 +25,19 @@ The Flight SQL Driver provides access to any database implementing a Installation

[GitHub] [arrow-datafusion] alamb commented on issue #4804: Blog post about datafusion 16 release

2023-01-19 Thread GitBox
alamb commented on issue #4804: URL: https://github.com/apache/arrow-datafusion/issues/4804#issuecomment-1396922128 Rendered site: Rendered: https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/ -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-datafusion] alamb closed issue #4804: Blog post about datafusion 16 release

2023-01-19 Thread GitBox
alamb closed issue #4804: Blog post about datafusion 16 release URL: https://github.com/apache/arrow-datafusion/issues/4804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-datafusion] DDtKey opened a new pull request, #4983: fix(4981): incorrect error wrapping in `OnceFut`

2023-01-19 Thread GitBox
DDtKey opened a new pull request, #4983: URL: https://github.com/apache/arrow-datafusion/pull/4983 # Which issue does this PR close? Closes #4981 # Rationale for this change # What changes are included in this PR? # Are these changes tested?

[GitHub] [arrow-datafusion] ygf11 commented on pull request #4944: Only add outer filter once when transforming exists/in subquery to join

2023-01-19 Thread GitBox
ygf11 commented on PR #4944: URL: https://github.com/apache/arrow-datafusion/pull/4944#issuecomment-1396904416 > Looks like the method's comment is out of date. Could you please also fix them in this PR ? Fixed. -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] OfekShilon commented on issue #33784: [R] writing/reading a data.frame with column class 'list' changes column class

2023-01-19 Thread GitBox
OfekShilon commented on issue #33784: URL: https://github.com/apache/arrow/issues/33784#issuecomment-1396903597 A repro with no use of `tibble`: ```r df <- data.frame(a=I(list(list(1), list(8 class(df$a[[1]]) # [1] "list" tmpf <- tempfile() arrow::write_feather(df,

[GitHub] [arrow] github-actions[bot] commented on pull request #33785: GH-33741: [Python] Address docstrings in Data Types Factory Functions

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #33785: URL: https://github.com/apache/arrow/pull/33785#issuecomment-1396903229 * Closes: #33741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] AlenkaF opened a new pull request, #33785: GH-33741: [Python] Address docstrings in Data Types Factory Functions

2023-01-19 Thread GitBox
AlenkaF opened a new pull request, #33785: URL: https://github.com/apache/arrow/pull/33785 ### Rationale for this change Ensure docstrings for [Data Types Factory Functions](https://arrow.apache.org/docs/python/api/datatypes.html#factory-functions) have an Examples section. ###

[GitHub] [arrow-datafusion] mustafasrepo opened a new pull request, #4982: Use arrow concat_batches instead of custom non-owning merge_batches

2023-01-19 Thread GitBox
mustafasrepo opened a new pull request, #4982: URL: https://github.com/apache/arrow-datafusion/pull/4982 # Which issue does this PR close? Closes #. # Rationale for this change With recent change in arrow `concat_batches`, `merge_batches` and

[GitHub] [arrow-datafusion] ursabot commented on pull request #4924: Unify Row hash and hash implementation

2023-01-19 Thread GitBox
ursabot commented on PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#issuecomment-1396885272 Benchmark runs are scheduled for baseline = e6a050058bd704f73b38106b7abf21dc4539eebc and contender = 96cf046be57bf09548d51f50d0bc964904bcec7d.

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4972: Simplify GroupByHash implementation (to prepare for more work)

2023-01-19 Thread GitBox
alamb commented on code in PR #4972: URL: https://github.com/apache/arrow-datafusion/pull/4972#discussion_r1081176906 ## datafusion/core/src/physical_plan/aggregates/row_hash.rs: ## @@ -115,6 +106,14 @@ struct GroupedHashAggregateStreamInner { indices: [Vec>; 2], }

[GitHub] [arrow-datafusion] alamb commented on pull request #4924: Unify Row hash and hash implementation

2023-01-19 Thread GitBox
alamb commented on PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#issuecomment-1396866343 Thanks again -- this is going to be great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] alamb merged pull request #4924: Unify Row hash and hash implementation

2023-01-19 Thread GitBox
alamb merged PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] alamb closed issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs` (remove duplication)

2023-01-19 Thread GitBox
alamb closed issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs` (remove duplication) URL: https://github.com/apache/arrow-datafusion/issues/2723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3563: Implement Extend for ArrayBuilder (#1841)

2023-01-19 Thread GitBox
tustvold commented on code in PR #3563: URL: https://github.com/apache/arrow-rs/pull/3563#discussion_r1081165909 ## arrow-array/src/builder/generic_bytes_dictionary_builder.rs: ## @@ -255,12 +255,34 @@ where Ok(key) } +/// Infallibly append a value to this

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3514: Use array_value_to_string in arrow-csv

2023-01-19 Thread GitBox
alamb commented on code in PR #3514: URL: https://github.com/apache/arrow-rs/pull/3514#discussion_r1081145188 ## arrow-cast/src/display.rs: ## @@ -309,9 +379,10 @@ fn append_map_field_string( /// /// Note this function is quite inefficient and is unlikely to be /// suitable

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3514: Use array_value_to_string in arrow-csv

2023-01-19 Thread GitBox
alamb commented on code in PR #3514: URL: https://github.com/apache/arrow-rs/pull/3514#discussion_r1081145188 ## arrow-cast/src/display.rs: ## @@ -309,9 +379,10 @@ fn append_map_field_string( /// /// Note this function is quite inefficient and is unlikely to be /// suitable

[GitHub] [arrow] MarcoGorelli commented on pull request #14662: ARROW-16544: [DevTools] Add linting to Cython files

2023-01-19 Thread GitBox
MarcoGorelli commented on PR #14662: URL: https://github.com/apache/arrow/pull/14662#issuecomment-1396842946 Hi - is there still interest in addressing https://issues.apache.org/jira/browse/ARROW-16544 ? If not, I'll close, no worries -- This is an automated message from the

[GitHub] [arrow] thisisnic commented on pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
thisisnic commented on PR #33748: URL: https://github.com/apache/arrow/pull/33748#issuecomment-1396828704 I've added in the URLS (this won't work automatically yet as things are configured to work with JIRA), ditched the "other" subsection and moved some items up in the hierarchy to "new

[GitHub] [arrow-datafusion] ygf11 commented on pull request #4923: Support join-filter pushdown for semi/anti join

2023-01-19 Thread GitBox
ygf11 commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1396823473 > Maybe we can add some error cases shpwing that SELECT t1.id FROM t1 SEMI JOIN t2 ON (t1.id = t2.id) WHERE t2.value = 5 generates an error > It would also be good to cover

[GitHub] [arrow] joosthooz commented on pull request #33738: GH-33737: [C++] simplify exec plan tracing

2023-01-19 Thread GitBox
joosthooz commented on PR #33738: URL: https://github.com/apache/arrow/pull/33738#issuecomment-1396819003 Nice, I think the WriteAndCheckBackpressure span is important because that's where the backpressure is checked and also it performs some work combining staged batches (in

[GitHub] [arrow-julia] ericphanson commented on pull request #381: Tag new version dev/release/release.sh

2023-01-19 Thread GitBox
ericphanson commented on PR #381: URL: https://github.com/apache/arrow-julia/pull/381#issuecomment-1396815064 TagBot is enabled/running here, but is having issues across many repos those days (lots of 502 errors from GitHub), including this one, so it is not very reliable at present

[GitHub] [arrow-rs] tustvold opened a new pull request, #3564: Improve GenericBytesBuilder offset overflow panic message (#139)

2023-01-19 Thread GitBox
tustvold opened a new pull request, #3564: URL: https://github.com/apache/arrow-rs/pull/3564 # Which issue does this PR close? Closes #139 # Rationale for this change # What changes are included in this PR? Adds a panic message instead of

[GitHub] [arrow-rs] alamb commented on issue #3562: Panic on Key Overflow in Dictionary Builders

2023-01-19 Thread GitBox
alamb commented on issue #3562: URL: https://github.com/apache/arrow-rs/issues/3562#issuecomment-1396798368 My personal opinion is that the client code should be able to avoid panic's if the user desires, but that doesn't mean the API needs to be fallible. For example, as long as it

[GitHub] [arrow-datafusion] DDtKey opened a new issue, #4981: Incorrect nested error wrapped to `ArrowError:External` variant

2023-01-19 Thread GitBox
DDtKey opened a new issue, #4981: URL: https://github.com/apache/arrow-datafusion/issues/4981 **Describe the bug** [This line of code](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/joins/utils.rs#L695) makes impossible to get real reason of

[GitHub] [arrow] joosthooz commented on a diff in pull request #33738: GH-33737: [C++] simplify exec plan tracing

2023-01-19 Thread GitBox
joosthooz commented on code in PR #33738: URL: https://github.com/apache/arrow/pull/33738#discussion_r1081097788 ## cpp/src/arrow/compute/exec/sink_node.cc: ## @@ -335,19 +327,13 @@ class ConsumingSinkNode : public ExecNode, public BackpressureControl { void Resume()

[GitHub] [arrow-rs] tustvold opened a new pull request, #3563: Implement Extend for ArrayBuilder (#1841)

2023-01-19 Thread GitBox
tustvold opened a new pull request, #3563: URL: https://github.com/apache/arrow-rs/pull/3563 _Draft pending #3562_ # Which issue does this PR close? Closes #1841 # Rationale for this change # What changes are included in this PR? #

[GitHub] [arrow-rs] tustvold opened a new issue, #3562: Panic on Key Overflow in Dictionary Builders

2023-01-19 Thread GitBox
tustvold opened a new issue, #3562: URL: https://github.com/apache/arrow-rs/issues/3562 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I suspect this will be a controversial issue, but there a couple of reasons for

[GitHub] [arrow] pitrou commented on issue #15054: [CI][Python] wheel-manylinux2014-* sometimes crashed on pytest exit

2023-01-19 Thread GitBox
pitrou commented on issue #15054: URL: https://github.com/apache/arrow/issues/15054#issuecomment-1396772668 > …maybe we could just leak the S3 client as a workaround? We can probably do that indeed. We just have to make sure to disable S3 on ASAN and Valgrind CI jobs. -- This is

[GitHub] [arrow-rs] tustvold commented on issue #1858: Huge amount of llvm code generated by comparison kernels, potentially slowing compile times

2023-01-19 Thread GitBox
tustvold commented on issue #1858: URL: https://github.com/apache/arrow-rs/issues/1858#issuecomment-1396761193 I'm closing as these comparison kernels are now gated with feature flags, and crates can also choose not to depend on arrow-ord at all. Feel free to re-open if there is additional

[GitHub] [arrow-rs] tustvold closed issue #1858: Huge amount of llvm code generated by comparison kernels, potentially slowing compile times

2023-01-19 Thread GitBox
tustvold closed issue #1858: Huge amount of llvm code generated by comparison kernels, potentially slowing compile times URL: https://github.com/apache/arrow-rs/issues/1858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] thisisnic commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-19 Thread GitBox
thisisnic commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1081073855 ## r/NEWS.md: ## @@ -19,6 +19,77 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the

[GitHub] [arrow] ursabot commented on pull request #15210: GH-20512: [Python] Numpy conversion doesn't account for ListArray offset

2023-01-19 Thread GitBox
ursabot commented on PR #15210: URL: https://github.com/apache/arrow/pull/15210#issuecomment-1396745891 Benchmark runs are scheduled for baseline = 705e04bb15f481e476c9e7a8e2ac92460890ad0c and contender = 2b50694c10e09e4a1343b62c6b5f44ad4403d0e1. 2b50694c10e09e4a1343b62c6b5f44ad4403d0e1

[GitHub] [arrow] MMCMA commented on issue #15153: [Python] OSError: Couldn't deserialize thrift: TProtocolException

2023-01-19 Thread GitBox
MMCMA commented on issue #15153: URL: https://github.com/apache/arrow/issues/15153#issuecomment-1396725954 I can close the issue - I just discovered by chance that in very rare circumstances two processes we writing to the same file at the same time. Sorry about this. -- This is an

[GitHub] [arrow] raulcd merged pull request #33751: WIP: [Release] Verify release-11.0.0-rc0

2023-01-19 Thread GitBox
raulcd merged PR #33751: URL: https://github.com/apache/arrow/pull/33751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
tustvold commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081032053 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
tustvold commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081032053 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081030906 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow] raulcd commented on issue #15054: [CI][Python] wheel-manylinux2014-* sometimes crashed on pytest exit

2023-01-19 Thread GitBox
raulcd commented on issue #15054: URL: https://github.com/apache/arrow/issues/15054#issuecomment-1396706119 As suggested by @lidavidm this is causing release verification (https://github.com/apache/arrow/pull/33751) to be a bit painful. I had to retry the `wheel-manylinux2014-cp39-amd64`

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081030906 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
tustvold commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081025726 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081013622 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081013622 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081013622 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1081004665 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] ursabot commented on pull request #3560: Update pyarrow method call with kwargs

2023-01-19 Thread GitBox
ursabot commented on PR #3560: URL: https://github.com/apache/arrow-rs/pull/3560#issuecomment-1396668572 Benchmark runs are scheduled for baseline = de62808a9d65e052ff3e89550bf780d952c8ceae and contender = d9802353f195979f7c6541143c7e849f5ac2d661. d9802353f195979f7c6541143c7e849f5ac2d661

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3561: Return reference from ListArray::values

2023-01-19 Thread GitBox
tustvold commented on code in PR #3561: URL: https://github.com/apache/arrow-rs/pull/3561#discussion_r1080992940 ## arrow-cast/src/cast.rs: ## @@ -4710,8 +4709,8 @@ mod tests { assert_eq!(1, arr.value_length(2)); assert_eq!(1, arr.value_length(3));

[GitHub] [arrow-rs] tustvold opened a new pull request, #3561: Return reference from ListArray::values

2023-01-19 Thread GitBox
tustvold opened a new pull request, #3561: URL: https://github.com/apache/arrow-rs/pull/3561 # Which issue does this PR close? Closes #. # Rationale for this change Returning an owned value not only result in unnecessary clones, but results in poor

[GitHub] [arrow] Ziy1-Tan commented on pull request #33781: re2::RE2::RE2() result must be checked

2023-01-19 Thread GitBox
Ziy1-Tan commented on PR #33781: URL: https://github.com/apache/arrow/pull/33781#issuecomment-1396660590 cc @kou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-rs] tustvold merged pull request #3560: Update pyarrow method call with kwargs

2023-01-19 Thread GitBox
tustvold merged PR #3560: URL: https://github.com/apache/arrow-rs/pull/3560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-rs] DDtKey commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
DDtKey commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1080981598 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-19 Thread GitBox
tustvold commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1080984521 ## arrow-csv/src/reader/records.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

<    1   2   3   4   5   6   7   8   9   10   >