[jira] [Created] (ARROW-18189) [Python] Table.drop should support passing a single column
Alessandro Molina created ARROW-18189: - Summary: [Python] Table.drop should support passing a single column Key: ARROW-18189 URL: https://issues.apache.org/jira/browse/ARROW-18189 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alessandro Molina {code} > 2 data = dataset.drop("Churn") /usr/local/lib/python3.7/dist-packages/pyarrow/table.pxi in pyarrow.lib.Table.drop() KeyError: "Column 'C' not found" {code} Also, for consistency, it would probably be good to have a {{Table.drop_column}} alias, as all the other methods are name {{Table.add_column}} and {{Table.set_column}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18005) [C++] Bind the JSON RecordBatchReader to Dataset
Alessandro Molina created ARROW-18005: - Summary: [C++] Bind the JSON RecordBatchReader to Dataset Key: ARROW-18005 URL: https://issues.apache.org/jira/browse/ARROW-18005 Project: Apache Arrow Issue Type: Sub-task Components: C++ Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18003) [Python] Add sort_by to Table and RecordBatch
Alessandro Molina created ARROW-18003: - Summary: [Python] Add sort_by to Table and RecordBatch Key: ARROW-18003 URL: https://issues.apache.org/jira/browse/ARROW-18003 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Fix For: 11.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18002) [Python] Improve Sorting Capabilities in PyArrow
Alessandro Molina created ARROW-18002: - Summary: [Python] Improve Sorting Capabilities in PyArrow Key: ARROW-18002 URL: https://issues.apache.org/jira/browse/ARROW-18002 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17660) [Doc] pyarrow.Array.diff Examples is wrongly rendered
Alessandro Molina created ARROW-17660: - Summary: [Doc] pyarrow.Array.diff Examples is wrongly rendered Key: ARROW-17660 URL: https://issues.apache.org/jira/browse/ARROW-17660 Project: Apache Arrow Issue Type: Bug Reporter: Alessandro Molina Assignee: Alessandro Molina See [https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.diff] The output of the diff function is rendered outside of the code block -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17212) [Python] Support lazy Dataset.filter
Alessandro Molina created ARROW-17212: - Summary: [Python] Support lazy Dataset.filter Key: ARROW-17212 URL: https://issues.apache.org/jira/browse/ARROW-17212 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 10.0.0 Given that when possible we would like to keep Dataset and Table with a similar enough API that allows to perform most convenient operations on Dataset without having to materialise it to a table, it would be good to add proper support for a {{Dataset.filter}} method like the one we have on {{Table.filter}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16979) [Java] Further Consolidate JNI compilation
Alessandro Molina created ARROW-16979: - Summary: [Java] Further Consolidate JNI compilation Key: ARROW-16979 URL: https://issues.apache.org/jira/browse/ARROW-16979 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16882) Allow to pick comparison function for pyarrow joins
Alessandro Molina created ARROW-16882: - Summary: Allow to pick comparison function for pyarrow joins Key: ARROW-16882 URL: https://issues.apache.org/jira/browse/ARROW-16882 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 8.0.0 Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 9.0.0 See https://github.com/apache/arrow/issues/13408 We probably want to allow end users to pick {{JoinKeyCmp::EQ}} or {{JoinKeyCmp::IS}} to perform the comparison -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16842) Lubridate features for version 10.0.0
Alessandro Molina created ARROW-16842: - Summary: Lubridate features for version 10.0.0 Key: ARROW-16842 URL: https://issues.apache.org/jira/browse/ARROW-16842 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16841) [R] Additional Lubridate Capabilities
Alessandro Molina created ARROW-16841: - Summary: [R] Additional Lubridate Capabilities Key: ARROW-16841 URL: https://issues.apache.org/jira/browse/ARROW-16841 Project: Apache Arrow Issue Type: Bug Components: C++, R Affects Versions: 9.0.0 Reporter: Alessandro Molina Umbrella Ticket for the remaining lubridate work -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16616) [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method
Alessandro Molina created ARROW-16616: - Summary: [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method Key: ARROW-16616 URL: https://issues.apache.org/jira/browse/ARROW-16616 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Fix For: 9.0.0 To keep the {{Dataset}} api compatible with the {{Table}} one in terms of analytics capabilities, we should add a {{Dataset.filter}} method. The initial POC was based on {{_table_filter}} but that required materialising all the {{Dataset}} content after filtering as it returned an {{{}InMemoryDataset{}}}. Given that {{Scanner}} can filter a dataset without actually materialising the data until a final step happens, it would be good to have {{Dataset.filter}} return some form of lazy dataset when the filter is only stored aside and the Scanner is created when data is actually retrieved. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16518) [Python] Ensure _exec_plan.execplan preserves order of inputs
Alessandro Molina created ARROW-16518: - Summary: [Python] Ensure _exec_plan.execplan preserves order of inputs Key: ARROW-16518 URL: https://issues.apache.org/jira/browse/ARROW-16518 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Fix For: 9.0.0 At the moment execplan doesn't guarantee any ordered output, the batches are consumed in a random order. This can lead to unordered outputs when `use_threads=True` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16470) [Python][Doc] Document Table.filter capability in compute documentation
Alessandro Molina created ARROW-16470: - Summary: [Python][Doc] Document Table.filter capability in compute documentation Key: ARROW-16470 URL: https://issues.apache.org/jira/browse/ARROW-16470 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Fix For: 9.0.0 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16469) [Python] Extend Table.filter to accept Expressions
Alessandro Molina created ARROW-16469: - Summary: [Python] Extend Table.filter to accept Expressions Key: ARROW-16469 URL: https://issues.apache.org/jira/browse/ARROW-16469 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Fix For: 9.0.0 If {{Table.filter}} receives an expression, it should invoke {{{}_exec_plan.filter_table{}}}. Also extend the docstring to reflect this change. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16468) [Python] Test the _exec_plan.filter_table helper with complex expressions
Alessandro Molina created ARROW-16468: - Summary: [Python] Test the _exec_plan.filter_table helper with complex expressions Key: ARROW-16468 URL: https://issues.apache.org/jira/browse/ARROW-16468 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Create a comprehensive test suite for {{_exec_plan.filter_table}} with the primary purpose of testing its convenience and ease of use. (PS: {{pc.field}} and {{pc.scalar}} shoul be used when building expressions, not {{Expression._fied}} etc..) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16467) [Python] Allow execplan to handle Filter nodes
Alessandro Molina created ARROW-16467: - Summary: [Python] Allow execplan to handle Filter nodes Key: ARROW-16467 URL: https://issues.apache.org/jira/browse/ARROW-16467 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Fix For: 9.0.0 Create a {{filter_table}} helper function in {{_exec_plan}} that allows passing a {{Table}} and an {{Expression}} to filter the table with the provided expression. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16383) [C++] Disable memory mapping by default in Arrow-C++
Alessandro Molina created ARROW-16383: - Summary: [C++] Disable memory mapping by default in Arrow-C++ Key: ARROW-16383 URL: https://issues.apache.org/jira/browse/ARROW-16383 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina In case there are places where Arrow-C++ is performing read or writes using memory mapping by default, we should make memory mapping optional -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16382) [Python] Disable memory mapping by default in pyarrow
Alessandro Molina created ARROW-16382: - Summary: [Python] Disable memory mapping by default in pyarrow Key: ARROW-16382 URL: https://issues.apache.org/jira/browse/ARROW-16382 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina All read and writes in PyArrow should be done without memory mapping unless explicitly enabled -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16381) [Python] Research if memory mapping is on by default in PyArrow
Alessandro Molina created ARROW-16381: - Summary: [Python] Research if memory mapping is on by default in PyArrow Key: ARROW-16381 URL: https://issues.apache.org/jira/browse/ARROW-16381 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alvin Chunga Mamani -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16380) Research where Memory Mapping is ON by default in Arrow-C++
Alessandro Molina created ARROW-16380: - Summary: Research where Memory Mapping is ON by default in Arrow-C++ Key: ARROW-16380 URL: https://issues.apache.org/jira/browse/ARROW-16380 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alvin Chunga Mamani Check if there are places where memory mapping is actually ON by default in C++ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16379) [C++][Python] Change Memory Mapping to be off by default
Alessandro Molina created ARROW-16379: - Summary: [C++][Python] Change Memory Mapping to be off by default Key: ARROW-16379 URL: https://issues.apache.org/jira/browse/ARROW-16379 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Alessandro Molina Assignee: Alvin Chunga Mamani Fix For: 9.0.0 Having memory mapping on can cause problems on some kind of file systems like NFS etc... We should make sure that by default memory mapping is off on every read and write. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16378) [Archery][CI] Add possibility to archery crossbow reports to send a Zulip notification report via a webhook
Alessandro Molina created ARROW-16378: - Summary: [Archery][CI] Add possibility to archery crossbow reports to send a Zulip notification report via a webhook Key: ARROW-16378 URL: https://issues.apache.org/jira/browse/ARROW-16378 Project: Apache Arrow Issue Type: Sub-task Components: Archery, Continuous Integration, Developer Tools Reporter: Raúl Cumplido Assignee: Raúl Cumplido Fix For: 9.0.0 As part of improving our nightly reports we can add to crossbow the ability to send a slack message with the status of the nightly builds using a slack webhook for easy integration with any slack platform. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16279) [Python] Support Expressions in `Table.filter`
Alessandro Molina created ARROW-16279: - Summary: [Python] Support Expressions in `Table.filter` Key: ARROW-16279 URL: https://issues.apache.org/jira/browse/ARROW-16279 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 9.0.0 *Umbrella ticket* At the moment {{Table.filter}} only accepts a mask, and building a mask that actually leads to the rows we care about can be complex and slow in cases where more than one compute function is used to generate the mask. It would be helpful to be able to pass an {{Expression}} as the argument and get the table filtered by that expression as expressions are easier to understand and reason about than masks. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16267) [Java] Support Java 18
Alessandro Molina created ARROW-16267: - Summary: [Java] Support Java 18 Key: ARROW-16267 URL: https://issues.apache.org/jira/browse/ARROW-16267 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16155) [R] lubridate functions for 9.0.0
Alessandro Molina created ARROW-16155: - Summary: [R] lubridate functions for 9.0.0 Key: ARROW-16155 URL: https://issues.apache.org/jira/browse/ARROW-16155 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 8.0.0 Reporter: Alessandro Molina Assignee: Dragoș Moldovan-Grünfeld Fix For: 9.0.0 Umbrella ticket for lubridate functions in 9.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16139) [Python] Crash in tests/test_dataset.py::test_write_dataset_s3
Alessandro Molina created ARROW-16139: - Summary: [Python] Crash in tests/test_dataset.py::test_write_dataset_s3 Key: ARROW-16139 URL: https://issues.apache.org/jira/browse/ARROW-16139 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 7.0.0 Reporter: Alessandro Molina Fix For: 8.0.0 {code:java} Fatal Python error: Segmentation fault 1328 1329Thread 0x000117170e00 (most recent call first): 1330 File "/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py", line 927 in write_dataset 1331 File "/usr/local/lib/python3.9/site-packages/pyarrow/tests/test_dataset.py", line 4265 in test_write_dataset_s3 1332 File "/usr/local/lib/python3.9/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call 1333 File "/usr/local/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall 1334 File "/usr/local/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec 1335 File "/usr/local/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__ 1336 File "/usr/local/lib/python3.9/site-packages/_pytest/python.py", line 1761 in runtest 1337 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call 1338 File "/usr/local/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall 1339 File "/usr/local/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec 1340 File "/usr/local/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__ 1341 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 259 in 1342 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 338 in from_call 1343 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 258 in call_runtest_hook 1344 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 219 in call_and_report 1345 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 130 in runtestprotocol 1346 File "/usr/local/lib/python3.9/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol 1347 File "/usr/local/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall 1348 File "/usr/local/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec 1349 File "/usr/local/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__ 1350 File "/usr/local/lib/python3.9/site-packages/_pytest/main.py", line 347 in pytest_runtestloop 1351 File "/usr/local/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall 1352 File "/usr/local/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec 1353 File "/usr/local/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__ 1354 File "/usr/local/lib/python3.9/site-packages/_pytest/main.py", line 322 in _main 1355 File "/usr/local/lib/python3.9/site-packages/_pytest/main.py", line 268 in wrap_session 1356 File "/usr/local/lib/python3.9/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main 1357 File "/usr/local/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall 1358 File "/usr/local/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec 1359 File "/usr/local/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__ 1360 File "/usr/local/lib/python3.9/site-packages/_pytest/config/__init__.py", line 164 in main 1361 File "/usr/local/lib/python3.9/site-packages/_pytest/config/__init__.py", line 187 in console_main 1362 File "/usr/local/bin/pytest", line 8 in 1363ci/scripts/python_test.sh: line 55: 20279 Segmentation fault: 11 pytest -r s -v ${PYTEST_ARGS} --pyargs pyarrow 1364tests/test_dataset.py::test_write_dataset_s3 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16074) [Docs] Document joins in PyArrow
Alessandro Molina created ARROW-16074: - Summary: [Docs] Document joins in PyArrow Key: ARROW-16074 URL: https://issues.apache.org/jira/browse/ARROW-16074 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15578) [Java][Doc] Document C Data and how to interface with other languages.
Alessandro Molina created ARROW-15578: - Summary: [Java][Doc] Document C Data and how to interface with other languages. Key: ARROW-15578 URL: https://issues.apache.org/jira/browse/ARROW-15578 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Document how to use the C-Data interface to pass data to Python -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15577) [Java][Doc] Apache Arrow Flight
Alessandro Molina created ARROW-15577: - Summary: [Java][Doc] Apache Arrow Flight Key: ARROW-15577 URL: https://issues.apache.org/jira/browse/ARROW-15577 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Tutorial on how to use Arrow Flight in Java -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15576) [Java][Doc] VectorSchemaRoots for 2D data
Alessandro Molina created ARROW-15576: - Summary: [Java][Doc] VectorSchemaRoots for 2D data Key: ARROW-15576 URL: https://issues.apache.org/jira/browse/ARROW-15576 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Document VectorSchemaRoot and how to use it -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15575) [Java][Doc] Datasets Tutorial
Alessandro Molina created ARROW-15575: - Summary: [Java][Doc] Datasets Tutorial Key: ARROW-15575 URL: https://issues.apache.org/jira/browse/ARROW-15575 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Document how to use Datasets in Java -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15574) [Java][Doc] Review existing documentation
Alessandro Molina created ARROW-15574: - Summary: [Java][Doc] Review existing documentation Key: ARROW-15574 URL: https://issues.apache.org/jira/browse/ARROW-15574 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Review existing documentation ( https://arrow.apache.org/docs/java/ ) to make sure it’s easy to follow and complete (does it cover how to work with all possible types?) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15573) [Java][Doc] Apache Arrow memory management
Alessandro Molina created ARROW-15573: - Summary: [Java][Doc] Apache Arrow memory management Key: ARROW-15573 URL: https://issues.apache.org/jira/browse/ARROW-15573 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 - Apache arrow memory management: - - Apache java [arrow off-heap](https://github.com/apache/arrow/blob/ccffcea3fd383c448aa9da292baf2d0805ecab4d/java/memory/memory-core/pom.xml#L23) reference implementation with examples: - Figure out a use case when we could show power of off-heap versus common java heap implementations (for example: off-heap is performing well for faster start up) - Apache arrow best practices to working without problems with off-heap implementation -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15572) [Java][Doc]
Alessandro Molina created ARROW-15572: - Summary: [Java][Doc] Key: ARROW-15572 URL: https://issues.apache.org/jira/browse/ARROW-15572 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Getting Started Section (See for python https://arrow.apache.org/docs/python/getstarted.html for inspiration ) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15527) [Python] Make Joins able to execute the join operation
Alessandro Molina created ARROW-15527: - Summary: [Python] Make Joins able to execute the join operation Key: ARROW-15527 URL: https://issues.apache.org/jira/browse/ARROW-15527 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15526) [Python] Make joins able to output a Dataset as a result
Alessandro Molina created ARROW-15526: - Summary: [Python] Make joins able to output a Dataset as a result Key: ARROW-15526 URL: https://issues.apache.org/jira/browse/ARROW-15526 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15525) [Python] Make joins able to output a Table as result.
Alessandro Molina created ARROW-15525: - Summary: [Python] Make joins able to output a Table as result. Key: ARROW-15525 URL: https://issues.apache.org/jira/browse/ARROW-15525 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15524) [Python] Make joins able to receive Tables as inputs
Alessandro Molina created ARROW-15524: - Summary: [Python] Make joins able to receive Tables as inputs Key: ARROW-15524 URL: https://issues.apache.org/jira/browse/ARROW-15524 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15432) [Python] Address CSV docstrings
Alessandro Molina created ARROW-15432: - Summary: [Python] Address CSV docstrings Key: ARROW-15432 URL: https://issues.apache.org/jira/browse/ARROW-15432 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Ensure /docs/python/generated/pyarrow.csv.read_csv.html has an {{Examples}} section -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15431) [Python] Address docstrings in Schema
Alessandro Molina created ARROW-15431: - Summary: [Python] Address docstrings in Schema Key: ARROW-15431 URL: https://issues.apache.org/jira/browse/ARROW-15431 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Ensure all docstrings of classes and methods in /docs/python/generated/pyarrow.Schema.html have an {{Examples}} section. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15430) [Python] Address docstrings in Filesystem classes and functions
Alessandro Molina created ARROW-15430: - Summary: [Python] Address docstrings in Filesystem classes and functions Key: ARROW-15430 URL: https://issues.apache.org/jira/browse/ARROW-15430 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Ensure all docstrings in https://arrow.apache.org/docs/python/api/files.html and https://arrow.apache.org/docs/python/api/filesystems.html have an {{Examples}} section -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15429) [Python] Address docstrings in Table and related classes and functions
Alessandro Molina created ARROW-15429: - Summary: [Python] Address docstrings in Table and related classes and functions Key: ARROW-15429 URL: https://issues.apache.org/jira/browse/ARROW-15429 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Address docstrings of all classes and functions mentioned in https://arrow.apache.org/docs/python/api/tables.html -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15428) [Python] Address docstrings in Parquet classes and functions
Alessandro Molina created ARROW-15428: - Summary: [Python] Address docstrings in Parquet classes and functions Key: ARROW-15428 URL: https://issues.apache.org/jira/browse/ARROW-15428 Project: Apache Arrow Issue Type: Sub-task Components: Python Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Address docstrings of all classes and functions referenced in https://arrow.apache.org/docs/python/api/formats.html#parquet-files -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15369) [Doc] Follow-up of ARROW-14671
Alessandro Molina created ARROW-15369: - Summary: [Doc] Follow-up of ARROW-14671 Key: ARROW-15369 URL: https://issues.apache.org/jira/browse/ARROW-15369 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 7.0.0 Reporter: Alessandro Molina Assignee: Alessandro Molina Follow up with fixes for ARROW-14671 the original ticket was merged when the snippets couldn't be verified due to changes in rpy2 and pointers import/export feature in Arrow. Last time they have been checked they were wrong and could even trigger segfaults, so need to recheck and eventually tweak what's now invalid. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15367) [Python] Improve Classes and Methods Docstrings
Alessandro Molina created ARROW-15367: - Summary: [Python] Improve Classes and Methods Docstrings Key: ARROW-15367 URL: https://issues.apache.org/jira/browse/ARROW-15367 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Assignee: Alenka Frim Initiative aimed at improving methods and classes docstrings, especiallly from the point of view of ensuring they have an {{Examples}} section -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15185) Arrow R doesn't build correctly if Arrow C++ was configured with lowercase flags
Alessandro Molina created ARROW-15185: - Summary: Arrow R doesn't build correctly if Arrow C++ was configured with lowercase flags Key: ARROW-15185 URL: https://issues.apache.org/jira/browse/ARROW-15185 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 I use to configure ARROW C++ using lowercase flag names {{"on"}} instead of {{{}"ON"{}}}, that works fine with the C++ library and Python library, but leads to build errors on R. That's because the R build process looks for compile flags in {{ArrowOptions.cmake}} and those compile flags are compared with the uppercase ones. So in my {{ArrowOptions.cmake}} I had {code:java} ### Build the Parquet libraries set(ARROW_PARQUET "on") {code} while the R build script was looking for {code:java} grep 'set(ARROW_PARQUET "ON")' $ARROW_OPTS_CMAKE {code} this can be addressed by making those greps case insensitive -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15180) Document how to add JNI bindings for C++ features
Alessandro Molina created ARROW-15180: - Summary: Document how to add JNI bindings for C++ features Key: ARROW-15180 URL: https://issues.apache.org/jira/browse/ARROW-15180 Project: Apache Arrow Issue Type: Sub-task Components: Documentation, Java Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15179) Ensure Support for modern Java versions
Alessandro Molina created ARROW-15179: - Summary: Ensure Support for modern Java versions Key: ARROW-15179 URL: https://issues.apache.org/jira/browse/ARROW-15179 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 *Umbrella ticket for supporting recent java versions* -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15178) Developer Docs for Java
Alessandro Molina created ARROW-15178: - Summary: Developer Docs for Java Key: ARROW-15178 URL: https://issues.apache.org/jira/browse/ARROW-15178 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Inspired by [https://arrow.apache.org/docs/developers/python.html] and [https://arrow.apache.org/docs/developers/cpp/index.html] we should have a similar doc for Java -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15177) Check which Java versions we are packaging for
Alessandro Molina created ARROW-15177: - Summary: Check which Java versions we are packaging for Key: ARROW-15177 URL: https://issues.apache.org/jira/browse/ARROW-15177 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 Check if we are building multiple artifacts for Java and eventually which Java versions we are packaging for. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15176) Check which versions of Java Arrow currently support
Alessandro Molina created ARROW-15176: - Summary: Check which versions of Java Arrow currently support Key: ARROW-15176 URL: https://issues.apache.org/jira/browse/ARROW-15176 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15174) Consolidate Java JNI compilation
Alessandro Molina created ARROW-15174: - Summary: Consolidate Java JNI compilation Key: ARROW-15174 URL: https://issues.apache.org/jira/browse/ARROW-15174 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 *Umbrella ticket for consolidating Java JNI compilation initiative* Seems we have spread the JNI code across the {{cpp}} and {{java}} directories. As for other bindings (Python) we already discussed it would be great to consolidate and move all cpp code related to PYthon into PyArrow, we should do something equivalent for Java too and move all C++ code specific to Java into the Java project. At the moment there are two JNI related directories: * [https://github.com/apache/arrow/tree/master/java/c] * [https://github.com/apache/arrow/tree/master/cpp/src/jni] Let's also research what's the best method to build those. The {{java/c}} directory seems to be already integrated with the Java build process, let's check if that approach is something we can reuse for the {{dataset}} directory too -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15164) [R
Alessandro Molina created ARROW-15164: - Summary: [R Key: ARROW-15164 URL: https://issues.apache.org/jira/browse/ARROW-15164 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15163) [R] lubridate functions for 8.0.0
Alessandro Molina created ARROW-15163: - Summary: [R] lubridate functions for 8.0.0 Key: ARROW-15163 URL: https://issues.apache.org/jira/browse/ARROW-15163 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Alessandro Molina Assignee: Dragoș Moldovan-Grünfeld Fix For: 8.0.0 *Umbrella ticket for the Initiative aimed at reaching support for the most important lubridate functions in the R bindings* -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15158) [R] stringr functions for 8.0.0
Alessandro Molina created ARROW-15158: - Summary: [R] stringr functions for 8.0.0 Key: ARROW-15158 URL: https://issues.apache.org/jira/browse/ARROW-15158 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Alessandro Molina Assignee: Dragoș Moldovan-Grünfeld Fix For: 8.0.0 *Umbrella ticket for the Initiative aimed at reaching support for the most important strngr functions in the R bindings* -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15157) [Doc] New Contributors Guide v2
Alessandro Molina created ARROW-15157: - Summary: [Doc] New Contributors Guide v2 Key: ARROW-15157 URL: https://issues.apache.org/jira/browse/ARROW-15157 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Alessandro Molina Assignee: Alenka Frim Fix For: 8.0.0 Based on the feedbacks acquired thanks to ARROW-14278 and the survey conducted amount users, the New Contributors Guide can be further improved and extended. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15156) [Doc] Implement Tutorials for the Java Documentation
Alessandro Molina created ARROW-15156: - Summary: [Doc] Implement Tutorials for the Java Documentation Key: ARROW-15156 URL: https://issues.apache.org/jira/browse/ARROW-15156 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 8.0.0 The Java Documentation has a reference and it's getting an "how to do" part thanks to the cookbook, but it lacks a lot on the side of explaining how things work and what's available in the library. We should tweak the documentation adding tutorials for the most important capabilities of the library showing what's available and how it works. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15155) [Doc] Add support for serving multiple versions of the R documentation
Alessandro Molina created ARROW-15155: - Summary: [Doc] Add support for serving multiple versions of the R documentation Key: ARROW-15155 URL: https://issues.apache.org/jira/browse/ARROW-15155 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 7.0.0 Reporter: Alessandro Molina Assignee: Nicola Crane Fix For: 8.0.0 The Python, Java and C++ documentation have support for serving multiple versions of the docs and switching between them thanks to the newly added version switcher. The same doesn't apply to the R docs as they are not implemented in Sphinx and thus can't benefit from the version switcher implemented for Sphinx. We should provide an equivalent version switcher for the R docs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14937) [Doc] The docs about building docs doesn't mention mounting the target directory
Alessandro Molina created ARROW-14937: - Summary: [Doc] The docs about building docs doesn't mention mounting the target directory Key: ARROW-14937 URL: https://issues.apache.org/jira/browse/ARROW-14937 Project: Apache Arrow Issue Type: Bug Components: Documentation Affects Versions: 6.0.1 Reporter: Alessandro Molina When building with docker ( [https://arrow.apache.org/docs/developers/documentation.html#building-with-docker] ) the output goes into the {{/build}} directory of the container. The documentation states that the output will be available in {{docs/_build/html}} but for that to be true you need to mount that local directory as {{/build}} to get the output available to your own system, or it will remain in the container and thus won't be accessible. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14881) [C++][Doc] Warnings in Doxygen
Alessandro Molina created ARROW-14881: - Summary: [C++][Doc] Warnings in Doxygen Key: ARROW-14881 URL: https://issues.apache.org/jira/browse/ARROW-14881 Project: Apache Arrow Issue Type: Bug Components: Documentation Reporter: Alessandro Molina When building the doxygen apidoc for C++ I get a few warnings that forced me to disable {{WARN_AS_ERROR}} option to be able to proceed with building the docs. Is it an actual issue or something stray in my local environment? Here is a preview of the warnings {code:java} $ doxygen warning: Tag 'COLS_IN_ALPHA_INDEX' at line 1118 of file 'Doxyfile' has become obsolete. To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u" /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2741: warning: no uniquely matching class member found for void arrow::flight::protocol::HandshakeRequest::clear_protocol_version() /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2744: warning: no uniquely matching class member found for PROTOBUF_NAMESPACE_ID::uint64 arrow::flight::protocol::HandshakeRequest::_internal_protocol_version() const /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2747: warning: no uniquely matching class member found for PROTOBUF_NAMESPACE_ID::uint64 arrow::flight::protocol::HandshakeRequest::protocol_version() const /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2751: warning: no uniquely matching class member found for void arrow::flight::protocol::HandshakeRequest::_internal_set_protocol_version(::PROTOBUF_NAMESPACE_ID::uint64 value) /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2755: warning: no uniquely matching class member found for void arrow::flight::protocol::HandshakeRequest::set_protocol_version(::PROTOBUF_NAMESPACE_ID::uint64 value) /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2761: warning: no uniquely matching class member found for void arrow::flight::protocol::HandshakeRequest::clear_payload() /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2764: warning: no uniquely matching class member found for const std::string & arrow::flight::protocol::HandshakeRequest::payload() const /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2768: warning: no uniquely matching class member found for void arrow::flight::protocol::HandshakeRequest::set_payload(const std::string ) /Users/amol/ARROW/arrow/cpp/src/arrow/flight/Flight.pb.h:2772: warning: no uniquely matching class member found for std::string * arrow::flight::protocol::HandshakeRequest::mutable_payload() {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14738) [Doc][Python] Make return types clickable
Alessandro Molina created ARROW-14738: - Summary: [Doc][Python] Make return types clickable Key: ARROW-14738 URL: https://issues.apache.org/jira/browse/ARROW-14738 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 At the moment return types in the Python functions documentation are not clickable and thus it's hard to get details about something returned by a function. We should make them clickable. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14674) [Python][Doc] Connecting Python to Rust
Alessandro Molina created ARROW-14674: - Summary: [Python][Doc] Connecting Python to Rust Key: ARROW-14674 URL: https://issues.apache.org/jira/browse/ARROW-14674 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14673) [Python][Doc] Connecting Python to C++
Alessandro Molina created ARROW-14673: - Summary: [Python][Doc] Connecting Python to C++ Key: ARROW-14673 URL: https://issues.apache.org/jira/browse/ARROW-14673 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14672) [Python][Doc] Connecting Python to Java through CData
Alessandro Molina created ARROW-14672: - Summary: [Python][Doc] Connecting Python to Java through CData Key: ARROW-14672 URL: https://issues.apache.org/jira/browse/ARROW-14672 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14671) [Python][Doc] Connecting Python to R through CData interface
Alessandro Molina created ARROW-14671: - Summary: [Python][Doc] Connecting Python to R through CData interface Key: ARROW-14671 URL: https://issues.apache.org/jira/browse/ARROW-14671 Project: Apache Arrow Issue Type: Sub-task Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14656) [Python] Add sort helper method to StructArray
Alessandro Molina created ARROW-14656: - Summary: [Python] Add sort helper method to StructArray Key: ARROW-14656 URL: https://issues.apache.org/jira/browse/ARROW-14656 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina It's usually a common need to sort the results of an aggregation. The group_by method of tables returns a {{StructArray}} to provide multiple aggregations at once. Given that we don't support directly sorting a {{StructArray}} it means a few lines of code are necessary for a simple sort operation. It would be helpful to provide an helper method in {{StructArray}} that provides a one line solution for most simple cases. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14655) Add Cookbook recipe on group_by + sort
Alessandro Molina created ARROW-14655: - Summary: Add Cookbook recipe on group_by + sort Key: ARROW-14655 URL: https://issues.apache.org/jira/browse/ARROW-14655 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14608) [Python][C++] Provide access to hash_aggregate functions through a group_by function
Alessandro Molina created ARROW-14608: - Summary: [Python][C++] Provide access to hash_aggregate functions through a group_by function Key: ARROW-14608 URL: https://issues.apache.org/jira/browse/ARROW-14608 Project: Apache Arrow Issue Type: Sub-task Components: Python Affects Versions: 6.0.0 Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14607) Basic Group By functionality in pyarrow
Alessandro Molina created ARROW-14607: - Summary: Basic Group By functionality in pyarrow Key: ARROW-14607 URL: https://issues.apache.org/jira/browse/ARROW-14607 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Pyarrow currently lacks a group by functionality even though all the building blocks are available. It would be good to expose a very simple to use group_by function leaving more complex use cases to the execution engine once IR execution is available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14553) [Doc] Java Cookbook Release 1
Alessandro Molina created ARROW-14553: - Summary: [Doc] Java Cookbook Release 1 Key: ARROW-14553 URL: https://issues.apache.org/jira/browse/ARROW-14553 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Alessandro Molina Assignee: David Dali Susanibar Arce Fix For: 7.0.0 JAVA Recipes regarding the core sections of the Cookbook document ( [https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit#] ) * Reading and Writing Data * Creating Arrow Objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14337) Arrow doesn't build on M1 when SIMD acceleration is enabled
Alessandro Molina created ARROW-14337: - Summary: Arrow doesn't build on M1 when SIMD acceleration is enabled Key: ARROW-14337 URL: https://issues.apache.org/jira/browse/ARROW-14337 Project: Apache Arrow Issue Type: Improvement Affects Versions: 6.0.0 Reporter: Alessandro Molina Assignee: Krisztian Szucs Fix For: 7.0.0 There is a build error in C++ that seems related to XSIMD. An issue was opened on XSIMD ( [https://github.com/xtensor-stack/xsimd/issues/597] ) which now looks resolved. It's necessary to test if Arrow now builds with the new XSIMD release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14322) [Doc] Add Python doc on how to connect Python to other languages
Alessandro Molina created ARROW-14322: - Summary: [Doc] Add Python doc on how to connect Python to other languages Key: ARROW-14322 URL: https://issues.apache.org/jira/browse/ARROW-14322 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 As one of the core values of Arrow is to make data portable, it would be great to have a documentation section on how to connect across languages. Connecting through pipes, files or networking is fairly straightforward, but we don't document how to connect using shared memory or even how to connect using cross language communication through the C API. It would be good to have at least docs for how to do that across Python->Java and Python->R -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14309) [Python] CompressedInputStream doesn't support str or file objects
Alessandro Molina created ARROW-14309: - Summary: [Python] CompressedInputStream doesn't support str or file objects Key: ARROW-14309 URL: https://issues.apache.org/jira/browse/ARROW-14309 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 While in {{CompressedOutputStream}} we support providing {{str}} or {{file}} objects, for {{CompressedInputStream}} we currently don't, which makes the class harder to use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14294) [Doc][Python] Add tutorial on Flight to pyarrow documentation
Alessandro Molina created ARROW-14294: - Summary: [Doc][Python] Add tutorial on Flight to pyarrow documentation Key: ARROW-14294 URL: https://issues.apache.org/jira/browse/ARROW-14294 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14293) Basic Join functionality in PyArrow
Alessandro Molina created ARROW-14293: - Summary: Basic Join functionality in PyArrow Key: ARROW-14293 URL: https://issues.apache.org/jira/browse/ARROW-14293 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 7.0.0 We want to expose a {{Table.join}} and {{Dataset.join}} functionalities in PyArrow which can leverage our join feature from the ExecPlan to expose. The {{Table.join}} can easily return a new {{Table}}, questions about what {{Dataset.join}} might return are more complex as it probably doesn't make much sense to return a new {{Dataset}} given that the result won't map to any files on disk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14292) Expose ExecPlan to pyarrow
Alessandro Molina created ARROW-14292: - Summary: Expose ExecPlan to pyarrow Key: ARROW-14292 URL: https://issues.apache.org/jira/browse/ARROW-14292 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alessandro Molina Assignee: Krisztian Szucs Fix For: 7.0.0 At the moment pyarrow doesn't provide any way to leverage the query execution engine that the C++ layer provides. We want to expose the capability to execute plans that have been built by IBIS through a flatbuffers dump. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14281) How to Review PRs Guidelines
Alessandro Molina created ARROW-14281: - Summary: How to Review PRs Guidelines Key: ARROW-14281 URL: https://issues.apache.org/jira/browse/ARROW-14281 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Assignee: Antoine Pitrou Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14279) PyArrow Architectural Overview
Alessandro Molina created ARROW-14279: - Summary: PyArrow Architectural Overview Key: ARROW-14279 URL: https://issues.apache.org/jira/browse/ARROW-14279 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14280) R-Arrow Architectural Overview
Alessandro Molina created ARROW-14280: - Summary: R-Arrow Architectural Overview Key: ARROW-14280 URL: https://issues.apache.org/jira/browse/ARROW-14280 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14278) New Contributors Guide
Alessandro Molina created ARROW-14278: - Summary: New Contributors Guide Key: ARROW-14278 URL: https://issues.apache.org/jira/browse/ARROW-14278 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 7.0.0 Umbrella Issue for the Guide for new contributors for Python and R -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14277) R Tutorials 2021-Q4 Initiative
Alessandro Molina created ARROW-14277: - Summary: R Tutorials 2021-Q4 Initiative Key: ARROW-14277 URL: https://issues.apache.org/jira/browse/ARROW-14277 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 7.0.0 An umbrella ticket for the initiative of writing up a set of Tutorials for R users -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14255) [Python] FlightClient.do_action is a generator instead of returning one.
Alessandro Molina created ARROW-14255: - Summary: [Python] FlightClient.do_action is a generator instead of returning one. Key: ARROW-14255 URL: https://issues.apache.org/jira/browse/ARROW-14255 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 5.0.0 Reporter: Alessandro Molina Fix For: 6.0.0 According to the documentation ( https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightClient.html#pyarrow.flight.FlightClient.do_action ) I would expect to be able to invoke {{client.do_action}} and get back an iterator. While it seems that the function is in fact a generator itself. This makes a big difference because if it returns an iterator I expect to be able to do {{client.do_action(...)}} and just ignore the result if I don't care. But if it's a generator ignoring the results makes so that it actually doesn't get invoked at all and thus that the server won't do anything. It would make sense to make {{do_action}} return a generator instead of being a generator itself, that would keep backward compatibility and would avoid puzzled users that don't see their code being invoked when they do {{client.do_action(...)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14002) [Python] unify_schema should accept tuples too
Alessandro Molina created ARROW-14002: - Summary: [Python] unify_schema should accept tuples too Key: ARROW-14002 URL: https://issues.apache.org/jira/browse/ARROW-14002 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 5.0.0 Reporter: Alessandro Molina Calling {{unify_schemas}} with a tuple fails, but there is no good reason why we shouldn't support it and restrict argument to only lists. {code} TypeError: Argument 'schemas' has incorrect type (expected list, got tuple) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13883) [Python] Allow usage of Arrow arrays as masks of other arrays
Alessandro Molina created ARROW-13883: - Summary: [Python] Allow usage of Arrow arrays as masks of other arrays Key: ARROW-13883 URL: https://issues.apache.org/jira/browse/ARROW-13883 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 5.0.0 Reporter: Alessandro Molina At the moment we only allow using {{numpy}} arrays for masks when building new arrays. It would be helpful to allow using a {{pyarrow.Array}} too so that the user has more flexibility and we can avoid involving {{numpy}} in documentation examples for the sole purpose of creating masks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13835) [Python] Add utility to merge schemas
Alessandro Molina created ARROW-13835: - Summary: [Python] Add utility to merge schemas Key: ARROW-13835 URL: https://issues.apache.org/jira/browse/ARROW-13835 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 5.0.0 Reporter: Alessandro Molina In R we have the `unify_schemas` utility which allows to merge two or more schemas, in Python such feature would be helpful but we don't seem to provide it. Such feature already exists in C++ ( https://github.com/apache/arrow/blob/27affa3181708c4f800f9d0a70603fb3390d6462/cpp/src/arrow/type.cc#L1722-L1744 ) and should be exposed in PyArrow and documented under https://arrow.apache.org/docs/python/api/datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13832) [Doc][Python] Improve compute documentation
Alessandro Molina created ARROW-13832: - Summary: [Doc][Python] Improve compute documentation Key: ARROW-13832 URL: https://issues.apache.org/jira/browse/ARROW-13832 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 5.0.0 Reporter: Alessandro Molina Assignee: Alessandro Molina The documentation page is mostly empty ( https://arrow.apache.org/docs/python/compute.html ) but the API reference instead is fairly complete ( https://arrow.apache.org/docs/python/api/compute.html ) it would be a good idea to add a bit more context and maybe directly recap the compute functions into to doc page -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13783) Improve Table.to_string (and maybe __repr__) to also preview data of the table
Alessandro Molina created ARROW-13783: - Summary: Improve Table.to_string (and maybe __repr__) to also preview data of the table Key: ARROW-13783 URL: https://issues.apache.org/jira/browse/ARROW-13783 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 5.0.0 Reporter: Alessandro Molina Fix For: 6.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13755) [Python] Allow usage of field_names in partitioning when saving datasets
Alessandro Molina created ARROW-13755: - Summary: [Python] Allow usage of field_names in partitioning when saving datasets Key: ARROW-13755 URL: https://issues.apache.org/jira/browse/ARROW-13755 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Alessandro Molina When loading back datasets, it's possible to quickly provide the name of the columns for which data was partitioned using {code} partitioning=pyarrow.dataset.partitioning(field_names=["year"]) {code} this is convenient because it's easier and quicker than providing the whole schema, which can still be autodetected from the loaded data. On the other side, we don't support this when _saving_ data. If you provide {{field_names}} instead of the {{schema}} you will get a crash {code} pyarrow/dataset.py in _ensure_write_partitioning(scheme) 684 if not isinstance(scheme, Partitioning): 685 # TODO support passing field names, and get types from schema --> 686 raise ValueError("partitioning needs to be actual Partitioning object") 687 return scheme 688 {code} It would be convenient to allow to use {{field_names}} only even when saving as we can automatically detect the schema from the table itself that we are saving. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13753) [Doc][Cookbook] Filtering Arrays for values matching a mask filter - Python
Alessandro Molina created ARROW-13753: - Summary: [Doc][Cookbook] Filtering Arrays for values matching a mask filter - Python Key: ARROW-13753 URL: https://issues.apache.org/jira/browse/ARROW-13753 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13754) [Doc][Cookbook] Filtering Arrays for values matching a mask filter - R
Alessandro Molina created ARROW-13754: - Summary: [Doc][Cookbook] Filtering Arrays for values matching a mask filter - R Key: ARROW-13754 URL: https://issues.apache.org/jira/browse/ARROW-13754 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13752) [Doc][Cookbook] Searching for values matching a predicate in Arrays - R
Alessandro Molina created ARROW-13752: - Summary: [Doc][Cookbook] Searching for values matching a predicate in Arrays - R Key: ARROW-13752 URL: https://issues.apache.org/jira/browse/ARROW-13752 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13751) [Doc][Cookbook] Searching for values matching a predicate in Arrays - Python
Alessandro Molina created ARROW-13751: - Summary: [Doc][Cookbook] Searching for values matching a predicate in Arrays - Python Key: ARROW-13751 URL: https://issues.apache.org/jira/browse/ARROW-13751 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13749) [Doc][Cookbook] Work with temporal data (lubridate functions) - R
Alessandro Molina created ARROW-13749: - Summary: [Doc][Cookbook] Work with temporal data (lubridate functions) - R Key: ARROW-13749 URL: https://issues.apache.org/jira/browse/ARROW-13749 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13750) [Doc][Cookbook] Call an Arrow compute function which doesn't yet have an R binding - R
Alessandro Molina created ARROW-13750: - Summary: [Doc][Cookbook] Call an Arrow compute function which doesn't yet have an R binding - R Key: ARROW-13750 URL: https://issues.apache.org/jira/browse/ARROW-13750 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13748) [Doc][Cookbook] Work with character data (stringr functions and Arrow functions) - R
Alessandro Molina created ARROW-13748: - Summary: [Doc][Cookbook] Work with character data (stringr functions and Arrow functions) - R Key: ARROW-13748 URL: https://issues.apache.org/jira/browse/ARROW-13748 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13732) [Doc][Cookbook] Manipulating and analyze Arrow data with dplyr verbs - R
Alessandro Molina created ARROW-13732: - Summary: [Doc][Cookbook] Manipulating and analyze Arrow data with dplyr verbs - R Key: ARROW-13732 URL: https://issues.apache.org/jira/browse/ARROW-13732 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13731) [Doc][Cookbook] Adding a column to an existing Table - R
Alessandro Molina created ARROW-13731: - Summary: [Doc][Cookbook] Adding a column to an existing Table - R Key: ARROW-13731 URL: https://issues.apache.org/jira/browse/ARROW-13731 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13730) [Doc][Cookbook] Adding a column to an existing Table - Python
Alessandro Molina created ARROW-13730: - Summary: [Doc][Cookbook] Adding a column to an existing Table - Python Key: ARROW-13730 URL: https://issues.apache.org/jira/browse/ARROW-13730 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13727) [Doc][Cookbook] Appending Tables to an existing Table - Python
Alessandro Molina created ARROW-13727: - Summary: [Doc][Cookbook] Appending Tables to an existing Table - Python Key: ARROW-13727 URL: https://issues.apache.org/jira/browse/ARROW-13727 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13728) [Doc][Cookbook] Appending Tables to an existing Table - R
Alessandro Molina created ARROW-13728: - Summary: [Doc][Cookbook] Appending Tables to an existing Table - R Key: ARROW-13728 URL: https://issues.apache.org/jira/browse/ARROW-13728 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)