[jira] [Created] (ARROW-13067) [C++][Compute] Implement integer to decimal casting
Yibo Cai created ARROW-13067: Summary: [C++][Compute] Implement integer to decimal casting Key: ARROW-13067 URL: https://issues.apache.org/jira/browse/ARROW-13067 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 4.0.1 Reporter: Yibo Cai Fix For: 5.0.0 Current cast kernel supports decimal to integer casting, but not integer to decimal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13066) [Docs] Describe supported OS + client languages
Jonathan Keane created ARROW-13066: -- Summary: [Docs] Describe supported OS + client languages Key: ARROW-13066 URL: https://issues.apache.org/jira/browse/ARROW-13066 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Jonathan Keane We mention which operating systems we support: https://arrow.apache.org/install/#c-and-glib-c-packages-for-debian-gnulinux-ubuntu-and-centos And which Python versions: https://arrow.apache.org/docs/developers/python.html But we should also include other languages (e.g. R), as well as some philosophy / criteria for how long we support them (Python versions until they are EOLed, R versions 5 years back, ...) Additionally: "It would be good to make a note somewhere of (1) platforms we test on, (2) platforms we have worked to support in the past but don't have CI for (e.g. solaris, ibm something that came up recently, raspbian per Nic's issue, etc.), and (3) what one can do if they want to add support for some other platform not listed" -Neal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13065) [Packaging][RPM] Add missing required LZ4 version information
Kouhei Sutou created ARROW-13065: Summary: [Packaging][RPM] Add missing required LZ4 version information Key: ARROW-13065 URL: https://issues.apache.org/jira/browse/ARROW-13065 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13064) [C++] Add a general "if, ifelse, ..., else" kernel
Ian Cook created ARROW-13064: Summary: [C++] Add a general "if, ifelse, ..., else" kernel Key: ARROW-13064 URL: https://issues.apache.org/jira/browse/ARROW-13064 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Ian Cook ARROW-10640 added a ternary {{if_else}} kernel. Add another kernel that extends this concept to an arbitrary number of conditions and associated results, like a vectorized {{if-ifelse-...-else}} with an arbitrary number of {{ifelse}} and with the {{else}} optional. This is like a SQL {{CASE}} statement. How best to achieve this is not obvious. To enable SQL-style uses, it would be most efficient to implement this as a variadic kernel where the even-number arguments (0, 2, ...) are the arrays of boolean conditions, the odd-number arguments (1, 3, ...) are the corresponding arrays of results, and the final argument is the {{else}} result. But I'm not sure if this is practical. Maybe instead we should implement this to operate on listarrays, like NumPy's {{[np.where|https://numpy.org/doc/stable/reference/generated/numpy.where.html]}} or {{[np.select|https://numpy.org/doc/stable/reference/generated/numpy.select.html]}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13063) [Dev] Link our nightly reports info to Jira
Jonathan Keane created ARROW-13063: -- Summary: [Dev] Link our nightly reports info to Jira Key: ARROW-13063 URL: https://issues.apache.org/jira/browse/ARROW-13063 Project: Apache Arrow Issue Type: Sub-task Components: Developer Tools Reporter: Jonathan Keane Jira has an API — we should try and use that to match jira tickets with builds (e.g. "this build will be fixed when issue Y is solved, so don't alert unless Y is marked as resolved and the build is still failing"). This is similar to ARROW-8043, but is specific to this information repository -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13062) [Dev] Add a way for people to add information to our saved crossbow data
Jonathan Keane created ARROW-13062: -- Summary: [Dev] Add a way for people to add information to our saved crossbow data Key: ARROW-13062 URL: https://issues.apache.org/jira/browse/ARROW-13062 Project: Apache Arrow Issue Type: Sub-task Components: Developer Tools Reporter: Jonathan Keane We should have a simple + ligthweight way to annotate specific builds with information like "won't be fixed until dask has a new release" or "this is supposed to be fixed in ARROW-XXX". We should find an easy, lightweight way to add this kind of information. We *should not* require, ask, or allow people to add this information to the JSON that is saved as part of ARROW-13509. That JSON should be kept pristine and not have manual edits. Instead, we should have a plain-text look up file that matches notes to specific builds (maybe to specific dates?) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13061) [Dev] Display our nightly analysis somewhere so that people can see it
Jonathan Keane created ARROW-13061: -- Summary: [Dev] Display our nightly analysis somewhere so that people can see it Key: ARROW-13061 URL: https://issues.apache.org/jira/browse/ARROW-13061 Project: Apache Arrow Issue Type: Sub-task Components: Developer Tools Reporter: Jonathan Keane Create a markdown report that uses the saved nightly data to create a static site that we can publish on gh-pages (of ursacomputing/crossbow, or of the other ursacomputing repository that we are storing the data of the nightly builds in) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13060) [Dev] Process the saved nightly build data
Jonathan Keane created ARROW-13060: -- Summary: [Dev] Process the saved nightly build data Key: ARROW-13060 URL: https://issues.apache.org/jira/browse/ARROW-13060 Project: Apache Arrow Issue Type: Sub-task Components: Developer Tools Reporter: Jonathan Keane One ARROW-13059 is done and we have historical data for each build saved somewhere reliable + accessible, we can analyze it. (We can even use arrow to do that!) At a minimum we should do the following: * days since last passing * last commit that passed * % of failures in the past week, month, year -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13059) Adapt the crossbow code to save build status to json
Jonathan Keane created ARROW-13059: -- Summary: Adapt the crossbow code to save build status to json Key: ARROW-13059 URL: https://issues.apache.org/jira/browse/ARROW-13059 Project: Apache Arrow Issue Type: Sub-task Reporter: Jonathan Keane Add to / adapt the code that {{archery crossbow}} already uses to send the email report to also save the status of the builds to a json file and commit that to a new branch in the crossbow (or some other ursacomputing repository) Crossbow code (hint this is the code that you will want to copy + adapt to do this new task): https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/reports.py This is how the nightly jobs are triggered: https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/nightly_report.yml (note that it [figures out what the job id|https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/nightly_report.yml#L33-L34] is and then it runs a command {{archery crossbow report ...}} The archer CLI interface is specifiedin https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/cli.py Ultimately what we want is something like: a new command like {{crossbow archery save-report-data}} that uses similar code/approaches to how the report is designed but saves the data to json (or line delimited json) and saves that somewhere reliable (i.e. the ursacomputing/crossbow repository or a new repository under ursacomputing) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13058) [Dev] Improve nightly build visibility
Jonathan Keane created ARROW-13058: -- Summary: [Dev] Improve nightly build visibility Key: ARROW-13058 URL: https://issues.apache.org/jira/browse/ARROW-13058 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Jonathan Keane -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13057) [MATLAB] update feather reading/writing functionality to use latest MATLAB data array (MDA) api's, stead of old style C-matrix api
tahsin hassan created ARROW-13057: - Summary: [MATLAB] update feather reading/writing functionality to use latest MATLAB data array (MDA) api's, stead of old style C-matrix api Key: ARROW-13057 URL: https://issues.apache.org/jira/browse/ARROW-13057 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: tahsin hassan Assignee: Kevin Gurney The current featherreadmex and featherwritemex functionality uses the old c-style matrix api, from MATLAB [https://www.mathworks.com/help/matlab/cc-mx-matrix-library.html?s_tid=CRUX_lftnav] We should update these functionalities to build against the latest MATLAB Data Array (mda) , C++ style api, since these new API uses modern C++ semantics and design patterns and avoids data copies whenever possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13056) Expand PR labeler for supported language - MATLAB
tahsin hassan created ARROW-13056: - Summary: Expand PR labeler for supported language - MATLAB Key: ARROW-13056 URL: https://issues.apache.org/jira/browse/ARROW-13056 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: tahsin hassan The PR labeler yaml file [https://github.com/apache/arrow/blob/master/.github/workflows/dev_pr/labeler.yml] introduced in https://issues.apache.org/jira/browse/ARROW-10616 needs to be updated to include MATLAB as a supported language -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13055) [Format] Document "canonical extension type" and criteria
Neal Richardson created ARROW-13055: --- Summary: [Format] Document "canonical extension type" and criteria Key: ARROW-13055 URL: https://issues.apache.org/jira/browse/ARROW-13055 Project: Apache Arrow Issue Type: New Feature Components: Documentation, Format Reporter: Neal Richardson Fix For: 5.0.0 See discussion at [https://lists.apache.org/thread.html/r7ba08aed2809fa64537e6f44bce38b2cf740acbef0e91cfaa7c19767%40%3Cdev.arrow.apache.org%3E] and then again at [https://lists.apache.org/thread.html/r108ac130406b3e63ca23a60b8e79285857355f8342232ad226a6571a%40%3Cdev.arrow.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13054) [C++] Add TemporalOptions
Nic Crane created ARROW-13054: - Summary: [C++] Add TemporalOptions Key: ARROW-13054 URL: https://issues.apache.org/jira/browse/ARROW-13054 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Nic Crane Please could we implement TemporalOptions for the day_of_week kernel, so we can specify the first day of the week (i.e. and therefore affect which day of the week is represented by the integers 0 - 6 when calling day_of_week on a date). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13053) Build fails on MacOS Big Sur using homebrewed Arrow libraries
Dorian Kind created ARROW-13053: --- Summary: Build fails on MacOS Big Sur using homebrewed Arrow libraries Key: ARROW-13053 URL: https://issues.apache.org/jira/browse/ARROW-13053 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 4.0.1 Environment: MacOS BigSur 11.4 (Apple Silicon) Python 3.9.5 apache-arrow 4.0.1 (via Homebrew) Reporter: Dorian Kind When installing pyarrow 4.0.1 from source, the install step fails with {{error: can't copy 'build/lib.macosx-11.3-arm64-3.9/pyarrow/include/arrow': doesn't exist or not a regular file}} because the headers directory {{build/lib.macosx-11.3-arm64-3.9/pyarrow/include/arrow}} is a relative symlink to {{../Cellar/apache-arrow/4.0.1/include/arrow}} I believe this is caused by the build system including the header files from{{ /opt/homebrew/include/arrow}}, which is the above symlink: {{ls -hl /opt/homebrew/include/arrow }}{{lrwxr-xr-x 1 dki admin 42B Jun 8 15:35 /opt/homebrew/include/arrow -> ../Cellar/apache-arrow/4.0.1/include/arrow}} I was able work around this issue by modifying line 334 in {{CMakeLists.txt}} from {{Always bundle includes}} {{file(COPY ${ARROW_INCLUDE_DIR}/arrow DESTINATION ${BUILD_OUTPUT_ROOT_DIRECTORY}/include)}} to {{Always bundle includes}} {{get_filename_component(REAL_ARROW_INCLUDE_DIR "${ARROW_INCLUDE_DIR}/arrow" REALPATH)}} {{file(COPY ${}}{{REAL_ARROW_INCLUDE_DIR}}{{} DESTINATION ${BUILD_OUTPUT_ROOT_DIRECTORY}/include)}} But I'm not familiar with CMake, so maybe there is a more appropriate way to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13052) [C++][Gandiva] Implements REGEXP_EXTRACT function
Anthony Louis Gotlib Ferreira created ARROW-13052: - Summary: [C++][Gandiva] Implements REGEXP_EXTRACT function Key: ARROW-13052 URL: https://issues.apache.org/jira/browse/ARROW-13052 Project: Apache Arrow Issue Type: New Feature Components: C++ - Gandiva Reporter: Anthony Louis Gotlib Ferreira Assignee: Anthony Louis Gotlib Ferreira Implements the REGEXP_EXTRACT function based on the [the Hive implementation|https://www.revisitclass.com/hadoop/regexp_extract-function-in-hive-with-examples/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13051) [Release][Packaging] Update the java post release task to use the crossbow artifacts
Krisztian Szucs created ARROW-13051: --- Summary: [Release][Packaging] Update the java post release task to use the crossbow artifacts Key: ARROW-13051 URL: https://issues.apache.org/jira/browse/ARROW-13051 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools, Packaging Reporter: Krisztian Szucs We produce java jars using a crossbow tasks. Ideally we should download and deploy these packages instead of compiling them locally during the java post release task. See the produced jars at: https://github.com/ursacomputing/crossbow/releases/tag/actions-496-github-java-jars See more context at: https://github.com/apache/arrow/pull/10411 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13050) [C++][Gandiva] Implement SPACE Hive function on Gandiva
João Pedro Antunes Ferreira created ARROW-13050: --- Summary: [C++][Gandiva] Implement SPACE Hive function on Gandiva Key: ARROW-13050 URL: https://issues.apache.org/jira/browse/ARROW-13050 Project: Apache Arrow Issue Type: Task Components: C++ - Gandiva Reporter: João Pedro Antunes Ferreira Assignee: João Pedro Antunes Ferreira Implement SPACE Hive function on Gandiva -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13049) [C++][Gandiva] Implement BIN Hive function on Gandiva
João Pedro Antunes Ferreira created ARROW-13049: --- Summary: [C++][Gandiva] Implement BIN Hive function on Gandiva Key: ARROW-13049 URL: https://issues.apache.org/jira/browse/ARROW-13049 Project: Apache Arrow Issue Type: Task Components: C++ - Gandiva Reporter: João Pedro Antunes Ferreira Assignee: João Pedro Antunes Ferreira -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13048) [Python] S3FileSystem fails moving filepaths containing = or +
Joerg Schneider created ARROW-13048: --- Summary: [Python] S3FileSystem fails moving filepaths containing = or + Key: ARROW-13048 URL: https://issues.apache.org/jira/browse/ARROW-13048 Project: Apache Arrow Issue Type: Bug Affects Versions: 4.0.1 Reporter: Joerg Schneider Hi Arrow team, we have the very common use-case of having partitioned parquet tables on S3, written by Spark. These include equals (=) to denote the partition value per folder. In trying to use PyArrows S3FileSystem `move` function, it's not possible to move these objects in the bucket underneath a path which contains `=` somewhere: {code:java} OSError: When copying key 'table/date=202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet' in bucket 'bucket' to key 'table2/date=202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet' in bucket 'bucket': AWS Error [code 133]: The specified key does not exist.{code} It is also not possible to move, using preemptively URL-quoted paths, like these: {code:java} OSError: When copying key 'table/date%3D202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet' in bucket 'bucket' to key 'table2/date%3D202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet' in bucket 'bucket': AWS Error [code 133]: The specified key does not exist.{code} The source object does definitely exist, it has in fact been returned by a FileSelector from PyArrow itself and is just passed to move. Is there any configuration option to be set, or special quoting to be used? Thanks in advance. Joerg -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13047) [Website] Add kiszk to committer list
Kazuaki Ishizaki created ARROW-13047: Summary: [Website] Add kiszk to committer list Key: ARROW-13047 URL: https://issues.apache.org/jira/browse/ARROW-13047 Project: Apache Arrow Issue Type: Improvement Reporter: Kazuaki Ishizaki Assignee: Kazuaki Ishizaki Fix For: 5.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)