[jira] [Created] (ARROW-13067) [C++][Compute] Implement integer to decimal casting

2021-06-11 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-13067:


 Summary: [C++][Compute] Implement integer to decimal casting
 Key: ARROW-13067
 URL: https://issues.apache.org/jira/browse/ARROW-13067
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 4.0.1
Reporter: Yibo Cai
 Fix For: 5.0.0


Current cast kernel supports decimal to integer casting, but not integer to 
decimal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13066) [Docs] Describe supported OS + client languages

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13066:
--

 Summary: [Docs] Describe supported OS + client languages
 Key: ARROW-13066
 URL: https://issues.apache.org/jira/browse/ARROW-13066
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Jonathan Keane


We mention which operating systems we support: 
https://arrow.apache.org/install/#c-and-glib-c-packages-for-debian-gnulinux-ubuntu-and-centos

And which Python versions: https://arrow.apache.org/docs/developers/python.html

But we should also include other languages (e.g. R), as well as some philosophy 
/ criteria for how long we support them (Python versions until they are EOLed, 
R versions 5 years back, ...)

Additionally: "It would be good to make a note somewhere of (1) platforms we 
test on, (2) platforms we have worked to support in the past but don't have CI 
for (e.g. solaris, ibm something that came up recently, raspbian per Nic's 
issue, etc.), and (3) what one can do if they want to add support for some 
other platform not listed" -Neal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13065) [Packaging][RPM] Add missing required LZ4 version information

2021-06-11 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-13065:


 Summary: [Packaging][RPM] Add missing required LZ4 version 
information
 Key: ARROW-13065
 URL: https://issues.apache.org/jira/browse/ARROW-13065
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13064) [C++] Add a general "if, ifelse, ..., else" kernel

2021-06-11 Thread Ian Cook (Jira)
Ian Cook created ARROW-13064:


 Summary: [C++] Add a general "if, ifelse, ..., else" kernel
 Key: ARROW-13064
 URL: https://issues.apache.org/jira/browse/ARROW-13064
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ian Cook


ARROW-10640 added a ternary {{if_else}} kernel. Add another kernel that extends 
this concept to an arbitrary number of conditions and associated results, like 
a vectorized {{if-ifelse-...-else}} with an arbitrary number of {{ifelse}} and 
with the {{else}} optional. This is like a SQL {{CASE}} statement.

How best to achieve this is not obvious. To enable SQL-style uses, it would be 
most efficient to implement this as a variadic kernel where the even-number 
arguments (0, 2, ...) are the arrays of boolean conditions, the odd-number 
arguments (1, 3, ...) are the corresponding arrays of results, and the final 
argument is the {{else}} result. But I'm not sure if this is practical. Maybe 
instead we should implement this to operate on listarrays, like NumPy's 
{{[np.where|https://numpy.org/doc/stable/reference/generated/numpy.where.html]}}
 or 
{{[np.select|https://numpy.org/doc/stable/reference/generated/numpy.select.html]}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13063) [Dev] Link our nightly reports info to Jira

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13063:
--

 Summary: [Dev] Link our nightly reports info to Jira
 Key: ARROW-13063
 URL: https://issues.apache.org/jira/browse/ARROW-13063
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Developer Tools
Reporter: Jonathan Keane


Jira has an API — we should try and use that to match jira tickets with builds 
(e.g. "this build will be fixed when issue Y is solved, so don't alert unless Y 
is marked as resolved and the build is still failing").

This is similar to ARROW-8043, but is specific to this information repository



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13062) [Dev] Add a way for people to add information to our saved crossbow data

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13062:
--

 Summary: [Dev] Add a way for people to add information to our 
saved crossbow data
 Key: ARROW-13062
 URL: https://issues.apache.org/jira/browse/ARROW-13062
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Developer Tools
Reporter: Jonathan Keane


We should have a simple + ligthweight way to annotate specific builds with 
information like "won't be fixed until dask has a new release" or "this is 
supposed to be fixed in ARROW-XXX".

We should find an easy, lightweight way to add this kind of information. 

We *should not* require, ask, or allow people to add this information to the 
JSON that is saved as part of ARROW-13509. That JSON should be kept pristine 
and not have manual edits. Instead, we should have a plain-text look up file 
that matches notes to specific builds (maybe to specific dates?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13061) [Dev] Display our nightly analysis somewhere so that people can see it

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13061:
--

 Summary: [Dev] Display our nightly analysis somewhere so that 
people can see it
 Key: ARROW-13061
 URL: https://issues.apache.org/jira/browse/ARROW-13061
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Developer Tools
Reporter: Jonathan Keane


Create a markdown report that uses the saved nightly data to create a static 
site that we can publish on gh-pages (of ursacomputing/crossbow, or of the 
other ursacomputing repository that we are storing the data of the nightly 
builds in)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13060) [Dev] Process the saved nightly build data

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13060:
--

 Summary: [Dev] Process the saved nightly build data 
 Key: ARROW-13060
 URL: https://issues.apache.org/jira/browse/ARROW-13060
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Developer Tools
Reporter: Jonathan Keane


One ARROW-13059 is done and we have historical data for each build saved 
somewhere reliable + accessible, we can analyze it. (We can even use arrow to 
do that!)

At a minimum we should do the following:
* days since last passing
* last commit that passed
* % of failures in the past week, month, year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13059) Adapt the crossbow code to save build status to json

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13059:
--

 Summary: Adapt the crossbow code to save build status to json
 Key: ARROW-13059
 URL: https://issues.apache.org/jira/browse/ARROW-13059
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jonathan Keane


Add to / adapt the code that {{archery crossbow}} already uses to send the 
email report to also save the status of the builds to a json file and commit 
that to a new branch in the crossbow (or some other ursacomputing repository)

Crossbow code (hint this is the code that you will want to copy + adapt to do 
this new task): 
https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/reports.py

This is how the nightly jobs are triggered: 
https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/nightly_report.yml
(note that it [figures out what the job 
id|https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/nightly_report.yml#L33-L34]
 is and then it runs a command {{archery crossbow report ...}}

The archer CLI interface is specifiedin 
https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/cli.py

Ultimately what we want is something like: a new command like {{crossbow 
archery save-report-data}} that uses similar code/approaches to how the report 
is designed but saves the data to json (or line delimited json) and saves that 
somewhere reliable (i.e. the ursacomputing/crossbow repository or a new 
repository under ursacomputing)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13058) [Dev] Improve nightly build visibility

2021-06-11 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-13058:
--

 Summary: [Dev] Improve nightly build visibility
 Key: ARROW-13058
 URL: https://issues.apache.org/jira/browse/ARROW-13058
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Jonathan Keane






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13057) [MATLAB] update feather reading/writing functionality to use latest MATLAB data array (MDA) api's, stead of old style C-matrix api

2021-06-11 Thread tahsin hassan (Jira)
tahsin hassan created ARROW-13057:
-

 Summary: [MATLAB] update feather reading/writing functionality to 
use latest MATLAB data array (MDA) api's, stead of old style C-matrix api
 Key: ARROW-13057
 URL: https://issues.apache.org/jira/browse/ARROW-13057
 Project: Apache Arrow
  Issue Type: Task
  Components: MATLAB
Reporter: tahsin hassan
Assignee: Kevin Gurney


The current featherreadmex and featherwritemex functionality uses the old 

c-style matrix api, from MATLAB

[https://www.mathworks.com/help/matlab/cc-mx-matrix-library.html?s_tid=CRUX_lftnav]

We should update these functionalities to build against the latest MATLAB Data 
Array (mda) , C++ style api, since these new API uses modern C++ semantics and 
design patterns and avoids data copies whenever possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13056) Expand PR labeler for supported language - MATLAB

2021-06-11 Thread tahsin hassan (Jira)
tahsin hassan created ARROW-13056:
-

 Summary: Expand PR labeler for supported language - MATLAB 
 Key: ARROW-13056
 URL: https://issues.apache.org/jira/browse/ARROW-13056
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: tahsin hassan


The PR labeler yaml file

[https://github.com/apache/arrow/blob/master/.github/workflows/dev_pr/labeler.yml]

introduced in 

https://issues.apache.org/jira/browse/ARROW-10616

needs to be updated to include MATLAB as a supported language

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13055) [Format] Document "canonical extension type" and criteria

2021-06-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-13055:
---

 Summary: [Format] Document "canonical extension type" and criteria
 Key: ARROW-13055
 URL: https://issues.apache.org/jira/browse/ARROW-13055
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Documentation, Format
Reporter: Neal Richardson
 Fix For: 5.0.0


See discussion at 
[https://lists.apache.org/thread.html/r7ba08aed2809fa64537e6f44bce38b2cf740acbef0e91cfaa7c19767%40%3Cdev.arrow.apache.org%3E]
 and then again at 
[https://lists.apache.org/thread.html/r108ac130406b3e63ca23a60b8e79285857355f8342232ad226a6571a%40%3Cdev.arrow.apache.org%3E]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13054) [C++] Add TemporalOptions

2021-06-11 Thread Nic Crane (Jira)
Nic Crane created ARROW-13054:
-

 Summary: [C++] Add TemporalOptions
 Key: ARROW-13054
 URL: https://issues.apache.org/jira/browse/ARROW-13054
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Nic Crane


Please could we implement TemporalOptions for the day_of_week kernel, so we can 
specify the first day of the week (i.e. and therefore affect which day of the 
week is represented by the integers 0 - 6 when calling day_of_week on a date).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13053) Build fails on MacOS Big Sur using homebrewed Arrow libraries

2021-06-11 Thread Dorian Kind (Jira)
Dorian Kind created ARROW-13053:
---

 Summary: Build fails on MacOS Big Sur using homebrewed Arrow 
libraries
 Key: ARROW-13053
 URL: https://issues.apache.org/jira/browse/ARROW-13053
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 4.0.1
 Environment: MacOS BigSur 11.4 (Apple Silicon)
Python 3.9.5
apache-arrow 4.0.1 (via Homebrew)
Reporter: Dorian Kind


When installing pyarrow 4.0.1 from source, the install step fails with 

{{error: can't copy 'build/lib.macosx-11.3-arm64-3.9/pyarrow/include/arrow': 
doesn't exist or not a regular file}}

because the headers directory

{{build/lib.macosx-11.3-arm64-3.9/pyarrow/include/arrow}}

is a relative symlink to {{../Cellar/apache-arrow/4.0.1/include/arrow}}

I believe this is caused by the build system including the header files from{{ 
/opt/homebrew/include/arrow}}, which is the above symlink:

{{ls -hl /opt/homebrew/include/arrow
}}{{lrwxr-xr-x  1 dki  admin    42B Jun  8 15:35 /opt/homebrew/include/arrow -> 
../Cellar/apache-arrow/4.0.1/include/arrow}}

I was able work around this issue by modifying line 334 in {{CMakeLists.txt}} 
from

{{Always bundle includes}}
{{file(COPY ${ARROW_INCLUDE_DIR}/arrow DESTINATION 
${BUILD_OUTPUT_ROOT_DIRECTORY}/include)}}

to

{{Always bundle includes}}
{{get_filename_component(REAL_ARROW_INCLUDE_DIR "${ARROW_INCLUDE_DIR}/arrow" 
REALPATH)}}
{{file(COPY ${}}{{REAL_ARROW_INCLUDE_DIR}}{{} DESTINATION 
${BUILD_OUTPUT_ROOT_DIRECTORY}/include)}}

But I'm not familiar with CMake, so maybe there is a more appropriate way to 
fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13052) [C++][Gandiva] Implements REGEXP_EXTRACT function

2021-06-11 Thread Anthony Louis Gotlib Ferreira (Jira)
Anthony Louis Gotlib Ferreira created ARROW-13052:
-

 Summary: [C++][Gandiva] Implements REGEXP_EXTRACT function
 Key: ARROW-13052
 URL: https://issues.apache.org/jira/browse/ARROW-13052
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Gandiva
Reporter: Anthony Louis Gotlib Ferreira
Assignee: Anthony Louis Gotlib Ferreira


Implements the REGEXP_EXTRACT function based on the [the Hive 
implementation|https://www.revisitclass.com/hadoop/regexp_extract-function-in-hive-with-examples/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13051) [Release][Packaging] Update the java post release task to use the crossbow artifacts

2021-06-11 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-13051:
---

 Summary: [Release][Packaging] Update the java post release task to 
use the crossbow artifacts
 Key: ARROW-13051
 URL: https://issues.apache.org/jira/browse/ARROW-13051
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Packaging
Reporter: Krisztian Szucs


We produce java jars using a crossbow tasks. Ideally we should download and 
deploy these packages instead of compiling them locally during the java post 
release task.

See the produced jars at: 
https://github.com/ursacomputing/crossbow/releases/tag/actions-496-github-java-jars
See more context at: https://github.com/apache/arrow/pull/10411




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13050) [C++][Gandiva] Implement SPACE Hive function on Gandiva

2021-06-11 Thread Jira
João Pedro Antunes Ferreira created ARROW-13050:
---

 Summary: [C++][Gandiva] Implement SPACE Hive function on Gandiva
 Key: ARROW-13050
 URL: https://issues.apache.org/jira/browse/ARROW-13050
 Project: Apache Arrow
  Issue Type: Task
  Components: C++ - Gandiva
Reporter: João Pedro Antunes Ferreira
Assignee: João Pedro Antunes Ferreira


Implement SPACE Hive function on Gandiva



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13049) [C++][Gandiva] Implement BIN Hive function on Gandiva

2021-06-11 Thread Jira
João Pedro Antunes Ferreira created ARROW-13049:
---

 Summary: [C++][Gandiva] Implement BIN Hive function on Gandiva
 Key: ARROW-13049
 URL: https://issues.apache.org/jira/browse/ARROW-13049
 Project: Apache Arrow
  Issue Type: Task
  Components: C++ - Gandiva
Reporter: João Pedro Antunes Ferreira
Assignee: João Pedro Antunes Ferreira






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13048) [Python] S3FileSystem fails moving filepaths containing = or +

2021-06-11 Thread Joerg Schneider (Jira)
Joerg Schneider created ARROW-13048:
---

 Summary: [Python] S3FileSystem fails moving filepaths containing = 
or +
 Key: ARROW-13048
 URL: https://issues.apache.org/jira/browse/ARROW-13048
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 4.0.1
Reporter: Joerg Schneider


Hi Arrow team,

we have the very common use-case of having partitioned parquet tables on S3, 
written by Spark. These include equals (=) to denote the partition value per 
folder.

 

In trying to use PyArrows S3FileSystem `move` function, it's not possible to 
move these objects in the bucket underneath a path which contains `=` 
somewhere: 
{code:java}
OSError: When copying key 
'table/date=202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket' to key 
'table2/date=202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket': AWS Error [code 133]: The specified key does not 
exist.{code}

It is also not possible to move, using preemptively URL-quoted paths, like 
these:

 
{code:java}
OSError: When copying key 
'table/date%3D202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket' to key 
'table2/date%3D202007/part-0-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket': AWS Error [code 133]: The specified key does not 
exist.{code}
 

The source object does definitely exist, it has in fact been returned by a 
FileSelector from PyArrow itself and is just passed to move.


Is there any configuration option to be set, or special quoting to be used?

Thanks in advance.
Joerg

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13047) [Website] Add kiszk to committer list

2021-06-11 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-13047:


 Summary: [Website] Add kiszk to committer list
 Key: ARROW-13047
 URL: https://issues.apache.org/jira/browse/ARROW-13047
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Kazuaki Ishizaki
Assignee: Kazuaki Ishizaki
 Fix For: 5.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)