[jira] [Created] (ARROW-15772) [Go][Flight] Server Basic Auth Middleware/Interceptor wrongly base64 decode

2022-02-23 Thread Risselin Corentin (Jira)
Risselin Corentin created ARROW-15772:
-

 Summary: [Go][Flight] Server Basic Auth Middleware/Interceptor 
wrongly base64 decode
 Key: ARROW-15772
 URL: https://issues.apache.org/jira/browse/ARROW-15772
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Affects Versions: 7.0.0, 6.0.1
Reporter: Risselin Corentin


Currently the implementation of the Auth interceptors uses 
`base64.RawStdEncoding.DecodeString` to decode the content of the hanshake.

In Go RawStdEncoding will not uses padding (with '='), trying to authenticate 
from pyarrow (with `client.authenticate_basic_token(user, password)`) will 
result in an error like:
{quote}{{pyarrow._flight.FlightUnauthenticatedError: gRPC returned 
unauthenticated error, with message: invalid basic auth encoding: illegal 
base64 data at input byte XX}}
{quote}
StdEncoding would successfully read the content if RawStdEncoding fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15771) [C++][Compute] Add window join to execution engine

2022-02-23 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15771:
--

 Summary: [C++][Compute] Add window join to execution engine
 Key: ARROW-15771
 URL: https://issues.apache.org/jira/browse/ARROW-15771
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


We would want to support window joins with as-of support.
See https://github.com/substrait-io/substrait/issues/3 for more.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15770) [CI] Not all python tests are running on CI jobs

2022-02-23 Thread Weston Pace (Jira)
Weston Pace created ARROW-15770:
---

 Summary: [CI] Not all python tests are running on CI jobs
 Key: ARROW-15770
 URL: https://issues.apache.org/jira/browse/ARROW-15770
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Weston Pace


It appears that only the Orc tests are running.  See for example: 
https://github.com/apache/arrow/runs/5307134146?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15769) [C++] Generate less arithmetic kernels

2022-02-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-15769:
--

 Summary: [C++] Generate less arithmetic kernels
 Key: ARROW-15769
 URL: https://issues.apache.org/jira/browse/ARROW-15769
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Some of our arithmetic kernel executors are templated on different logical 
types (e.g. duration or timestamp) even though they could use the underlying 
physical type.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15768) [CI][C++] Reinstate a code coverage job

2022-02-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-15768:
--

 Summary: [CI][C++] Reinstate a code coverage job
 Key: ARROW-15768
 URL: https://issues.apache.org/jira/browse/ARROW-15768
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


Long ago we used to have code coverage measured on our Travis-CI setup. We 
dropped it because it was difficult to maintain and also a bit costly in 
additional execution time AFAIR.

Now that we have nightly builds using Crossbow and the flexibility of Github 
Actions workflows (where we could even perhaps run coverage on e.g. Linux and 
Windows, then combine the files as a final merge job), we should be able to 
create a new code coverage job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15767) Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame

2022-02-23 Thread Ben Baumgold (Jira)
Ben Baumgold created ARROW-15767:


 Summary: Arrow Table with Nullable DenseUnion fails to convert to 
Python Pandas DataFrame
 Key: ARROW-15767
 URL: https://issues.apache.org/jira/browse/ARROW-15767
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 6.0.1
Reporter: Ben Baumgold
 Attachments: nothing.arrow

A feather file containing column of nullable values errors when converting to a 
Pandas DataFrame. It can be read into a pyarrow.Table as follows:
{code:python}
In [1]: import pyarrow.feather as feather

In [2]: t = feather.read_table("nothing.arrow")

In [3]: t
Out[3]:
pyarrow.Table
col: dense_union<: null=0, : int32 not null=1>
  child 0, : null
  child 1, : int32 not null

col: [  -- is_valid: all not null  -- type_ids: [
  1,
  1,
  1,
  0
]  -- value_offsets: [
  0,
  1,
  2,
  0
]  -- child 0 type: null
1 nulls  -- child 1 type: int32
[
  1,
  2,
  3
]]
{code}
But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get the 
following error:
{code:python}
In [4]: t.to_pandas()
---
ArrowNotImplementedError  Traceback (most recent call last)
 in 
> 1 t.to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in 
pyarrow.lib._PandasConvertible.to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
pyarrow.lib.Table._to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
787 _check_data_column_metadata_consistency(all_columns)
788 columns = _deserialize_column_index(table, all_columns, 
column_indexes)
--> 789 blocks = _table_to_blocks(options, table, categories, 
ext_columns_dtypes)
790
791 axes = [columns, index]

~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
_table_to_blocks(options, block_table, categories, extension_columns)
   1126 # Convert an arrow table to Block from the internal pandas API
   1127 columns = block_table.column_names
-> 1128 result = pa.lib.table_to_blocks(options, block_table, categories,
   1129 list(extension_columns.keys()))
   1130 return [_reconstruct_block(item, columns, extension_columns)

~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
pyarrow.lib.table_to_blocks()

~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of 
type dense_union<: null=0, : int32 not null=1> is known.
{code}
Note the Arrow file is valid and can be read successfully by 
[Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is 
[arrow-julia#285|https://github.com/apache/arrow-julia/issues/285].  The  
[^nothing.arrow]  file used in this example is attached for convenience.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15766) [R] Implement bindings for lubridate::duration()

2022-02-23 Thread Jira
Dragoș Moldovan-Grünfeld created ARROW-15766:


 Summary: [R] Implement bindings for lubridate::duration()
 Key: ARROW-15766
 URL: https://issues.apache.org/jira/browse/ARROW-15766
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dragoș Moldovan-Grünfeld
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-15765:


 Summary: [Python] Extracting Type information from Python Objects
 Key: ARROW-15765
 URL: https://issues.apache.org/jira/browse/ARROW-15765
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


When creating user defined functions or similar exercises where we want to 
extract the Arrow data types from the type hints, the existing Python API have 
some limitations. 

An example case is as follows;

```python

def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
    return pc.call_function("add", [array1, array2])

```

We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 

At the moment there doesn't exist a straightforward manner to get this done. 

So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15764) [C++][FlightRPC] Optionally cache serialized ListFlights serverside

2022-02-23 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15764:
--

 Summary: [C++][FlightRPC] Optionally cache serialized ListFlights 
serverside
 Key: ARROW-15764
 URL: https://issues.apache.org/jira/browse/ARROW-15764
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


ListFlights serializes flights each time it is called. If we have many flights 
and ListFlights is called often this can produce a significant load on the 
server. We could have an optional cache server side to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15763) [C++] Improve csv writer

2022-02-23 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-15763:


 Summary: [C++] Improve csv writer
 Key: ARROW-15763
 URL: https://issues.apache.org/jira/browse/ARROW-15763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Yibo Cai
Assignee: Yibo Cai


Profiling shows for string inputs, csv writer spends most of the time in 
counting quotes for every string, this can probably be improved by checking 
quotes before-ahead in batch.
For numeric inputs, the casting (number -> string) dominates the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15762) [R] Revisit binding_format_datetime and remove manual casting

2022-02-23 Thread Jira
Dragoș Moldovan-Grünfeld created ARROW-15762:


 Summary: [R] Revisit binding_format_datetime and remove manual 
casting 
 Key: ARROW-15762
 URL: https://issues.apache.org/jira/browse/ARROW-15762
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dragoș Moldovan-Grünfeld
Assignee: Dragoș Moldovan-Grünfeld


This is a follow-up issue to revisit the casting step in format once 
[https://github.com/apache/arrow/pull/12240] gets merged.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15761) [Python] Remove the deprecated pyarrow.filesystem legacy implementations

2022-02-23 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-15761:
-

 Summary: [Python] Remove the deprecated pyarrow.filesystem legacy 
implementations
 Key: ARROW-15761
 URL: https://issues.apache.org/jira/browse/ARROW-15761
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Joris Van den Bossche
 Fix For: 8.0.0


The {{pyarrow.filesystem}} and {{pyarrow.hdfs}} filesystems have been 
deprecated in 2.0.0, and changed from Deprecation to FutureWarning in 4.0.0. I 
think it is time to actually remove them, and I would propose to do so in 8.0.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)