[jira] [Created] (ARROW-13920) how to disable plasma print connection info to python

2021-09-06 Thread auderson (Jira)
auderson created ARROW-13920:


 Summary: how to disable plasma print connection info to python
 Key: ARROW-13920
 URL: https://issues.apache.org/jira/browse/ARROW-13920
 Project: Apache Arrow
  Issue Type: Bug
Reporter: auderson


Hi, I'm using plasma in joblib to help with dataframe transferring. But plasma 
client will print "disconnect on fd *" when worker process finished. 

Is there a way to block it from printing this kind of logging info, like 
setting an environment variable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13919) [GLib] Add GArrowFunctionDoc

2021-09-06 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-13919:


 Summary: [GLib] Add GArrowFunctionDoc
 Key: ARROW-13919
 URL: https://issues.apache.org/jira/browse/ARROW-13919
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13918) [Gandiva][Python] Add decimal support for make_literal and make_in_expression

2021-09-06 Thread Will Jones (Jira)
Will Jones created ARROW-13918:
--

 Summary: [Gandiva][Python] Add decimal support for make_literal 
and make_in_expression
 Key: ARROW-13918
 URL: https://issues.apache.org/jira/browse/ARROW-13918
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva, Python
Reporter: Will Jones


These are already implemented in C++, they just need to be exposed in Cython.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13917) [Gandiva] Add helper to determine valid decimal function return type

2021-09-06 Thread Will Jones (Jira)
Will Jones created ARROW-13917:
--

 Summary: [Gandiva] Add helper to determine valid decimal function 
return type
 Key: ARROW-13917
 URL: https://issues.apache.org/jira/browse/ARROW-13917
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Will Jones


To evaluate a Gandiva function, you need to pass it's return type. For most 
types, we can look up the possible return types by using the 
`GetRegisteredFunctionSignatures` method, but those don't include details of 
the precision and scale parameters of the decimal type.

Specifying the precision and scale parameters of the decimal type is left up to 
the user, but if the user  gets it wrong, they can get invalid answers. See the 
reproducible example at the bottom.

The precision and scale of the return type depend on the input types and the 
implementation of the decimal operations. Given the variation of logic across 
different functions (add, divide, trunc, round), it would be best if we were 
able to provide some utility to help the user determine the precise return type.

Now return types aren't unique for every given function name and parameter 
types. For example, `add(date64[ms], int64` can return either `date64[ms]` or 
`timestamp[ms]`. So a generic utility has to return multiple possible return 
types.


Example of invalid decimal results from bad return type:

{code:python}
from decimal import Decimal
import pyarrow as pa
from pyarrow.gandiva import TreeExprBuilder, make_projector

def call_on_value(func, values, params, out_type):
builder = TreeExprBuilder()

param_literals = []
for param, param_type in params:
param_literals.append(builder.make_literal(param, param_type))

inputs = []
arrays = []
for i, value in enumerate(values):
inputs.append(builder.make_field(pa.field(str(i), value[1])))
arrays.append(pa.array([value[0]], value[1]))

record_batch = pa.record_batch(arrays, [str(i) for i in range(len(values))])

func_x = builder.make_function(func, inputs + param_literals, out_type)

expressions = [builder.make_expression(func_x, pa.field('result', 
out_type))]


projector = make_projector(record_batch.schema, expressions, 
pa.default_memory_pool())

return projector.evaluate(record_batch)

call_on_value(
'round',
(Decimal("123.459"), pa.decimal128(28, 3)),
[(2, pa.int32())],
pa.decimal128(28, 3)
)
# Returns: 123.459 (not rounded!)

call_on_value(
'round',
(Decimal("123.459"), pa.decimal128(28, 3)),
[(-2, pa.int32())],
pa.decimal128(28, 3)
)
# Returns:  0.100 ()
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13916) [C++] Implement strftime on date32/64 types

2021-09-06 Thread Jira
Percy Camilo Triveño Aucahuasi created ARROW-13916:
--

 Summary: [C++] Implement strftime on date32/64 types
 Key: ARROW-13916
 URL: https://issues.apache.org/jira/browse/ARROW-13916
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Percy Camilo Triveño Aucahuasi


Python actually supports this too: 
[https://docs.python.org/3/library/datetime.html#datetime.date.strftime]

Related:

- [https://github.com/apache/arrow/pull/10998]

- [https://github.com/apache/arrow/pull/11075]

- https://issues.apache.org/jira/browse/ARROW-13138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13915) [R][CI] R UCRT C++ bundles are incomplete

2021-09-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-13915:
---

 Summary: [R][CI] R UCRT C++ bundles are incomplete
 Key: ARROW-13915
 URL: https://issues.apache.org/jira/browse/ARROW-13915
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 6.0.0


[~jeroenooms] noticed this when some checks were triggered at CRAN. To prevent 
things like this, we need ARROW-13683, but we can still fix this now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

2021-09-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-13914:
---

 Summary: [C++][Python] Optimize type inference when converting 
from python values
 Key: ARROW-13914
 URL: https://issues.apache.org/jira/browse/ARROW-13914
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Krisztian Szucs


Currently we use an extensive set of checks to infer arrow type from python 
sequences. 

Last time I checked using asv, the inference part had a significant overhead. 

We could try other approaches to speed-up the type inference, see comments: 
https://github.com/apache/arrow/pull/11076#discussion_r702808196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13913) [C++] segfault if compute function index called with no options supplied

2021-09-06 Thread Nic Crane (Jira)
Nic Crane created ARROW-13913:
-

 Summary: [C++] segfault if compute function index called with no 
options supplied
 Key: ARROW-13913
 URL: https://issues.apache.org/jira/browse/ARROW-13913
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Nic Crane


 

If I try to use the {{index}} compute function from R without {{IndexOptions}}, 
it results in a segfault.{{}}
{code:java}
> call_function("index", Array$create(1:10))
Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x72291384 in arrow::compute::FunctionOptions::FunctionOptions (
 this=0x7fff5970)
 at /home/nic2/arrow_installed_version/include/arrow/compute/function.h:60
60 class ARROW_EXPORT FunctionOptions : public 
util::EqualityComparable {
{code}
I ran a fresh container to check it wasn't just my machine, and got a similar 
output with the additional line:
{code:java}
*** caught segfault ***
address 0x8, cause 'memory not mapped'{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13912) [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies

2021-09-06 Thread Nic Crane (Jira)
Nic Crane created ARROW-13912:
-

 Summary: [R] TrimOptions implementation breaks 
test-r-minimal-build due to dependencies
 Key: ARROW-13912
 URL: https://issues.apache.org/jira/browse/ARROW-13912
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Nic Crane






--
This message was sent by Atlassian Jira
(v8.3.4#803005)