[jira] [Created] (ARROW-13920) how to disable plasma print connection info to python
auderson created ARROW-13920: Summary: how to disable plasma print connection info to python Key: ARROW-13920 URL: https://issues.apache.org/jira/browse/ARROW-13920 Project: Apache Arrow Issue Type: Bug Reporter: auderson Hi, I'm using plasma in joblib to help with dataframe transferring. But plasma client will print "disconnect on fd *" when worker process finished. Is there a way to block it from printing this kind of logging info, like setting an environment variable? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13919) [GLib] Add GArrowFunctionDoc
Kouhei Sutou created ARROW-13919: Summary: [GLib] Add GArrowFunctionDoc Key: ARROW-13919 URL: https://issues.apache.org/jira/browse/ARROW-13919 Project: Apache Arrow Issue Type: Improvement Components: GLib Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13918) [Gandiva][Python] Add decimal support for make_literal and make_in_expression
Will Jones created ARROW-13918: -- Summary: [Gandiva][Python] Add decimal support for make_literal and make_in_expression Key: ARROW-13918 URL: https://issues.apache.org/jira/browse/ARROW-13918 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva, Python Reporter: Will Jones These are already implemented in C++, they just need to be exposed in Cython. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13917) [Gandiva] Add helper to determine valid decimal function return type
Will Jones created ARROW-13917: -- Summary: [Gandiva] Add helper to determine valid decimal function return type Key: ARROW-13917 URL: https://issues.apache.org/jira/browse/ARROW-13917 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Reporter: Will Jones To evaluate a Gandiva function, you need to pass it's return type. For most types, we can look up the possible return types by using the `GetRegisteredFunctionSignatures` method, but those don't include details of the precision and scale parameters of the decimal type. Specifying the precision and scale parameters of the decimal type is left up to the user, but if the user gets it wrong, they can get invalid answers. See the reproducible example at the bottom. The precision and scale of the return type depend on the input types and the implementation of the decimal operations. Given the variation of logic across different functions (add, divide, trunc, round), it would be best if we were able to provide some utility to help the user determine the precise return type. Now return types aren't unique for every given function name and parameter types. For example, `add(date64[ms], int64` can return either `date64[ms]` or `timestamp[ms]`. So a generic utility has to return multiple possible return types. Example of invalid decimal results from bad return type: {code:python} from decimal import Decimal import pyarrow as pa from pyarrow.gandiva import TreeExprBuilder, make_projector def call_on_value(func, values, params, out_type): builder = TreeExprBuilder() param_literals = [] for param, param_type in params: param_literals.append(builder.make_literal(param, param_type)) inputs = [] arrays = [] for i, value in enumerate(values): inputs.append(builder.make_field(pa.field(str(i), value[1]))) arrays.append(pa.array([value[0]], value[1])) record_batch = pa.record_batch(arrays, [str(i) for i in range(len(values))]) func_x = builder.make_function(func, inputs + param_literals, out_type) expressions = [builder.make_expression(func_x, pa.field('result', out_type))] projector = make_projector(record_batch.schema, expressions, pa.default_memory_pool()) return projector.evaluate(record_batch) call_on_value( 'round', (Decimal("123.459"), pa.decimal128(28, 3)), [(2, pa.int32())], pa.decimal128(28, 3) ) # Returns: 123.459 (not rounded!) call_on_value( 'round', (Decimal("123.459"), pa.decimal128(28, 3)), [(-2, pa.int32())], pa.decimal128(28, 3) ) # Returns: 0.100 () {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13916) [C++] Implement strftime on date32/64 types
Percy Camilo Triveño Aucahuasi created ARROW-13916: -- Summary: [C++] Implement strftime on date32/64 types Key: ARROW-13916 URL: https://issues.apache.org/jira/browse/ARROW-13916 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Percy Camilo Triveño Aucahuasi Python actually supports this too: [https://docs.python.org/3/library/datetime.html#datetime.date.strftime] Related: - [https://github.com/apache/arrow/pull/10998] - [https://github.com/apache/arrow/pull/11075] - https://issues.apache.org/jira/browse/ARROW-13138 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13915) [R][CI] R UCRT C++ bundles are incomplete
Neal Richardson created ARROW-13915: --- Summary: [R][CI] R UCRT C++ bundles are incomplete Key: ARROW-13915 URL: https://issues.apache.org/jira/browse/ARROW-13915 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 6.0.0 [~jeroenooms] noticed this when some checks were triggered at CRAN. To prevent things like this, we need ARROW-13683, but we can still fix this now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values
Krisztian Szucs created ARROW-13914: --- Summary: [C++][Python] Optimize type inference when converting from python values Key: ARROW-13914 URL: https://issues.apache.org/jira/browse/ARROW-13914 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Krisztian Szucs Currently we use an extensive set of checks to infer arrow type from python sequences. Last time I checked using asv, the inference part had a significant overhead. We could try other approaches to speed-up the type inference, see comments: https://github.com/apache/arrow/pull/11076#discussion_r702808196 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13913) [C++] segfault if compute function index called with no options supplied
Nic Crane created ARROW-13913: - Summary: [C++] segfault if compute function index called with no options supplied Key: ARROW-13913 URL: https://issues.apache.org/jira/browse/ARROW-13913 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Nic Crane If I try to use the {{index}} compute function from R without {{IndexOptions}}, it results in a segfault.{{}} {code:java} > call_function("index", Array$create(1:10)) Thread 1 "R" received signal SIGSEGV, Segmentation fault. 0x72291384 in arrow::compute::FunctionOptions::FunctionOptions ( this=0x7fff5970) at /home/nic2/arrow_installed_version/include/arrow/compute/function.h:60 60 class ARROW_EXPORT FunctionOptions : public util::EqualityComparable { {code} I ran a fresh container to check it wasn't just my machine, and got a similar output with the additional line: {code:java} *** caught segfault *** address 0x8, cause 'memory not mapped'{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13912) [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
Nic Crane created ARROW-13912: - Summary: [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies Key: ARROW-13912 URL: https://issues.apache.org/jira/browse/ARROW-13912 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Nic Crane -- This message was sent by Atlassian Jira (v8.3.4#803005)