[jira] [Created] (ARROW-17135) [C++] Reduce code size in arrow/compute/kernels/scalar_compare.cc
Wes McKinney created ARROW-17135: Summary: [C++] Reduce code size in arrow/compute/kernels/scalar_compare.cc Key: ARROW-17135 URL: https://issues.apache.org/jira/browse/ARROW-17135 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Assignee: Wes McKinney I had noticed the large symbol sizes in scalar_compare.cc when looking at the shared library. I had a quick hack on the plane to try to reduce the code size -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17134) [C++(?)/Python] pyarrow.compute.replace_with_mask does not replace null when providing an array mask
Matthew Roeschke created ARROW-17134: Summary: [C++(?)/Python] pyarrow.compute.replace_with_mask does not replace null when providing an array mask Key: ARROW-17134 URL: https://issues.apache.org/jira/browse/ARROW-17134 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Affects Versions: 8.0.0 Reporter: Matthew Roeschke {code:java} In [1]: import pyarrow as pa In [2]: arr1 = pa.array([1, 0, 1, None, None]) In [3]: arr2 = pa.array([None, None, 1, 0, 1]) In [4]: pa.compute.replace_with_mask(arr1, [False, False, False, True, True], arr2) Out[4]: [ 1, 0, 1, null, # I would expect 0 null # I would expect 1 ] In [5]: pa.__version__ Out[5]: '8.0.0'{code} I have noticed this behavior occur with the integer, floating, bool, temporal types -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17133) pqarrow: PlainFixedLenByteArrayEncoder behaves differently from DictFixedLenByteArrayEncoder with null values where schema has Nullable: false
Phillip LeBlanc created ARROW-17133: --- Summary: pqarrow: PlainFixedLenByteArrayEncoder behaves differently from DictFixedLenByteArrayEncoder with null values where schema has Nullable: false Key: ARROW-17133 URL: https://issues.apache.org/jira/browse/ARROW-17133 Project: Apache Arrow Issue Type: Bug Components: Go, Parquet Affects Versions: 8.0.0 Reporter: Phillip LeBlanc I have created a small repro to illustrate this bug: https://gist.github.com/phillipleblanc/5e3e2d0e6914d276cf9fd79e019581de When writing a Decimal128 array to a Parquet file the pqarrow package will prefer to use DictFixedLenByteArrayEncoder. If the size of the array goes over some threshold, it will switch to using PlainFixedLenByteArrayEncoder. The DictFixedLenByteArrayEncoder tolerates null values in a Decimal128 array with the arrow schema set to Nullable: false, however the PlainFixedLenByteArrayEncoder will not tolerate null values and will panic. Having null values in an array marked as non-nullable is an issue in the user code - however, it was surprising that my buggy code was working some times and not working other times. I would expect the PlainFixedLen encoder to handle nulls the same way as the DictFixedLen encoder or for the DictFixedLen encoder to panic. An observation is that most other array types handle nulls with the schema marked as non-nullable when writing to Parquet; this was the first instance I found in the pqarrow package where having the Arrow schema marked as Nullable was necessary for Parquet writing arrays with null values. Again, debatable if this is desirable or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17132) [R] Mutate in compare_dplyr_binding returns wrong type
Rok Mihevc created ARROW-17132: -- Summary: [R] Mutate in compare_dplyr_binding returns wrong type Key: ARROW-17132 URL: https://issues.apache.org/jira/browse/ARROW-17132 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Rok Mihevc The following: {code:r} df <- tibble::tibble( time = as.POSIXct(seq(as.Date("1999-12-31", tz = "UTC"), as.Date("2001-01-01", tz = "UTC"), by = "day")) ) compare_dplyr_binding( .input %>% mutate(x = yday(time)) %>% collect(), df ) {code} Fails with: {code:bash} Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp `object` (`actual`) not equal to `expected` (`expected`). `attr(actual$time, 'tzone')` is a character vector ('UTC') `attr(expected$time, 'tzone')` is absent Backtrace: 1. arrow:::compare_dplyr_binding(...) at test-dplyr-funcs-datetime.R:574:2 2. arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4 3. testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4 Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp `object` (`actual`) not equal to `expected` (`expected`). `attr(actual$time, 'tzone')` is a character vector ('UTC') `attr(expected$time, 'tzone')` is absent Backtrace: 1. arrow:::compare_dplyr_binding(...) at test-dplyr-funcs-datetime.R:574:2 2. arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4 3. testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4 {code} This also happens for qday and probably other functions where input is temporal and output is numeric. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17131) [Python] add a field() method to StructType that returns the user a field
Anja Boskovic created ARROW-17131: - Summary: [Python] add a field() method to StructType that returns the user a field Key: ARROW-17131 URL: https://issues.apache.org/jira/browse/ARROW-17131 Project: Apache Arrow Issue Type: Improvement Reporter: Anja Boskovic Assignee: Anja Boskovic Joris suggested here "we could also add a {{field()}} method that returns you a field? (that is more discoverable than {{{}[]{}}}, and would be consistent with a Schema and with StructArray (to get the child array for that field)". Completion of this issue would also mean updating the example in the API docs for StructType to mention `.field()`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-julia] palday opened a new issue, #328: `fromarrow` dispatch for `::Type{Union{Missing, T}}` is broken when `T` is parametric
palday opened a new issue, #328: URL: https://github.com/apache/arrow-julia/issues/328 xref: https://github.com/JuliaCloud/AWSS3.jl/issues/263 The fix seems to be change `::Type{Union{Missing, T}}` to `::Type{<:Union{Missing, T}}`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17130) Enable multiple character delimiters in read_csv
Jack Howard created ARROW-17130: --- Summary: Enable multiple character delimiters in read_csv Key: ARROW-17130 URL: https://issues.apache.org/jira/browse/ARROW-17130 Project: Apache Arrow Issue Type: Improvement Components: Format Affects Versions: 8.0.1 Reporter: Jack Howard Read_CSV ParseOptions allows only a single character delimiter. Single character delimiters are highly susceptible to the candidate value existing within the data to be loaded, negating the ability to serve as a delimiter. If a double character delimiter is used, the current limit of a single character returns "only single character unicode strings can be converted to Py_UCS4, got length 2" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17129) [C++][Compute] Improve memory efficiency in Grouper
Wes McKinney created ARROW-17129: Summary: [C++][Compute] Improve memory efficiency in Grouper Key: ARROW-17129 URL: https://issues.apache.org/jira/browse/ARROW-17129 Project: Apache Arrow Issue Type: Improvement Reporter: Wes McKinney There are APIs in arrow::compute::Grouper (GetUniques, Consume) which may be able to be refactored to write into preallocated memory or otherwise have a mode that does less mandatory allocation. We can investigate at some point -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17128) [C++] Sporadic DCHECK failure in arrow-dataset-scanner-test (2)
Antoine Pitrou created ARROW-17128: -- Summary: [C++] Sporadic DCHECK failure in arrow-dataset-scanner-test (2) Key: ARROW-17128 URL: https://issues.apache.org/jira/browse/ARROW-17128 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Just got this sporadic assertion error: {code} [ RUN ] TestScannerThreading/TestScanner.CountRowsWithMetadata/3Threaded2d16b1024r /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:331: Check failed: !IsFutureFinished(state_) Future already marked finished {code} Stack trace: {code} #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x74a24859 in __GI_abort () at abort.c:79 #2 0x756f635c in arrow::util::CerrLog::~CerrLog (this=0x5586b330, __in_chrg=) at /home/antoine/arrow/dev/cpp/src/arrow/util/logging.cc:72 #3 0x756f6378 in arrow::util::CerrLog::~CerrLog (this=0x5586b330, __in_chrg=) at /home/antoine/arrow/dev/cpp/src/arrow/util/logging.cc:74 #4 0x756f66dd in arrow::util::ArrowLog::~ArrowLog (this=0x7fffebffd970, __in_chrg=) at /home/antoine/arrow/dev/cpp/src/arrow/util/logging.cc:250 #5 0x756c7af1 in arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed (this=0x5585e910, state=arrow::FutureState::SUCCESS) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:331 #6 0x756c70e7 in arrow::ConcreteFutureImpl::DoMarkFinished (this=0x5585e910) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:232 #7 0x756c8288 in arrow::FutureImpl::MarkFinished (this=0x5585e910) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:409 #8 0x7564e4f7 in arrow::Future::DoMarkFinished (this=0x55896bf0, res=...) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:725 #9 0x7564c198 in arrow::Future::MarkFinished (this=0x55896bf0, s=...) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:476 #10 0x7599d045 in arrow::compute::(anonymous namespace)::ScalarAggregateNode::Finish (this=0x55896b60) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/aggregate_node.cc:255 #11 0x7599c422 in arrow::compute::(anonymous namespace)::ScalarAggregateNode::InputReceived (this=0x55896b60, input=0x559077c0, batch=...) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/aggregate_node.cc:176 #12 0x759c8567 in operator() (__closure=0x7fffebffdd40) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:531 #13 0x759c873a in arrow::compute::MapNode::SubmitTask(std::function (arrow::compute::ExecBatch)>, arrow::compute::ExecBatch) (this=0x559077c0, map_fn=..., batch=...) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:535 #14 0x75a97524 in arrow::compute::(anonymous namespace)::ProjectNode::InputReceived (this=0x559077c0, input=0x55913150, batch=...) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/project_node.cc:111 #15 0x75aa3da2 in operator() (__closure=0x7fffa000aba0) at /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/source_node.cc:119 #16 0x75aaa56f in std::__invoke_impl::&)>::&>(std::__invoke_other, struct {...} &) (__f=...) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/invoke.h:60 #17 0x75aa943c in std::__invoke_r::&)>::&>(struct {...} &) (__fn=...) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/invoke.h:115 #18 0x75aa79ca in std::_Function_handler::&)>:: >::_M_invoke(const std::_Any_data &) (__functor=...) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/std_function.h:292 #19 0x759ca6b5 in std::function::operator()() const (this=0x5593e700) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/std_function.h:622 #20 0x759df33b in arrow::detail::ContinueFuture::operator()&, , arrow::Status, arrow::Future >(arrow::Future, std::function&) const (this=0x5593e6f8, next=..., f=...) at /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:150 #21 0x759df19f in std::__invoke_impl&, std::function&>(std::__invoke_other, arrow::detail::ContinueFuture&, arrow::Future&, std::function&) (__f=...) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/invoke.h:60 #22 0x759def33 in std::__invoke&, std::function&>(arrow::detail::ContinueFuture&, arrow::Future&, std::function&) (__fn=...) at /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/invoke.h:95 #23 0x759deb86 in std::_Bind, std::function)>::__call(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x5593e6f8, __args=...) at
[jira] [Created] (ARROW-17127) [C++] Sporadic crash in arrow-dataset-scanner-test (1)
Antoine Pitrou created ARROW-17127: -- Summary: [C++] Sporadic crash in arrow-dataset-scanner-test (1) Key: ARROW-17127 URL: https://issues.apache.org/jira/browse/ARROW-17127 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Fix For: 9.0.0 See GDB backtrace at https://gist.github.com/pitrou/ef47ab902cbbba80440ee0375a1d7ed3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17126) [C++] Remove FutureWaiter
Antoine Pitrou created ARROW-17126: -- Summary: [C++] Remove FutureWaiter Key: ARROW-17126 URL: https://issues.apache.org/jira/browse/ARROW-17126 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou {{FutureWaiter}} and dependent APIs ({{FutureIterator}}, {{WaitForAll}}, {{WaitForAny}}). Removing {{FutureWaiter}} would significantly simplify the {{Future}} implementation, making it also more maintainable and potentially faster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17125) Unable to install pyarrow on Debian 10 (i686)
Rustam Guliev created ARROW-17125: - Summary: Unable to install pyarrow on Debian 10 (i686) Key: ARROW-17125 URL: https://issues.apache.org/jira/browse/ARROW-17125 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.1, 7.0.1 Environment: Debian GNU/Linux 10 (buster) Python 3.9.7 pip 22.1.2 cmake 3.22.5 $ lscpu Architecture: i686 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 45 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Stepping: 7 CPU MHz: 1995.000 BogoMIPS: 3990.00 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss nx rdtscp lm constant_tsc arch_perfmon xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat md_clear flush_l1d arch_capabilities Reporter: Rustam Guliev Hi, I am not able to install pyarrow on Debian 10. First, the installation (via `pip` or `poetry install`) fails with the following: {code:java} EnvCommandError Command ['/home/rustam/.cache/pypoetry/virtualenvs/spectra-annotator-Vr_f9e53-py3.9/bin/pip', 'install', '--no-deps', 'file:///home/rustam/.cache/pypoetry/artifacts/b2/96/6a/2a784854a355f986090eafd225285e4a1c6167b5a6adc6c859d785a095/pyarrow-7.0.0.tar.gz'] errored with the following return code 1, and output: Processing /home/rustam/.cache/pypoetry/artifacts/b2/96/6a/2a784854a355f986090eafd225285e4a1c6167b5a6adc6c859d785a095/pyarrow-7.0.0.tar.gz Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Building wheels for collected packages: pyarrow Building wheel for pyarrow (pyproject.toml): started Building wheel for pyarrow (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error × Building wheel for pyarrow (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [261 lines of output] running bdist_wheel running build running build_py running egg_info writing pyarrow.egg-info/PKG-INFO writing dependency_links to pyarrow.egg-info/dependency_links.txt writing entry points to pyarrow.egg-info/entry_points.txt writing requirements to pyarrow.egg-info/requires.txt writing top-level names to pyarrow.egg-info/top_level.txt listing git files failed - pretending there aren't any reading manifest file 'pyarrow.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '../LICENSE.txt' warning: no files found matching '../NOTICE.txt' warning: no previously-included files matching '*.so' found anywhere in distribution warning: no previously-included files matching '*.pyc' found anywhere in distribution warning: no previously-included files matching '*~' found anywhere in distribution warning: no previously-included files matching '#*' found anywhere in distribution warning: no previously-included files matching '.git*' found anywhere in distribution warning: no previously-included files matching '.DS_Store' found anywhere in distribution no previously-included directories found matching '.asv' /tmp/pip-build-env-umvxn44o/overlay/lib/python3.9/site-packages/setuptools/command/build_py.py:153: SetuptoolsDeprecationWarning: Installing 'pyarrow.includes' as data is deprecated, please list it in `packages`. !! # Package would be ignored # Python recognizes 'pyarrow.includes' as an importable package, but it is not listed in the `packages` configuration of setuptools. 'pyarrow.includes' has been automatically added to the distribution only because it may contain data files, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated). Please
[jira] [Created] (ARROW-17124) [C++] Data race between future signalling and destruction
Antoine Pitrou created ARROW-17124: -- Summary: [C++] Data race between future signalling and destruction Key: ARROW-17124 URL: https://issues.apache.org/jira/browse/ARROW-17124 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou This sporadic Thread Sanitizer error just occurred to me: {code} WARNING: ThreadSanitizer: data race (pid=636020) Write of size 8 at 0x7b2c17d0 by main thread: #0 pthread_cond_destroy ../../../../libsanitizer/tsan/tsan_interceptors_posix.cpp:1208 (libtsan.so.0+0x31c14) #1 arrow::ConcreteFutureImpl::~ConcreteFutureImpl() /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:211 (libarrow.so.900+0xa70b62) #2 arrow::ConcreteFutureImpl::~ConcreteFutureImpl() /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:211 (libarrow.so.900+0xa70ba0) #3 std::default_delete::operator()(arrow::FutureImpl*) const /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/unique_ptr.h:85 (arrow-dataset-file-test+0x584a1) #4 std::_Sp_counted_deleter, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/shared_ptr_base.h:474 (arrow-dataset-file-test+0xa9638) #5 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (libarrow.so.900+0x2e1158) #6 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (libarrow.so.900+0x2dc6ed) #7 std::__shared_ptr::~__shared_ptr() (libarrow.so.900+0x978fee) #8 std::shared_ptr::~shared_ptr() (libarrow.so.900+0x97901c) #9 arrow::Future::~Future() (libarrow.so.900+0x97904a) #10 ~ExecPlanImpl /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:52 (libarrow.so.900+0xe8160b) #11 ~ExecPlanImpl /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:58 (libarrow.so.900+0xe8166e) #12 _M_dispose /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/shared_ptr_base.h:380 (libarrow.so.900+0xea6c2a) #13 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (libarrow_dataset.so.900+0x7bd10) #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/shared_ptr_base.h:733 (libarrow_dataset.so.900+0x77ad9) #15 std::__shared_ptr::~__shared_ptr() /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/shared_ptr_base.h:1183 (libarrow_dataset.so.900+0xd3dfc) #16 std::shared_ptr::~shared_ptr() /home/antoine/miniconda3/envs/pyarrow/x86_64-conda-linux-gnu/include/c++/10.3.0/bits/shared_ptr.h:121 (libarrow_dataset.so.900+0xd3e2a) #17 arrow::dataset::FileSystemDataset::Write(arrow::dataset::FileSystemDatasetWriteOptions const&, std::shared_ptr) /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:398 (libarrow_dataset.so.900+0xd49ca) #18 arrow::dataset::TestFileSystemDataset_WriteProjected_Test::TestBody() /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_test.cc:330 (arrow-dataset-file-test+0x2e382) #19 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) (libgtest.so.1.11.0+0x5bd3d) Previous read of size 8 at 0x7b2c17d0 by thread T3: #0 pthread_cond_broadcast ../../../../libsanitizer/tsan/tsan_interceptors_posix.cpp:1201 (libtsan.so.0+0x31b51) #1 arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:343 (libarrow.so.900+0xa6bee0) #2 arrow::ConcreteFutureImpl::DoMarkFinished() /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:232 (libarrow.so.900+0xa6b0f4) #3 arrow::FutureImpl::MarkFinished() /home/antoine/arrow/dev/cpp/src/arrow/util/future.cc:409 (libarrow.so.900+0xa6c83f) #4 arrow::Future::DoMarkFinished(arrow::Result) /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:725 (libarrow.so.900+0x9cbf81) #5 void arrow::Future::MarkFinished(arrow::Status) /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:476 (libarrow.so.900+0x9c921c) #6 operator() /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:192 (libarrow.so.900+0xe82ee6) #7 operator() /home/antoine/arrow/dev/cpp/src/arrow/util/future.h:522 (libarrow.so.900+0xea70a3) {code} I think the fix is simply to signal the condition variable with the mutex locked (which might be a bit worse performance-wise). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17123) [JS] Unable to open reader on .arrow file after fetch: Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.
Benoit Cantin created ARROW-17123: - Summary: [JS] Unable to open reader on .arrow file after fetch: Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123. Key: ARROW-17123 URL: https://issues.apache.org/jira/browse/ARROW-17123 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: 8.0.1 Reporter: Benoit Cantin I created a file in raw arrow format with the script given in the Py arrow cookbook here: [https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk] In a Node.js application, this file can be read doing: {code:java} const r = await RecordBatchReader.from(fs.createReadStream(filePath)); await r.open(); for (let i = 0; i < r.numRecordBatches; i++) { const rb = await r.readRecordBatch(i); if (rb !== null) { console.log(rb.numRows); } } {code} However this method loads the whole file in memory (is that a bug?), which is not scalable. To solve this scalability issue, I try to load the data with fetch as described in the the [README.md|[https://github.com/apache/arrow/tree/master/js#load-data-with-fetch].] Both: {code:java} import { tableFromIPC } from "apache-arrow"; const table = await tableFromIPC(fetch(filePath)); console.table([...table]);{code} and {code:java} const r = await RecordBatchReader.from(await fetch(filePath)); await r.open(); {code} fail with error: Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17122) [Python] Cleanup after moving Python related code into pyarrow
Alenka Frim created ARROW-17122: --- Summary: [Python] Cleanup after moving Python related code into pyarrow Key: ARROW-17122 URL: https://issues.apache.org/jira/browse/ARROW-17122 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Alenka Frim Assignee: Alenka Frim Fix For: 10.0.0 This is an umbrella issue for follow-up work that needs to be done after https://issues.apache.org/jira/browse/ARROW-16340 is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17121) [Gandiva][C++] Adding mask function
Palak Pariawala created ARROW-17121: --- Summary: [Gandiva][C++] Adding mask function Key: ARROW-17121 URL: https://issues.apache.org/jira/browse/ARROW-17121 Project: Apache Arrow Issue Type: New Feature Components: C++ - Gandiva Reporter: Palak Pariawala Assignee: Palak Pariawala Add mask(str inp)/mask(str inp, str uc-mask, str lc-mask, str num-mask) function to Gandiva. With default masking upper case letters as 'X', lower case letters as 'x' and numbers as 'n'. Custom masking as specified in parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010)