[jira] [Created] (ARROW-18371) [C++] Expose *FromJSON helpers
Rok Mihevc created ARROW-18371: -- Summary: [C++] Expose *FromJSON helpers Key: ARROW-18371 URL: https://issues.apache.org/jira/browse/ARROW-18371 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches could be considered as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18365) [C++][Parquet] Optimize DELTA_BINARY_PACKED encoding and decoding
Rok Mihevc created ARROW-18365: -- Summary: [C++][Parquet] Optimize DELTA_BINARY_PACKED encoding and decoding Key: ARROW-18365 URL: https://issues.apache.org/jira/browse/ARROW-18365 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc [As suggested here|https://github.com/apache/arrow/pull/14191#discussion_r1019762308] simd approach such as [FastDifferentialCoding|https://github.com/lemire/FastDifferentialCoding] could be used to speed up encoding and decoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18344) [C++] Use input pre-sortedness to create sorted table with ConcatenateTables
Rok Mihevc created ARROW-18344: -- Summary: [C++] Use input pre-sortedness to create sorted table with ConcatenateTables Key: ARROW-18344 URL: https://issues.apache.org/jira/browse/ARROW-18344 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc In case of concatenating large sorted tables (e.g. sorted timeseries data) the resulting table is no longer sorted. However the input sortedness can be used to significantly speed up post concatenation sorting. A potential API could be to add ConcatenateTablesOptions.inputs_sorted and implement the logic in ConcatenateTables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18342) [C++] AsofJoinNode support for Boolean data field
Rok Mihevc created ARROW-18342: -- Summary: [C++] AsofJoinNode support for Boolean data field Key: ARROW-18342 URL: https://issues.apache.org/jira/browse/ARROW-18342 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc This is to add boolean data field support to asof join as proposed here: https://github.com/apache/arrow/pull/14485 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18278) [Java] Maven generate-libs-jni-macos-linux on M1 fails due to cmake error
Rok Mihevc created ARROW-18278: -- Summary: [Java] Maven generate-libs-jni-macos-linux on M1 fails due to cmake error Key: ARROW-18278 URL: https://issues.apache.org/jira/browse/ARROW-18278 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Rok Mihevc When building with maven on M1 [as per docs|https://arrow.apache.org/docs/dev/developers/java/building.html#id3]: {code:bash} mvn clean install mvn generate-resources -Pgenerate-libs-jni-macos-linux -N mvn -Darrow.cpp.build.dir=/arrow/java-dist/lib/ -Parrow-jni clean install {code} I get the following error: {code:bash} [INFO] --- exec-maven-plugin:3.1.0:exec (jni-cmake) @ arrow-java-root --- -- Building using CMake version: 3.24.2 -- The C compiler identification is AppleClang 14.0.0.1429 -- The CXX compiler identification is AppleClang 14.0.0.1429 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Java: /Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/bin/java (found version "11.0.16") -- Found JNI: /Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/include found components: AWT JVM CMake Error at dataset/CMakeLists.txt:18 (find_package): By not providing "FindArrowDataset.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "ArrowDataset", but CMake did not find one. Could not find a package configuration file provided by "ArrowDataset" with any of the following names: ArrowDatasetConfig.cmake arrowdataset-config.cmake Add the installation prefix of "ArrowDataset" to CMAKE_PREFIX_PATH or set "ArrowDataset_DIR" to a directory containing one of the above files. If "ArrowDataset" provides a separate development package or SDK, be sure it has been installed. -- Configuring incomplete, errors occurred! See also "/Users/rok/Documents/repos/arrow/java-jni/CMakeFiles/CMakeOutput.log". See also "/Users/rok/Documents/repos/arrow/java-jni/CMakeFiles/CMakeError.log". [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:1000) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:947) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:471) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293) at org.apache.maven.cli.MavenCli.main (MavenCli.java:196) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:566) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
[jira] [Created] (ARROW-18042) [Java] Distribute Apple M1 compatible JNI libraries via mavencentral
Rok Mihevc created ARROW-18042: -- Summary: [Java] Distribute Apple M1 compatible JNI libraries via mavencentral Key: ARROW-18042 URL: https://issues.apache.org/jira/browse/ARROW-18042 Project: Apache Arrow Issue Type: New Feature Components: Java Affects Versions: 9.0.0 Reporter: Rok Mihevc Currently JNI libraries need to be built locally to be usable on Apple silicon. We should build and distribute compatible libraries via mavencentral. @dsusanibara @lidavidm Also see ARROW-17267 and ARROW-16608 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17935) [C++] Kernel to convert timestamp with timezone to wall time
Rok Mihevc created ARROW-17935: -- Summary: [C++] Kernel to convert timestamp with timezone to wall time Key: ARROW-17935 URL: https://issues.apache.org/jira/browse/ARROW-17935 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc We have `assume_timezone` to go from wall time timestamp to timestamp with a timezone. We might want a reverse operation to go from timestamp with a timezone to wall time. This is not needed for computation within Arrow, but would be needed if an application outsides of Arrow consumes wall time. E.g.: https://stackoverflow.com/questions/73275465/how-to-keep-original-datatime-in-pyarrow-table/73276431 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17918) [Python] ExtensionArray.__getitem__ is not called if called from StructArray
Rok Mihevc created ARROW-17918: -- Summary: [Python] ExtensionArray.__getitem__ is not called if called from StructArray Key: ARROW-17918 URL: https://issues.apache.org/jira/browse/ARROW-17918 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Rok Mihevc It seems that when getting a value from a StructScalar extension information is lost. See: {code:python} import pyarrow as pa class ExampleScalar(pa.ExtensionScalar): def as_py(self): print("ExampleScalar.as_py -> {self.value.as_py()}") return self.value.as_py() class ExampleArray(pa.ExtensionArray): def __getitem__(self, item): return f"ExampleArray.__getitem__[{item}] -> {self.storage[item]}" def __arrow_ext_scalar_class__(self): return ExampleScalar class ExampleType(pa.ExtensionType): def __init__(self): pa.ExtensionType.__init__(self, pa.int64(), "ExampleExtensionType") def __arrow_ext_serialize__(self): return b"" def __arrow_ext_class__(self): return ExampleArray example_type = ExampleType() arr = pa.array([1, 2, 3]) example_array = pa.ExtensionArray.from_storage(example_type, arr) example_array2 = pa.StructArray.from_arrays([example_array, arr], ["a", "b"]) print("\nExample 1\n=") print(example_array[0]) print(example_array.type) print(type(example_array[0])) print("\nExample 2\n=") print(example_array2[0]) print(example_array2[0].type) print(example_array2[0]["a"]) print(example_array2[0]["a"].type) {code} Returns: {code:python} Example 1 = ExampleArray.__getitem__[0] -> 1 extension> Example 2 = [('a', 1), ('b', 1)] struct>, b: int64> 1 extension> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17799) [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer
Rok Mihevc created ARROW-17799: -- Summary: [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer Key: ARROW-17799 URL: https://issues.apache.org/jira/browse/ARROW-17799 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc We need to add DELTA_LENGTH_BYTE_ARRAY encoder to implement DELTA_BYTE_ARRAY encoder (ARROW-17619). ARROW-13388 already implemented DELTA_LENGTH_BYTE_ARRAY decoder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17798) [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer
Rok Mihevc created ARROW-17798: -- Summary: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer Key: ARROW-17798 URL: https://issues.apache.org/jira/browse/ARROW-17798 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc We need to add DELTA_BINARY_PACKED encoder to implement DELTA_BYTE_ARRAY encoder (ARROW-17619). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17685) [Python] Expose jemalloc statistics for logging
Rok Mihevc created ARROW-17685: -- Summary: [Python] Expose jemalloc statistics for logging Key: ARROW-17685 URL: https://issues.apache.org/jira/browse/ARROW-17685 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc Expose C++ jemalloc stats added by ARROW-16981 in Python. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17619) [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer
Rok Mihevc created ARROW-17619: -- Summary: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer Key: ARROW-17619 URL: https://issues.apache.org/jira/browse/ARROW-17619 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc PARQUET-492 added DELTA_BYTE_ARRAY decoder, but we don't have an encoder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17411) [C++][Python][Doc] Document that order is not preserved when writing dataset with use_threads=True
Rok Mihevc created ARROW-17411: -- Summary: [C++][Python][Doc] Document that order is not preserved when writing dataset with use_threads=True Key: ARROW-17411 URL: https://issues.apache.org/jira/browse/ARROW-17411 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc Current behaviour is surprising and not documented. See: ARROW-16506 and ARROW-10883 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17407) [Doc][C++][FlightRPC] Flight/gRPC best practices
Rok Mihevc created ARROW-17407: -- Summary: [Doc][C++][FlightRPC] Flight/gRPC best practices Key: ARROW-17407 URL: https://issues.apache.org/jira/browse/ARROW-17407 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation, FlightRPC Reporter: Rok Mihevc We want to provide best practices and debugging section for [flight docs|https://arrow.apache.org/docs/cpp/flight.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17398) [R] Add support for %Z to strptime
Rok Mihevc created ARROW-17398: -- Summary: [R] Add support for %Z to strptime Key: ARROW-17398 URL: https://issues.apache.org/jira/browse/ARROW-17398 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Rok Mihevc While lubridate does not support %Z flag for strptime Arrow could. Changes to C++ kernels might be required for support on all platforms, but that shouldn't block implementation as kStrptimeSupportsZone flag can be used, [see proposal|https://github.com/apache/arrow/pull/13854#issuecomment-1212694663]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17266) [Doc] Java nightlies file prefix changed
Rok Mihevc created ARROW-17266: -- Summary: [Doc] Java nightlies file prefix changed Key: ARROW-17266 URL: https://issues.apache.org/jira/browse/ARROW-17266 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Reporter: Rok Mihevc As per [Arrow docs|https://arrow.apache.org/docs/dev/java/install.html#installing-manually] Java nightlies are at: [https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-19-0-github-java-jars] However file prefix changed and new url format is: [https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars] Since it's hard to search github for old releases it would be good to change the url in the docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17218) [C++][R] Strptime should detect invalid formats
Rok Mihevc created ARROW-17218: -- Summary: [C++][R] Strptime should detect invalid formats Key: ARROW-17218 URL: https://issues.apache.org/jira/browse/ARROW-17218 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc As [discussed here|https://github.com/apache/arrow/pull/13506#pullrequestreview-1048946393] we want C++ to report invalid strptime formats to avoid having to implement this logic in multiple places. Once that is in place (if it's not yet) we should add tests for this error in R. Currently an invalid format used in R will simply return an array of NAs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17197) [R] floor_date/ceiling_date lubridate comparison tests failing on macOS
Rok Mihevc created ARROW-17197: -- Summary: [R] floor_date/ceiling_date lubridate comparison tests failing on macOS Key: ARROW-17197 URL: https://issues.apache.org/jira/browse/ARROW-17197 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Rok Mihevc Fix For: 9.0.0 We observed failing tests on local machines and [in CI|https://github.com/ursacomputing/crossbow/runs/7460282895?check_suite_focus=true#step:10:228] where timezoned timestamps are rounded to subsecond, second and minute units. Tests fail when comparing our result to lubridate's, [however it seems the issue is on lubridate's side|https://github.com/apache/arrow/pull/12154/files#diff-d405691ec7dd30bdf039b63136e5aac3c34cea96d8ff532485d1faea7f2caaacR2815-R2823]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17193) [C++] Building GCS and tests on M1 MacOS 12.05 is failing.
Rok Mihevc created ARROW-17193: -- Summary: [C++] Building GCS and tests on M1 MacOS 12.05 is failing. Key: ARROW-17193 URL: https://issues.apache.org/jira/browse/ARROW-17193 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 8.0.0 Reporter: Rok Mihevc Building GCS and tests on M1 MacOS 12.05 with dependencies installed with homebrew is failing. {code:bash} cmake \ -GNinja \ -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DCMAKE_INSTALL_LIBDIR=lib \ -DARROW_PYTHON=ON \ -DARROW_COMPUTE=ON \ -DARROW_FILESYSTEM=ON \ -DARROW_CSV=ON \ -DARROW_GCS=ON \ -DARROW_INSTALL_NAME_RPATH=OFF \ -DARROW_BUILD_TESTS=ON \ -DCMAKE_CXX_STANDARD=17 \ .. {code} Building errors with: {noformat} Undefined symbols for architecture arm64: "absl::lts_20220623::FormatTime(std::__1::basic_string_view >, absl::lts_20220623::Time, absl::lts_20220623::TimeZone)", referenced from: arrow::fs::(anonymous namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::FromChrono(std::__1::chrono::time_point > > const&)", referenced from: arrow::fs::(anonymous namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::RFC3339_full", referenced from: arrow::fs::(anonymous namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in gcsfs_test.cc.o arrow::fs::(anonymous namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::time_internal::cctz::utc_time_zone()", referenced from: arrow::fs::(anonymous namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::ToDoubleSeconds(absl::lts_20220623::Duration)", referenced from: arrow::fs::(anonymous namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::Duration::operator-=(absl::lts_20220623::Duration)", referenced from: arrow::fs::(anonymous namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in gcsfs_test.cc.o "absl::lts_20220623::ParseTime(std::__1::basic_string_view >, std::__1::basic_string_view >, absl::lts_20220623::Time*, std::__1::basic_string, std::__1::allocator >*)", referenced from: arrow::fs::(anonymous namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in gcsfs_test.cc.o {noformat} Dependencies installed with: {noformat} brew update && brew bundle --file=cpp/Brewfile {noformat} See https://github.com/apache/arrow/pull/13681#issuecomment-1193241547 and https://github.com/apache/arrow/pull/13407 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17166) [R] [CI] Remove ENV TZ from docker files
Rok Mihevc created ARROW-17166: -- Summary: [R] [CI] Remove ENV TZ from docker files Key: ARROW-17166 URL: https://issues.apache.org/jira/browse/ARROW-17166 Project: Apache Arrow Issue Type: Bug Reporter: Rok Mihevc Fix For: 9.0.0 We have noticed R CI job (AMD64 Ubuntu 20.04 R 4.2 Force-Tests true) failing on master: [1|https://github.com/apache/arrow/runs/7424773120?check_suite_focus=true#step:7:5547], [2|https://github.com/apache/arrow/runs/7431821192?check_suite_focus=true#step:7:5804], [3|https://github.com/apache/arrow/runs/7445803518?check_suite_focus=true#step:7:16305] with: {code:java} Start test: array uses local timezone for POSIXct without timezone test-Array.R:269:3 [success] System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17147) [R] parse_date_time should support locale parameter
Rok Mihevc created ARROW-17147: -- Summary: [R] parse_date_time should support locale parameter Key: ARROW-17147 URL: https://issues.apache.org/jira/browse/ARROW-17147 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Rok Mihevc See [discussion here|https://github.com/apache/arrow/pull/13627#discussion_r924875872]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17146) [R] parse_date_time should support quiet = FALSE
Rok Mihevc created ARROW-17146: -- Summary: [R] parse_date_time should support quiet = FALSE Key: ARROW-17146 URL: https://issues.apache.org/jira/browse/ARROW-17146 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Rok Mihevc See [discussion here|https://github.com/apache/arrow/pull/13627#discussion_r924875872]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17141) [C++] Enable selecting nested fields in StructArray with field path
Rok Mihevc created ARROW-17141: -- Summary: [C++] Enable selecting nested fields in StructArray with field path Key: ARROW-17141 URL: https://issues.apache.org/jira/browse/ARROW-17141 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc Currently selecting a nested field in a StructArray requires multiple selects or flattening of schema. It would be more user friendly to provide a field path e.g.: field_in_top_struct.field_in_substruct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17132) [R] Mutate in compare_dplyr_binding returns wrong type
Rok Mihevc created ARROW-17132: -- Summary: [R] Mutate in compare_dplyr_binding returns wrong type Key: ARROW-17132 URL: https://issues.apache.org/jira/browse/ARROW-17132 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Rok Mihevc The following: {code:r} df <- tibble::tibble( time = as.POSIXct(seq(as.Date("1999-12-31", tz = "UTC"), as.Date("2001-01-01", tz = "UTC"), by = "day")) ) compare_dplyr_binding( .input %>% mutate(x = yday(time)) %>% collect(), df ) {code} Fails with: {code:bash} Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp `object` (`actual`) not equal to `expected` (`expected`). `attr(actual$time, 'tzone')` is a character vector ('UTC') `attr(expected$time, 'tzone')` is absent Backtrace: 1. arrow:::compare_dplyr_binding(...) at test-dplyr-funcs-datetime.R:574:2 2. arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4 3. testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4 Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp `object` (`actual`) not equal to `expected` (`expected`). `attr(actual$time, 'tzone')` is a character vector ('UTC') `attr(expected$time, 'tzone')` is absent Backtrace: 1. arrow:::compare_dplyr_binding(...) at test-dplyr-funcs-datetime.R:574:2 2. arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4 3. testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4 {code} This also happens for qday and probably other functions where input is temporal and output is numeric. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17065) [Python] Allow using subclassed ExtensionScalar in ExtensionType
Rok Mihevc created ARROW-17065: -- Summary: [Python] Allow using subclassed ExtensionScalar in ExtensionType Key: ARROW-17065 URL: https://issues.apache.org/jira/browse/ARROW-17065 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Rok Mihevc Fix For: 9.0.0 This is a follow-up to ARROW-13612. [See discussion.|https://github.com/apache/arrow/pull/13454#issuecomment-1177140141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16981) [C++] Expose jemalloc statistics for logging
Rok Mihevc created ARROW-16981: -- Summary: [C++] Expose jemalloc statistics for logging Key: ARROW-16981 URL: https://issues.apache.org/jira/browse/ARROW-16981 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc Assignee: Rok Mihevc This would enable us to log memory usage and diagnose out of memory issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16932) [C++] Rounding RoundTemporalOptions.calendar_based_origin doesn't correctly offset non-UTC results
Title: Message Title Rok Mihevc created an issue Apache Arrow / ARROW-16932 [C++] Rounding RoundTemporalOptions.calendar_based_origin doesn't correctly offset non-UTC results Issue Type: Bug Assignee: Rok Mihevc Components: C++ Created: 29/Jun/22 11:49 Labels: kernel c++ timestamp Priority: Major Reporter: Rok Mihevc Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16618) [C++][Python] strptime fails to parse with %p on Windows
Rok Mihevc created ARROW-16618: -- Summary: [C++][Python] strptime fails to parse with %p on Windows Key: ARROW-16618 URL: https://issues.apache.org/jira/browse/ARROW-16618 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Rok Mihevc As reported in https://github.com/apache/arrow/issues/13111 parsing a timestamp with %p will fail on Windows. This is probably due to issues with vendored strptime on Windows locales. We should explore which flags can be enabled and how. Strptime tests suite should be expanded https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string_test.cc#L1842-L1890. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16536) [Doc][Cookbook][Flight] Find client address from ArrowFlightServer
Rok Mihevc created ARROW-16536: -- Summary: [Doc][Cookbook][Flight] Find client address from ArrowFlightServer Key: ARROW-16536 URL: https://issues.apache.org/jira/browse/ARROW-16536 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc We want a cookbook entry for Python/C++/Java describing how to get Arrow Flight Server client's address. See: [Java|https://stackoverflow.com/a/36140002/262727] [Python |https://arrow.apache.org/docs/python/generated/pyarrow.flight.ServerCallContext.html#pyarrow.flight.ServerCallContext.peer] [C++|https://arrow.apache.org/docs/cpp/api/flight.html#_CPPv4NK5arrow6flight17ServerCallContext4peerEv] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16535) [C++] Temporal floor/ceil/round should have settable origin unit
Rok Mihevc created ARROW-16535: -- Summary: [C++] Temporal floor/ceil/round should have settable origin unit Key: ARROW-16535 URL: https://issues.apache.org/jira/browse/ARROW-16535 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Rok Mihevc Temporal rounding kernels (will) allow setting of rounding origin to a greater unit. This could be made more flexible by introducing a `greater_unit` parameter which would let user select the unit serving as origin. See [this discussion|https://github.com/apache/arrow/pull/12657#issuecomment-1119580484] for more context. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16147) [C++] ParquetFileWriter doesn't call sink_.Close when using GcsRandomAccessFile
Rok Mihevc created ARROW-16147: -- Summary: [C++] ParquetFileWriter doesn't call sink_.Close when using GcsRandomAccessFile Key: ARROW-16147 URL: https://issues.apache.org/jira/browse/ARROW-16147 Project: Apache Arrow Issue Type: Bug Reporter: Rok Mihevc -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16142) [C++] Temporal floor/ceil/round returns incorrect results for date32 and time32 inputs
Rok Mihevc created ARROW-16142: -- Summary: [C++] Temporal floor/ceil/round returns incorrect results for date32 and time32 inputs Key: ARROW-16142 URL: https://issues.apache.org/jira/browse/ARROW-16142 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Rok Mihevc Temporal rounding flooring seem to interpret 32 bit input arrays as 64 bit arrays. The following test: {code:c++} TEST_F(ScalarTemporalTest, TestCeilFloorRoundTemporalDate) { RoundTemporalOptions round_to_2_hours = RoundTemporalOptions(2, CalendarUnit::HOUR); const char* date32s = R"([0, 11016, -25932, null])"; const char* date64s = R"([0, 95178240, -224052480, null])"; auto dates32 = ArrayFromJSON(date32(), date32s); auto dates64 = ArrayFromJSON(date64(), date64s); CheckScalarUnary("ceil_temporal", dates64, dates64, _to_2_hours); CheckScalarUnary("floor_temporal", dates64, dates64, _to_2_hours); CheckScalarUnary("round_temporal", dates64, dates64, _to_2_hours); CheckScalarUnary("ceil_temporal", dates32, dates32, _to_2_hours); CheckScalarUnary("floor_temporal", dates32, dates32, _to_2_hours); CheckScalarUnary("round_temporal", dates32, dates32, _to_2_hours); const char* times_s = R"([0, 7200, null])"; const char* times_ms = R"([0, 720, null])"; const char* times_us = R"([0, 72, null])"; const char* times_ns = R"([0, 72000, null])"; auto arr_s = ArrayFromJSON(time32(TimeUnit::SECOND), times_s); auto arr_ms = ArrayFromJSON(time32(TimeUnit::MILLI), times_ms); auto arr_us = ArrayFromJSON(time64(TimeUnit::MICRO), times_us); auto arr_ns = ArrayFromJSON(time64(TimeUnit::NANO), times_ns); CheckScalarUnary("ceil_temporal", arr_s, arr_s, _to_2_hours); CheckScalarUnary("ceil_temporal", arr_ms, arr_ms, _to_2_hours); CheckScalarUnary("ceil_temporal", arr_us, arr_us, _to_2_hours); CheckScalarUnary("ceil_temporal", arr_ns, arr_ns, _to_2_hours); } {code} Returns: {code:bash} Got: [ [ 1970-01-01, 1970-01-01, 2000-02-29, null ] ] Expected: [ [ 1970-01-01 ], [ 2000-02-29, 1899-01-01, null ] ] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16110) [C++] GcsFileSystem::Make ignores IOContext
Rok Mihevc created ARROW-16110: -- Summary: [C++] GcsFileSystem::Make ignores IOContext Key: ARROW-16110 URL: https://issues.apache.org/jira/browse/ARROW-16110 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Rok Mihevc Passed IO context is ignored and default context is used. See current function: {code:cpp} std::shared_ptr GcsFileSystem::Make(const GcsOptions& options, const io::IOContext& context) { // Cannot use `std::make_shared<>` as the constructor is private. return std::shared_ptr( new GcsFileSystem(options, io::default_io_context())); } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15942) [C++] RecordBatch::ValidateFull fails on nested StructArray
Rok Mihevc created ARROW-15942: -- Summary: [C++] RecordBatch::ValidateFull fails on nested StructArray Key: ARROW-15942 URL: https://issues.apache.org/jira/browse/ARROW-15942 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Rok Mihevc ValidateFull appears to discard the outermost field of nested schema. The following example passes: {code:bash} diff --git a/cpp/src/arrow/array/array_struct_test.cc b/cpp/src/arrow/array/array_struct_test.cc index 318c83860..6a8896ca9 100644 --- a/cpp/src/arrow/array/array_struct_test.cc +++ b/cpp/src/arrow/array/array_struct_test.cc @@ -15,6 +15,8 @@ // specific language governing permissions and limitations // under the License. +#include + #include #include @@ -696,4 +698,20 @@ TEST(TestFieldRef, GetChildren) { AssertArraysEqual(*a, *expected_a); } +TEST(TestFieldRef, TestValidateFullRecordBatch) { + auto struct_array = + ArrayFromJSON(struct_({field("a", struct_({field("b", float64())}))}), R"([ +{"a": {"b": 6.125}}, +{"a": {"b": 0.0}}, +{"a": {"b": -1}} + ])"); + + auto schema1 = arrow::schema({field("x", struct_({field("a", struct_({field("b", float64())}))}))}); + auto schema2 = arrow::schema({field("a", struct_({field("b", float64())}))}); + auto record_batch1 = arrow::RecordBatch::Make(schema1, 3, {struct_array}); + auto record_batch2 = arrow::RecordBatch::Make(schema2, 3, {struct_array}); + ASSERT_OK(record_batch1->ValidateFull()); + ASSERT_NOT_OK(record_batch2->ValidateFull()); +} + {code} Is this expected behaviour? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15894) [C++] Strptime issues umbrella
Rok Mihevc created ARROW-15894: -- Summary: [C++] Strptime issues umbrella Key: ARROW-15894 URL: https://issues.apache.org/jira/browse/ARROW-15894 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc This is to make strptime efforts more visible -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15859) [C++] Add nightly test for static build with arrow_flight_static and arrow_bundled_dependencies
Rok Mihevc created ARROW-15859: -- Summary: [C++] Add nightly test for static build with arrow_flight_static and arrow_bundled_dependencies Key: ARROW-15859 URL: https://issues.apache.org/jira/browse/ARROW-15859 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: Rok Mihevc Due to abseil dependencies static builds with arrow_bundled_dependencies are brittle. We could test them nightly with the example proposed in ARROW-14708. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15787) [C++] Temporal floor/ceil/round kernels could be optimised with templating
Rok Mihevc created ARROW-15787: -- Summary: [C++] Temporal floor/ceil/round kernels could be optimised with templating Key: ARROW-15787 URL: https://issues.apache.org/jira/browse/ARROW-15787 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc [CeilTemporal, FloorTemporal, RoundTemporal|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc#L728-L980] kernels could probably be templated in a clean way. They also execute a switch statement for every call instead of creating an operator at kernel call time and only running that. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15771) [C++][Compute] Add window join to execution engine
Rok Mihevc created ARROW-15771: -- Summary: [C++][Compute] Add window join to execution engine Key: ARROW-15771 URL: https://issues.apache.org/jira/browse/ARROW-15771 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc We would want to support window joins with as-of support. See https://github.com/substrait-io/substrait/issues/3 for more. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15764) [C++][FlightRPC] Optionally cache serialized ListFlights serverside
Rok Mihevc created ARROW-15764: -- Summary: [C++][FlightRPC] Optionally cache serialized ListFlights serverside Key: ARROW-15764 URL: https://issues.apache.org/jira/browse/ARROW-15764 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc ListFlights serializes flights each time it is called. If we have many flights and ListFlights is called often this can produce a significant load on the server. We could have an optional cache server side to avoid this. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15680) [C++] Temporal floor/ceil/round should accept week_start when rounding to multiple of week
Rok Mihevc created ARROW-15680: -- Summary: [C++] Temporal floor/ceil/round should accept week_start when rounding to multiple of week Key: ARROW-15680 URL: https://issues.apache.org/jira/browse/ARROW-15680 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc See ARROW-14821 and the [related PR|https://github.com/apache/arrow/pull/12154]. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15666) [C++] Add format inference option to StrptimeOptions
Rok Mihevc created ARROW-15666: -- Summary: [C++] Add format inference option to StrptimeOptions Key: ARROW-15666 URL: https://issues.apache.org/jira/browse/ARROW-15666 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc We want to have an option to infer timestamp format. See [pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html] and lubridate [parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html] for examples. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15665) [C++] Add error handling option to StrptimeOptions
Rok Mihevc created ARROW-15665: -- Summary: [C++] Add error handling option to StrptimeOptions Key: ARROW-15665 URL: https://issues.apache.org/jira/browse/ARROW-15665 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc We want to have an option to either raise, ignore or return NA in case of format mismatch. See [pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html] and lubridate [parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html] for examples. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15619) [C++] Temporal component extraction function for extracting is_leap_year indicator
Rok Mihevc created ARROW-15619: -- Summary: [C++] Temporal component extraction function for extracting is_leap_year indicator Key: ARROW-15619 URL: https://issues.apache.org/jira/browse/ARROW-15619 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc This should return as [is_leap_year|https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.is_leap_year.html] from Pandas and [leap_year|https://lubridate.tidyverse.org/reference/leap_year.html] from lubridate. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15549) [Java] gRPC not available on M1
Rok Mihevc created ARROW-15549: -- Summary: [Java] gRPC not available on M1 Key: ARROW-15549 URL: https://issues.apache.org/jira/browse/ARROW-15549 Project: Apache Arrow Issue Type: New Feature Reporter: Rok Mihevc When building on M1 gRPC is not found. It can be [manually downloaded|https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.41.0/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe] and installed: {code:bash} mvn install:install-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.41.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/Users/rok/Downloads/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe {code} But perhaps that could be fixed as suggested here: https://github.com/grpc/grpc-java/issues/7690 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15473) [C++][FlightRPC] Expose a way to terminate DoExchange stream client side
Rok Mihevc created ARROW-15473: -- Summary: [C++][FlightRPC] Expose a way to terminate DoExchange stream client side Key: ARROW-15473 URL: https://issues.apache.org/jira/browse/ARROW-15473 Project: Apache Arrow Issue Type: New Feature Components: C++, FlightRPC Reporter: Rok Mihevc We want a mechanism to close DoExchange streams from client side in case of long running connections. This would be handy for testing and in case e.g. user wants to disconnect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15251) [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time
Rok Mihevc created ARROW-15251: -- Summary: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time Key: ARROW-15251 URL: https://issues.apache.org/jira/browse/ARROW-15251 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc ARROW-14822 enables temporal round/floor/ceil and raises on ambiguous/nonexistent local time. We should allow users to choose different behaviours e.g.: raise, earliest, latest, etc. See AssumeTimezoneOptions for example. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15250) [Python][R] Temporal floor/ceil/round for should accept frequency string
Rok Mihevc created ARROW-15250: -- Summary: [Python][R] Temporal floor/ceil/round for should accept frequency string Key: ARROW-15250 URL: https://issues.apache.org/jira/browse/ARROW-15250 Project: Apache Arrow Issue Type: Improvement Components: Python, R Reporter: Rok Mihevc More user-friendly rounding period input can be supported. See [Pandas to_offset|https://github.com/pandas-dev/pandas/blob/a6c1f6cccee6bbccfb29488a94664ed07db024d9/pandas/_libs/tslibs/offsets.pyx#L3575-L3679] and [lubridate's period|https://lubridate.tidyverse.org/reference/period.html]. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-13684) [C++][Compute] Strftime kernel follow-up
Rok Mihevc created ARROW-13684: -- Summary: [C++][Compute] Strftime kernel follow-up Key: ARROW-13684 URL: https://issues.apache.org/jira/browse/ARROW-13684 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc Assignee: Rok Mihevc Fix For: 6.0.0 As per ARROW-13174 [comments|https://github.com/apache/arrow/pull/10647#issuecomment-901783928] we should: * Correct default format string for non-UTC timestamps * Allow non-zoned timestamps to be printed * Better document %S flag behavior -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13347) [C++][Compute] Add option to return null values for nonexistent and ambiguous times
Rok Mihevc created ARROW-13347: -- Summary: [C++][Compute] Add option to return null values for nonexistent and ambiguous times Key: ARROW-13347 URL: https://issues.apache.org/jira/browse/ARROW-13347 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc ARROW-13033 Implements TzLocalize kernel and can handle ambiguous and nonexistent times by raising or shifting backwards or forwards to first valid local time. However we might want to be able to return null in these cases. We could implement new flags to do so: {{compute::TemporalLocalizationOptions::Nonexistent::NONEXISTENT_IGNORE}} and {{compute::TemporalLocalizationOptions::Ambiguous::AMBIGUOUS_IGNORE}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13174) [C+][Compute] Add strftime kernel
Rok Mihevc created ARROW-13174: -- Summary: [C+][Compute] Add strftime kernel Key: ARROW-13174 URL: https://issues.apache.org/jira/browse/ARROW-13174 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc To express timestamps with arbitrary format we require a strftime kernel. See [comments here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13168) [C++] Timezone database configuration and access
Rok Mihevc created ARROW-13168: -- Summary: [C++] Timezone database configuration and access Key: ARROW-13168 URL: https://issues.apache.org/jira/browse/ARROW-13168 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Rok Mihevc Note: currently timezone database is not available on windows so timezone aware operations will fail. We're using tz.h library which needs an updated timezone database to correctly handle timezoned timestamps. See [installation instructions|https://howardhinnant.github.io/date/tz.html#Installation]. We have the following options for getting a timezone database: # local (non-windows) OS timezone database - no work required. # arrow bundled folder - we could bundle the database at build time for windows. Database would slowly go stale. # download it from IANA Time Zone Database at runtime - tz.h gets the database at runtime, but curl (and 7-zip on windows) are required. # local user-provided folder - user could provide a location at buildtime. Nice to have. # allow runtime configuration - at runtime say: "the tzdata can be found at this location" For more context see: [ARROW-12980|https://github.com/apache/arrow/pull/10457] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12980) [C++] Kernels to extract datetime components should be timezone aware
Rok Mihevc created ARROW-12980: -- Summary: [C++] Kernels to extract datetime components should be timezone aware Key: ARROW-12980 URL: https://issues.apache.org/jira/browse/ARROW-12980 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc Assignee: Rok Mihevc As followup to ARROW-11759 datetime component extraction kernels should return localized time components if timezone property is present. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12820) [C++] Strptime ignores timezone information
Rok Mihevc created ARROW-12820: -- Summary: [C++] Strptime ignores timezone information Key: ARROW-12820 URL: https://issues.apache.org/jira/browse/ARROW-12820 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Rok Mihevc ParseTimestampStrptime currently ignores the timezone information. So timestamps are read as if they were all in UTC. This can be unexpected. See [discussion|https://github.com/apache/arrow/pull/10334#discussion_r634269138] for details. It would be useful to either capture timezone information or convert timestamp to UTC when parsing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12499) [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
Rok Mihevc created ARROW-12499: -- Summary: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels Key: ARROW-12499 URL: https://issues.apache.org/jira/browse/ARROW-12499 Project: Apache Arrow Issue Type: Improvement Components: C++, Python, R Reporter: Rok Mihevc Assignee: Rok Mihevc Follow up to ARROW-9054 and ARROW-12185 - see [comment|https://github.com/apache/arrow/pull/10032#pullrequestreview-641468079]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12301) [C++][Compute] Use generic hash-aggregate for DictionaryArrays
Rok Mihevc created ARROW-12301: -- Summary: [C++][Compute] Use generic hash-aggregate for DictionaryArrays Key: ARROW-12301 URL: https://issues.apache.org/jira/browse/ARROW-12301 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc When calculating unique for chunked DictionaryArrays we currently run through all chunks and unify their dictionaries and then collect chunk indices. We could avoid the dictionary unification by using a generic hash. [See discussion here.|https://github.com/apache/arrow/pull/9683] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7297) [C++] Add value accessor in sparse tensor class
[ https://issues.apache.org/jira/browse/ARROW-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-7297: - Assignee: Rok Mihevc > [C++] Add value accessor in sparse tensor class > --- > > Key: ARROW-7297 > URL: https://issues.apache.org/jira/browse/ARROW-7297 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Major > > {{SparseTensor}} can have value accessor like {{Tensor::Value}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-8162: -- Fix Version/s: (was: 1.0.0) 0.17.0 > [Format][Python] Add serialization for CSF sparse tensors > - > > Key: ARROW-8162 > URL: https://issues.apache.org/jira/browse/ARROW-8162 > Project: Apache Arrow > Issue Type: Improvement > Components: Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is > complete serialization for CSF sparse tensors should be enabled in Python too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-8162: -- Description: Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is complete serialization for CSF sparse tensors should be enabled in Python too. (was: Once [#ARROW-7428] is complete serialization for CSF sparse tensors should be enabled in Python too.) > [Format][Python] Add serialization for CSF sparse tensors > - > > Key: ARROW-8162 > URL: https://issues.apache.org/jira/browse/ARROW-8162 > Project: Apache Arrow > Issue Type: Improvement > Components: Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Fix For: 1.0.0 > > > Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is > complete serialization for CSF sparse tensors should be enabled in Python too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors
Rok Mihevc created ARROW-8162: - Summary: [Format][Python] Add serialization for CSF sparse tensors Key: ARROW-8162 URL: https://issues.apache.org/jira/browse/ARROW-8162 Project: Apache Arrow Issue Type: Improvement Components: Format, Python Reporter: Rok Mihevc Assignee: Rok Mihevc Fix For: 1.0.0 Once [#ARROW-7428] is complete serialization for CSF sparse tensors should be enabled in Python too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Fix Version/s: 1.0.0 > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Component/s: (was: C++) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Component/s: (was: C++) > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Fix Version/s: 1.0.0 > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Component/s: Format C++ > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Component/s: Format C++ > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Fix Version/s: 1.0.0 > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Component/s: Python Format C++ > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Description: Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from Python. (was: Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from Python.) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Priority: Minor (was: Major) > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Priority: Minor (was: Major) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > > Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Priority: Minor (was: Major) > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Summary: [Format][C++] Add serialization for CSF sparse tensors (was: [Format][C++] Add serialization for CSF and CSC sparse tensors) > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Description: Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from Python. (was: Once ARROW-4225 is complete we want to be able to use it from Python.) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Description: Once ARROW-4225 is complete we want to be able to use it from Python. > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > Once ARROW-4225 is complete we want to be able to use it from Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Summary: [Python] Support SparseCSFTensor (was: [Python] Support SparseCSFMatrix) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF and CSC sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Summary: [Format][C++] Add serialization for CSF and CSC sparse tensors (was: [C++] Add serialization for CSF and CSC sparse tensors) > [Format][C++] Add serialization for CSF and CSC sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7428) [C++] Add serialization for CSF and CSC sparse tensors
Rok Mihevc created ARROW-7428: - Summary: [C++] Add serialization for CSF and CSC sparse tensors Key: ARROW-7428 URL: https://issues.apache.org/jira/browse/ARROW-7428 Project: Apache Arrow Issue Type: New Feature Reporter: Rok Mihevc Once ARROW-4225 and ARROW-4226 are completed we should add serialization support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7428) [C++] Add serialization for CSF and CSC sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-7428: - Assignee: Rok Mihevc > [C++] Add serialization for CSF and CSC sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7427) [Python] Support SparseCSFMatrix
Rok Mihevc created ARROW-7427: - Summary: [Python] Support SparseCSFMatrix Key: ARROW-7427 URL: https://issues.apache.org/jira/browse/ARROW-7427 Project: Apache Arrow Issue Type: New Feature Reporter: Rok Mihevc Assignee: Rok Mihevc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-7419: - Assignee: Rok Mihevc > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-5805) [Python] Dockerize (add to docker-compose) Python Travis CI job
[ https://issues.apache.org/jira/browse/ARROW-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc closed ARROW-5805. - Resolution: Won't Do > [Python] Dockerize (add to docker-compose) Python Travis CI job > --- > > Key: ARROW-5805 > URL: https://issues.apache.org/jira/browse/ARROW-5805 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Rok Mihevc >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://github.com/apache/arrow/blob/master/.travis.yml#L118 > The existing Python Dockerfiles should be expanded to test all of the things > that are being tested currently in Travis CI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5805) [Python] Dockerize (add to docker-compose) Python Travis CI job
[ https://issues.apache.org/jira/browse/ARROW-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973129#comment-16973129 ] Rok Mihevc commented on ARROW-5805: --- This is now solved with Github Actions. > [Python] Dockerize (add to docker-compose) Python Travis CI job > --- > > Key: ARROW-5805 > URL: https://issues.apache.org/jira/browse/ARROW-5805 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Rok Mihevc >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://github.com/apache/arrow/blob/master/.travis.yml#L118 > The existing Python Dockerfiles should be expanded to test all of the things > that are being tested currently in Travis CI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4224) [Python] Support integration with pydata/sparse library
[ https://issues.apache.org/jira/browse/ARROW-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4224: -- Fix Version/s: 1.0.0 > [Python] Support integration with pydata/sparse library > --- > > Key: ARROW-4224 > URL: https://issues.apache.org/jira/browse/ARROW-4224 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available, sparse > Fix For: 1.0.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > It would be great to support integration with pydata/sparse library. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4223) [Python] Support scipy.sparse integration
[ https://issues.apache.org/jira/browse/ARROW-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4223: -- Fix Version/s: 1.0.0 > [Python] Support scipy.sparse integration > - > > Key: ARROW-4223 > URL: https://issues.apache.org/jira/browse/ARROW-4223 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available, sparse > Fix For: 1.0.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > It would be great to support integration with scipy.sparse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4226) [C++] Add CSF sparse tensor support
[ https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942607#comment-16942607 ] Rok Mihevc commented on ARROW-4226: --- I was just reading it :) I'll start working on this sometime this week. > [C++] Add CSF sparse tensor support > --- > > Key: ARROW-4226 > URL: https://issues.apache.org/jira/browse/ARROW-4226 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: sparse > Fix For: 1.0.0 > > > [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172] > {quote}Perhaps in the future, if zero-copy and future-proof-ness is really > what we want, we might want to add the CSF (compressed sparse fiber) format, > a generalisation of CSR/CSC. I'm currently working on adding it to > PyData/Sparse, and I plan to make it the preferred format (COO will still be > around though). > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-4226) [C++] Add CSF sparse tensor support
[ https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-4226: - Assignee: Rok Mihevc (was: Kenta Murata) > [C++] Add CSF sparse tensor support > --- > > Key: ARROW-4226 > URL: https://issues.apache.org/jira/browse/ARROW-4226 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: sparse > Fix For: 1.0.0 > > > [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172] > {quote}Perhaps in the future, if zero-copy and future-proof-ness is really > what we want, we might want to add the CSF (compressed sparse fiber) format, > a generalisation of CSR/CSC. I'm currently working on adding it to > PyData/Sparse, and I plan to make it the preferred format (COO will still be > around though). > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support
[ https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942598#comment-16942598 ] Rok Mihevc commented on ARROW-4225: --- Ok, I'll check the paper and see if I can get somewhere. :) > [C++] Add CSC sparse matrix support > --- > > Key: ARROW-4225 > URL: https://issues.apache.org/jira/browse/ARROW-4225 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Minor > Labels: sparse > Fix For: 1.0.0 > > > CSC sparse matrix is necessary for integration with existing sparse matrix > libraries (umfpack, superlu). > https://github.com/apache/arrow/pull/2546#issuecomment-422135645 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support
[ https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942568#comment-16942568 ] Rok Mihevc commented on ARROW-4225: --- That's great @mrkn! Did you also start on [CSF|https://issues.apache.org/jira/browse/ARROW-4226]? If not I'll pick it up. > [C++] Add CSC sparse matrix support > --- > > Key: ARROW-4225 > URL: https://issues.apache.org/jira/browse/ARROW-4225 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Minor > Labels: sparse > Fix For: 1.0.0 > > > CSC sparse matrix is necessary for integration with existing sparse matrix > libraries (umfpack, superlu). > https://github.com/apache/arrow/pull/2546#issuecomment-422135645 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-4225) [C++] Add CSC sparse matrix support
[ https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-4225: - Assignee: Rok Mihevc > [C++] Add CSC sparse matrix support > --- > > Key: ARROW-4225 > URL: https://issues.apache.org/jira/browse/ARROW-4225 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: sparse > Fix For: 1.0.0 > > > CSC sparse matrix is necessary for integration with existing sparse matrix > libraries (umfpack, superlu). > https://github.com/apache/arrow/pull/2546#issuecomment-422135645 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6671) [C++] Sparse tensor naming
[ https://issues.apache.org/jira/browse/ARROW-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936779#comment-16936779 ] Rok Mihevc commented on ARROW-6671: --- SparseCSRMatrix might be more misleading as it 'doesn't look look like' a Tensor type. I think that is potentially more confusing than it being limited to 2D. +1 for the consistent naming > [C++] Sparse tensor naming > -- > > Key: ARROW-6671 > URL: https://issues.apache.org/jira/browse/ARROW-6671 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Kenta Murata >Priority: Minor > > Currently there's {{SparseCOOIndex}} and {{SparseCSRIndex}}, but also > {{SparseTensorCOO}} and {{SparseTensorCSR}}. > For consistency, it would be nice to rename the latter {{SparseCOOTensor}} > and {{SparseCSRTensor}}. > Also, it's not obvious the {{SparseMatrixCSR}} alias is useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6624) [C++] Add SparseTensor.ToTensor() method
Rok Mihevc created ARROW-6624: - Summary: [C++] Add SparseTensor.ToTensor() method Key: ARROW-6624 URL: https://issues.apache.org/jira/browse/ARROW-6624 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc Assignee: Rok Mihevc We have functionality to convert (dense) tensors to sparse tensors, but not the other way around. Also [see discussion|https://github.com/apache/arrow/pull/4446#issuecomment-503792308]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4208) [CI/Python] Have automatized tests for S3
[ https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929826#comment-16929826 ] Rok Mihevc commented on ARROW-4208: --- With #[5200|https://github.com/apache/arrow/pull/5200] we get minio server in pytest via fixture. It will work run in docker images, travis and appveyor. So far we have one [test|https://github.com/apache/arrow/blob/59f1e148d5c0fa13b7964f85f13011532ff515ed/python/pyarrow/tests/test_parquet.py#L1797] in python. Do we want to add other tests now? Do we have regression examples? > [CI/Python] Have automatized tests for S3 > - > > Key: ARROW-4208 > URL: https://issues.apache.org/jira/browse/ARROW-4208 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Rok Mihevc >Priority: Major > Labels: filesystem, pull-request-available, s3 > Fix For: 1.0.0 > > > Currently We don't run S3 integration tests regularly. > Possible solutions: > - mock it within python/pytest > - simply run the s3 tests with an S3 credential provided > - create a hdfs-integration like docker-compose setup and run an S3 mock > server (e.g.: https://github.com/adobe/S3Mock, > https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, > https://github.com/jserver/mock-s3) > For more see discussion https://github.com/apache/arrow/pull/3286 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3
[ https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4208: -- Labels: filesystem pull-request-available s3 (was: filesystem s3) > [CI/Python] Have automatized tests for S3 > - > > Key: ARROW-4208 > URL: https://issues.apache.org/jira/browse/ARROW-4208 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Rok Mihevc >Priority: Major > Labels: filesystem, pull-request-available, s3 > Fix For: 1.0.0 > > > Currently We don't run S3 integration tests regularly. > Possible solutions: > - mock it within python/pytest > - simply run the s3 tests with an S3 credential provided > - create a hdfs-integration like docker-compose setup and run an S3 mock > server (e.g.: https://github.com/adobe/S3Mock, > https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, > https://github.com/jserver/mock-s3) > For more see discussion https://github.com/apache/arrow/pull/3286 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself
[ https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917191#comment-16917191 ] Rok Mihevc commented on ARROW-6358: --- Got it. Thanks! > [C++] FileSystem::DeleteDir should make it optional to delete the directory > itself > -- > > Key: ARROW-6358 > URL: https://issues.apache.org/jira/browse/ARROW-6358 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.14.1 >Reporter: Antoine Pitrou >Priority: Major > > In some situations, it can be desirable to delete the entirety of a > directory's contents, but not the directory itself (e.g. when it's a S3 > bucket). Perhaps we should add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself
[ https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917142#comment-16917142 ] Rok Mihevc commented on ARROW-6358: --- Can you treat bucket as root of the filesystem? > [C++] FileSystem::DeleteDir should make it optional to delete the directory > itself > -- > > Key: ARROW-6358 > URL: https://issues.apache.org/jira/browse/ARROW-6358 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.14.1 >Reporter: Antoine Pitrou >Priority: Major > > In some situations, it can be desirable to delete the entirety of a > directory's contents, but not the directory itself (e.g. when it's a S3 > bucket). Perhaps we should add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself
[ https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916616#comment-16916616 ] Rok Mihevc commented on ARROW-6358: --- Ah yes, sorry, missed the bucket case. As an occasional S3 user I would be surprised if arrow deleted a bucket and not only it's contents. But I can imagine it would be useful to have that option sometimes. > [C++] FileSystem::DeleteDir should make it optional to delete the directory > itself > -- > > Key: ARROW-6358 > URL: https://issues.apache.org/jira/browse/ARROW-6358 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.14.1 >Reporter: Antoine Pitrou >Priority: Major > > In some situations, it can be desirable to delete the entirety of a > directory's contents, but not the directory itself (e.g. when it's a S3 > bucket). Perhaps we should add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself
[ https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916134#comment-16916134 ] Rok Mihevc commented on ARROW-6358: --- AFAIK folders don't really exist in S3. Once objects inside a folder are delete the folder no longer exists. So deleting the contents and folder are the same thing See: https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java > [C++] FileSystem::DeleteDir should make it optional to delete the directory > itself > -- > > Key: ARROW-6358 > URL: https://issues.apache.org/jira/browse/ARROW-6358 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.14.1 >Reporter: Antoine Pitrou >Priority: Major > > In some situations, it can be desirable to delete the entirety of a > directory's contents, but not the directory itself (e.g. when it's a S3 > bucket). Perhaps we should add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself
[ https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916134#comment-16916134 ] Rok Mihevc edited comment on ARROW-6358 at 8/26/19 8:57 PM: AFAIK folders don't really exist in S3. Once objects inside a folder are deleted the folder no longer exists. So deleting the contents and deleting the folder are the same thing. See: https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java was (Author: rokm): AFAIK folders don't really exist in S3. Once objects inside a folder are delete the folder no longer exists. So deleting the contents and folder are the same thing See: https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java > [C++] FileSystem::DeleteDir should make it optional to delete the directory > itself > -- > > Key: ARROW-6358 > URL: https://issues.apache.org/jira/browse/ARROW-6358 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.14.1 >Reporter: Antoine Pitrou >Priority: Major > > In some situations, it can be desirable to delete the entirety of a > directory's contents, but not the directory itself (e.g. when it's a S3 > bucket). Perhaps we should add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (ARROW-1456) [Python] Run s3fs unit tests in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-1456: - Assignee: Rok Mihevc > [Python] Run s3fs unit tests in Travis CI > - > > Key: ARROW-1456 > URL: https://issues.apache.org/jira/browse/ARROW-1456 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Rok Mihevc >Priority: Major > Labels: filesystem > Fix For: 1.0.0 > > > We'll need to set up an S3 bucket to write to with credentials that cannot > compromise anyone's AWS account. I've been testing locally with a user that I > set up but I wouldn't be comfortable checking in these credentials, even in > encrypted form, without more scrutiny -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (ARROW-4208) [CI/Python] Have automatized tests for S3
[ https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-4208: - Assignee: Rok Mihevc > [CI/Python] Have automatized tests for S3 > - > > Key: ARROW-4208 > URL: https://issues.apache.org/jira/browse/ARROW-4208 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Rok Mihevc >Priority: Major > Labels: filesystem, s3 > Fix For: 1.0.0 > > > Currently We don't run S3 integration tests regularly. > Possible solutions: > - mock it within python/pytest > - simply run the s3 tests with an S3 credential provided > - create a hdfs-integration like docker-compose setup and run an S3 mock > server (e.g.: https://github.com/adobe/S3Mock, > https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, > https://github.com/jserver/mock-s3) > For more see discussion https://github.com/apache/arrow/pull/3286 -- This message was sent by Atlassian Jira (v8.3.2#803003)