[jira] [Created] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18371:
--

 Summary: [C++] Expose *FromJSON helpers
 Key: ARROW-18371
 URL: https://issues.apache.org/jira/browse/ARROW-18371
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


{Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18365) [C++][Parquet] Optimize DELTA_BINARY_PACKED encoding and decoding

2022-11-18 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18365:
--

 Summary: [C++][Parquet] Optimize DELTA_BINARY_PACKED encoding and 
decoding
 Key: ARROW-18365
 URL: https://issues.apache.org/jira/browse/ARROW-18365
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


[As suggested 
here|https://github.com/apache/arrow/pull/14191#discussion_r1019762308] simd 
approach such as 
[FastDifferentialCoding|https://github.com/lemire/FastDifferentialCoding] could 
be used to speed up encoding and decoding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18344) [C++] Use input pre-sortedness to create sorted table with ConcatenateTables

2022-11-16 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18344:
--

 Summary: [C++] Use input pre-sortedness to create sorted table 
with ConcatenateTables
 Key: ARROW-18344
 URL: https://issues.apache.org/jira/browse/ARROW-18344
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


In case of concatenating large sorted tables (e.g. sorted timeseries data) the 
resulting table is no longer sorted. However the input sortedness can be used 
to significantly speed up post concatenation sorting. A potential API could be 
to add ConcatenateTablesOptions.inputs_sorted and implement the logic in 
ConcatenateTables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18342) [C++] AsofJoinNode support for Boolean data field

2022-11-16 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18342:
--

 Summary: [C++] AsofJoinNode support for Boolean data field
 Key: ARROW-18342
 URL: https://issues.apache.org/jira/browse/ARROW-18342
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


This is to add boolean data field support to asof join as proposed here: 
https://github.com/apache/arrow/pull/14485



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18278) [Java] Maven generate-libs-jni-macos-linux on M1 fails due to cmake error

2022-11-07 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18278:
--

 Summary: [Java] Maven generate-libs-jni-macos-linux on M1 fails 
due to cmake error
 Key: ARROW-18278
 URL: https://issues.apache.org/jira/browse/ARROW-18278
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Rok Mihevc


When building with maven on M1 [as per 
docs|https://arrow.apache.org/docs/dev/developers/java/building.html#id3]:
{code:bash}
mvn clean install
mvn generate-resources -Pgenerate-libs-jni-macos-linux -N
mvn -Darrow.cpp.build.dir=/arrow/java-dist/lib/ -Parrow-jni 
clean install
{code}
I get the following error:
{code:bash}
[INFO] --- exec-maven-plugin:3.1.0:exec (jni-cmake) @ arrow-java-root ---
-- Building using CMake version: 3.24.2
-- The C compiler identification is AppleClang 14.0.0.1429
-- The CXX compiler identification is AppleClang 14.0.0.1429
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc 
- skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: 
/Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Java: 
/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/bin/java (found 
version "11.0.16") 
-- Found JNI: 
/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/include  found 
components: AWT JVM 
CMake Error at dataset/CMakeLists.txt:18 (find_package):
  By not providing "FindArrowDataset.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "ArrowDataset", but CMake did not find one.

  Could not find a package configuration file provided by "ArrowDataset" with
  any of the following names:

ArrowDatasetConfig.cmake
arrowdataset-config.cmake

  Add the installation prefix of "ArrowDataset" to CMAKE_PREFIX_PATH or set
  "ArrowDataset_DIR" to a directory containing one of the above files.  If
  "ArrowDataset" provides a separate development package or SDK, be sure it
  has been installed.


-- Configuring incomplete, errors occurred!
See also "/Users/rok/Documents/repos/arrow/java-jni/CMakeFiles/CMakeOutput.log".
See also "/Users/rok/Documents/repos/arrow/java-jni/CMakeFiles/CMakeError.log".
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit 
value: 1)
at org.apache.commons.exec.DefaultExecutor.executeInternal 
(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.execute 
(DefaultExecutor.java:166)
at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:1000)
at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:947)
at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:471)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
(MojoExecutor.java:370)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
(MojoExecutor.java:351)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:171)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:163)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(Launcher.java:225)

[jira] [Created] (ARROW-18042) [Java] Distribute Apple M1 compatible JNI libraries via mavencentral

2022-10-13 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18042:
--

 Summary: [Java] Distribute Apple M1 compatible JNI libraries via 
mavencentral
 Key: ARROW-18042
 URL: https://issues.apache.org/jira/browse/ARROW-18042
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Affects Versions: 9.0.0
Reporter: Rok Mihevc


Currently JNI libraries need to be built locally to be usable on Apple silicon. 
We should build and distribute compatible libraries via mavencentral.

@dsusanibara @lidavidm

Also see ARROW-17267 and ARROW-16608



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17935) [C++] Kernel to convert timestamp with timezone to wall time

2022-10-04 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17935:
--

 Summary: [C++] Kernel to convert timestamp with timezone to wall 
time
 Key: ARROW-17935
 URL: https://issues.apache.org/jira/browse/ARROW-17935
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


We have `assume_timezone` to go from wall time timestamp to timestamp with a 
timezone. We might want a reverse operation to go from timestamp with a 
timezone to wall time.

This is not needed for computation within Arrow, but would be needed if an 
application outsides of Arrow consumes wall time. E.g.: 
https://stackoverflow.com/questions/73275465/how-to-keep-original-datatime-in-pyarrow-table/73276431



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17918) [Python] ExtensionArray.__getitem__ is not called if called from StructArray

2022-10-03 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17918:
--

 Summary: [Python] ExtensionArray.__getitem__ is not called if 
called from StructArray
 Key: ARROW-17918
 URL: https://issues.apache.org/jira/browse/ARROW-17918
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Rok Mihevc


It seems that when getting a value from a StructScalar extension information is 
lost. See:


{code:python}
import pyarrow as pa

class ExampleScalar(pa.ExtensionScalar):
def as_py(self):
print("ExampleScalar.as_py -> {self.value.as_py()}")
return self.value.as_py()

class ExampleArray(pa.ExtensionArray):
def __getitem__(self, item):
return f"ExampleArray.__getitem__[{item}] -> {self.storage[item]}"
def __arrow_ext_scalar_class__(self):
return ExampleScalar

class ExampleType(pa.ExtensionType):
def __init__(self):
pa.ExtensionType.__init__(self, pa.int64(), "ExampleExtensionType")
def __arrow_ext_serialize__(self):
return b""
def __arrow_ext_class__(self):
return ExampleArray

example_type = ExampleType()
arr = pa.array([1, 2, 3])
example_array = pa.ExtensionArray.from_storage(example_type, arr)
example_array2 = pa.StructArray.from_arrays([example_array, arr], ["a", "b"])

print("\nExample 1\n=")
print(example_array[0])
print(example_array.type)
print(type(example_array[0]))

print("\nExample 2\n=")
print(example_array2[0])
print(example_array2[0].type)
print(example_array2[0]["a"])
print(example_array2[0]["a"].type)
{code}

Returns:

{code:python}
Example 1
=
ExampleArray.__getitem__[0] -> 1
extension>


Example 2
=
[('a', 1), ('b', 1)]
struct>, b: int64>
1
extension>
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17799) [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer

2022-09-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17799:
--

 Summary: [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to 
Parquet writer
 Key: ARROW-17799
 URL: https://issues.apache.org/jira/browse/ARROW-17799
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


We need to add DELTA_LENGTH_BYTE_ARRAY encoder to implement DELTA_BYTE_ARRAY 
encoder (ARROW-17619).
ARROW-13388 already implemented DELTA_LENGTH_BYTE_ARRAY decoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17798) [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer

2022-09-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17798:
--

 Summary: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet 
writer
 Key: ARROW-17798
 URL: https://issues.apache.org/jira/browse/ARROW-17798
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


We need to add DELTA_BINARY_PACKED encoder to implement DELTA_BYTE_ARRAY 
encoder (ARROW-17619).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17685) [Python] Expose jemalloc statistics for logging

2022-09-12 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17685:
--

 Summary: [Python] Expose jemalloc statistics for logging
 Key: ARROW-17685
 URL: https://issues.apache.org/jira/browse/ARROW-17685
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


Expose C++ jemalloc stats added by ARROW-16981 in Python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17619) [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer

2022-09-05 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17619:
--

 Summary: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet 
writer
 Key: ARROW-17619
 URL: https://issues.apache.org/jira/browse/ARROW-17619
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


PARQUET-492 added DELTA_BYTE_ARRAY decoder, but we don't have an encoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17411) [C++][Python][Doc] Document that order is not preserved when writing dataset with use_threads=True

2022-08-15 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17411:
--

 Summary: [C++][Python][Doc] Document that order is not preserved 
when writing dataset with use_threads=True
 Key: ARROW-17411
 URL: https://issues.apache.org/jira/browse/ARROW-17411
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


Current behaviour is surprising and not documented.
See: ARROW-16506 and ARROW-10883



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17407) [Doc][C++][FlightRPC] Flight/gRPC best practices

2022-08-13 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17407:
--

 Summary: [Doc][C++][FlightRPC] Flight/gRPC best practices
 Key: ARROW-17407
 URL: https://issues.apache.org/jira/browse/ARROW-17407
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation, FlightRPC
Reporter: Rok Mihevc


We want to provide best practices and debugging section for [flight 
docs|https://arrow.apache.org/docs/cpp/flight.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17398) [R] Add support for %Z to strptime

2022-08-12 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17398:
--

 Summary: [R] Add support for %Z to strptime 
 Key: ARROW-17398
 URL: https://issues.apache.org/jira/browse/ARROW-17398
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Rok Mihevc


While lubridate does not support %Z flag for strptime Arrow could.

Changes to C++ kernels might be required for support on all platforms, but that 
shouldn't block implementation as kStrptimeSupportsZone flag can be used, [see 
proposal|https://github.com/apache/arrow/pull/13854#issuecomment-1212694663].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17266) [Doc] Java nightlies file prefix changed

2022-07-30 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17266:
--

 Summary: [Doc] Java nightlies file prefix changed
 Key: ARROW-17266
 URL: https://issues.apache.org/jira/browse/ARROW-17266
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: Rok Mihevc


As per [Arrow 
docs|https://arrow.apache.org/docs/dev/java/install.html#installing-manually] 
Java nightlies are at: 
[https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-19-0-github-java-jars]

However file prefix changed and new url format is:
[https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars]

Since it's hard to search github for old releases it would be good to change 
the url in the docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17218) [C++][R] Strptime should detect invalid formats

2022-07-26 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17218:
--

 Summary: [C++][R] Strptime should detect invalid formats
 Key: ARROW-17218
 URL: https://issues.apache.org/jira/browse/ARROW-17218
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


As [discussed 
here|https://github.com/apache/arrow/pull/13506#pullrequestreview-1048946393] 
we want C++ to report invalid strptime formats to avoid having to implement 
this logic in multiple places. Once that is in place (if it's not yet) we 
should add tests for this error in R. Currently an invalid format used in R 
will simply return an array of NAs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17197) [R] floor_date/ceiling_date lubridate comparison tests failing on macOS

2022-07-25 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17197:
--

 Summary: [R] floor_date/ceiling_date lubridate comparison tests 
failing on macOS
 Key: ARROW-17197
 URL: https://issues.apache.org/jira/browse/ARROW-17197
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Rok Mihevc
 Fix For: 9.0.0


We observed failing tests on local machines and [in 
CI|https://github.com/ursacomputing/crossbow/runs/7460282895?check_suite_focus=true#step:10:228]
 where timezoned timestamps are rounded to subsecond, second and minute units.
Tests fail when comparing our result to lubridate's, [however it seems the 
issue is on lubridate's 
side|https://github.com/apache/arrow/pull/12154/files#diff-d405691ec7dd30bdf039b63136e5aac3c34cea96d8ff532485d1faea7f2caaacR2815-R2823].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17193) [C++] Building GCS and tests on M1 MacOS 12.05 is failing.

2022-07-23 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17193:
--

 Summary: [C++] Building GCS and tests on M1 MacOS 12.05 is failing.
 Key: ARROW-17193
 URL: https://issues.apache.org/jira/browse/ARROW-17193
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 8.0.0
Reporter: Rok Mihevc


Building GCS and tests on M1 MacOS 12.05 with dependencies installed with 
homebrew is failing.

{code:bash}
cmake \
-GNinja \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_PYTHON=ON \
-DARROW_COMPUTE=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_CSV=ON \
-DARROW_GCS=ON \
-DARROW_INSTALL_NAME_RPATH=OFF \
-DARROW_BUILD_TESTS=ON \
-DCMAKE_CXX_STANDARD=17 \
..
{code}

Building errors with:

{noformat}
Undefined symbols for architecture arm64:
  "absl::lts_20220623::FormatTime(std::__1::basic_string_view >, absl::lts_20220623::Time, 
absl::lts_20220623::TimeZone)", referenced from:
  arrow::fs::(anonymous 
namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in 
gcsfs_test.cc.o
  
"absl::lts_20220623::FromChrono(std::__1::chrono::time_point > > 
const&)", referenced from:
  arrow::fs::(anonymous 
namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in 
gcsfs_test.cc.o
  "absl::lts_20220623::RFC3339_full", referenced from:
  arrow::fs::(anonymous 
namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
gcsfs_test.cc.o
  arrow::fs::(anonymous 
namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in 
gcsfs_test.cc.o
  "absl::lts_20220623::time_internal::cctz::utc_time_zone()", referenced from:
  arrow::fs::(anonymous 
namespace)::GcsIntegrationTest_OpenInputStreamReadMetadata_Test::TestBody() in 
gcsfs_test.cc.o
  "absl::lts_20220623::ToDoubleSeconds(absl::lts_20220623::Duration)", 
referenced from:
  arrow::fs::(anonymous 
namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
gcsfs_test.cc.o
  "absl::lts_20220623::Duration::operator-=(absl::lts_20220623::Duration)", 
referenced from:
  arrow::fs::(anonymous 
namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
gcsfs_test.cc.o
  "absl::lts_20220623::ParseTime(std::__1::basic_string_view >, std::__1::basic_string_view >, absl::lts_20220623::Time*, 
std::__1::basic_string, 
std::__1::allocator >*)", referenced from:
  arrow::fs::(anonymous 
namespace)::GcsFileSystem_ObjectMetadataRoundtrip_Test::TestBody() in 
gcsfs_test.cc.o
{noformat}


Dependencies  installed with:
{noformat}
brew update && brew bundle --file=cpp/Brewfile
{noformat}

See https://github.com/apache/arrow/pull/13681#issuecomment-1193241547 and  
https://github.com/apache/arrow/pull/13407



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17166) [R] [CI] Remove ENV TZ from docker files

2022-07-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17166:
--

 Summary: [R] [CI] Remove ENV TZ from docker files
 Key: ARROW-17166
 URL: https://issues.apache.org/jira/browse/ARROW-17166
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Rok Mihevc
 Fix For: 9.0.0


We have noticed R CI job (AMD64 Ubuntu 20.04 R 4.2 Force-Tests true) failing on 
master: 
[1|https://github.com/apache/arrow/runs/7424773120?check_suite_focus=true#step:7:5547],
 
[2|https://github.com/apache/arrow/runs/7431821192?check_suite_focus=true#step:7:5804],
 
[3|https://github.com/apache/arrow/runs/7445803518?check_suite_focus=true#step:7:16305]
with:
{code:java}
Start test: array uses local timezone for POSIXct without timezone
  test-Array.R:269:3 [success]
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17147) [R] parse_date_time should support locale parameter

2022-07-20 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17147:
--

 Summary: [R] parse_date_time should support locale parameter
 Key: ARROW-17147
 URL: https://issues.apache.org/jira/browse/ARROW-17147
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Rok Mihevc


See [discussion 
here|https://github.com/apache/arrow/pull/13627#discussion_r924875872].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17146) [R] parse_date_time should support quiet = FALSE

2022-07-20 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17146:
--

 Summary: [R] parse_date_time should support quiet = FALSE
 Key: ARROW-17146
 URL: https://issues.apache.org/jira/browse/ARROW-17146
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Rok Mihevc


See [discussion 
here|https://github.com/apache/arrow/pull/13627#discussion_r924875872].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17141) [C++] Enable selecting nested fields in StructArray with field path

2022-07-20 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17141:
--

 Summary: [C++] Enable selecting nested fields in StructArray with 
field path
 Key: ARROW-17141
 URL: https://issues.apache.org/jira/browse/ARROW-17141
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


Currently selecting a nested field in a StructArray requires multiple selects 
or flattening of schema. It would be more user friendly to provide a field path 
e.g.: field_in_top_struct.field_in_substruct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17132) [R] Mutate in compare_dplyr_binding returns wrong type

2022-07-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17132:
--

 Summary: [R] Mutate in compare_dplyr_binding returns wrong type
 Key: ARROW-17132
 URL: https://issues.apache.org/jira/browse/ARROW-17132
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Rok Mihevc


The following:
{code:r}
df <- tibble::tibble(
  time = as.POSIXct(seq(as.Date("1999-12-31", tz = "UTC"), 
as.Date("2001-01-01", tz = "UTC"), by = "day"))
)

compare_dplyr_binding(
  .input %>%
mutate(x = yday(time)) %>%
collect(),
  df
)
{code}

Fails with:

{code:bash}
Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp
`object` (`actual`) not equal to `expected` (`expected`).

`attr(actual$time, 'tzone')` is a character vector ('UTC')
`attr(expected$time, 'tzone')` is absent
Backtrace:
 1. arrow:::compare_dplyr_binding(...)
  at test-dplyr-funcs-datetime.R:574:2
 2. arrow:::expect_equal(via_batch, expected, ...)
  at tests/testthat/helper-expectation.R:115:4
 3. testthat::expect_equal(...)
  at tests/testthat/helper-expectation.R:42:4

Failure (test-dplyr-funcs-datetime.R:574:3): extract wday from timestamp
`object` (`actual`) not equal to `expected` (`expected`).

`attr(actual$time, 'tzone')` is a character vector ('UTC')
`attr(expected$time, 'tzone')` is absent
Backtrace:
 1. arrow:::compare_dplyr_binding(...)
  at test-dplyr-funcs-datetime.R:574:2
 2. arrow:::expect_equal(via_table, expected, ...)
  at tests/testthat/helper-expectation.R:129:4
 3. testthat::expect_equal(...)
  at tests/testthat/helper-expectation.R:42:4
{code}

This also happens for qday and probably other functions where input is temporal 
and output is numeric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17065) [Python] Allow using subclassed ExtensionScalar in ExtensionType

2022-07-13 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17065:
--

 Summary: [Python] Allow using subclassed ExtensionScalar in 
ExtensionType
 Key: ARROW-17065
 URL: https://issues.apache.org/jira/browse/ARROW-17065
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Rok Mihevc
 Fix For: 9.0.0


This is a follow-up to ARROW-13612.

[See 
discussion.|https://github.com/apache/arrow/pull/13454#issuecomment-1177140141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16981) [C++] Expose jemalloc statistics for logging

2022-07-05 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16981:
--

 Summary: [C++] Expose jemalloc statistics for logging
 Key: ARROW-16981
 URL: https://issues.apache.org/jira/browse/ARROW-16981
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc
Assignee: Rok Mihevc


This would enable us to log memory usage and diagnose out of memory issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16932) [C++] Rounding RoundTemporalOptions.calendar_based_origin doesn't correctly offset non-UTC results

2022-06-29 Thread Rok Mihevc (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Rok Mihevc created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16932  
 
 
  [C++] Rounding RoundTemporalOptions.calendar_based_origin doesn't correctly offset non-UTC results   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Rok Mihevc  
 
 
Components: 
 C++  
 
 
Created: 
 29/Jun/22 11:49  
 
 
Labels: 
 kernel c++ timestamp  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Rok Mihevc  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 
  

[jira] [Created] (ARROW-16618) [C++][Python] strptime fails to parse with %p on Windows

2022-05-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16618:
--

 Summary: [C++][Python] strptime fails to parse with %p on Windows
 Key: ARROW-16618
 URL: https://issues.apache.org/jira/browse/ARROW-16618
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Rok Mihevc


As reported in https://github.com/apache/arrow/issues/13111 parsing a timestamp 
with %p  will fail on Windows. This is probably due to issues with vendored 
strptime on Windows locales.
We should explore which flags can be enabled and how. Strptime tests suite 
should be expanded 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string_test.cc#L1842-L1890.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16536) [Doc][Cookbook][Flight] Find client address from ArrowFlightServer

2022-05-11 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16536:
--

 Summary: [Doc][Cookbook][Flight] Find client address from 
ArrowFlightServer
 Key: ARROW-16536
 URL: https://issues.apache.org/jira/browse/ARROW-16536
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


We want a cookbook entry for Python/C++/Java describing how to get Arrow Flight 
Server client's address.
See: [Java|https://stackoverflow.com/a/36140002/262727]
[Python 
|https://arrow.apache.org/docs/python/generated/pyarrow.flight.ServerCallContext.html#pyarrow.flight.ServerCallContext.peer]
[C++|https://arrow.apache.org/docs/cpp/api/flight.html#_CPPv4NK5arrow6flight17ServerCallContext4peerEv]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16535) [C++] Temporal floor/ceil/round should have settable origin unit

2022-05-11 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16535:
--

 Summary: [C++] Temporal floor/ceil/round should have settable 
origin unit
 Key: ARROW-16535
 URL: https://issues.apache.org/jira/browse/ARROW-16535
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Rok Mihevc


Temporal rounding kernels (will) allow setting of rounding origin to a greater 
unit. This could be made more flexible by introducing a `greater_unit` 
parameter which would let user select the unit serving as origin. See [this 
discussion|https://github.com/apache/arrow/pull/12657#issuecomment-1119580484] 
for more context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16147) [C++] ParquetFileWriter doesn't call sink_.Close when using GcsRandomAccessFile

2022-04-07 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16147:
--

 Summary: [C++] ParquetFileWriter doesn't call sink_.Close when 
using GcsRandomAccessFile
 Key: ARROW-16147
 URL: https://issues.apache.org/jira/browse/ARROW-16147
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Rok Mihevc






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16142) [C++] Temporal floor/ceil/round returns incorrect results for date32 and time32 inputs

2022-04-07 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16142:
--

 Summary: [C++] Temporal floor/ceil/round returns incorrect results 
for date32 and time32 inputs
 Key: ARROW-16142
 URL: https://issues.apache.org/jira/browse/ARROW-16142
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Rok Mihevc


Temporal rounding flooring seem to interpret 32 bit input arrays as 64 bit 
arrays. The following test:
{code:c++}
TEST_F(ScalarTemporalTest, TestCeilFloorRoundTemporalDate) {
  RoundTemporalOptions round_to_2_hours = RoundTemporalOptions(2, 
CalendarUnit::HOUR);
  const char* date32s = R"([0, 11016, -25932, null])";
  const char* date64s = R"([0, 95178240, -224052480, null])";
  auto dates32 = ArrayFromJSON(date32(), date32s);
  auto dates64 = ArrayFromJSON(date64(), date64s);
  CheckScalarUnary("ceil_temporal", dates64, dates64, _to_2_hours);
  CheckScalarUnary("floor_temporal", dates64, dates64, _to_2_hours);
  CheckScalarUnary("round_temporal", dates64, dates64, _to_2_hours);

  CheckScalarUnary("ceil_temporal", dates32, dates32, _to_2_hours);
  CheckScalarUnary("floor_temporal", dates32, dates32, _to_2_hours);
  CheckScalarUnary("round_temporal", dates32, dates32, _to_2_hours);

  const char* times_s = R"([0, 7200, null])";
  const char* times_ms = R"([0, 720, null])";
  const char* times_us = R"([0, 72, null])";
  const char* times_ns = R"([0, 72000, null])";

  auto arr_s = ArrayFromJSON(time32(TimeUnit::SECOND), times_s);
  auto arr_ms = ArrayFromJSON(time32(TimeUnit::MILLI), times_ms);
  auto arr_us = ArrayFromJSON(time64(TimeUnit::MICRO), times_us);
  auto arr_ns = ArrayFromJSON(time64(TimeUnit::NANO), times_ns);

  CheckScalarUnary("ceil_temporal", arr_s, arr_s, _to_2_hours);
  CheckScalarUnary("ceil_temporal", arr_ms, arr_ms, _to_2_hours);
  CheckScalarUnary("ceil_temporal", arr_us, arr_us, _to_2_hours);
  CheckScalarUnary("ceil_temporal", arr_ns, arr_ns, _to_2_hours);
}
{code}

Returns:
{code:bash}
Got:
  [
[
  1970-01-01,
  1970-01-01,
  2000-02-29,
  null
]
  ]
Expected:
  [
[
  1970-01-01
],
[
  2000-02-29,
  1899-01-01,
  null
]
  ]
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16110) [C++] GcsFileSystem::Make ignores IOContext

2022-04-04 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-16110:
--

 Summary: [C++] GcsFileSystem::Make ignores IOContext
 Key: ARROW-16110
 URL: https://issues.apache.org/jira/browse/ARROW-16110
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Rok Mihevc


Passed IO context is ignored and default context is used. See current function:

{code:cpp}
std::shared_ptr GcsFileSystem::Make(const GcsOptions& options,
   const io::IOContext& 
context) {
  // Cannot use `std::make_shared<>` as the constructor is private.
  return std::shared_ptr(
  new GcsFileSystem(options, io::default_io_context()));
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15942) [C++] RecordBatch::ValidateFull fails on nested StructArray

2022-03-15 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15942:
--

 Summary: [C++] RecordBatch::ValidateFull fails on nested 
StructArray
 Key: ARROW-15942
 URL: https://issues.apache.org/jira/browse/ARROW-15942
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Rok Mihevc


ValidateFull appears to discard the outermost field of nested schema. The 
following example passes:

{code:bash}
diff --git a/cpp/src/arrow/array/array_struct_test.cc 
b/cpp/src/arrow/array/array_struct_test.cc
index 318c83860..6a8896ca9 100644
--- a/cpp/src/arrow/array/array_struct_test.cc
+++ b/cpp/src/arrow/array/array_struct_test.cc
@@ -15,6 +15,8 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#include 
+
 #include 
 
 #include 
@@ -696,4 +698,20 @@ TEST(TestFieldRef, GetChildren) {
   AssertArraysEqual(*a, *expected_a);
 }
 
+TEST(TestFieldRef, TestValidateFullRecordBatch) {
+  auto struct_array =
+  ArrayFromJSON(struct_({field("a", struct_({field("b", float64())}))}), 
R"([
+{"a": {"b": 6.125}},
+{"a": {"b": 0.0}},
+{"a": {"b": -1}}
+  ])");
+
+  auto schema1 = arrow::schema({field("x", struct_({field("a", 
struct_({field("b", float64())}))}))});
+  auto schema2 = arrow::schema({field("a", struct_({field("b", float64())}))});
+  auto record_batch1 = arrow::RecordBatch::Make(schema1, 3, {struct_array});
+  auto record_batch2 = arrow::RecordBatch::Make(schema2, 3, {struct_array});
+  ASSERT_OK(record_batch1->ValidateFull());
+  ASSERT_NOT_OK(record_batch2->ValidateFull());
+}
+
{code}

Is this expected behaviour?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15894) [C++] Strptime issues umbrella

2022-03-09 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15894:
--

 Summary: [C++] Strptime issues umbrella
 Key: ARROW-15894
 URL: https://issues.apache.org/jira/browse/ARROW-15894
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


This is to make strptime efforts more visible



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15859) [C++] Add nightly test for static build with arrow_flight_static and arrow_bundled_dependencies

2022-03-07 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15859:
--

 Summary: [C++] Add nightly test for static build with 
arrow_flight_static and arrow_bundled_dependencies
 Key: ARROW-15859
 URL: https://issues.apache.org/jira/browse/ARROW-15859
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Rok Mihevc


Due to abseil dependencies static builds with arrow_bundled_dependencies are 
brittle. We could test them nightly with the example proposed in ARROW-14708.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15787) [C++] Temporal floor/ceil/round kernels could be optimised with templating

2022-02-25 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15787:
--

 Summary: [C++] Temporal floor/ceil/round kernels could be 
optimised with templating
 Key: ARROW-15787
 URL: https://issues.apache.org/jira/browse/ARROW-15787
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


[CeilTemporal, FloorTemporal, 
RoundTemporal|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc#L728-L980]
 kernels could probably be templated in a clean way. They also execute a switch 
statement for every call instead of creating an operator at kernel call time 
and only running that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15771) [C++][Compute] Add window join to execution engine

2022-02-23 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15771:
--

 Summary: [C++][Compute] Add window join to execution engine
 Key: ARROW-15771
 URL: https://issues.apache.org/jira/browse/ARROW-15771
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


We would want to support window joins with as-of support.
See https://github.com/substrait-io/substrait/issues/3 for more.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15764) [C++][FlightRPC] Optionally cache serialized ListFlights serverside

2022-02-23 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15764:
--

 Summary: [C++][FlightRPC] Optionally cache serialized ListFlights 
serverside
 Key: ARROW-15764
 URL: https://issues.apache.org/jira/browse/ARROW-15764
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


ListFlights serializes flights each time it is called. If we have many flights 
and ListFlights is called often this can produce a significant load on the 
server. We could have an optional cache server side to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15680) [C++] Temporal floor/ceil/round should accept week_start when rounding to multiple of week

2022-02-14 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15680:
--

 Summary: [C++] Temporal floor/ceil/round  should accept week_start 
when rounding to multiple of week
 Key: ARROW-15680
 URL: https://issues.apache.org/jira/browse/ARROW-15680
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


See ARROW-14821 and the [related PR|https://github.com/apache/arrow/pull/12154].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15666) [C++] Add format inference option to StrptimeOptions

2022-02-11 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15666:
--

 Summary: [C++] Add format inference option to StrptimeOptions
 Key: ARROW-15666
 URL: https://issues.apache.org/jira/browse/ARROW-15666
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


We want to have an option to infer timestamp format.

See 
[pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html]
 and lubridate 
[parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html]
 for examples.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15665) [C++] Add error handling option to StrptimeOptions

2022-02-11 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15665:
--

 Summary: [C++] Add error handling option to StrptimeOptions
 Key: ARROW-15665
 URL: https://issues.apache.org/jira/browse/ARROW-15665
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


We want to have an option to either raise, ignore or return NA in case of 
format mismatch.

See 
[pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html]
 and lubridate 
[parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html]
 for examples.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15619) [C++] Temporal component extraction function for extracting is_leap_year indicator

2022-02-08 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15619:
--

 Summary: [C++] Temporal component extraction function for 
extracting is_leap_year indicator
 Key: ARROW-15619
 URL: https://issues.apache.org/jira/browse/ARROW-15619
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


This should return as 
[is_leap_year|https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.is_leap_year.html]
 from Pandas and 
[leap_year|https://lubridate.tidyverse.org/reference/leap_year.html] from 
lubridate.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15549) [Java] gRPC not available on M1

2022-02-03 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15549:
--

 Summary: [Java] gRPC not available on M1
 Key: ARROW-15549
 URL: https://issues.apache.org/jira/browse/ARROW-15549
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Rok Mihevc


When building on M1 gRPC is not found. It can be [manually 
downloaded|https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.41.0/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe]
 and installed:

{code:bash}
mvn install:install-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java 
-Dversion=1.41.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe 
-Dfile=/Users/rok/Downloads/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe
{code}

But perhaps that could be fixed as suggested here: 
https://github.com/grpc/grpc-java/issues/7690



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15473) [C++][FlightRPC] Expose a way to terminate DoExchange stream client side

2022-01-26 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15473:
--

 Summary: [C++][FlightRPC] Expose a way to terminate DoExchange 
stream client side
 Key: ARROW-15473
 URL: https://issues.apache.org/jira/browse/ARROW-15473
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, FlightRPC
Reporter: Rok Mihevc


We want a mechanism to close DoExchange streams from client side in case of 
long running connections. This would be handy for testing and in case e.g. user 
wants to disconnect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15251) [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-01-04 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15251:
--

 Summary: [C++] Temporal floor/ceil/round handle 
ambiguous/nonexistent local time
 Key: ARROW-15251
 URL: https://issues.apache.org/jira/browse/ARROW-15251
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


ARROW-14822 enables temporal round/floor/ceil and raises on 
ambiguous/nonexistent local time. We should allow users to choose different 
behaviours e.g.: raise, earliest, latest, etc.
See AssumeTimezoneOptions for example.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15250) [Python][R] Temporal floor/ceil/round for should accept frequency string

2022-01-04 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-15250:
--

 Summary: [Python][R] Temporal floor/ceil/round for should accept 
frequency string
 Key: ARROW-15250
 URL: https://issues.apache.org/jira/browse/ARROW-15250
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python, R
Reporter: Rok Mihevc


More user-friendly rounding period input can be supported. See [Pandas 
to_offset|https://github.com/pandas-dev/pandas/blob/a6c1f6cccee6bbccfb29488a94664ed07db024d9/pandas/_libs/tslibs/offsets.pyx#L3575-L3679]
 and [lubridate's period|https://lubridate.tidyverse.org/reference/period.html].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-13684) [C++][Compute] Strftime kernel follow-up

2021-08-20 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-13684:
--

 Summary: [C++][Compute] Strftime kernel follow-up
 Key: ARROW-13684
 URL: https://issues.apache.org/jira/browse/ARROW-13684
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc
Assignee: Rok Mihevc
 Fix For: 6.0.0


As per ARROW-13174 
[comments|https://github.com/apache/arrow/pull/10647#issuecomment-901783928] we 
should:
 * Correct default format string for non-UTC timestamps
 * Allow non-zoned timestamps to be printed
 * Better document %S flag behavior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13347) [C++][Compute] Add option to return null values for nonexistent and ambiguous times

2021-07-15 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-13347:
--

 Summary: [C++][Compute] Add option to return null values for 
nonexistent and ambiguous times
 Key: ARROW-13347
 URL: https://issues.apache.org/jira/browse/ARROW-13347
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


ARROW-13033 Implements TzLocalize kernel and can handle ambiguous and 
nonexistent times by raising or shifting backwards or forwards to first valid 
local time.
However we might want to be able to return null in these cases.
We could implement new flags to do so:
{{compute::TemporalLocalizationOptions::Nonexistent::NONEXISTENT_IGNORE}} and 
{{compute::TemporalLocalizationOptions::Ambiguous::AMBIGUOUS_IGNORE}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13174) [C+][Compute] Add strftime kernel

2021-06-25 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-13174:
--

 Summary: [C+][Compute] Add strftime kernel
 Key: ARROW-13174
 URL: https://issues.apache.org/jira/browse/ARROW-13174
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


To express timestamps with arbitrary format we require a strftime kernel.
See [comments 
here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13168) [C++] Timezone database configuration and access

2021-06-24 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-13168:
--

 Summary: [C++] Timezone database configuration and access
 Key: ARROW-13168
 URL: https://issues.apache.org/jira/browse/ARROW-13168
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Rok Mihevc


Note: currently timezone database is not available on windows so timezone aware 
operations will fail.

We're using tz.h library which needs an updated timezone database to correctly 
handle timezoned timestamps. See [installation 
instructions|https://howardhinnant.github.io/date/tz.html#Installation].

We have the following options for getting a timezone database:
 # local (non-windows) OS timezone database - no work required.
 # arrow bundled folder - we could bundle the database at build time for 
windows. Database would slowly go stale.
 # download it from IANA Time Zone Database at runtime - tz.h gets the database 
at runtime, but curl (and 7-zip on windows) are required.
 # local user-provided folder - user could provide a location at buildtime. 
Nice to have.
 # allow runtime configuration - at runtime say: "the tzdata can be found at 
this location"

For more context see: [ARROW-12980|https://github.com/apache/arrow/pull/10457]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12980) [C++] Kernels to extract datetime components should be timezone aware

2021-06-04 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-12980:
--

 Summary: [C++] Kernels to extract datetime components should be 
timezone aware
 Key: ARROW-12980
 URL: https://issues.apache.org/jira/browse/ARROW-12980
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc
Assignee: Rok Mihevc


As followup to ARROW-11759 datetime component extraction kernels should return 
localized time components if timezone property is present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12820) [C++] Strptime ignores timezone information

2021-05-18 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-12820:
--

 Summary: [C++] Strptime ignores timezone information
 Key: ARROW-12820
 URL: https://issues.apache.org/jira/browse/ARROW-12820
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Rok Mihevc


ParseTimestampStrptime currently ignores the timezone information. So 
timestamps are read as if they were all in UTC. This can be unexpected. See 
[discussion|https://github.com/apache/arrow/pull/10334#discussion_r634269138] 
for details.
It would be useful to either capture timezone information or convert timestamp 
to UTC when parsing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12499) [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-04-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-12499:
--

 Summary: [C++][Compute] Add ScalarAggregateOptions to Any and All 
kernels
 Key: ARROW-12499
 URL: https://issues.apache.org/jira/browse/ARROW-12499
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python, R
Reporter: Rok Mihevc
Assignee: Rok Mihevc


Follow up to ARROW-9054 and ARROW-12185 - see 
[comment|https://github.com/apache/arrow/pull/10032#pullrequestreview-641468079].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12301) [C++][Compute] Use generic hash-aggregate for DictionaryArrays

2021-04-08 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-12301:
--

 Summary: [C++][Compute] Use generic hash-aggregate for 
DictionaryArrays
 Key: ARROW-12301
 URL: https://issues.apache.org/jira/browse/ARROW-12301
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc


When calculating unique for chunked DictionaryArrays we currently run through 
all chunks and unify their dictionaries and then collect chunk indices. We 
could avoid the dictionary unification by using a generic hash.

[See discussion here.|https://github.com/apache/arrow/pull/9683]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7297) [C++] Add value accessor in sparse tensor class

2020-04-24 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-7297:
-

Assignee: Rok Mihevc

> [C++] Add value accessor in sparse tensor class
> ---
>
> Key: ARROW-7297
> URL: https://issues.apache.org/jira/browse/ARROW-7297
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Major
>
> {{SparseTensor}} can have value accessor like {{Tensor::Value}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors

2020-04-02 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-8162:
--
Fix Version/s: (was: 1.0.0)
   0.17.0

> [Format][Python] Add serialization for CSF sparse tensors
> -
>
> Key: ARROW-8162
> URL: https://issues.apache.org/jira/browse/ARROW-8162
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is 
> complete serialization for CSF sparse tensors should be enabled in Python too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors

2020-03-19 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-8162:
--
Description: Once 
[ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is complete 
serialization for CSF sparse tensors should be enabled in Python too.  (was: 
Once [#ARROW-7428] is complete serialization for CSF sparse tensors should be 
enabled in Python too.)

> [Format][Python] Add serialization for CSF sparse tensors
> -
>
> Key: ARROW-8162
> URL: https://issues.apache.org/jira/browse/ARROW-8162
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
> Fix For: 1.0.0
>
>
> Once [ARROW-7428|https://issues.apache.org/jira/browse/ARROW-7428] is 
> complete serialization for CSF sparse tensors should be enabled in Python too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors

2020-03-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-8162:
-

 Summary: [Format][Python] Add serialization for CSF sparse tensors
 Key: ARROW-8162
 URL: https://issues.apache.org/jira/browse/ARROW-8162
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format, Python
Reporter: Rok Mihevc
Assignee: Rok Mihevc
 Fix For: 1.0.0


Once [#ARROW-7428] is complete serialization for CSF sparse tensors should be 
enabled in Python too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7428:
--
Fix Version/s: 1.0.0

> [Format][C++] Add serialization for CSF sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Component/s: (was: C++)

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7419:
--
Component/s: (was: C++)

> [Python] Support SparseCSCMatrix
> 
>
> Key: ARROW-7419
> URL: https://issues.apache.org/jira/browse/ARROW-7419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format, Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7419:
--
Fix Version/s: 1.0.0

> [Python] Support SparseCSCMatrix
> 
>
> Key: ARROW-7419
> URL: https://issues.apache.org/jira/browse/ARROW-7419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format, Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7428:
--
Component/s: Format
 C++

> [Format][C++] Add serialization for CSF sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7419:
--
Component/s: Format
 C++

> [Python] Support SparseCSCMatrix
> 
>
> Key: ARROW-7419
> URL: https://issues.apache.org/jira/browse/ARROW-7419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format, Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Fix Version/s: 1.0.0

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2020-03-07 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Component/s: Python
 Format
 C++

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format, Python
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2020-02-12 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Description: Once ARROW-4226 is complete we want to be able to use sparse 
CSF tensors from Python.  (was: Once ARROW-4225 is complete we want to be able 
to use sparse CSF tensors from Python.)

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>
> Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors

2020-01-25 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7428:
--
Priority: Minor  (was: Major)

> [Format][C++] Add serialization for CSF sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2020-01-25 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Priority: Minor  (was: Major)

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Minor
>
> Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix

2020-01-25 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7419:
--
Priority: Minor  (was: Major)

> [Python] Support SparseCSCMatrix
> 
>
> Key: ARROW-7419
> URL: https://issues.apache.org/jira/browse/ARROW-7419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors

2020-01-25 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7428:
--
Summary: [Format][C++] Add serialization for CSF sparse tensors  (was: 
[Format][C++] Add serialization for CSF and CSC sparse tensors)

> [Format][C++] Add serialization for CSF sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Description: Once ARROW-4225 is complete we want to be able to use sparse 
CSF tensors from Python.  (was: Once ARROW-4225 is complete we want to be able 
to use it from Python.)

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> Once ARROW-4225 is complete we want to be able to use sparse CSF tensors from 
> Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Description: Once ARROW-4225 is complete we want to be able to use it from 
Python.

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> Once ARROW-4225 is complete we want to be able to use it from Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7427:
--
Summary: [Python] Support SparseCSFTensor  (was: [Python] Support 
SparseCSFMatrix)

> [Python] Support SparseCSFTensor
> 
>
> Key: ARROW-7427
> URL: https://issues.apache.org/jira/browse/ARROW-7427
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF and CSC sparse tensors

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-7428:
--
Summary: [Format][C++] Add serialization for CSF and CSC sparse tensors  
(was: [C++] Add serialization for CSF and CSC sparse tensors)

> [Format][C++] Add serialization for CSF and CSC sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7428) [C++] Add serialization for CSF and CSC sparse tensors

2019-12-18 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-7428:
-

 Summary: [C++] Add serialization for CSF and CSC sparse tensors
 Key: ARROW-7428
 URL: https://issues.apache.org/jira/browse/ARROW-7428
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Rok Mihevc


Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7428) [C++] Add serialization for CSF and CSC sparse tensors

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-7428:
-

Assignee: Rok Mihevc

> [C++] Add serialization for CSF and CSC sparse tensors
> --
>
> Key: ARROW-7428
> URL: https://issues.apache.org/jira/browse/ARROW-7428
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> Once ARROW-4225 and ARROW-4226  are completed we should add serialization 
> support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7427) [Python] Support SparseCSFMatrix

2019-12-18 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-7427:
-

 Summary: [Python] Support SparseCSFMatrix
 Key: ARROW-7427
 URL: https://issues.apache.org/jira/browse/ARROW-7427
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Rok Mihevc
Assignee: Rok Mihevc






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7419) [Python] Support SparseCSCMatrix

2019-12-18 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-7419:
-

Assignee: Rok Mihevc

> [Python] Support SparseCSCMatrix
> 
>
> Key: ARROW-7419
> URL: https://issues.apache.org/jira/browse/ARROW-7419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5805) [Python] Dockerize (add to docker-compose) Python Travis CI job

2019-11-13 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc closed ARROW-5805.
-
Resolution: Won't Do

> [Python] Dockerize (add to docker-compose) Python Travis CI job
> ---
>
> Key: ARROW-5805
> URL: https://issues.apache.org/jira/browse/ARROW-5805
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/blob/master/.travis.yml#L118
> The existing Python Dockerfiles should be expanded to test all of the things 
> that are being tested currently in Travis CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5805) [Python] Dockerize (add to docker-compose) Python Travis CI job

2019-11-13 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973129#comment-16973129
 ] 

Rok Mihevc commented on ARROW-5805:
---

This is now solved with Github Actions.

> [Python] Dockerize (add to docker-compose) Python Travis CI job
> ---
>
> Key: ARROW-5805
> URL: https://issues.apache.org/jira/browse/ARROW-5805
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/blob/master/.travis.yml#L118
> The existing Python Dockerfiles should be expanded to test all of the things 
> that are being tested currently in Travis CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4224) [Python] Support integration with pydata/sparse library

2019-11-13 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4224:
--
Fix Version/s: 1.0.0

> [Python] Support integration with pydata/sparse library
> ---
>
> Key: ARROW-4224
> URL: https://issues.apache.org/jira/browse/ARROW-4224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available, sparse
> Fix For: 1.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> It would be great to support integration with pydata/sparse library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4223) [Python] Support scipy.sparse integration

2019-10-28 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4223:
--
Fix Version/s: 1.0.0

> [Python] Support scipy.sparse integration
> -
>
> Key: ARROW-4223
> URL: https://issues.apache.org/jira/browse/ARROW-4223
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available, sparse
> Fix For: 1.0.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> It would be great to support integration with scipy.sparse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942607#comment-16942607
 ] 

Rok Mihevc commented on ARROW-4226:
---

I was just reading it :)
I'll start working on this sometime this week.

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-4226:
-

Assignee: Rok Mihevc  (was: Kenta Murata)

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942598#comment-16942598
 ] 

Rok Mihevc commented on ARROW-4225:
---

Ok, I'll check the paper and see if I can get somewhere. :)

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942568#comment-16942568
 ] 

Rok Mihevc commented on ARROW-4225:
---

That's great @mrkn! Did you also start on 
[CSF|https://issues.apache.org/jira/browse/ARROW-4226]? If not I'll pick it up.

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-4225:
-

Assignee: Rok Mihevc

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6671) [C++] Sparse tensor naming

2019-09-24 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936779#comment-16936779
 ] 

Rok Mihevc commented on ARROW-6671:
---

SparseCSRMatrix might be more misleading as it 'doesn't look look like' a 
Tensor type. I think that is potentially more confusing than it being limited 
to 2D.

+1 for the consistent naming

> [C++] Sparse tensor naming
> --
>
> Key: ARROW-6671
> URL: https://issues.apache.org/jira/browse/ARROW-6671
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Minor
>
> Currently there's {{SparseCOOIndex}} and {{SparseCSRIndex}}, but also 
> {{SparseTensorCOO}} and {{SparseTensorCSR}}.
> For consistency, it would be nice to rename the latter {{SparseCOOTensor}} 
> and {{SparseCSRTensor}}.
> Also, it's not obvious the {{SparseMatrixCSR}} alias is useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6624) [C++] Add SparseTensor.ToTensor() method

2019-09-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-6624:
-

 Summary: [C++] Add SparseTensor.ToTensor() method
 Key: ARROW-6624
 URL: https://issues.apache.org/jira/browse/ARROW-6624
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc
Assignee: Rok Mihevc


We have functionality to convert (dense) tensors to sparse tensors, but not the 
other way around. Also [see 
discussion|https://github.com/apache/arrow/pull/4446#issuecomment-503792308].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-09-14 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929826#comment-16929826
 ] 

Rok Mihevc commented on ARROW-4208:
---

With #[5200|https://github.com/apache/arrow/pull/5200] we get minio server in 
pytest via fixture. It will work run in docker images, travis and appveyor.
So far we have one 
[test|https://github.com/apache/arrow/blob/59f1e148d5c0fa13b7964f85f13011532ff515ed/python/pyarrow/tests/test_parquet.py#L1797]
 in python.

Do we want to add other tests now? Do we have regression examples?

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: filesystem, pull-request-available, s3
> Fix For: 1.0.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-09-14 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4208:
--
Labels: filesystem pull-request-available s3  (was: filesystem s3)

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: filesystem, pull-request-available, s3
> Fix For: 1.0.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself

2019-08-27 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917191#comment-16917191
 ] 

Rok Mihevc commented on ARROW-6358:
---

Got it. Thanks!

> [C++] FileSystem::DeleteDir should make it optional to delete the directory 
> itself
> --
>
> Key: ARROW-6358
> URL: https://issues.apache.org/jira/browse/ARROW-6358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.14.1
>Reporter: Antoine Pitrou
>Priority: Major
>
> In some situations, it can be desirable to delete the entirety of a 
> directory's contents, but not the directory itself (e.g. when it's a S3 
> bucket). Perhaps we should add an option for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself

2019-08-27 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917142#comment-16917142
 ] 

Rok Mihevc commented on ARROW-6358:
---

Can you treat bucket as root of the filesystem?

> [C++] FileSystem::DeleteDir should make it optional to delete the directory 
> itself
> --
>
> Key: ARROW-6358
> URL: https://issues.apache.org/jira/browse/ARROW-6358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.14.1
>Reporter: Antoine Pitrou
>Priority: Major
>
> In some situations, it can be desirable to delete the entirety of a 
> directory's contents, but not the directory itself (e.g. when it's a S3 
> bucket). Perhaps we should add an option for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself

2019-08-27 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916616#comment-16916616
 ] 

Rok Mihevc commented on ARROW-6358:
---

Ah yes, sorry, missed the bucket case.
As an occasional S3 user I would be surprised if arrow deleted a bucket and not 
only it's contents. But I can imagine it would be useful to have that option 
sometimes.

> [C++] FileSystem::DeleteDir should make it optional to delete the directory 
> itself
> --
>
> Key: ARROW-6358
> URL: https://issues.apache.org/jira/browse/ARROW-6358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.14.1
>Reporter: Antoine Pitrou
>Priority: Major
>
> In some situations, it can be desirable to delete the entirety of a 
> directory's contents, but not the directory itself (e.g. when it's a S3 
> bucket). Perhaps we should add an option for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself

2019-08-26 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916134#comment-16916134
 ] 

Rok Mihevc commented on ARROW-6358:
---

AFAIK folders don't really exist in S3. Once objects inside a folder are delete 
the folder no longer exists. So deleting the contents and folder are the same 
thing
See: 
https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java

> [C++] FileSystem::DeleteDir should make it optional to delete the directory 
> itself
> --
>
> Key: ARROW-6358
> URL: https://issues.apache.org/jira/browse/ARROW-6358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.14.1
>Reporter: Antoine Pitrou
>Priority: Major
>
> In some situations, it can be desirable to delete the entirety of a 
> directory's contents, but not the directory itself (e.g. when it's a S3 
> bucket). Perhaps we should add an option for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (ARROW-6358) [C++] FileSystem::DeleteDir should make it optional to delete the directory itself

2019-08-26 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916134#comment-16916134
 ] 

Rok Mihevc edited comment on ARROW-6358 at 8/26/19 8:57 PM:


AFAIK folders don't really exist in S3. Once objects inside a folder are 
deleted the folder no longer exists. So deleting the contents and deleting the 
folder are the same thing.
See: 
https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java


was (Author: rokm):
AFAIK folders don't really exist in S3. Once objects inside a folder are delete 
the folder no longer exists. So deleting the contents and folder are the same 
thing
See: 
https://stackoverflow.com/questions/42442259/delete-a-folder-and-its-content-aws-s3-java

> [C++] FileSystem::DeleteDir should make it optional to delete the directory 
> itself
> --
>
> Key: ARROW-6358
> URL: https://issues.apache.org/jira/browse/ARROW-6358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.14.1
>Reporter: Antoine Pitrou
>Priority: Major
>
> In some situations, it can be desirable to delete the entirety of a 
> directory's contents, but not the directory itself (e.g. when it's a S3 
> bucket). Perhaps we should add an option for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-1456) [Python] Run s3fs unit tests in Travis CI

2019-08-23 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-1456:
-

Assignee: Rok Mihevc

> [Python] Run s3fs unit tests in Travis CI
> -
>
> Key: ARROW-1456
> URL: https://issues.apache.org/jira/browse/ARROW-1456
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: filesystem
> Fix For: 1.0.0
>
>
> We'll need to set up an S3 bucket to write to with credentials that cannot 
> compromise anyone's AWS account. I've been testing locally with a user that I 
> set up but I wouldn't be comfortable checking in these credentials, even in 
> encrypted form, without more scrutiny



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-08-23 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-4208:
-

Assignee: Rok Mihevc

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: filesystem, s3
> Fix For: 1.0.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   >