[jira] [Created] (ARROW-15670) [C++/Python/Packaging] Update conda pinnings and enable GCS on Windows

2022-02-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-15670:


 Summary: [C++/Python/Packaging] Update conda pinnings and enable 
GCS on Windows
 Key: ARROW-15670
 URL: https://issues.apache.org/jira/browse/ARROW-15670
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15445) [C++/Python] pyarrow build incorrectly detects x86 as system process during cross-cimpile

2022-01-25 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-15445:


 Summary: [C++/Python] pyarrow build incorrectly detects x86 as 
system process during cross-cimpile
 Key: ARROW-15445
 URL: https://issues.apache.org/jira/browse/ARROW-15445
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Uwe Korn


When cross-compiling {{pyarrow}} for aarch64 or ppc64le we run into the 
following issue:
{code:java}
-- System processor: x86_64
-- Performing Test CXX_SUPPORTS_SSE4_2
-- Performing Test CXX_SUPPORTS_SSE4_2 - Failed
-- Performing Test CXX_SUPPORTS_AVX2
-- Performing Test CXX_SUPPORTS_AVX2 - Failed
-- Performing Test CXX_SUPPORTS_AVX512
-- Performing Test CXX_SUPPORTS_AVX512 - Failed
-- Arrow build warning level: PRODUCTION
CMake Error at cmake_modules/SetupCxxFlags.cmake:456 (message):
  SSE4.2 required but compiler doesn't support it.
Call Stack (most recent call first):
  CMakeLists.txt:121 (include)


-- Configuring incomplete, errors occurred!
 {code}
The error is valid as we are building for a target system that doesn't support 
SSE at all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15444) [C++] Compilation with GCC 7.5 fails in aggregate_basic.cc

2022-01-25 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-15444:


 Summary: [C++] Compilation with GCC 7.5 fails in aggregate_basic.cc
 Key: ARROW-15444
 URL: https://issues.apache.org/jira/browse/ARROW-15444
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe Korn


Building with GCC 7.5 currently fails with the following internal error. We 
need to support this GCC version for CUDA-enabled and PPC64LE builds on 
conda-forge. See also the updated conda recipe in 
https://github.com/apache/arrow/pull/11916
{code:java}
2022-01-24T14:18:48.2261185Z [182/405] Building CXX object 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/aggregate_basic.cc.o
2022-01-24T14:18:48.2261792Z FAILED: 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/aggregate_basic.cc.o 
2022-01-24T14:18:48.2268608Z 
/build/arrow-cpp-ext_1643033227908/_build_env/bin/powerpc64le-conda-linux-gnu-c++
 -DARROW_EXPORTING -DARROW_HDFS -DARROW_JEMALLOC 
-DARROW_JEMALLOC_INCLUDE_DIR="" -DARROW_MIMALLOC -DARROW_WITH_BACKTRACE 
-DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 
-DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC 
-DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DURI_STATIC_BUILD 
-I/build/arrow-cpp-ext_1643033227908/work/cpp/build/src 
-I/build/arrow-cpp-ext_1643033227908/work/cpp/src 
-I/build/arrow-cpp-ext_1643033227908/work/cpp/src/generated -isystem 
/build/arrow-cpp-ext_1643033227908/work/cpp/thirdparty/flatbuffers/include 
-isystem 
/build/arrow-cpp-ext_1643033227908/work/cpp/build/jemalloc_ep-prefix/src 
-isystem 
/build/arrow-cpp-ext_1643033227908/work/cpp/build/mimalloc_ep/src/mimalloc_ep/include/mimalloc-1.7
 -isystem 
/build/arrow-cpp-ext_1643033227908/work/cpp/build/xsimd_ep/src/xsimd_ep-install/include
 -isystem /build/arrow-cpp-ext_1643033227908/work/cpp/thirdparty/hadoop/include 
-Wno-noexcept-type -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 
-mcpu=power8 -mtune=power8 -ftree-vectorize -fPIC -fstack-protector-strong 
-fno-plt -O3 -pipe -isystem 
/build/arrow-cpp-ext_1643033227908/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/include
 
-fdebug-prefix-map=/build/arrow-cpp-ext_1643033227908/work=/usr/local/src/conda/arrow-cpp-7.0.0.dev553
 
-fdebug-prefix-map=/build/arrow-cpp-ext_1643033227908/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla=/usr/local/src/conda-prefix
 -fdiagnostics-color=always -fuse-ld=gold -O3 -DNDEBUG  -Wall 
-fno-semantic-interposition  -O3 -DNDEBUG -fPIC -std=c++1z -MD -MT 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/aggregate_basic.cc.o -MF 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/aggregate_basic.cc.o.d -o 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/aggregate_basic.cc.o -c 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_basic.cc
2022-01-24T14:18:48.2273037Z In file included from 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/codegen_internal.h:46:0,
2022-01-24T14:18:48.2273811Z  from 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/util_internal.h:26,
2022-01-24T14:18:48.2274563Z  from 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_internal.h:20,
2022-01-24T14:18:48.2275318Z  from 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_basic_internal.h:24,
2022-01-24T14:18:48.2276088Z  from 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_basic.cc:19:
2022-01-24T14:18:48.2277993Z 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_internal.h:
 In instantiation of 'arrow::compute::internal::SumArray(const 
arrow::ArrayData&, ValueFunc&&):: [with ValueType = double; 
SumType = double; arrow::compute::SimdLevel::type SimdLevel = 
(arrow::compute::SimdLevel::type)0; ValueFunc = 
arrow::compute::internal::SumArray(const arrow::ArrayData&) [with ValueType = 
double; SumType = double; arrow::compute::SimdLevel::type SimdLevel = 
(arrow::compute::SimdLevel::type)0]::]':
2022-01-24T14:18:48.2281061Z 
/build/arrow-cpp-ext_1643033227908/work/cpp/src/arrow/compute/kernels/aggregate_internal.h:181:5:
   required from 'struct arrow::compute::internal::SumArray(const 
arrow::ArrayData&, ValueFunc&&) [with ValueType = double; SumType = double; 
arrow::compute::SimdLevel::type SimdLevel = (arrow::compute::SimdLevel::type)0; 
ValueFunc = arrow::compute::internal::SumArray(const arrow::ArrayData&) [with 
ValueType = double; SumType = 

[jira] [Created] (ARROW-13140) [C++/Python] Upgrade libthrift pin in the nightlies

2021-06-22 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-13140:


 Summary: [C++/Python] Upgrade libthrift pin in the nightlies
 Key: ARROW-13140
 URL: https://issues.apache.org/jira/browse/ARROW-13140
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12649) [Python/Packaging] Move conda-aarch64 to Azure with cross-compilation

2021-05-04 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-12649:


 Summary: [Python/Packaging] Move conda-aarch64 to Azure with 
cross-compilation
 Key: ARROW-12649
 URL: https://issues.apache.org/jira/browse/ARROW-12649
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 5.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12420) [C++/Dataset] Reading null columns as dictionary not longer possible

2021-04-16 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-12420:


 Summary: [C++/Dataset] Reading null columns as dictionary not 
longer possible
 Key: ARROW-12420
 URL: https://issues.apache.org/jira/browse/ARROW-12420
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 4.0.0
Reporter: Uwe Korn
 Fix For: 4.0.0


Reading a dataset with a dictionary column where some of the files don't 
contain any data for that column (and thus are typed as null) broke with 
https://github.com/apache/arrow/pull/9532. It worked with the 3.0 release 
though and thus I would consider this a regression.

This can be reproduced using the following Python snippet:

{code}
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

table = pa.table({"a": [None, None]})
pq.write_table(table, "test.parquet")
schema = pa.schema([pa.field("a", pa.dictionary(pa.int32(), pa.string()))])
fsds = ds.FileSystemDataset.from_paths(
paths=["test.parquet"],
schema=schema,
format=pa.dataset.ParquetFileFormat(),
filesystem=pa.fs.LocalFileSystem(),
)
fsds.to_table()
{code}

The exception on master is currently:

{code}
---
ArrowNotImplementedError  Traceback (most recent call last)
 in 
  6 filesystem=pa.fs.LocalFileSystem(),
  7 )
> 8 fsds.to_table()

~/Development/arrow/python/pyarrow/_dataset.pyx in 
pyarrow._dataset.Dataset.to_table()
456 table : Table instance
457 """
--> 458 return self._scanner(**kwargs).to_table()
459 
460 def head(self, int num_rows, **kwargs):

~/Development/arrow/python/pyarrow/_dataset.pyx in 
pyarrow._dataset.Scanner.to_table()
   2887 result = self.scanner.ToTable()
   2888 
-> 2889 return pyarrow_wrap_table(GetResultValue(result))
   2890 
   2891 def take(self, object indices):

~/Development/arrow/python/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()
139 cdef api int pyarrow_internal_check_status(const CStatus& status) \
140 nogil except -1:
--> 141 return check_status(status)
142 
143 

~/Development/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
116 raise ArrowKeyError(message)
117 elif status.IsNotImplemented():
--> 118 raise ArrowNotImplementedError(message)
119 elif status.IsTypeError():
120 raise ArrowTypeError(message)

ArrowNotImplementedError: Unsupported cast from null to 
dictionary (no available cast function 
for target type)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12230) [C++/Python/Packaging] Move conda aarch64 builds to Azure Pipelines

2021-04-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-12230:


 Summary: [C++/Python/Packaging] Move conda aarch64 builds to Azure 
Pipelines
 Key: ARROW-12230
 URL: https://issues.apache.org/jira/browse/ARROW-12230
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: Uwe Korn


We should move the nightly conda builds for aarch64 to Azure Pipelines as they 
currently fail on drone due to the hard 1h timeout. On Azure Pipelines, they 
should work automatically thanks to conda-forge's cross-compilation setup. The 
necessary trick here is that the {{.ci_support}} files contain a 
{{target_platform}} line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11724) [C++] Namespace collisions with protobuf 3.15

2021-02-21 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-11724:


 Summary: [C++] Namespace collisions with protobuf 3.15
 Key: ARROW-11724
 URL: https://issues.apache.org/jira/browse/ARROW-11724
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Affects Versions: 3.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 4.0.0


We define {{pb}} as a namespace alias in the flight sources. This conflicts 
with {{protobuf}} starting to introduce it as its global namespace alias.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11372) Support RC verification on macOS-ARM64

2021-01-25 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-11372:


 Summary: Support RC verification on macOS-ARM64
 Key: ARROW-11372
 URL: https://issues.apache.org/jira/browse/ARROW-11372
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0


There are some assumptions in the verification scripts that assume an x86 
system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11198) [Packaging][Python] Ensure setuptools version during build supports markdown

2021-01-10 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-11198:


 Summary: [Packaging][Python] Ensure setuptools version during 
build supports markdown
 Key: ARROW-11198
 URL: https://issues.apache.org/jira/browse/ARROW-11198
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0


We use a {{text/markdown}} long description and thus should always build/upload 
with at least setuptools 38.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11127) [C++] Unused cpu_info on non-x86 architecture

2021-01-04 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-11127:


 Summary: [C++] Unused cpu_info on non-x86 architecture
 Key: ARROW-11127
 URL: https://issues.apache.org/jira/browse/ARROW-11127
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 2.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10881) [C++] EXC_BAD_ACCESS in BaseSetBitRunReader::NextRun

2020-12-11 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10881:


 Summary: [C++] EXC_BAD_ACCESS in 
BaseSetBitRunReader::NextRun
 Key: ARROW-10881
 URL: https://issues.apache.org/jira/browse/ARROW-10881
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 2.0.0
Reporter: Uwe Korn


{{./release/parquet-encoding-benchmark}} fails with

{code}
BM_PlainDecodingFloat/65536  4206 ns 4206 
ns   167354 bytes_per_second=58.0474G/s
error: libparquet.300.dylib debug map object file 
'/Users/uwe/Development/arrow/cpp/build/src/parquet/CMakeFiles/parquet_objlib.dir/encoding.cc.o'
 has changed (actual time is 2020-12-10 22:57:29.0, debug map time is 
2020-12-10 21:02:52.0) since this executable was linked, file will be 
ignored
Process 11120 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
frame #0: 0x00010047fe04 
libparquet.300.dylib`arrow::internal::BaseSetBitRunReader::NextRun() + 
192
libparquet.300.dylib`arrow::internal::BaseSetBitRunReader::NextRun:
->  0x10047fe04 <+192>: ldur   x11, [x9, #-0x8]
0x10047fe08 <+196>: strx9, [x19]
0x10047fe0c <+200>: strx11, [x19, #0x18]
0x10047fe10 <+204>: rbit   x10, x11
Target 0: (parquet-encoding-benchmark) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
  * frame #0: 0x00010047fe04 
libparquet.300.dylib`arrow::internal::BaseSetBitRunReader::NextRun() + 
192
frame #1: 0x00010047f808 libparquet.300.dylib`parquet::(anonymous 
namespace)::PlainEncoder 
>::PutSpaced(bool const*, int, unsigned char const*, long long) + 336
frame #2: 0x00018970 
parquet-encoding-benchmark`parquet::BM_PlainEncodingSpacedBoolean(benchmark::State&)
 at encoding_benchmark.cc:249:14 [opt]
frame #3: 0x0001881c 
parquet-encoding-benchmark`parquet::BM_PlainEncodingSpacedBoolean(state=0x00016fdfd4b8)
 at encoding_benchmark.cc:257 [opt]
frame #4: 0x0001001614f4 
libbenchmark.0.dylib`benchmark::internal::BenchmarkInstance::Run(unsigned long 
long, int, benchmark::internal::ThreadTimer*, 
benchmark::internal::ThreadManager*) const + 68
frame #5: 0x000100173ae8 
libbenchmark.0.dylib`benchmark::internal::(anonymous 
namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, unsigned 
long long, int, benchmark::internal::ThreadManager*) + 80
frame #6: 0x0001001723c8 
libbenchmark.0.dylib`benchmark::internal::RunBenchmark(benchmark::internal::BenchmarkInstance
 const&, std::__1::vector >*) + 1284
frame #7: 0x00010015ee7c 
libbenchmark.0.dylib`benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*,
 benchmark::BenchmarkReporter*) + 1824
frame #8: 0x00010014beec libbenchmark_main.0.dylib`main + 76
frame #9: 0x00019e270f54 libdyld.dylib`start + 4
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10873) [C++] Apple Silicon is reported as arm64 in CMake

2020-12-10 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10873:


 Summary: [C++] Apple Silicon is reported as arm64 in CMake
 Key: ARROW-10873
 URL: https://issues.apache.org/jira/browse/ARROW-10873
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 2.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.1, 3.0.0


Currently we try to build with AVX2 on this platform which raises a lot of 
errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10861) [Python] Update minimal NumPy version to 1.6.6

2020-12-09 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10861:


 Summary: [Python] Update minimal NumPy version to 1.6.6
 Key: ARROW-10861
 URL: https://issues.apache.org/jira/browse/ARROW-10861
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Affects Versions: 2.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.1, 3.0.0


As part of the mitigation of https://github.com/numpy/numpy/issues/17913



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-08 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10853:


 Summary: [Java] Undeprecate sqlToArrow helpers
 Key: ARROW-10853
 URL: https://issues.apache.org/jira/browse/ARROW-10853
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 2.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0


These helper functions are really useful when called from Python as they deal 
with a lot of "internals" of Java that we don't want to handle from the Python 
side. We rather would keep using these functions.

Note that some of them are broken due to recent refactoring and only return 
1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10833) [Python] Avoid usage of NumPy's PyArray_DescrCheck macro

2020-12-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10833:


 Summary: [Python] Avoid usage of NumPy's PyArray_DescrCheck macro
 Key: ARROW-10833
 URL: https://issues.apache.org/jira/browse/ARROW-10833
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 2.0.0
Reporter: Uwe Korn
 Fix For: 3.0.0, 2.0.1


This is faulty in old versions and this will lead to a lot of issues with the 
upcoming numpy 1.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10711) [CI] Remove set-env from auto-tune to work with new GHA settings

2020-11-24 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10711:


 Summary: [CI] Remove set-env from auto-tune to work with new GHA 
settings
 Key: ARROW-10711
 URL: https://issues.apache.org/jira/browse/ARROW-10711
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn


See 
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10616) [Developer] Expand PR labeler to R and Python

2020-11-16 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10616:


 Summary: [Developer] Expand PR labeler to R and Python
 Key: ARROW-10616
 URL: https://issues.apache.org/jira/browse/ARROW-10616
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn


This would help me to browse through past PRs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10509) [C++] Define operator<<(ostream, ParquetExceptio) for clang+Windows

2020-11-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10509:


 Summary: [C++] Define operator<<(ostream, ParquetExceptio) for 
clang+Windows
 Key: ARROW-10509
 URL: https://issues.apache.org/jira/browse/ARROW-10509
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10502) [C++/Python] CUDA detection messes up nightly conda-win builds

2020-11-05 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10502:


 Summary: [C++/Python] CUDA detection messes up nightly conda-win 
builds
 Key: ARROW-10502
 URL: https://issues.apache.org/jira/browse/ARROW-10502
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10346) [Python] Default S3 region is eu-central-1 even with LANG=C

2020-10-19 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10346:


 Summary: [Python] Default S3 region is eu-central-1 even with 
LANG=C
 Key: ARROW-10346
 URL: https://issues.apache.org/jira/browse/ARROW-10346
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Uwe Korn


Verifying the macOS wheels using {{LANG=C 
dev/release/verify-release-candidate.sh wheels 2.0.0 2}} fails for me with

{code}
@pytest.mark.s3
def test_s3_real_aws():
# Exercise connection code with an AWS-backed S3 bucket.
# This is a minimal integration check for ARROW-9261 and similar issues.
from pyarrow.fs import S3FileSystem
fs = S3FileSystem(anonymous=True)
>   assert fs.region == 'us-east-1'  # default region
E   AssertionError: assert 'eu-central-1' == 'us-east-1'
E - us-east-1
E + eu-central-1
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10302) [Python] Don't double-package plasma-store-server

2020-10-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10302:


 Summary: [Python] Don't double-package plasma-store-server
 Key: ARROW-10302
 URL: https://issues.apache.org/jira/browse/ARROW-10302
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 3.0.0


This is part of the {{arrow-cpp}} and {{pyarrow}} conda packages. We shouldn't 
ship the version in {{pyarrow}} as this is just a copy to a different location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10253) [Python] Don't bundle plasma-store-server in pyarrow conda package

2020-10-09 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10253:


 Summary: [Python] Don't bundle plasma-store-server in pyarrow 
conda package
 Key: ARROW-10253
 URL: https://issues.apache.org/jira/browse/ARROW-10253
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn


We currently have it in the {{arrow-cpp}} and the {{pyarrow}} conda package, we 
should only have it in {{arrow-cpp}} as this is always there and also the 
source of the binary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10252) [Python] Add option to skip inclusion of Arrow headers in Python installation

2020-10-09 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10252:


 Summary: [Python] Add option to skip inclusion of Arrow headers in 
Python installation
 Key: ARROW-10252
 URL: https://issues.apache.org/jira/browse/ARROW-10252
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn


We don't want to have them as part of the conda package as the single source 
should be {{arrow-cpp}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10104) [Python] Separate tests into its own conda package

2020-09-26 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10104:


 Summary: [Python] Separate tests into its own conda package
 Key: ARROW-10104
 URL: https://issues.apache.org/jira/browse/ARROW-10104
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn


We currently ship the tests with the source code. This is nice to test the 
integrity of the installation, it is not needed for runtime though. In the case 
of conda, the overhead to turn them into a separate package is small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10081) [C++/Python] Fix bash syntax in drone.io conda builds

2020-09-24 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10081:


 Summary: [C++/Python] Fix bash syntax in drone.io conda builds
 Key: ARROW-10081
 URL: https://issues.apache.org/jira/browse/ARROW-10081
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10049) [C++/Python] Sync conda recipe with conda-forge

2020-09-20 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10049:


 Summary: [C++/Python] Sync conda recipe with conda-forge
 Key: ARROW-10049
 URL: https://issues.apache.org/jira/browse/ARROW-10049
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10011) [C++] Make FindRE2.cmake re-entrant

2020-09-15 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-10011:


 Summary: [C++] Make FindRE2.cmake re-entrant
 Key: ARROW-10011
 URL: https://issues.apache.org/jira/browse/ARROW-10011
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 1.0.1, 1.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn


Repeated calls to FindRE2.cmake try to recreate the exisiting target 
{{RE2::re2}} which is prohibited by CMake and fails with the following error:

{code}
CMake Warning (dev) at 
C:/Miniconda37-x64/envs/arrow/Library/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:272
 (message):
  The package name passed to `find_package_handle_standard_args` (RE2) does
  not match the name of the calling package (re2).  This can lead to problems
  in calling code that expects `find_package` result variables (e.g.,
  `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake_modules/FindRE2.cmake:63 (find_package_handle_standard_args)
  C:/Miniconda37-x64/envs/arrow/Library/lib/cmake/grpc/gRPCConfig.cmake:21 
(find_package)
  cmake_modules/ThirdpartyToolchain.cmake:2472 (find_package)
  CMakeLists.txt:495 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.
CMake Error at cmake_modules/FindRE2.cmake:66 (add_library):
  add_library cannot create imported target "RE2::re2" because another target
  with the same name already exists.
Call Stack (most recent call first):
  C:/Miniconda37-x64/envs/arrow/Library/lib/cmake/grpc/gRPCConfig.cmake:21 
(find_package)
  cmake_modules/ThirdpartyToolchain.cmake:2472 (find_package)
  CMakeLists.txt:495 (include)
{code}

Note that this issue only occurs currently on case-insensitive file systems 
when ARROW_FLIGHT=ON is set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9933) [Developer] Add drone as a CI provider for crossbow

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9933:
---

 Summary: [Developer] Add drone as a CI provider for crossbow
 Key: ARROW-9933
 URL: https://issues.apache.org/jira/browse/ARROW-9933
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9929) [Developer] Autotune cmake-format

2020-09-07 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9929:
---

 Summary: [Developer] Autotune cmake-format
 Key: ARROW-9929
 URL: https://issues.apache.org/jira/browse/ARROW-9929
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9882) [C++/Python] Update conda-forge-pinning to 3 for OSX conda packages

2020-08-28 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9882:
---

 Summary: [C++/Python] Update conda-forge-pinning to 3 for OSX 
conda packages
 Key: ARROW-9882
 URL: https://issues.apache.org/jira/browse/ARROW-9882
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9879) [Python] ChunkedArray.__getitem__ doesn't work with numpy scalars

2020-08-28 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9879:
---

 Summary: [Python] ChunkedArray.__getitem__ doesn't work with numpy 
scalars
 Key: ARROW-9879
 URL: https://issues.apache.org/jira/browse/ARROW-9879
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 1.0.0, 1.0.1
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.0


 

{{import pyarrow as pa
import numpy as np
pa.chunked_array(pa.array([1,2]))[np.int32(0)]}}

fails with error {{TypeError: key must either be a slice or integer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9589) [C++/R] arrow_exports.h contains structs declared as class

2020-07-28 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9589:
---

 Summary: [C++/R] arrow_exports.h contains structs declared as class
 Key: ARROW-9589
 URL: https://issues.apache.org/jira/browse/ARROW-9589
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 1.0.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.0


This is an issue in an MSVC-based toolchain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9588) [C++] clang/win: Copy constructor of ParquetInvalidOrCorruptedFileException not correctly triggered

2020-07-28 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9588:
---

 Summary: [C++] clang/win: Copy constructor of 
ParquetInvalidOrCorruptedFileException not correctly triggered
 Key: ARROW-9588
 URL: https://issues.apache.org/jira/browse/ARROW-9588
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Uwe Korn


The copy constructor of ParquetInvalidOrCorruptedFileException doesn't seem to 
be taken correctly when building with clang 9.0.1 on Windows in a MSVC 
toolchain.

Adding {{ParquetInvalidOrCorruptedFileException(const 
ParquetInvalidOrCorruptedFileException&) = default;}} as an explicit copy 
constructor didn't help.

Happy to any ideas here, probably a long shot as there are other clang-msvc 
problems.

{code}
[49/62] Building CXX object 
src/parquet/CMakeFiles/parquet_shared.dir/Unity/unity_1_cxx.cxx.obj
FAILED: src/parquet/CMakeFiles/parquet_shared.dir/Unity/unity_1_cxx.cxx.obj
C:\Users\Administrator\miniconda3\conda-bld\arrow-cpp-ext_1595962790058\_build_env\Library\bin\clang++.exe
  -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 
-DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_S
SE4_2 -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_SNAPPY 
-DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB 
-DARROW_WITH_ZSTD -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_EVE
NT_STREAM_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=7 
-DAWS_SDK_VERSION_PATCH=164 -DHAVE_INTTYPES_H -DHAVE_NETDB_H -DNOMINMAX 
-DPARQUET_EXPORTING -DUSE_IMPORT_EXPORT -DUSE_IMPORT
_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS 
-Dparquet_shared_EXPORTS -Isrc -I../src -I../src/generated -isystem 
../thirdparty/flatbuffers/include -isystem C:/Users/Administrator/minico
nda3/conda-bld/arrow-cpp-ext_1595962790058/_h_env/Library/include -isystem 
../thirdparty/hadoop/include -fvisibility-inlines-hidden -std=c++14 
-fmessage-length=0 -march=k8 -mtune=haswell -ftree-vectorize
-fstack-protector-strong -O2 -ffunction-sections -pipe 
-D_CRT_SECURE_NO_WARNINGS -D_MT -D_DLL -nostdlib -Xclang --dependent-lib=msvcrt 
-fuse-ld=lld -fno-aligned-allocation -Qunused-arguments -fcolor-diagn
ostics -O3 -DNDEBUG  -Wa,-mbig-obj -Wall -Wno-unknown-warning-option 
-Wno-pass-failed -msse4.2  -O3 -DNDEBUG -D_DLL -D_MT -Xclang 
--dependent-lib=msvcrt   -std=c++14 -MD -MT src/parquet/CMakeFiles/parquet
_shared.dir/Unity/unity_1_cxx.cxx.obj -MF 
src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_1_cxx.cxx.obj.d -o 
src/parquet/CMakeFiles/parquet_shared.dir/Unity/unity_1_cxx.cxx.obj -c 
src/parquet/CMakeF
iles/parquet_shared.dir/Unity/unity_1_cxx.cxx
In file included from 
src/parquet/CMakeFiles/parquet_shared.dir/Unity/unity_1_cxx.cxx:3:
In file included from 
C:/Users/Administrator/miniconda3/conda-bld/arrow-cpp-ext_1595962790058/work/cpp/src/parquet/column_scanner.cc:18:
In file included from ../src\parquet/column_scanner.h:29:
In file included from ../src\parquet/column_reader.h:25:
In file included from ../src\parquet/exception.h:26:
In file included from ../src\parquet/platform.h:23:
In file included from ../src\arrow/buffer.h:28:
In file included from ../src\arrow/status.h:25:
../src\arrow/util/string_builder.h:49:10: error: invalid operands to binary 
expression ('std::ostream' (aka 'basic_ostream >') and 
'parquet::ParquetInvalidOrCorruptedFileException'
)
  stream << head;
  ~~ ^  
../src\arrow/util/string_builder.h:61:3: note: in instantiation of function 
template specialization 
'arrow::util::StringBuilderRecursive' requested here
  StringBuilderRecursive(ss.stream(), std::forward(args)...);
  ^
../src\arrow/status.h:160:31: note: in instantiation of function template 
specialization 
'arrow::util::StringBuilder' 
requested here
return Status(code, util::StringBuilder(std::forward(args)...));
  ^
../src\arrow/status.h:204:20: note: in instantiation of function template 
specialization 
'arrow::Status::FromArgs' 
requested here
return Status::FromArgs(StatusCode::Invalid, std::forward(args)...);
   ^
../src\parquet/exception.h:129:49: note: in instantiation of function template 
specialization 
'arrow::Status::Invalid' 
requested here
  : 
ParquetStatusException(::arrow::Status::Invalid(std::forward(args)...)) {}
^
C:/Users/Administrator/miniconda3/conda-bld/arrow-cpp-ext_1595962790058/work/cpp/src/parquet/file_reader.cc:270:13:
 note: in instantiation of function template specialization 
'parquet::ParquetInvalidOrCor
ruptedFileException::ParquetInvalidOrCorruptedFileException' requested here
  throw ParquetInvalidOrCorruptedFileException("Parquet file size is 0 
bytes");
^
C:\BuildTools\VC\Tools\MSVC\14.16.27023\include\ostream:480:36: note: candidate 
function not viable: no known conversion from 
'parquet::ParquetInvalidOrCorruptedFileException' to 'const void *' for 1st ar
gument; take the 

[jira] [Created] (ARROW-9560) [Packaging] conda recipes failing due to missing conda-forge.yml

2020-07-26 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9560:
---

 Summary: [Packaging] conda recipes failing due to missing 
conda-forge.yml
 Key: ARROW-9560
 URL: https://issues.apache.org/jira/browse/ARROW-9560
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9535) [Python] Remove symlink fixes from conda recipe

2020-07-21 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9535:
---

 Summary: [Python] Remove symlink fixes from conda recipe
 Key: ARROW-9535
 URL: https://issues.apache.org/jira/browse/ARROW-9535
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9504) [C++/Python] Segmentation fault on ChunkedArray.take

2020-07-16 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9504:
---

 Summary: [C++/Python] Segmentation fault on ChunkedArray.take
 Key: ARROW-9504
 URL: https://issues.apache.org/jira/browse/ARROW-9504
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Uwe Korn
 Fix For: 1.0.0


This leads to a segementation fault with the latest conda nigthlies on Python 
3.8 / macOS

{code}
import pyarrow as pa
import numpy as np

arr = pa.chunked_array([
  [
"m",
"J",
"q",
"k",
"t"
  ],
  [
"m",
"J",
"q",
"k",
"t"
  ]
])

indices = np.array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
arr.take(indices)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9489) [C++] Add fill_null kernel implementation for (array[string], scalar[string])

2020-07-15 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9489:
---

 Summary: [C++] Add fill_null kernel implementation for 
(array[string], scalar[string])
 Key: ARROW-9489
 URL: https://issues.apache.org/jira/browse/ARROW-9489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Uwe Korn
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9460) [C++] BinaryContainsExact doesn't cope with double characters in the pattern

2020-07-14 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9460:
---

 Summary: [C++] BinaryContainsExact doesn't cope with double 
characters in the pattern
 Key: ARROW-9460
 URL: https://issues.apache.org/jira/browse/ARROW-9460
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9433) [C++/Python] Add option to Take kernel to interpret negative indices as NULL

2020-07-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9433:
---

 Summary: [C++/Python] Add option to Take kernel to interpret 
negative indices as NULL
 Key: ARROW-9433
 URL: https://issues.apache.org/jira/browse/ARROW-9433
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Reporter: Uwe Korn
 Fix For: 2.0.0


Currently negative integers are explicitly forbidding in the {{Take}} kernel. 
It would be nice to have the option to treat negative integers as NULL instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9432) [C++/Python] Add option to Take kernel to interpret negative indices as indexing from the right

2020-07-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9432:
---

 Summary: [C++/Python] Add option to Take kernel to interpret 
negative indices as indexing from the right
 Key: ARROW-9432
 URL: https://issues.apache.org/jira/browse/ARROW-9432
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Reporter: Uwe Korn
 Fix For: 2.0.0


Currently negative integers are explicitly forbidding in the {{Take}} kernel. 
It would be nice to have the option to treat negative integers as "indices from 
the right" instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9431) [C++/Python] Kernel for SetItem(IntegerArray, values)

2020-07-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9431:
---

 Summary: [C++/Python] Kernel for SetItem(IntegerArray, values)
 Key: ARROW-9431
 URL: https://issues.apache.org/jira/browse/ARROW-9431
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Affects Versions: 2.0.0
Reporter: Uwe Korn


We should have a kernel that allows overriding the values of an array using an 
integer array as the indexer and a scalar or array of equal length as the 
values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9430) [C++/Python] Kernel for SetItem(BooleanArray, values)

2020-07-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9430:
---

 Summary: [C++/Python] Kernel for SetItem(BooleanArray, values)
 Key: ARROW-9430
 URL: https://issues.apache.org/jira/browse/ARROW-9430
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Reporter: Uwe Korn
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9429) [Python] ChunkedArray.to_numpy

2020-07-13 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9429:
---

 Summary: [Python] ChunkedArray.to_numpy
 Key: ARROW-9429
 URL: https://issues.apache.org/jira/browse/ARROW-9429
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Uwe Korn
 Fix For: 2.0.0


Currently one needs to construct a {{pandas.Series}} and call {{values}} to get 
a numpy array as a result of {{ChunkedArray}}. We should provide a simpler 
{{to_numpy}} function that doesn't construct the {{pandas.Series}} overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9407) [Python] Accept pd.NA as missing value in array constructor

2020-07-10 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9407:
---

 Summary: [Python] Accept pd.NA as missing value in array 
constructor
 Key: ARROW-9407
 URL: https://issues.apache.org/jira/browse/ARROW-9407
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe Korn
 Fix For: 2.0.0


Currently we don't support using {{pandas.NA}} at all:

{code}
In [1]: import pyarrow as pa

In [2]: import pandas as pd

In [3]: pa.array([pd.NA, "A"])
---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 pa.array([pd.NA, "A"])

~/miniconda3/envs/fletcher/lib/python3.8/site-packages/pyarrow/array.pxi in 
pyarrow.lib.array()

~/miniconda3/envs/fletcher/lib/python3.8/site-packages/pyarrow/array.pxi in 
pyarrow.lib._sequence_to_array()

~/miniconda3/envs/fletcher/lib/python3.8/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: Could not convert  with type NAType: did not recognize Python 
value type when inferring an Arrow data type
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9401) [C++/Python] Support necessary functionality to have an Arrow-string type in pandas

2020-07-10 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9401:
---

 Summary: [C++/Python] Support necessary functionality to have an 
Arrow-string type in pandas
 Key: ARROW-9401
 URL: https://issues.apache.org/jira/browse/ARROW-9401
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 2.0.0


This should serve as an umbrella issue for the needed functionality to have an 
Apache Arrow backed string type in {{pandas}}. In addition to the string 
kernels, we probably need to implement some more support functionality to 
efficiently the {{pandas}} interfaces.

Some of these functions are already present in {{fletcher}} but a native string 
type in {{pandas}} should not have a hard dependency on {{numba}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9160) [C++] Implement string/binary contains for exact matches

2020-06-17 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9160:
---

 Summary: [C++] Implement string/binary contains for exact matches
 Key: ARROW-9160
 URL: https://issues.apache.org/jira/browse/ARROW-9160
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 1.0.0


Implement {{contains}} for exact matches of subportions of a string. Using the 
Knuth–Morris–Pratt algorithm, we should be able to do this in a linear runtime 
with a tiny bit of preprocessing at the invocation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9074) [GLib] Add missing arrow-json check

2020-06-09 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-9074.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7381
[https://github.com/apache/arrow/pull/7381]

> [GLib] Add missing arrow-json check
> ---
>
> Key: ARROW-9074
> URL: https://issues.apache.org/jira/browse/ARROW-9074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9073) [C++] RapidJSON include directory detection doesn't work with RapidJSONConfig.cmake

2020-06-09 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-9073.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7380
[https://github.com/apache/arrow/pull/7380]

> [C++] RapidJSON include directory detection doesn't work with 
> RapidJSONConfig.cmake
> ---
>
> Key: ARROW-9073
> URL: https://issues.apache.org/jira/browse/ARROW-9073
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7893) [Developer][GLib] Document GLib development workflow when using conda environment on GTK-based Linux systems

2020-06-09 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128898#comment-17128898
 ] 

Uwe Korn commented on ARROW-7893:
-

[~kou] Can you give me a pointer at which stage the library is loaded, i.e. 
where {{LD_LIBRARY_PATH}} does come into action? Then I can have a look at the 
conda packaging.

> [Developer][GLib] Document GLib development workflow when using conda 
> environment on GTK-based Linux systems
> 
>
> Key: ARROW-7893
> URL: https://issues.apache.org/jira/browse/ARROW-7893
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, GLib
>Reporter: Wes McKinney
>Assignee: Kouhei Sutou
>Priority: Major
>
> I periodically deal with annoying errors like:
> {code}
> checking for GLIB - version >= 2.32.4... 
> *** 'pkg-config --modversion glib-2.0' returned 2.58.3, but GLIB (2.56.4)
> *** was found! If pkg-config was correct, then it is best
> *** to remove the old version of GLib. You may also be able to fix the error
> *** by modifying your LD_LIBRARY_PATH enviroment variable, or by editing
> *** /etc/ld.so.conf. Make sure you have run ldconfig if that is
> *** required on your system.
> *** If pkg-config was wrong, set the environment variable PKG_CONFIG_PATH
> *** to point to the correct configuration files
> no
> configure: error: GLib isn't available
> make: *** No targets specified and no makefile found.  Stop.
> make: *** No rule to make target 'install'.  Stop.
> Traceback (most recent call last):
>   2: from /home/wesm/code/arrow/c_glib/test/run-test.rb:30:in `'
>   1: from /usr/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in 
> `require'
> /usr/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require': 
> cannot load such file -- gi (LoadError)
> {code}
> The problem is that I have one version of glib on my Linux system while 
> another in the activated conda environment, it seems that there is a conflict 
> even though {{$PKG_CONFIG_PATH}} is set to ignore system directories
> https://gist.github.com/wesm/e62bf4517468be78200e8dd6db0fc544



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9066) [Python] Raise correct error in isnull()

2020-06-08 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9066:
---

 Summary: [Python] Raise correct error in isnull()
 Key: ARROW-9066
 URL: https://issues.apache.org/jira/browse/ARROW-9066
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8961) [C++] Vendor utf8proc library

2020-06-07 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127531#comment-17127531
 ] 

Uwe Korn commented on ARROW-8961:
-

We should definitely run benchmarks as in the utf8proc issue tracker they 
mention that {{icu}} seems to be significantly faster than {{utf8proc}}. Still, 
{{icu}} is much fatter than {{utf8proc}} and we probably need exact the 
functionality that is part of {{utf8proc}}, not more.

> [C++] Vendor utf8proc library
> -
>
> Key: ARROW-8961
> URL: https://issues.apache.org/jira/browse/ARROW-8961
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> This is a minimal MIT-licensed library for UTF-8 data processing originally 
> developed for use in Julia
> https://github.com/JuliaStrings/utf8proc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2079) [Python][C++] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2020-06-04 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125743#comment-17125743
 ] 

Uwe Korn commented on ARROW-2079:
-

For the datasets, we write in \{{kartothek}}, we only write 
\{{_common_metadata}} (I think Apache Drill does the same). This is useful to 
have the schema for the whole dataset but writing the {{_metadata}} file with 
all information would be to expensive and in the {{kartothek}} case even 
useless.

> [Python][C++] Possibly use `_common_metadata` for schema if `_metadata` isn't 
> available
> ---
>
> Key: ARROW-2079
> URL: https://issues.apache.org/jira/browse/ARROW-2079
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Jim Crist
>Priority: Minor
>  Labels: dataset, dataset-parquet-read, parquet
>
> Currently pyarrow's parquet writer only writes `_common_metadata` and not 
> `_metadata`. From what I understand these are intended to contain the dataset 
> schema but not any row group information.
>  
> A few (possibly naive) questions:
>  
> 1. In the `__init__` for `ParquetDataset`, the following lines exist:
> {code:java}
> if self.metadata_path is not None:
> with self.fs.open(self.metadata_path) as f:
> self.common_metadata = ParquetFile(f).metadata
> else:
> self.common_metadata = None
> {code}
> I believe this should use `common_metadata_path` instead of `metadata_path`, 
> as the latter is never written by `pyarrow`, and is given by the `_metadata` 
> file instead of `_common_metadata` (as seemingly intended?).
>  
> 2. In `validate_schemas` I believe an option should exist for using the 
> schema from `_common_metadata` instead of `_metadata`, as pyarrow currently 
> only writes the former, and as far as I can tell `_common_metadata` does 
> include all the schema information needed.
>  
> Perhaps the logic in `validate_schemas` could be ported over to:
>  
> {code:java}
> if self.schema is not None:
> pass  # schema explicitly provided
> elif self.metadata is not None:
> self.schema = self.metadata.schema
> elif self.common_metadata is not None:
> self.schema = self.common_metadata.schema
> else:
> self.schema = self.pieces[0].get_metadata(open_file).schema{code}
> If these changes are valid, I'd be happy to submit a PR. It's not 100% clear 
> to me the difference between `_common_metadata` and `_metadata`, but I 
> believe the schema in both should be the same. Figured I'd open this for 
> discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9026:
---

 Summary: [C++/Python] Force package removal from arrow-nightlies 
conda repository
 Key: ARROW-9026
 URL: https://issues.apache.org/jira/browse/ARROW-9026
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9024:
---

 Summary: [C++/Python] Install anaconda-client in conda-clean job
 Key: ARROW-9024
 URL: https://issues.apache.org/jira/browse/ARROW-9024
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9023) [C++] Use mimalloc conda package

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9023:
---

 Summary: [C++] Use mimalloc conda package
 Key: ARROW-9023
 URL: https://issues.apache.org/jira/browse/ARROW-9023
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4144) [Java] Arrow-to-JDBC

2020-06-02 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123484#comment-17123484
 ] 

Uwe Korn commented on ARROW-4144:
-

Yes, the usecase would be to write large {{pandas.DataFrames}} to a database 
layer that only has performant JDBC drivers. Personally, all my JDBC sources 
are read-only and thus I didn't write a WriteToJDBC function but other people 
will also use these technologies with more access rights. I have used the 
"pyarrow->Arrow Java -> JDBC" successfully with Apache Drill and Denodo. I also 
heard that some people use it together with Amazon Athena and here a performant 
INSERT might be interesting 
[https://docs.aws.amazon.com/athena/latest/ug/insert-into.html] as the JDBC 
driver seems to be the most performant currently.

> [Java] Arrow-to-JDBC
> 
>
> Key: ARROW-4144
> URL: https://issues.apache.org/jira/browse/ARROW-4144
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Chen
>Priority: Major
>
> ARROW-1780 reads a query from a JDBC data source and converts the ResultSet 
> to an Arrow VectorSchemaRoot.  However, there is no built-in adapter for 
> writing an Arrow VectorSchemaRoot back to the database.
> ARROW-3966 adds JDBC field metadata:
>  * The Catalog Name
>  * The Table Name
>  * The Field Name
>  * The Field Type
> We can use this information to ask for the field information from the 
> database via the 
> [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html]
>  object.  We can then create INSERT or UPDATE statements based on the [list 
> of primary 
> keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)]
>  in the table:
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> NULL, insert that record into the database.
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> not NULL, update the existing record in the database.
> We can also perform the same data conversion in reverse based on the field 
> types queried from the database.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8941) [C++/Python] arrow-nightlies conda repository is full

2020-05-30 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn reassigned ARROW-8941:
---

Assignee: Uwe Korn

> [C++/Python] arrow-nightlies conda repository is full
> -
>
> Key: ARROW-8941
> URL: https://issues.apache.org/jira/browse/ARROW-8941
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging, Python
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> You currently have 3 public packages and 0 packages that require to be 
> authenticated.
> Using 10.0 GB of 3.0 GB storage
>  
> We need a script to delete old packages, e.g. once a week?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8984) [R] Revise install guides now that Windows conda package exists

2020-05-29 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-8984.
-
Resolution: Fixed

Issue resolved by pull request 7303
[https://github.com/apache/arrow/pull/7303]

> [R] Revise install guides now that Windows conda package exists
> ---
>
> Key: ARROW-8984
> URL: https://issues.apache.org/jira/browse/ARROW-8984
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8961) [C++] Vendor utf8proc library

2020-05-27 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117942#comment-17117942
 ] 

Uwe Korn commented on ARROW-8961:
-

It's already there, named {{libutf8proc}}.

> [C++] Vendor utf8proc library
> -
>
> Key: ARROW-8961
> URL: https://issues.apache.org/jira/browse/ARROW-8961
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> This is a minimal MIT-licensed library for UTF-8 data processing originally 
> developed for use in Julia
> https://github.com/JuliaStrings/utf8proc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8961) [C++] Vendor utf8proc library

2020-05-27 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117446#comment-17117446
 ] 

Uwe Korn commented on ARROW-8961:
-

For conda-forge and other distributions that can handle binary dependencies, we 
want to have use the system one. So we definitely need a 
ARROW_USE_SYSTEM_UTF8PROC option if we vendor.

> [C++] Vendor utf8proc library
> -
>
> Key: ARROW-8961
> URL: https://issues.apache.org/jira/browse/ARROW-8961
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> This is a minimal MIT-licensed library for UTF-8 data processing originally 
> developed for use in Julia
> https://github.com/JuliaStrings/utf8proc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8962) [C++] Linking failure with clang-4.0

2020-05-27 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8962:
---

 Summary: [C++] Linking failure with clang-4.0
 Key: ARROW-8962
 URL: https://issues.apache.org/jira/browse/ARROW-8962
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe Korn
Assignee: Uwe Korn


{code:java}
FAILED: release/arrow-file-to-stream
: && /Users/uwe/miniconda3/envs/pyarrow-dev/bin/ccache 
/Users/uwe/miniconda3/envs/pyarrow-dev/bin/x86_64-apple-darwin13.4.0-clang++  
-march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE 
-fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden 
-std=c++14 -fmessage-length=0 -Qunused-arguments -fcolor-diagnostics -O3 
-DNDEBUG  -Wall -Wno-unknown-warning-option -Wno-pass-failed -msse4.2  -O3 
-DNDEBUG -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk
 -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-pie 
-Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs 
src/arrow/ipc/CMakeFiles/arrow-file-to-stream.dir/file_to_stream.cc.o  -o 
release/arrow-file-to-stream  release/libarrow.a 
/usr/local/opt/openssl@1.1/lib/libssl.dylib 
/usr/local/opt/openssl@1.1/lib/libcrypto.dylib 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libbrotlienc-static.a 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libbrotlidec-static.a 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libbrotlicommon-static.a 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/liblz4.dylib 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libsnappy.1.1.7.dylib 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libz.dylib 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libzstd.dylib 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/liborc.a 
/Users/uwe/miniconda3/envs/pyarrow-dev/lib/libprotobuf.dylib 
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a && :
Undefined symbols for architecture x86_64:
  "arrow::internal::(anonymous 
namespace)::StringToFloatConverterImpl::main_junk_value_", referenced from:
  arrow::internal::StringToFloat(char const*, unsigned long, float*) in 
libarrow.a(value_parsing.cc.o)
  arrow::internal::StringToFloat(char const*, unsigned long, double*) in 
libarrow.a(value_parsing.cc.o)
  "arrow::internal::(anonymous 
namespace)::StringToFloatConverterImpl::fallback_junk_value_", referenced from:
  arrow::internal::StringToFloat(char const*, unsigned long, float*) in 
libarrow.a(value_parsing.cc.o)
  arrow::internal::StringToFloat(char const*, unsigned long, double*) in 
libarrow.a(value_parsing.cc.o)
ld: symbol(s) not found for architecture x86_64
clang-4.0: error: linker command failed with exit code 1 (use -v to see 
invocation) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8941) [C++/Python] arrow-nightlies conda repository is full

2020-05-26 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8941:
---

 Summary: [C++/Python] arrow-nightlies conda repository is full
 Key: ARROW-8941
 URL: https://issues.apache.org/jira/browse/ARROW-8941
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: Uwe Korn


You currently have 3 public packages and 0 packages that require to be 
authenticated.
Using 10.0 GB of 3.0 GB storage

 

We need a script to delete old packages, e.g. once a week?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-8810) Append to parquet file?

2020-05-15 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108087#comment-17108087
 ] 

Uwe Korn edited comment on ARROW-8810 at 5/15/20, 8:47 AM:
---

Generally, you should see Parquet files as immutable. If you want to change its 
contents, it is almost always simpler and faster to just rewrite them 
completely or (much better) just write a second file and treat a directory of 
Parquet files as a single dataset. This comes down to two major properties:
 * Values in a Parquet file are encoded and compressed. Thus they don't adhere 
to a fixed size per row/value but in some cases a column chunk of a million 
values may be stored in just 64 bytes.
 * The metadata that contains all essential information, e.g. where row groups 
start, what schema the data is, is stored at the end of the file (i.e. the 
footer). Especially the last four bytes are needed as they indicate the start 
position of the footer.

Technically, you could still write code that appends to an existing Parquet 
file but this has the drawbacks that:
 * Writing wouldn't be faster than writing to a second, separate file. It would 
probably be even slower as we need to deserialize the existing metadata and 
serialize it again only with slight modifications.
 * Reading wouldn't be faster than reading from a second file, even when doing 
it sequentially.
 * While append to a Parquet file, the file would be unreadable.
 * If your process crashes during write, all existing data in the Parquet file 
will be lost.
 * It will give the users the impression that you could efficiently insert 
row-by-row to a file. With a columnar data format that can only leverage its 
techniques on large chunks of rows, this would generate a massive overhead.

Still if one would try to implement this, it would work as follows:
 # Read in the footer/metadata of the existing file.
 # Seek to the start position of the existing footer and overwrite it with the 
new data.
 # Merge (or rather concat) the existing metadata with the newly computed 
metadata and write it at the end of the file.

If you would take a look at how a completely fresh Parquet file would be 
written, this is identical except that we wouldn't need to read in and 
overwrite any existing metadata.

With newer Arrow releases, there will be better support for Parquet datasets in 
R, I'll leave this to [~npr] or [~jorisvandenbossche] to link to the right docs.


was (Author: xhochy):
Generally, you should see Parquet files as immutable. If you want to change its 
contents, it is almost always simpler and faster to just rewrite them 
completely or (much better) just write a second file and treat a directory of 
Parquet files as a single dataset. This comes down to two major properties:
 * Values in a Parquet file are encoded and compressed. Thus they don't adhere 
to a fixed size per row/value but in some cases a column chunk of a million 
values may be stored in just 64 bytes.
 * The metadata that contains all essential information, e.g. where row groups 
start, what schema the data is, is stored at the end of the file (i.e. the 
footer). Especially the last four bytes are needed as they indicate the start 
position of the footer.

Technically, you could still write code that appends to an existing Parquet 
file but this has the drawbacks that:
 * Writing wouldn't be faster than writing to a second, separate file. It would 
probably be even slower as we need to deserialize the existing metadata and 
serialize it again only with slight modifications.
 * Reading wouldn't be faster than reading from a second file, even when doing 
it sequentially.
 * While append to a Parquet file, the file would be unreadable.
 * If your process crashes during write, all existing data in the Parquet file 
will be lost.
 * It will give the users the impression that you could efficiently insert 
row-by-row to a file. With a columnar data format that can only leverage its 
techniques on large chunks of rows, this would generate a massive overhead.

Still if one would try to implement this, it would work as follows:
 # Read in the footer/metadata of the existing file.
 # Seek to the start position of the existing footer and overwrite it with the 
new data.
 # Merge (or rather concat) the existing metadata with the newly computed 
metadata and write it at the end of the file.

If you would take a look at how a completely fresh Parquet file would be 
written, this is identical except that we wouldn't need to read in and 
overwrite any existing metadata.

> Append to parquet file?
> ---
>
> Key: ARROW-8810
> URL: https://issues.apache.org/jira/browse/ARROW-8810
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Carl Boettiger
>Priority: Major
>
> Is it possible to append new 

[jira] [Commented] (ARROW-8810) Append to parquet file?

2020-05-15 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108087#comment-17108087
 ] 

Uwe Korn commented on ARROW-8810:
-

Generally, you should see Parquet files as immutable. If you want to change its 
contents, it is almost always simpler and faster to just rewrite them 
completely or (much better) just write a second file and treat a directory of 
Parquet files as a single dataset. This comes down to two major properties:
 * Values in a Parquet file are encoded and compressed. Thus they don't adhere 
to a fixed size per row/value but in some cases a column chunk of a million 
values may be stored in just 64 bytes.
 * The metadata that contains all essential information, e.g. where row groups 
start, what schema the data is, is stored at the end of the file (i.e. the 
footer). Especially the last four bytes are needed as they indicate the start 
position of the footer.

Technically, you could still write code that appends to an existing Parquet 
file but this has the drawbacks that:
 * Writing wouldn't be faster than writing to a second, separate file. It would 
probably be even slower as we need to deserialize the existing metadata and 
serialize it again only with slight modifications.
 * Reading wouldn't be faster than reading from a second file, even when doing 
it sequentially.
 * While append to a Parquet file, the file would be unreadable.
 * If your process crashes during write, all existing data in the Parquet file 
will be lost.
 * It will give the users the impression that you could efficiently insert 
row-by-row to a file. With a columnar data format that can only leverage its 
techniques on large chunks of rows, this would generate a massive overhead.

Still if one would try to implement this, it would work as follows:
 # Read in the footer/metadata of the existing file.
 # Seek to the start position of the existing footer and overwrite it with the 
new data.
 # Merge (or rather concat) the existing metadata with the newly computed 
metadata and write it at the end of the file.

If you would take a look at how a completely fresh Parquet file would be 
written, this is identical except that we wouldn't need to read in and 
overwrite any existing metadata.

> Append to parquet file?
> ---
>
> Key: ARROW-8810
> URL: https://issues.apache.org/jira/browse/ARROW-8810
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Carl Boettiger
>Priority: Major
>
> Is it possible to append new rows to an existing .parquet file using the R 
> client's arrow::write_parquet(), in a manner similar to the `append=TRUE` 
> argument in text-based output formats like write.table()? 
>  
> Apologies as this is perhaps more a question of documentation or user 
> interface, or maybe just my ignorance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8638) Arrow Cython API Usage Gives an error when calling CTable API Endpoints

2020-04-30 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096493#comment-17096493
 ] 

Uwe Korn commented on ARROW-8638:
-

You either need to extend the environment variable `LD_LIBRARY_PATH` to point 
to the directory where `libarrow.so.16` resides or (a bit more complicated in 
setup.py but the preferred approach) set the RPATH on the generated 
`example.so` Python module to also include the directory where `libarrow.so.16` 
reside, see turbodbc for an example: 
https://github.com/blue-yonder/turbodbc/blob/8e2db0d0a26b620ad3e687e56a88fdab3117e09c/setup.py#L186-L189

> Arrow Cython API Usage Gives an error when calling CTable API Endpoints
> ---
>
> Key: ARROW-8638
> URL: https://issues.apache.org/jira/browse/ARROW-8638
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.16.0
> Environment: Ubuntu 20.04 with Python 3.8.2
> RHEL7 with Python 3.6.8
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Blocker
> Fix For: 0.16.0
>
>
> I am working on using both Arrow C++ API and Cython API to support an 
> application that I am developing. But here, I will add the issue I 
> experienced when I am trying to follow the example, 
> [https://arrow.apache.org/docs/python/extending.html]
> I am testing on Ubuntu 20.04 LTS
> Python version 3.8.2
> These are the steps I followed.
>  # Create Virtualenv
> python3 -m venv ENVARROW
>  
> 2. Activate ENV
> source ENVARROW/bin/activate
>  
> 3. pip3 install pyarrow==0.16.0 cython numpy
>  
>  4. Code block and Tools,
>  
> +*example.pyx*+
>  
>  
> {code:java}
> from pyarrow.lib cimport *
> def get_array_length(obj):
>  # Just an example function accessing both the pyarrow Cython API
>  # and the Arrow C++ API
>  cdef shared_ptr[CArray] arr = pyarrow_unwrap_array(obj)
>  if arr.get() == NULL:
>  raise TypeError("not an array")
>  return arr.get().length()
> def get_table_info(obj):
>  cdef shared_ptr[CTable] table = pyarrow_unwrap_table(obj)
>  if table.get() == NULL:
>  raise TypeError("not an table")
>  
>  return table.get().num_columns() 
> {code}
>  
>  
> +*setup.py*+
>  
>  
> {code:java}
> from distutils.core import setup
> from Cython.Build import cythonize
> import os
> import numpy as np
> import pyarrow as pa
> ext_modules = cythonize("example.pyx")
> for ext in ext_modules:
>  # The Numpy C headers are currently required
>  ext.include_dirs.append(np.get_include())
>  ext.include_dirs.append(pa.get_include())
>  ext.libraries.extend(pa.get_libraries())
>  ext.library_dirs.extend(pa.get_library_dirs())
> if os.name == 'posix':
>  ext.extra_compile_args.append('-std=c++11')
> # Try uncommenting the following line on Linux
>  # if you get weird linker errors or runtime crashes
>  #ext.define_macros.append(("_GLIBCXX_USE_CXX11_ABI", "0"))
> setup(ext_modules=ext_modules)
> {code}
>  
>  
> +*arrow_array.py*+
>  
> {code:java}
> import example
> import pyarrow as pa
> import numpy as np
> arr = pa.array([1,2,3,4,5])
> len = example.get_array_length(arr)
> print("Array length {} ".format(len)) 
> {code}
>  
> +*arrow_table.py*+
>  
> {code:java}
> import example
> import pyarrow as pa
> import numpy as np
> from pyarrow import csv
> fn = 'data.csv'
> table = csv.read_csv(fn)
> print(table)
> cols = example.get_table_info(table)
> print(cols)
>  
> {code}
> +*data.csv*+
> {code:java}
> 1,2,3,4,5
> 6,7,8,9,10
> 11,12,13,14,15
> {code}
>  
> +*Makefile*+
>  
> {code:java}
> install: 
> python3 setup.py build_ext --inplace
> clean: 
> rm -R *.so build *.cpp 
> {code}
>  
> **When I try to run either of the python example scripts arrow_table.py or 
> arrow_array.py, 
> I get the following error. 
>  
> {code:java}
> File "arrow_array.py", line 1, in 
>  import example
> ImportError: libarrow.so.16: cannot open shared object file: No such file or 
> directory
> {code}
>  
>  
> *Note: I also checked this on RHEL7 with Python 3.6.8, I got a similar 
> response.* 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8571) [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn updated ARROW-8571:

Description: conda-forge did the switch, so we should follow this.

> [C++] Switch AppVeyor image to VS 2017
> --
>
> Key: ARROW-8571
> URL: https://issues.apache.org/jira/browse/ARROW-8571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>
> conda-forge did the switch, so we should follow this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8571) [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8571:
---

 Summary: [C++] Switch AppVeyor image to VS 2017
 Key: ARROW-8571
 URL: https://issues.apache.org/jira/browse/ARROW-8571
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8395) [Python] conda install pyarrow defaults to 0.11.1 not 0.16.0

2020-04-12 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081654#comment-17081654
 ] 

Uwe Korn commented on ARROW-8395:
-

What does a clean conda environment mean? A clean conda environment would have 
no packages in it but I expect that you mean that you have a full anaconda 
environment here. In that case, this won't work as you cannot mix packages 
between anaconda/defaults and conda-forge.

 

Can you use {{conda create -n test pyarrow}} instead? 

> [Python] conda install pyarrow defaults to 0.11.1 not 0.16.0
> 
>
> Key: ARROW-8395
> URL: https://issues.apache.org/jira/browse/ARROW-8395
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: ubuntu 16, ubuntu 18, anaconda 2020.02 x64
>Reporter: dwang
>Priority: Major
>  Labels: conda, conda-forge, install, pyarrow, python,, version
>
> When install pyarrow in clean linux conda environment (2020.02):
> {code:java}
> conda install -c conda-forge pyarrow
> The following packages will be downloaded:package|
> build
> ---|-
> arrow-cpp-0.11.1   |py37h0e61e49_1004 6.3 MB  conda-forge
> boost-cpp-1.68.0   |h11c811c_100020.5 MB  conda-forge
> conda-4.8.3|   py37hc8dfbb8_1 3.0 MB  conda-forge
> libprotobuf-3.6.1  |hdbcaa40_1001 4.0 MB  conda-forge
> parquet-cpp-1.5.1  |3   3 KB  conda-forge
> pyarrow-0.11.1 |py37hbbcf98d_1002 2.0 MB  conda-forge
> python_abi-3.7 |  1_cp37m   4 KB  conda-forge
> thrift-cpp-0.12.0  |h0a07b25_1002 2.4 MB  conda-forge
> 
>Total:38.2 MB
> {code}
> The default version is pyarrow-0.11.1, while conda repo actually has the 
> latest version 0.16.0 ( [https://anaconda.org/conda-forge/pyarrow] ).
>  
> Specify the version does not help:
> conda install -c conda-forge pyarrow=0.16.0
>  
>  
> Workaround:
> I have to manually download below packages from conda then install them 
> locally:
> arrow-cpp-0.16.0-py37hb0edad2_0.tar.bz2
> aws-sdk-cpp-1.7.164-h1f8afcc_0.tar.bz2
> boost-cpp-1.70.0-h8e57a91_2.tar.bz2
> brotli-1.0.7-he1b5a44_1000.tar.bz2
> c-ares-1.15.0-h516909a_1001.tar.bz2
> gflags-2.2.2-he1b5a44_1002.tar.bz2
> glog-0.4.0-he1b5a44_1.tar.bz2
> grpc-cpp-1.25.0-h213be95_2.tar.bz2
> libprotobuf-3.11.3-h8b12597_0.tar.bz2
> lz4-c-1.8.3-he1b5a44_1001.tar.bz2
> parquet-cpp-1.5.1-1.tar.bz2
> pyarrow-0.16.0-py37h8b68381_1.tar.bz2
> re2-2020.01.01-he1b5a44_0.tar.bz2
> snappy-1.1.8-he1b5a44_1.tar.bz2
> thrift-cpp-0.12.0-hf3afdfd_1004.tar.bz2
> zstd-1.4.4-h3b9ef0a_1.tar.bz2
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8359) [C++/Python] Enable aarch64/ppc64le build in conda recipes

2020-04-07 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077154#comment-17077154
 ] 

Uwe Korn commented on ARROW-8359:
-

[~kszucs] These builds are running on travis.*com* and drone.io, do we already 
have support for them in crossbow?

> [C++/Python] Enable aarch64/ppc64le build in conda recipes
> --
>
> Key: ARROW-8359
> URL: https://issues.apache.org/jira/browse/ARROW-8359
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging, Python
>Reporter: Uwe Korn
>Priority: Major
> Fix For: 0.17.0
>
>
> These two new arches were added in the conda recipes, we should also build 
> them as nightlies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8350) [Python] Implement to_numpy on ChunkedArray

2020-04-06 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-8350.
-
Resolution: Invalid

We already support the {{__array__}} protocol and get the right output there, 
so this is not needed.

> [Python] Implement to_numpy on ChunkedArray
> ---
>
> Key: ARROW-8350
> URL: https://issues.apache.org/jira/browse/ARROW-8350
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe Korn
>Priority: Major
>
> We support {{to_numpy}} on Array instances but not on {{ChunkedArray}} 
> instances. It would be quite useful to have it also there to support 
> returning e.g. non-nanosecond datetime instances.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8359) [C++/Python] Enable aarch64/ppc64le build in conda recipes

2020-04-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8359:
---

 Summary: [C++/Python] Enable aarch64/ppc64le build in conda recipes
 Key: ARROW-8359
 URL: https://issues.apache.org/jira/browse/ARROW-8359
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: Uwe Korn
 Fix For: 0.17.0


These two new arches were added in the conda recipes, we should also build them 
as nightlies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8149) [C++/Python] Enable CUDA Support in conda recipes

2020-04-06 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076921#comment-17076921
 ] 

Uwe Korn commented on ARROW-8149:
-

Yes, PRs are open but there is still discussion.

> [C++/Python] Enable CUDA Support in conda recipes
> -
>
> Key: ARROW-8149
> URL: https://issues.apache.org/jira/browse/ARROW-8149
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Packaging
>Reporter: Uwe Korn
>Priority: Major
> Fix For: 0.17.0
>
>
> See the changes in 
> [https://github.com/conda-forge/arrow-cpp-feedstock/pull/123], we need to 
> copy this into the Arrow repository and also test CUDA in these recipes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8350) [Python] Implement to_numpy on ChunkedArray

2020-04-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8350:
---

 Summary: [Python] Implement to_numpy on ChunkedArray
 Key: ARROW-8350
 URL: https://issues.apache.org/jira/browse/ARROW-8350
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe Korn


We support {{to_numpy}} on Array instances but not on {{ChunkedArray}} 
instances. It would be quite useful to have it also there to support returning 
e.g. non-nanosecond datetime instances.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8288) [Python] Expose with_ modifiers on DataType

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8288:
---

 Summary: [Python] Expose with_ modifiers on DataType
 Key: ARROW-8288
 URL: https://issues.apache.org/jira/browse/ARROW-8288
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 0.17.0


We have several {{WithX}} functions defined on {{DataType}} in C++ but only 
{{WithMetadata}} is yet exposed in Python. We should expose the rest of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8285) [Python][Dataset] ScalarExpression doesn't accept numpy scalars

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8285:
---

 Summary: [Python][Dataset] ScalarExpression doesn't accept numpy 
scalars
 Key: ARROW-8285
 URL: https://issues.apache.org/jira/browse/ARROW-8285
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe Korn


{{pyarrow.dataset.ScalarExpression}} doesn't accept numpy scalars. This would 
be useful as values coming out of {{pandas}} or {{numpy}} are such.

Example:
{code:java}
import pyarrow.dataset as ds
import numpy as np

ds.ScalarExpression(np.int64(2)){code}
{code:java}
---
TypeError Traceback (most recent call last)
 in 
> 1 ds.ScalarExpression(np.int64(2))

~/miniconda3/envs/kartothek/lib/python3.7/site-packages/pyarrow/_dataset.pyx in 
pyarrow._dataset.ScalarExpression.__init__()

TypeError: Not yet supported scalar value: 2 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8284:
---

 Summary: [C++][Dataset] Schema evolution for timestamp columns
 Key: ARROW-8284
 URL: https://issues.apache.org/jira/browse/ARROW-8284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8283:
---

 Summary: [C++/Python][Dataset] Non-existent files are silently 
dropped in pa.dataset.FileSystemDataset
 Key: ARROW-8283
 URL: https://issues.apache.org/jira/browse/ARROW-8283
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset, Python
Reporter: Uwe Korn


When passing a list of files to the constructor of 
{{pyarrow.dataset.FileSystemData}}, all files that don't exist are silently 
dropped immediately (i.e. no fragments are created for them).

Instead, I would expect that fragments will be created for them but an error is 
thrown when one tries to read the fragment with the non-existent file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8282) [C++/Python][Dataset] Support schema evolution for integer columns

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8282:
---

 Summary: [C++/Python][Dataset] Support schema evolution for 
integer columns
 Key: ARROW-8282
 URL: https://issues.apache.org/jira/browse/ARROW-8282
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Uwe Korn


When reading in a dataset where the schema specifies that column X is of type 
{{int64}} but the partition actually contains the data stored in that columns 
as {{int32}}, an upcast should be done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8281) [R] Name collision of arrow.dll on Windows

2020-03-31 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8281:
---

 Summary: [R] Name collision of arrow.dll on Windows
 Key: ARROW-8281
 URL: https://issues.apache.org/jira/browse/ARROW-8281
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, R
Affects Versions: 0.16.0
Reporter: Uwe Korn


Currently we build the R extension for Windows only for CRAN with static 
linkage. For conda-forge, we though want to build it with dynamic linkage to 
{{arrow-cpp}}. Here we come into the issue that the R packages as well as the 
C++ package produces an {{arrow.dll}}. As there is no RPATH equivalent on 
Windows, the dynamic loader cannot find the right relatonship of both and fails 
to load the library.

>From my point of view, the simplest approach here would be to name the R 
>{{arrow.dll}} differently, e.g. {{rarrow.dll}}. Would this be possible?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8148) [Packaging][C++] Add google-cloud-cpp to conda-forge

2020-03-28 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-8148.
-
Resolution: Fixed

> [Packaging][C++] Add google-cloud-cpp to conda-forge
> 
>
> Key: ARROW-8148
> URL: https://issues.apache.org/jira/browse/ARROW-8148
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>
> This is a requirement for ARROW-1231 to be able to move forward



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5176) [Python] Automate formatting of python files

2020-03-26 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067856#comment-17067856
 ] 

Uwe Korn commented on ARROW-5176:
-

Would be very happy with that!

> [Python] Automate formatting of python files
> 
>
> Key: ARROW-5176
> URL: https://issues.apache.org/jira/browse/ARROW-5176
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [Black](https://github.com/ambv/black) is a tool for automatically formatting 
> python code in ways which flake8 and our other linters approve of. Adding it 
> to the project will allow more reliably formatted python code and fill a 
> similar role to {{clang-format}} for c++ and {{cmake-format}} for cmake



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8223) [Python] Schema.from_pandas breaks with pandas nullable integer dtype

2020-03-26 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-8223.
-
Fix Version/s: 0.17.0
 Assignee: Uwe Korn
   Resolution: Duplicate

I fixed this recently in master.

 

[~wesm] I maintain it, it simply works and thus doesn't need that much love 
except for the recent {{ExtensionArray}} fix.

> [Python] Schema.from_pandas breaks with pandas nullable integer dtype
> -
>
> Key: ARROW-8223
> URL: https://issues.apache.org/jira/browse/ARROW-8223
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0, 0.16.0, 0.15.1
> Environment: pyarrow 0.16
>Reporter: Ged Steponavicius
>Assignee: Uwe Korn
>Priority: Minor
>  Labels: easyfix
> Fix For: 0.17.0
>
>
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame([{'int_col':1},
>  {'int_col':2}])
> df['int_col'] = df['int_col'].astype(pd.Int64Dtype())
> schema = pa.Schema.from_pandas(df)
> {code}
> produces ArrowTypeError: Did not pass numpy.dtype object
>  
> However, this works fine 
> {code:java}
> schema = pa.Table.from_pandas(df).schema{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8148) [Packaging][C++] Add google-cloud-cpp to conda-forge

2020-03-26 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067503#comment-17067503
 ] 

Uwe Korn commented on ARROW-8148:
-

PR: [https://github.com/conda-forge/staged-recipes/pull/11134]

> [Packaging][C++] Add google-cloud-cpp to conda-forge
> 
>
> Key: ARROW-8148
> URL: https://issues.apache.org/jira/browse/ARROW-8148
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>
> This is a requirement for ARROW-1231 to be able to move forward



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8148) [Packaging][C++] Add google-cloud-cpp to conda-forge

2020-03-25 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066707#comment-17066707
 ] 

Uwe Korn commented on ARROW-8148:
-

This is more than a single package, we need at least 
[https://github.com/google/crc32c], 
[https://github.com/googleapis/cpp-cmakefiles], 
[https://github.com/googleapis/google-cloud-cpp-common] and 
[https://github.com/googleapis/google-cloud-cpp].

Along the way discovered that we only building static GRPC libs in conda-forge 
whereas we there only want shared libraries: 
[https://github.com/conda-forge/grpc-cpp-feedstock/pull/53]

> [Packaging][C++] Add google-cloud-cpp to conda-forge
> 
>
> Key: ARROW-8148
> URL: https://issues.apache.org/jira/browse/ARROW-8148
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>
> This is a requirement for ARROW-1231 to be able to move forward



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8148) [Packaging][C++] Add google-cloud-cpp to conda-forge

2020-03-25 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn reassigned ARROW-8148:
---

Assignee: Uwe Korn

> [Packaging][C++] Add google-cloud-cpp to conda-forge
> 
>
> Key: ARROW-8148
> URL: https://issues.apache.org/jira/browse/ARROW-8148
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>
> This is a requirement for ARROW-1231 to be able to move forward



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7816) [Integration] Turbodbc fails to compile in the nightly tests

2020-03-24 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-7816.
-
  Assignee: Kouhei Sutou
Resolution: Fixed

This has been resolved in the meantime.

> [Integration] Turbodbc fails to compile in the nightly tests
> 
>
> Key: ARROW-7816
> URL: https://issues.apache.org/jira/browse/ARROW-7816
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
>
> Failing builds:
> - 
> https://circleci.com/gh/ursa-labs/crossbow/8035?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link
> - 
> https://circleci.com/gh/ursa-labs/crossbow/8035?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7871) [Python] Expose more compute kernels

2020-03-24 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065811#comment-17065811
 ] 

Uwe Korn commented on ARROW-7871:
-

I would vouch to expose more kernels there instead of removing them. The main 
intent of having this module is have all kernels in a clear defined namespace 
which is not the top-level {{pyarrow}} one.

You cannot use the {{pyarrow.compute}} module in the Array methods as this 
would introduce a cyclic dependency but you can directly call the C++ methods 
there.

> [Python] Expose more compute kernels
> 
>
> Key: ARROW-7871
> URL: https://issues.apache.org/jira/browse/ARROW-7871
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
>
> Currently only the sum kernel is exposed.
> Or consider to deprecate/remove the pyarrow.compute module, and bind the 
> compute kernels as methods instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5074) [C++/Python] When installing into a SYSTEM prefix, RPATHs are not correctly set

2020-03-24 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-5074.
-
Resolution: Cannot Reproduce

In my local build this seems to work fine.

> [C++/Python] When installing into a SYSTEM prefix, RPATHs are not correctly 
> set
> ---
>
> Key: ARROW-5074
> URL: https://issues.apache.org/jira/browse/ARROW-5074
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging, Python
>Reporter: Uwe Korn
>Priority: Major
>
> When installing the Arrow libraries into a system with a prefix (mostly a 
> conda env), the RPATHs are not correctly set by CMake (there is no RPATH). 
> Thus we need to use {{LD_LIBRARY_PATH}} in consumers. When packages are built 
> using {{conda-build}}, this takes cares of that in its post-processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3391) [Python] Support \0 characters in binary Parquet predicate values

2020-03-24 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065798#comment-17065798
 ] 

Uwe Korn commented on ARROW-3391:
-

Have a look at the failing tests in 
[https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_parquet.py#L1646-L1654]

My problem is that I have a binary column with UUIDs (low entropy), there can 
be a zero-byte at any position inside the ID. When I now filter on this ID, 
e.g. "a\0dfsgjzdsaf" there were some steps that converted the value to C-style 
strings and thus in turn to a simple "a" instead of the whole identifier.

> [Python] Support \0 characters in binary Parquet predicate values
> -
>
> Key: ARROW-3391
> URL: https://issues.apache.org/jira/browse/ARROW-3391
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe Korn
>Priority: Major
>  Labels: dataset, dataset-parquet-read, parquet
>
> As we convert the predicate values of a Parquet filter in some intermediate 
> steps to C-style strings, we currently disallow the use of binary and string 
> predicate values that contain {{\0}} bytes as they would otherwise result in 
> wrong results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3054) [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel

2020-03-24 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065781#comment-17065781
 ] 

Uwe Korn commented on ARROW-3054:
-

[~kszucs] Can you link the correct ticket here and close this?

> [Packaging] Tooling to enable nightly conda packages to be updated to some 
> anaconda.org channel
> ---
>
> Key: ARROW-3054
> URL: https://issues.apache.org/jira/browse/ARROW-3054
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: Phillip Cloud
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: conda
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8175) [Python] Setup type checking with mypy

2020-03-20 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8175:
---

 Summary: [Python] Setup type checking with mypy
 Key: ARROW-8175
 URL: https://issues.apache.org/jira/browse/ARROW-8175
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Reporter: Uwe Korn
Assignee: Uwe Korn


Get mypy checks running, activate things like {{check_untyped_defs}} later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8174) [Python] Refactor context_choices in test_cuda_numba_interop to be a module level fixture

2020-03-20 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8174:
---

 Summary: [Python] Refactor context_choices in 
test_cuda_numba_interop to be a module level fixture
 Key: ARROW-8174
 URL: https://issues.apache.org/jira/browse/ARROW-8174
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe Korn


Instead of being a global variable that is set/unset in 
setup_module/teardown_module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8159) [Python] pyarrow.Schema.from_pandas doesn't support ExtensionDtype

2020-03-19 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8159:
---

 Summary: [Python] pyarrow.Schema.from_pandas doesn't support 
ExtensionDtype
 Key: ARROW-8159
 URL: https://issues.apache.org/jira/browse/ARROW-8159
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Uwe Korn
Assignee: Uwe Korn
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8149) [C++/Python] Enable CUDA Support in conda recipes

2020-03-18 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8149:
---

 Summary: [C++/Python] Enable CUDA Support in conda recipes
 Key: ARROW-8149
 URL: https://issues.apache.org/jira/browse/ARROW-8149
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging
Reporter: Uwe Korn
 Fix For: 0.17.0


See the changes in 
[https://github.com/conda-forge/arrow-cpp-feedstock/pull/123], we need to copy 
this into the Arrow repository and also test CUDA in these recipes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5265) [Python/CI] Add integration test with kartothek

2020-03-10 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn reassigned ARROW-5265:
---

Assignee: Uwe Korn

> [Python/CI] Add integration test with kartothek
> ---
>
> Key: ARROW-5265
> URL: https://issues.apache.org/jira/browse/ARROW-5265
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: parquet
>
> https://github.com/JDASoftwareGroup/kartothek is a heavy user of Apache Arrow 
> and thus a good indicator whether we have introduced some breakages in 
> {{pyarrow}}. Thus we should run regular integration tests against it as we do 
> with other libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8008) [C++/Python] Framework Python is preferred even though not the activated one

2020-03-05 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8008:
---

 Summary: [C++/Python] Framework Python is preferred even though 
not the activated one
 Key: ARROW-8008
 URL: https://issues.apache.org/jira/browse/ARROW-8008
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Uwe Korn
Assignee: Uwe Korn


Currently the framework Python is preferred on macOS eventhough development 
happens in a completely different Python runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8007) [Python] Remove unused and defunct assert_get_object_equal in plasma tests

2020-03-05 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8007:
---

 Summary: [Python] Remove unused and defunct 
assert_get_object_equal in plasma tests
 Key: ARROW-8007
 URL: https://issues.apache.org/jira/browse/ARROW-8007
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Uwe Korn (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Korn resolved ARROW-6766.
-
Resolution: Cannot Reproduce

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2020-02-26 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045812#comment-17045812
 ] 

Uwe Korn commented on ARROW-6766:
-

Thanks for revistiting this!

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
>      pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=0}}
>  {{export PYARROW_WITH_GANDIVA=0}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >