[jira] [Created] (ARROW-8592) [C++] Docs still list LLVM 7 as compiler used

2020-04-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8592:
--

 Summary: [C++] Docs still list LLVM 7 as compiler used
 Key: ARROW-8592
 URL: https://issues.apache.org/jira/browse/ARROW-8592
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
Assignee: Micah Kornfield


should be LLVM 8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data

2020-04-24 Thread Will Jones (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092028#comment-17092028
 ] 

Will Jones commented on ARROW-7706:
---

To add to the idea of write modes, Spark's Dataframe.saveAsTable() method has a 
mode attribute similar to what you're discussing here. Might be a good part of 
their API to imitate.

It includes the modes:
{quote} * ??append??: Append contents of this 
[{{DataFrame}}|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame]
 to existing data.

 * ??overwrite??: Overwrite existing data.

 * ??error?? or ??errorifexists??: Throw an exception if data already exists.

 * ??ignore??: Silently ignore this operation if data already exists.
{quote}
The default is "error": error if destination is not empty.

Reference: 
[https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.saveAsTable]

> [Python] saving a dataframe to the same partitioned location silently doubles 
> the data
> --
>
> Key: ARROW-7706
> URL: https://issues.apache.org/jira/browse/ARROW-7706
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Tsvika Shapira
>Priority: Major
>  Labels: dataset, parquet
>
> When a user saves a dataframe:
> {code:python}
> df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')
> {code}
> it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in 
> {{/tmp/table}}. Each of them will contain one (or more?) parquet files with 
> random filenames.
> If a user runs the same command again, the code will use the existing 
> sub-directories, but with different (random) filenames. As a result, any data 
> loaded from this folder will be wrong - each row will be present twice.
> For example, when using
> {code:python}
> df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')  # 
> second time
> df2 = pd.read_parquet('/tmp/table', engine='pyarrow')
> assert len(df1) == len(df2)  # raise an error{code}
> This is a subtle change in the data that can pass unnoticed.
>  
> I would expect that the code will prevent the user from using an non-empty 
> destination as partitioned target. an overwrite flag can also be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3

2020-04-24 Thread Shawn Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092026#comment-17092026
 ] 

Shawn Li commented on ARROW-8435:
-

Hi Will,

I posted the issue there as well because I'm not sure what the root cause is 
and where it belongs to as the issue occurred while using the 
`write_to_dataset` method of pyarrow. Thank for linking them together. By the 
way, what a small world, I hope you're doing well! 

> [Python] A TypeError is raised while token expires during writing to S3
> ---
>
> Key: ARROW-8435
> URL: https://issues.apache.org/jira/browse/ARROW-8435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Shawn Li
>Priority: Critical
>
> This issue occurs when a STS token expires *in the middle of* writing to S3. 
> An OSError: Write failed: TypeError("'NoneType' object is not 
> subscriptable",) is raised instead of a PermissionError.
>  
> OSError: Write failed: TypeError("'NoneType' object is not subscriptable",)
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, 
> in
>  write_to_dataset write_table(subtable, f, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, 
> in
>  write_table writer.write_table(table, row_group_size=row_group_size)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, 
> in
>  write_table self.writer.write_table(table, row_group_size=row_group_size)
>  File "pyarrow/_parquet.pyx", line 1375, in 
> pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, 
> in
>  pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The 
> provided token has expired.. Detail: Python exception: PermissionError
>  During handling of the above exception, another exception occurred:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in 
> _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: 
> 'NoneType' object is not subscriptable
> environment is:
>  s3fs==0.4.0
>  boto3==1.10.27
>  botocore==1.13.27
>  pyarrow==0.15.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3

2020-04-24 Thread Will Jones (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092024#comment-17092024
 ] 

Will Jones commented on ARROW-8435:
---

This looks to be a bug in s3fs, and the issue is being tracked here: 
[https://github.com/dask/s3fs/issues/314]

> [Python] A TypeError is raised while token expires during writing to S3
> ---
>
> Key: ARROW-8435
> URL: https://issues.apache.org/jira/browse/ARROW-8435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Shawn Li
>Priority: Critical
>
> This issue occurs when a STS token expires *in the middle of* writing to S3. 
> An OSError: Write failed: TypeError("'NoneType' object is not 
> subscriptable",) is raised instead of a PermissionError.
>  
> OSError: Write failed: TypeError("'NoneType' object is not subscriptable",)
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, 
> in
>  write_to_dataset write_table(subtable, f, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, 
> in
>  write_table writer.write_table(table, row_group_size=row_group_size)
>  File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, 
> in
>  write_table self.writer.write_table(table, row_group_size=row_group_size)
>  File "pyarrow/_parquet.pyx", line 1375, in 
> pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, 
> in
>  pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The 
> provided token has expired.. Detail: Python exception: PermissionError
>  During handling of the above exception, another exception occurred:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in 
> _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: 
> 'NoneType' object is not subscriptable
> environment is:
>  s3fs==0.4.0
>  boto3==1.10.27
>  botocore==1.13.27
>  pyarrow==0.15.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8586) [R] installation failure on CentOS 7

2020-04-24 Thread Hei (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091996#comment-17091996
 ] 

Hei commented on ARROW-8586:


Thanks for looking into it.

Here is the output:
{code}
> install.packages("arrow")
Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
Content type 'application/x-gzip' length 242534 bytes (236 KB)
==
downloaded 236 KB

* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
*** Generating code with data-raw/codegen.R
Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory
trying URL 
'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip'
Error in download.file(from_url, to_file, quiet = quietly) : 
  cannot open URL 
'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip'
trying URL 
'https://www.apache.org/dyn/closer.lua?action=download=arrow/arrow-0.17.0/apache-arrow-0.17.0.tar.gz'
Content type 'application/x-gzip' length 6460548 bytes (6.2 MB)
==
downloaded 6.2 MB

*** Successfully retrieved C++ source
*** Building C++ libraries
rm: cannot remove ‘src/*.o’: No such file or directory
*** Building with MAKEFLAGS= -j2 
 cmake
trying URL 
'https://github.com/Kitware/CMake/releases/download/v3.16.2/cmake-3.16.2-Linux-x86_64.tar.gz'
Content type 'application/octet-stream' length 39508533 bytes (37.7 MB)
==
downloaded 37.7 MB

 arrow with 
SOURCE_DIR=/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp 
BUILD_DIR=/tmp/RtmpFm22he/file197f76cef765 DEST_DIR=libarrow/arrow-0.17.0 
CMAKE=/tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake 
++ pwd
+ : /tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow
+ : /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp
+ : /tmp/RtmpFm22he/file197f76cef765
+ : libarrow/arrow-0.17.0
+ : /tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake
++ cd /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp
++ pwd
+ SOURCE_DIR=/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp
++ mkdir -p libarrow/arrow-0.17.0
++ cd libarrow/arrow-0.17.0
++ pwd
+ DEST_DIR=/tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow/libarrow/arrow-0.17.0
+ '[' '' = '' ']'
+ which ninja
+ '[' '' = false ']'
+ mkdir -p /tmp/RtmpFm22he/file197f76cef765
+ pushd /tmp/RtmpFm22he/file197f76cef765
/tmp/RtmpFm22he/file197f76cef765 /tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow
+ /tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake 
-DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF 
-DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON 
-DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON 
-DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF 
-DARROW_WITH_LZ4=OFF -DARROW_WITH_SNAPPY=OFF -DARROW_WITH_ZLIB=OFF 
-DARROW_WITH_ZSTD=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib 
-DCMAKE_INSTALL_PREFIX=/tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow/libarrow/arrow-0.17.0
 -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON 
-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON 
-DOPENSSL_USE_STATIC_LIBS=ON -G 'Unix Makefiles' 
/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp
-- Building using CMake version: 3.16.2
-- The C compiler identification is GNU 8.3.1
-- The CXX compiler identification is GNU 8.3.1
-- Check for working C compiler: /opt/rh/devtoolset-8/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-8/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-8/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-8/root/usr/bin/c++ -- 
works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Arrow version: 0.17.0 (full: '0.17.0')
-- Arrow SO version: 17 (full: 17.0.0)
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1") 
-- clang-tidy not found
-- clang-format not found
-- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN) 
-- infer not found
-- Found Python3: /usr/bin/python3.6 (found version "3.6.8") found components: 
Interpreter 
-- Found cpplint executable at 
/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp/build-support/cpplint.py
-- System processor: x86_64
-- Performing Test CXX_SUPPORTS_SSE4_2
-- Performing Test CXX_SUPPORTS_SSE4_2 - Success
-- Performing Test CXX_SUPPORTS_AVX2
-- Performing Test 

[jira] [Commented] (ARROW-5634) [C#] ArrayData.NullCount should be a property

2020-04-24 Thread Zachary Gramana (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091994#comment-17091994
 ] 

Zachary Gramana commented on ARROW-5634:


[GitHub Pull Request #7032|https://github.com/apache/arrow/pull/7032] now 
properly computes the `NullCount` value and passes it to the `ArrayData` ctor 
in the `Slice` method.

`NullCount` should remain a readonly field, however, in order to preserve 
immutability.

> [C#] ArrayData.NullCount should be a property 
> --
>
> Key: ARROW-5634
> URL: https://issues.apache.org/jira/browse/ARROW-5634
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C#
>Reporter: Prashanth Govindarajan
>Priority: Major
>
> ArrayData.NullCount should be a property so that it can be computed when 
> necessary: for ex: after Slice(), NullCount is -1 and needs to be computed 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5708) [C#] Null support for BooleanArray

2020-04-24 Thread Zachary Gramana (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091987#comment-17091987
 ] 

Zachary Gramana edited comment on ARROW-5708 at 4/25/20, 12:37 AM:
---

In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` 
BooleanArray.Builder yet because there were not any BooleanArray.Builder tests 
in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for 
`BooleanArray.Slice` there or in "BooleanArrayTests.cs".

As a result of adding those tests, and getting them to pass, [GitHub PR 
7032|https://github.com/apache/arrow/pull/7032] also now resolves this issue.


was (Author: gramana):
In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` 
BooleanArray.Builder yet because there were not any BooleanArray.Builder tests 
in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for 
`BooleanArray.Slice` there or in "BooleanArrayTests.cs".

As a result of adding those tests, and getting them to pass, [GitHub PR 
6161|https://github.com/apache/arrow/pull/6121] also now resolves this issue.

> [C#] Null support for BooleanArray
> --
>
> Key: ARROW-5708
> URL: https://issues.apache.org/jira/browse/ARROW-5708
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Eric Erhardt
>Priority: Major
>
> See the conversation 
> [here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and 
> [here|https://github.com/apache/arrow/pull/3574#discussion_r262662083].
> We should add null support for BooleanArray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5708) [C#] Null support for BooleanArray

2020-04-24 Thread Zachary Gramana (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091987#comment-17091987
 ] 

Zachary Gramana commented on ARROW-5708:


In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` 
BooleanArray.Builder yet because there were not any BooleanArray.Builder tests 
in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for 
`BooleanArray.Slice` there or in "BooleanArrayTests.cs".

As a result of adding those tests, and getting them to pass, [GitHub PR 
6161|https://github.com/apache/arrow/pull/6121] also now resolves this issue.

> [C#] Null support for BooleanArray
> --
>
> Key: ARROW-5708
> URL: https://issues.apache.org/jira/browse/ARROW-5708
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Eric Erhardt
>Priority: Major
>
> See the conversation 
> [here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and 
> [here|https://github.com/apache/arrow/pull/3574#discussion_r262662083].
> We should add null support for BooleanArray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4544) [Rust] Read nested JSON structs into StructArrays

2020-04-24 Thread Jonathan Kelley (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091969#comment-17091969
 ] 

Jonathan Kelley commented on ARROW-4544:


Is there a particular direction this would need to take that doesn't follow 
recursion?

I'd like to contribute this feature but if recursion is not the recommended 
way, it would be nice to know up front.

> [Rust] Read nested JSON structs into StructArrays
> -
>
> Key: ARROW-4544
> URL: https://issues.apache.org/jira/browse/ARROW-4544
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Priority: Minor
>
> _Adding this as a separate task as it's a bit involved._
> Add the ability to read in JSON structs that are children of the JSON record 
> being read.
> The main concern here is deeply nested structures, which will require a 
> performant and reusable basic JSON reader before dealing with recursion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091937#comment-17091937
 ] 

Neal Richardson commented on ARROW-8556:


Thanks, that's helpful. So what I see is that when the C++ library builds, 
`cmake` finds the system `zstd` so it opts to use that instead of build it from 
source too. But then when the R package shared library tries to load, it can't 
find it. 

This is beyond my level of C++ competence to debug further, so I'll solicit 
help from someone else.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Karl Dunkle Werner (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091925#comment-17091925
 ] 

Karl Dunkle Werner commented on ARROW-8556:
---

{noformat}
Installing package into ‘/home/karl/test_arrow’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.17.0.tar.gz'
Content type 'application/x-gzip' length 242534 bytes (236 KB)
==
downloaded 236 KB* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
*** Generating code with data-raw/codegen.R
Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory
trying URL 
'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip'
Error in download.file(from_url, to_file, quiet = quietly) : 
  cannot open URL 
'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip'
trying URL 
'https://www.apache.org/dyn/closer.lua?action=download=arrow/arrow-0.17.0/apache-arrow-0.17.0.tar.gz'
Content type 'application/x-gzip' length 6460548 bytes (6.2 MB)
==
downloaded 6.2 MB*** Successfully retrieved C++ source
*** Building C++ libraries
rm: cannot remove 'src/*.o': No such file or directory
*** Building with MAKEFLAGS=  -j4 
 arrow with 
SOURCE_DIR=/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp 
BUILD_DIR=/tmp/RtmptP2CaW/file476e6fba345b DEST_DIR=libarrow/arrow-0.17.0 
CMAKE=/usr/bin/cmake 
++ pwd
+ : /tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow
+ : /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp
+ : /tmp/RtmptP2CaW/file476e6fba345b
+ : libarrow/arrow-0.17.0
+ : /usr/bin/cmake
++ cd /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp
++ pwd
+ SOURCE_DIR=/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp
++ mkdir -p libarrow/arrow-0.17.0
++ cd libarrow/arrow-0.17.0
++ pwd
+ DEST_DIR=/tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow/libarrow/arrow-0.17.0
+ '[' '' = '' ']'
+ which ninja
+ CMAKE_GENERATOR=Ninja
+ '[' false = false ']'
+ ARROW_JEMALLOC=ON
+ ARROW_WITH_BROTLI=ON
+ ARROW_WITH_BZ2=ON
+ ARROW_WITH_LZ4=ON
+ ARROW_WITH_SNAPPY=ON
+ ARROW_WITH_ZLIB=ON
+ ARROW_WITH_ZSTD=ON
+ mkdir -p /tmp/RtmptP2CaW/file476e6fba345b
+ pushd /tmp/RtmptP2CaW/file476e6fba345b
/tmp/RtmptP2CaW/file476e6fba345b /tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow
+ /usr/bin/cmake -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF 
-DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON 
-DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_DEPENDENCY_SOURCE=AUTO 
-DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON -DARROW_JSON=ON -DARROW_PARQUET=ON 
-DARROW_WITH_BROTLI=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON 
-DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON 
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib 
-DCMAKE_INSTALL_PREFIX=/tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow/libarrow/arrow-0.17.0
 -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON 
-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON 
-DOPENSSL_USE_STATIC_LIBS=ON -G Ninja 
/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp
-- Building using CMake version: 3.13.4
-- The C compiler identification is GNU 9.2.1
-- The CXX compiler identification is GNU 9.2.1
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Arrow version: 0.17.0 (full: '0.17.0')
-- Arrow SO version: 17 (full: 17.0.0)
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
-- clang-tidy not found
-- clang-format not found
-- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN) 
-- infer not found
-- Found Python3: /usr/bin/python3.7 (found version "3.7.5") found components:  
Interpreter 
-- Using ccache: /usr/bin/ccache
-- Found cpplint executable at 
/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp/build-support/cpplint.py
-- System processor: x86_64
-- Performing Test CXX_SUPPORTS_SSE4_2
-- Performing Test CXX_SUPPORTS_SSE4_2 - Success
-- Performing Test CXX_SUPPORTS_AVX2
-- Performing Test CXX_SUPPORTS_AVX2 - Success
-- Performing Test CXX_SUPPORTS_AVX512
-- Performing Test CXX_SUPPORTS_AVX512 - Success
-- Arrow build warning level: PRODUCTION
Using ld linker
Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
-- Using AUTO approach to find dependencies

[jira] [Commented] (ARROW-6603) [C#] ArrayBuilder API to support writing nulls

2020-04-24 Thread Zachary Gramana (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091913#comment-17091913
 ] 

Zachary Gramana commented on ARROW-6603:


I came across this conversation late, and _after_ implementing an alternative 
approach which much more in-line with other Arrow implementations.

I have submitted [GitHub Pull Request 
#7032|https://github.com/apache/arrow/pull/7032] which includes:
 * A newly added interface member, `AppendNull`, along with implementations for 
`PrimitiveArrayBuilder` and `Binary.BuilderBase`.
 * Additional work to finish the previously stubbed support for 
`NullBitmapBuffer` in a few of the specialized `Array` classes.
 * Several new and expanded tests.

> [C#] ArrayBuilder API to support writing nulls
> --
>
> Key: ARROW-6603
> URL: https://issues.apache.org/jira/browse/ARROW-6603
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Eric Erhardt
>Assignee: Anthony Abate
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Time Spent: 3h 10m
>  Remaining Estimate: 68h 50m
>
> There is currently no API in the PrimitiveArrayBuilder class to support 
> writing nulls.  See this TODO - 
> [https://github.com/apache/arrow/blob/1515fe10c039fb6685df2e282e2e888b773caa86/csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs#L101.]
>  
> Also see [https://github.com/apache/arrow/issues/5381].
>  
> We should add some APIs to support writing nulls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091909#comment-17091909
 ] 

Antoine Pitrou commented on ARROW-8587:
---

I suppose so. I had no idea that gRPC required zlib (probably for optional 
compression, though we don't use it).

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6718) [Rust] packed_simd requires nightly

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6718:
--
Labels: pull-request-available  (was: )

> [Rust] packed_simd requires nightly 
> 
>
> Key: ARROW-6718
> URL: https://issues.apache.org/jira/browse/ARROW-6718
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See [https://github.com/rust-lang/rfcs/pull/2366] for more info on 
> stabilization of this crate.
>  
> {code:java}
> error[E0554]: `#![feature]` may not be used on the stable release channel
>--> 
> /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1
> |
> 202 | / #![feature(
> 203 | | repr_simd,
> 204 | | const_fn,
> 205 | | platform_intrinsics,
> ...   |
> 215 | | custom_inner_attributes
> 216 | | )]
> | |__^
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091890#comment-17091890
 ] 

Neal Richardson commented on ARROW-8556:


Maybe it's something about 19.10, maybe it's something about your particular 
setup, or maybe it's a more general issue. To debug, I'd recommend setting 
`ARROW_R_DEV=true` (for verbosity), `LIBARROW_BINARY=false` (to ensure that we 
build from source), and `LIBARROW_MINIMAL=false` (so that it turns on zstd) and 
reinstalling. Then attach here the full installation logs, and I can try to 
sift through them. Then I may have some other ideas of things to try. Thanks 
for your help!

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091888#comment-17091888
 ] 

Wes McKinney commented on ARROW-8587:
-

Is the zlib dependency coming from gRPC? It shouldn't be necessary to add 
{{ARROW_WITH_ZLIB=ON}} here (so this should be enabled automatically if it's 
needed)

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091885#comment-17091885
 ] 

Chengxin Ma commented on ARROW-8587:


Adding {{-DARROW_WITH_ZLIB=ON}} solved this problem.

(I was expecting that the build system could find zlib on my system 
automatically so I didn't set this flag.)

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8591) [Rust] Reverse lookup for a key in DictionaryArray

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8591:
--
Labels: pull-request-available  (was: )

> [Rust] Reverse lookup for a key in DictionaryArray
> --
>
> Key: ARROW-8591
> URL: https://issues.apache.org/jira/browse/ARROW-8591
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there is no way to do a reverse lookup for DictionaryArray. A 
> reverse lookup would be beneficial. (Enables creation of combiner masks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion

2020-04-24 Thread Mark Hildreth (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hildreth updated ARROW-8590:
-
Description: ARROW-8287 added some new utility methods for pretty printing 
into the rust arrow crate (see [PR 
6972|https://github.com/apache/arrow/pull/6972]). These were basically copied 
from DataFusion. Modify DataFusion to use the utility methods in the arrow 
crate, removing the duplicate code.  (was: ARROW-8287 added some new utility 
methods for pretty printing into the rust arrow crate (see [PR 
6972|https://github.com/apache/arrow/pull/6972]). These were basically pulled 
from DataFusion. Modify DataFusion to use the utility methods in the arrow 
crate, removing the duplicate code.)

> [Rust] Use Arrow pretty print utility in DataFusion
> ---
>
> Key: ARROW-8590
> URL: https://issues.apache.org/jira/browse/ARROW-8590
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Mark Hildreth
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-8287 added some new utility methods for pretty printing into the rust 
> arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These 
> were basically copied from DataFusion. Modify DataFusion to use the utility 
> methods in the arrow crate, removing the duplicate code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8591) [Rust] Reverse lookup for a key in DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-8591:
---

 Summary: [Rust] Reverse lookup for a key in DictionaryArray
 Key: ARROW-8591
 URL: https://issues.apache.org/jira/browse/ARROW-8591
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Currently, there is no way to do a reverse lookup for DictionaryArray. A 
reverse lookup would be beneficial. (Enables creation of combiner masks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8590:
--
Labels: pull-request-available  (was: )

> [Rust] Use Arrow pretty print utility in DataFusion
> ---
>
> Key: ARROW-8590
> URL: https://issues.apache.org/jira/browse/ARROW-8590
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Mark Hildreth
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-8287 added some new utility methods for pretty printing into the rust 
> arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These 
> were basically pulled from DataFusion. Modify DataFusion to use the utility 
> methods in the arrow crate, removing the duplicate code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091870#comment-17091870
 ] 

Antoine Pitrou commented on ARROW-8587:
---

I don't see that error myself. Can you try to pass {{-DARROW_WITH_ZSTD=on}} ?

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion

2020-04-24 Thread Mark Hildreth (Jira)
Mark Hildreth created ARROW-8590:


 Summary: [Rust] Use Arrow pretty print utility in DataFusion
 Key: ARROW-8590
 URL: https://issues.apache.org/jira/browse/ARROW-8590
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Mark Hildreth


ARROW-8287 added some new utility methods for pretty printing into the rust 
arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These 
were basically pulled from DataFusion. Modify DataFusion to use the utility 
methods in the arrow crate, removing the duplicate code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8575) [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8575.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7028
[https://github.com/apache/arrow/pull/7028]

> [Developer] Add issue_comment workflow to rebase a PR
> -
>
> Key: ARROW-8575
> URL: https://issues.apache.org/jira/browse/ARROW-8575
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxin Ma reopened ARROW-8587:


> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091863#comment-17091863
 ] 

Chengxin Ma commented on ARROW-8587:


Thanks for the quick fix.

Unfortunately I still see the following error messages:
{code}
[ 96%] Linking CXX executable ../../../release/arrow-flight-perf-server
../../../release/libarrow_flight.so.18.0.0: undefined reference to 
`inflateInit2_'
../../../release/libarrow_flight.so.18.0.0: undefined reference to `inflate'
../../../release/libarrow_flight.so.18.0.0: undefined reference to 
`deflateInit2_'
../../../release/libarrow_flight.so.18.0.0: undefined reference to `deflate'
../../../release/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
../../../release/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
collect2: error: ld returned 1 exit status
src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:156: recipe 
for target 'release/arrow-flight-perf-server' failed
make[2]: *** [release/arrow-flight-perf-server] Error 1
CMakeFiles/Makefile2:2648: recipe for target 
'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
{code}

This seems to be a problem related to {{zlib}}. On my computer it is the latest 
version: {{zlib1g-dev is already the newest version (1:1.2.11.dfsg-0ubuntu2).}}

I guess this issue is still related to {{ThirdpartyToolchain.cmake}}?

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?

[jira] [Commented] (ARROW-7244) [Python] Inconsistent behavior with reading in S3 parquet objects

2020-04-24 Thread Harini Kannan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091841#comment-17091841
 ] 

Harini Kannan commented on ARROW-7244:
--

Any update on this ? I'm seeing the same error pop up randomly when I have a 
lambda function triggering on new parquet files in an S3 bucket which reads the 
parquet files using ParquetDataset(). Or is there any workaround for this ?

> [Python] Inconsistent behavior with reading in S3 parquet objects
> -
>
> Key: ARROW-7244
> URL: https://issues.apache.org/jira/browse/ARROW-7244
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
> Environment: running in a lambda, compiled on an EC2 using linux
>Reporter: William Tardio
>Priority: Major
>
> We are piloting using pyarrow to reaching parquet files from AWS S3.
>  
> We got it working in combination with s3fs as the filesystem. However, we are 
> seeing very inconsistent results when reading in parquet objects with
> s3=s3fs.S3FileSystem()
> ParquetDataset(url, filesystem=s3)
>  
> The read inconsistently throws this error:
>  
> [ERROR] OSError: Passed non-file path: 
> s3://bucket/schedule/sxaup/fms_db_aub/adn_master/trunc/20191122024436.parquet
> Traceback (most recent call last):
>   File "/var/task/file_check.py", line 35, in lambda_handler
> main(event, context)
>   File "/var/task/file_check.py", line 260, in main
> validate_resp['object_type'])
>   File "/opt/python/utils.py", line 80, in schema_check
> stage_pya_dataset = ParquetDataset(full_URL_stage, filesystem=s3)
>   File "/opt/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 
> 1030, in __init__
> open_file_func=partial(_open_dataset_file, self._metadata)
>   File "/opt/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 
> 1229, in _make_manifest
> .format(path))
>  
> As you can see, the path is valid and sometimes works, others times does not 
> (no modification of the file between those successful and error runs). Does 
> ParquetDataset actually open the file and validate it and so the error is in 
> regards to the data?
>  
> Willing to do any troubleshooting for get this solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7759) [C++][Dataset] Add CsvFileFormat for CSV support

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7759:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] Add CsvFileFormat for CSV support
> 
>
> Key: ARROW-7759
> URL: https://issues.apache.org/jira/browse/ARROW-7759
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This should be a minimal implementation that binds 1-1 file and ScanTask for 
> now. Streaming optimizations  can be done in ARROW-3410.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8587.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7031
[https://github.com/apache/arrow/pull/7031]

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8589) ModuleNotFoundError: No module named 'pyarrow._orc'

2020-04-24 Thread ryan (Jira)
ryan created ARROW-8589:
---

 Summary: ModuleNotFoundError: No module named 'pyarrow._orc'
 Key: ARROW-8589
 URL: https://issues.apache.org/jira/browse/ARROW-8589
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.0, 0.16.0, 0.15.1, 0.15.0, 0.14.1, 0.14.0
 Environment: I am on a mac, mojave version 10.14.6 is the os version

python 3.6.10

I am using a conda env, but I actually needed to use pip to install all the 
packages including pyarrow.
Reporter: ryan


When using verion 0.17.0 this error happens when I try to `import pyarrow.orc 
as orc`
{code:java}
Traceback (most recent call last):
  File "", line 971, in _find_and_load
  File "", line 955, in _find_and_load_unlocked
  File "", line 665, in _load_unlocked
  File "", line 678, in exec_module
  File "", line 219, in _call_with_frames_removed
  File "/Users/ryconnolly/code/source-syncer/sourcesyncer/s3_source_syncer.py", 
line 9, in 
import pyarrow.orc as orc
  File 
"/Users/ryconnolly/anaconda3/envs/source-syncer/lib/python3.6/site-packages/pyarrow/orc.py",
 line 24, in 
import pyarrow._orc as _orc
ModuleNotFoundError: No module named 'pyarrow._orc'{code}

the current work around is to pin the version to 0.13.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8588) `driver` param removed from `hdfs.connect()`

2020-04-24 Thread Jack Fan (Jira)
Jack Fan created ARROW-8588:
---

 Summary: `driver` param removed from `hdfs.connect()`
 Key: ARROW-8588
 URL: https://issues.apache.org/jira/browse/ARROW-8588
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.0
Reporter: Jack Fan


Hi,

It appears in ARROW-7863 the `driver` param was removed from `hdfs.connect()` 
function. However, if I understand it correctly, ARROW-7863 should only remove 
`libhdfs3` related tests, not disabling it entirely.

If I instantiate `pyarrow.HadoopFileSystem` class directly, it is still able to 
take in the `driver` param.

Can the arrow project check whether this change in API is intended?

Also, even if it is intended, this is a breaking change and deserves some 
documentation around it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8557) [Python] from pyarrow import parquet fails with AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__'

2020-04-24 Thread Hal T (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091791#comment-17091791
 ] 

Hal T commented on ARROW-8557:
--

No, this is in a jupyter notebook on a debian 8 environment, using 3.6.4

> [Python] from pyarrow import parquet fails with AttributeError: type object 
> 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__'
> --
>
> Key: ARROW-8557
> URL: https://issues.apache.org/jira/browse/ARROW-8557
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1, 0.16.0, 0.17.0
> Environment: Python 3.8.4, GCC 4.8.4, Debian 8
>Reporter: Hal T
>Priority: Major
>
> I have tried versions 0.15.1, 0.16.0, 0.17.0. Same error on all. I've seen in 
> other issues that co-installations of tensorflow and numpy might be causing 
> issues. I have tensorflow==1.14.0 and numpy==1.16.4 (and many other 
> libraries, but I've read that those tend to cause issues)
>  
> {{}}
>  
> {code:java}
> from pyarrow import parquet
>  
> ~/python/lib/python3.6/site-packages/pyarrow/parquet.py in 
>  32 import pyarrow as pa
>  33 import pyarrow.lib as lib
> ---> 34 import pyarrow._parquet as _parquet
>  35 
>  36 from pyarrow._parquet import (ParquetReader, Statistics, # noqa
> ~/python/lib/python3.6/site-packages/pyarrow/_parquet.pyx in init 
> pyarrow._parquet()
>  
> AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute 
> '__reduce_cython__'
> {code}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8587:
--
Labels: pull-request-available  (was: )

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091776#comment-17091776
 ] 

Antoine Pitrou commented on ARROW-8587:
---

By the way, you should build the benchmarks in release mode, not debug.

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-8587:
-

Assignee: Antoine Pitrou

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Assignee: Antoine Pitrou
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091770#comment-17091770
 ] 

Antoine Pitrou commented on ARROW-8587:
---

I've bisected and the culprit is ARROW-7869.

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7869) [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091772#comment-17091772
 ] 

Antoine Pitrou commented on ARROW-7869:
---

This seems to have caused ARROW-8587.

> [Python] Boost::system and boost::filesystem not necessary anymore in Python 
> wheels
> ---
>
> Key: ARROW-7869
> URL: https://issues.apache.org/jira/browse/ARROW-7869
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unfortunately it seems we still need boost::regex due to Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091761#comment-17091761
 ] 

Chengxin Ma commented on ARROW-8587:


Additional information: I still saw this error after rolling back the code base 
by: {{git checkout apache-arrow-0.17.0}}

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-8587:
--
Affects Version/s: (was: 1.0.0)
   0.17.0

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 0.17.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091743#comment-17091743
 ] 

Antoine Pitrou commented on ARROW-8587:
---

Weirdly, I get the same error now on Ubuntu 18.04. I used to be able to build 
it, so something broke along the way.

> Compilation error when linking arrow-flight-perf-server
> ---
>
> Key: ARROW-8587
> URL: https://issues.apache.org/jira/browse/ARROW-8587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++, FlightRPC
>Affects Versions: 1.0.0
> Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
> 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Chengxin Ma
>Priority: Minor
>
> I wanted to play around with Flight benchmark after seeing the discussion 
> regarding Flight's throughput in arrow dev mailing list today.
> I met the following error when trying to build the benchmark from latest 
> source code:
> {code:java}
> [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::canonical(boost::filesystem::path const&, 
> boost::filesystem::path const&, boost::system::error_code*)'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::system_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::parent_path() const'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::system::generic_category()'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::detail::current_path(boost::system::error_code*)'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `inflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to 
> `deflateInit2_'
> ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
> ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
> `boost::filesystem::path::operator/=(boost::filesystem::path const&)'
> collect2: error: ld returned 1 exit status
> src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: 
> recipe for target 'debug/arrow-flight-perf-server' failed
> make[2]: *** [debug/arrow-flight-perf-server] Error 1
> CMakeFiles/Makefile2:2609: recipe for target 
> 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
> make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
> Error 2
> Makefile:140: recipe for target 'all' failed
> make: *** [all] Error 2
> {code}
> I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug 
> -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
> -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
>  I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
> output, but the Boost library that I installed from the package manger was of 
> this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
> PS:
> I was able to build the benchmark 
> [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
> the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
> similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate

2020-04-24 Thread Mark Hildreth (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091739#comment-17091739
 ] 

Mark Hildreth edited comment on ARROW-8559 at 4/24/20, 5:00 PM:


Generally in favor, but one question and one bikeshed:

Question: perhaps my Rust-fu is lacking, but why would we also need a 
{{SendableBatchIterator}}? If we want to make sure that a type marks itself 
{{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only 
{{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}.

Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either 
{{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} 
seems a bit misleading.


was (Author: markhildreth):
Generally in favor, but one question and one bikeshed:

Question: perhaps my Rust-fu is lacking, but why would we need a 
{{SendableBatchIterator}}? If we want to make sure that a type marks itself 
{{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only 
{{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}.

Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either 
{{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} 
seems a bit misleading.

> [Rust] Consolidate Record Batch iterator traits in main arrow crate
> ---
>
> Key: ARROW-8559
> URL: https://issues.apache.org/jira/browse/ARROW-8559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` 
> trait in the main arrow crate.
> They differ in that `BatchIterator` is Send + Sync.  They should both be in 
> the Arrow crate and be named `BatchIterator` and `SendableBatchIterator`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate

2020-04-24 Thread Mark Hildreth (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091739#comment-17091739
 ] 

Mark Hildreth commented on ARROW-8559:
--

Generally in favor, but one question and one bikeshed:

Question: perhaps my Rust-fu is lacking, but why would we need a 
{{SendableBatchIterator}}? If we want to make sure that a type marks itself 
{{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only 
{{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}.

Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either 
{{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} 
seems a bit misleading.

> [Rust] Consolidate Record Batch iterator traits in main arrow crate
> ---
>
> Key: ARROW-8559
> URL: https://issues.apache.org/jira/browse/ARROW-8559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` 
> trait in the main arrow crate.
> They differ in that `BatchIterator` is Send + Sync.  They should both be in 
> the Arrow crate and be named `BatchIterator` and `SendableBatchIterator`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8580) Pyarrow exceptions are not helpful

2020-04-24 Thread Soroush Radpour (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soroush Radpour updated ARROW-8580:
---
Description: 
I'm trying to understand an exception in the code using pyarrow, and it is not 
very helpful.

File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open
 File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
 OSError: IOError: b'Service Unavailable'. Detail: Python exception: 
RuntimeError
  
  It would be great if each of the three exceptions was unwrapped with full 
stack trace and error messages that came with it.

  was:
I'm trying to understand an exception in the code using pyarrow, and it is not 
very helpful.
 {{  File "pyarrow/_parquet.pyx", line 1036, in 
pyarrow._parquet.ParquetReader.open
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: IOError: b'Service Unavailable'. Detail: Python exception: 
RuntimeError}}
 
 It would be great if each of the three exceptions was unwrapped with full 
stack trace and error messages that came with it. 


> Pyarrow exceptions are not helpful
> --
>
> Key: ARROW-8580
> URL: https://issues.apache.org/jira/browse/ARROW-8580
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Soroush Radpour
>Priority: Major
>
> I'm trying to understand an exception in the code using pyarrow, and it is 
> not very helpful.
> File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open
>  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
>  OSError: IOError: b'Service Unavailable'. Detail: Python exception: 
> RuntimeError
>   
>   It would be great if each of the three exceptions was unwrapped with full 
> stack trace and error messages that came with it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-8587:
--

 Summary: Compilation error when linking arrow-flight-perf-server
 Key: ARROW-8587
 URL: https://issues.apache.org/jira/browse/ARROW-8587
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++, FlightRPC
Affects Versions: 1.0.0
 Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Chengxin Ma


I wanted to play around with Flight benchmark after seeing the discussion 
regarding Flight's throughput in arrow dev mailing list today.

I met the following error when trying to build the benchmark from latest source 
code:
{code:java}
[ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::detail::canonical(boost::filesystem::path const&, 
boost::filesystem::path const&, boost::system::error_code*)'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::system::system_category()'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::path::parent_path() const'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::system::generic_category()'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::detail::current_path(boost::system::error_code*)'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateInit2_'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateInit2_'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::path::operator/=(boost::filesystem::path const&)'
collect2: error: ld returned 1 exit status
src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: recipe 
for target 'debug/arrow-flight-perf-server' failed
make[2]: *** [debug/arrow-flight-perf-server] Error 1
CMakeFiles/Makefile2:2609: recipe for target 
'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

{code}
I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug -DARROW_DEPENDENCY_SOURCE=AUTO 
-DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
-DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
 I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
output, but the Boost library that I installed from the package manger was of 
this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?

PS:
I was able to build the benchmark 
[before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Karl Dunkle Werner (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091700#comment-17091700
 ] 

Karl Dunkle Werner commented on ARROW-8556:
---

Great!

If you want to get to the bottom of it, I would be happy to run commands you 
send me. I think most 19.10 users will be moving to 20.04 soon, so this might 
only be worth it if 20.04 experiences the same issue.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device

2020-04-24 Thread Tanveer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523
 ] 

Tanveer edited comment on ARROW-8577 at 4/24/20, 4:10 PM:
--

Hi Kouhei,

This is the program. I am taking a RecordBatch (batch_genomics) as input in 
this function. The error arises at:

gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, );

 
{code:java}
guint8 id_arr[20]; 
genRandom(id_arr,20);
char objID_file[] = "/home/tahmad/lib/core/objID.txt";

g_print("obj_id: %s\n", id_arr);

 gboolean success = TRUE;
 GError *error = NULL;

 GPlasmaClient *gPlasmaClient;
 GPlasmaObjectID *object_id;
 GPlasmaClientCreateOptions *create_options;
 GPlasmaClientOptions *gplasmaClient_options;
 GPlasmaCreatedObject *Object;
 GPlasmaReferredObject *refObject;
 GArrowBuffer *arrowBuffer;

 arrowBuffer = GSerializeRecordBatch(batch_genomics);
 gint32 size = garrow_buffer_get_size(arrowBuffer);

 gplasmaClient_options = gplasma_client_options_new();
 gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, 
);
 object_id = gplasma_object_id_new(id_arr, 20, );
 create_options = gplasma_client_create_options_new();
 {
 guint8 metadata[] = "metadata";
 gplasma_client_create_options_set_metadata(create_options, (const guint8 
*)metadata, sizeof(metadata));
 }
 Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, 
);

 g_object_unref(create_options);
 {
 GArrowBuffer *data;
 guint8 dataW[] = "data";
 g_object_get(Object, "data", , NULL);
 garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, 
garrow_buffer_get_databytes(arrowBuffer),size,);
 g_object_unref(data);
 }

 gplasma_created_object_seal(Object, );
 g_object_unref(Object);
 gplasma_client_disconnect(gPlasmaClient, );
 g_object_unref(gPlasmaClient);{code}
 

I am using this function to convert Arrow RecordBatch to ArrowBuffer:

 
{code:java}
extern "C" GArrowBuffer * 
GSerializeRecordBatchToBuffer(GArrowRecordBatch *record_batch)
{
 
const auto arrow_record_batch = garrow_record_batch_get_raw(record_batch);
std::shared_ptr resizable_buffer;

arrow::AllocateResizableBuffer(arrow::default_memory_pool(), 0, 
_buffer);
std::shared_ptr buffer = 
std::dynamic_pointer_cast(resizable_buffer);

arrow::ipc::SerializeRecordBatch(*arrow_record_batch, 
arrow::default_memory_pool(), );
return garrow_buffer_new_raw();
}
{code}
 


was (Author: tahmad):
Hi Kouhei,

This is the program. I am taking a RecordBatch (batch_genomics) as input in 
this function. The error arises at:

gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, );

 
{code:java}
guint8 id_arr[20]; 
genRandom(id_arr,20);
char objID_file[] = "/home/tahmad/lib/core/objID.txt";

g_print("obj_id: %s\n", id_arr);

 gboolean success = TRUE;
 GError *error = NULL;

 GPlasmaClient *gPlasmaClient;
 GPlasmaObjectID *object_id;
 GPlasmaClientCreateOptions *create_options;
 GPlasmaClientOptions *gplasmaClient_options;
 GPlasmaCreatedObject *Object;
 GPlasmaReferredObject *refObject;
 GArrowBuffer *arrowBuffer;

 arrowBuffer = GSerializeRecordBatch(batch_genomics);
 gint32 size = garrow_buffer_get_size(arrowBuffer);

 gplasmaClient_options = gplasma_client_options_new();
 gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, 
);
 object_id = gplasma_object_id_new(id_arr, 20, );
 create_options = gplasma_client_create_options_new();
 {
 guint8 metadata[] = "metadata";
 gplasma_client_create_options_set_metadata(create_options, (const guint8 
*)metadata, sizeof(metadata));
 }
 Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, 
);

 g_object_unref(create_options);
 {
 GArrowBuffer *data;
 guint8 dataW[] = "data";
 g_object_get(Object, "data", , NULL);
 garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, 
garrow_buffer_get_databytes(arrowBuffer),size,);
 g_object_unref(data);
 }

 gplasma_created_object_seal(Object, );
 g_object_unref(Object);
 gplasma_client_disconnect(gPlasmaClient, );
 g_object_unref(gPlasmaClient);{code}

> [GLib][Plasma] gplasma_client_options_new() default settings are enabling a 
> check for CUDA device
> -
>
> Key: ARROW-8577
> URL: https://issues.apache.org/jira/browse/ARROW-8577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Tanveer
>Assignee: Kouhei Sutou
>Priority: Major
>
> Hi all,
>  Previously, I was using c_glib Plasma library (build 0.12) for creating 
> plasma objects. It was working as expected. But now I want to use Arrow's 
> newest build.  I incurred the following error:
>  
> /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on 
> an error: IOError: Cuda error 100 in function 'cuInit': 
> 

[jira] [Comment Edited] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device

2020-04-24 Thread Tanveer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523
 ] 

Tanveer edited comment on ARROW-8577 at 4/24/20, 4:07 PM:
--

Hi Kouhei,

This is the program. I am taking a RecordBatch (batch_genomics) as input in 
this function. The error arises at:

gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, );

 
{code:java}
guint8 id_arr[20]; 
genRandom(id_arr,20);
char objID_file[] = "/home/tahmad/lib/core/objID.txt";

g_print("obj_id: %s\n", id_arr);

 gboolean success = TRUE;
 GError *error = NULL;

 GPlasmaClient *gPlasmaClient;
 GPlasmaObjectID *object_id;
 GPlasmaClientCreateOptions *create_options;
 GPlasmaClientOptions *gplasmaClient_options;
 GPlasmaCreatedObject *Object;
 GPlasmaReferredObject *refObject;
 GArrowBuffer *arrowBuffer;

 arrowBuffer = GSerializeRecordBatch(batch_genomics);
 gint32 size = garrow_buffer_get_size(arrowBuffer);

 gplasmaClient_options = gplasma_client_options_new();
 gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, 
);
 object_id = gplasma_object_id_new(id_arr, 20, );
 create_options = gplasma_client_create_options_new();
 {
 guint8 metadata[] = "metadata";
 gplasma_client_create_options_set_metadata(create_options, (const guint8 
*)metadata, sizeof(metadata));
 }
 Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, 
);

 g_object_unref(create_options);
 {
 GArrowBuffer *data;
 guint8 dataW[] = "data";
 g_object_get(Object, "data", , NULL);
 garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, 
garrow_buffer_get_databytes(arrowBuffer),size,);
 g_object_unref(data);
 }

 gplasma_created_object_seal(Object, );
 g_object_unref(Object);
 gplasma_client_disconnect(gPlasmaClient, );
 g_object_unref(gPlasmaClient);{code}


was (Author: tahmad):
Hi Kouhei,

This the program. I am taking a RecordBatch (batch_genomics) as input in this 
function. The error arises at:

gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, );

 
{code:java}
guint8 id_arr[20]; 
genRandom(id_arr,20);
char objID_file[] = "/home/tahmad/lib/core/objID.txt";

g_print("obj_id: %s\n", id_arr);

 gboolean success = TRUE;
 GError *error = NULL;

 GPlasmaClient *gPlasmaClient;
 GPlasmaObjectID *object_id;
 GPlasmaClientCreateOptions *create_options;
 GPlasmaClientOptions *gplasmaClient_options;
 GPlasmaCreatedObject *Object;
 GPlasmaReferredObject *refObject;
 GArrowBuffer *arrowBuffer;

 arrowBuffer = GSerializeRecordBatch(batch_genomics);
 gint32 size = garrow_buffer_get_size(arrowBuffer);

 gplasmaClient_options = gplasma_client_options_new();
 gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, 
);
 object_id = gplasma_object_id_new(id_arr, 20, );
 create_options = gplasma_client_create_options_new();
 {
 guint8 metadata[] = "metadata";
 gplasma_client_create_options_set_metadata(create_options, (const guint8 
*)metadata, sizeof(metadata));
 }
 Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, 
);

 g_object_unref(create_options);
 {
 GArrowBuffer *data;
 guint8 dataW[] = "data";
 g_object_get(Object, "data", , NULL);
 garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, 
garrow_buffer_get_databytes(arrowBuffer),size,);
 g_object_unref(data);
 }

 gplasma_created_object_seal(Object, );
 g_object_unref(Object);
 gplasma_client_disconnect(gPlasmaClient, );
 g_object_unref(gPlasmaClient);{code}

> [GLib][Plasma] gplasma_client_options_new() default settings are enabling a 
> check for CUDA device
> -
>
> Key: ARROW-8577
> URL: https://issues.apache.org/jira/browse/ARROW-8577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Tanveer
>Assignee: Kouhei Sutou
>Priority: Major
>
> Hi all,
>  Previously, I was using c_glib Plasma library (build 0.12) for creating 
> plasma objects. It was working as expected. But now I want to use Arrow's 
> newest build.  I incurred the following error:
>  
> /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on 
> an error: IOError: Cuda error 100 in function 'cuInit': 
> [CUDA_ERROR_NO_DEVICE] no CUDA-capable device is detected
> I think plasma client options (gplasma_client_options_new()) which I am using 
> with default settings are enabling a check for my CUDA device and I have no 
> CUDA device attached to my system. How I can disable this check? Any help 
> will be highly appreciated. Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data

2020-04-24 Thread Gregory Hayes (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091686#comment-17091686
 ] 

Gregory Hayes commented on ARROW-7706:
--

I've encountered this as well, when using pyarrow v0.17.  In my instance, I've 
attempted to both write and to append to a partitioned dataset.  Both a write 
and append operation silently double the data.

> [Python] saving a dataframe to the same partitioned location silently doubles 
> the data
> --
>
> Key: ARROW-7706
> URL: https://issues.apache.org/jira/browse/ARROW-7706
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.1
>Reporter: Tsvika Shapira
>Priority: Major
>  Labels: dataset, parquet
>
> When a user saves a dataframe:
> {code:python}
> df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')
> {code}
> it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in 
> {{/tmp/table}}. Each of them will contain one (or more?) parquet files with 
> random filenames.
> If a user runs the same command again, the code will use the existing 
> sub-directories, but with different (random) filenames. As a result, any data 
> loaded from this folder will be wrong - each row will be present twice.
> For example, when using
> {code:python}
> df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')  # 
> second time
> df2 = pd.read_parquet('/tmp/table', engine='pyarrow')
> assert len(df1) == len(df2)  # raise an error{code}
> This is a subtle change in the data that can pass unnoticed.
>  
> I would expect that the code will prevent the user from using an non-empty 
> destination as partitioned target. an overwrite flag can also be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091678#comment-17091678
 ] 

Neal Richardson commented on ARROW-8556:


Thanks. I've mapped ubuntu 19.10 to ubuntu-18.04 
[here|https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/distro-map.csv#L13]
 so installation with a binary should Just Work now. I'm curious why zstd 
wasn't included correctly before (see that there is no {{-lzstd}} in the 
{{PKG_LIBS}} line), but if you want to let it lie and move on, that's fine with 
me, we can wait and see if anyone else experiences that.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8586) [R] installation failure on CentOS 7

2020-04-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8586:
---
Summary: [R] installation failure on CentOS 7  (was: Failed to Install 
arrow From CRAN)

> [R] installation failure on CentOS 7
> 
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
> chunkedarray.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compression.cpp -o 
> compression.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compute.cpp -o compute.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> 

[jira] [Commented] (ARROW-8586) Failed to Install arrow From CRAN

2020-04-24 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091670#comment-17091670
 ] 

Neal Richardson commented on ARROW-8586:


Thanks for the report. There seem to be two issues: (1) C++ build from source 
is failing, and (2) when {{install_arrow}} tries to download a prebuilt binary, 
it's not correctly identifying your OS version. 

To debug the first issue, could you please set the environment variable 
{{ARROW_R_DEV=true}} and retry, and share with me the (much more verbose) 
installation logs?

To debug the second, could you please tell me what {{lsb_release -rs}} says at 
the command line?

A workaround will be to set {{LIBARROW_BINARY=centos-7}} and reinstall (or, 
equivalently, call {{arrow::install_arrow(binary="centos-7")}} from R, since 
you have that installed). But I'd appreciate your help in debugging the issue 
so that we can make it work correctly going forward.

> Failed to Install arrow From CRAN
> -
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
> chunkedarray.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> 

[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091640#comment-17091640
 ] 

Neville Dipale commented on ARROW-5949:
---

I think not providing more convenient ways of using DictionaryArray potentially 
defeats the purpose of having it. I've already mentioned the need for compute 
kernel support on dictionaries, some of which would require access to the 
array's keys as a primitive array (e.g. sort, take), and others which would 
need both keys and values (filter).

I would rather have the DictionaryArray::keys() return 
ArrayRef instead of NullableIter, then support iterating on arrays in general.

Yes, building the primitive array is a bit expensive, and more importantly, 
it's opaque to a casual Arrow user; so I'd support providing that option.

Look at the below, for example:
{code:java}
impl<'a, K: ArrowPrimitiveType> DictionaryArray {
     pub fn decode_dictionary() -> Result {
 // convert the keys into an array
 let keys = Arc::new(PrimitiveArrayfrom(self.data.clone())) as 
ArrayRef;
 // cast keys to an uint32 array
 let keys = crate::compute::cast(, ::UInt32)?;
 let keys = UInt32Array::from(keys.data());
 // index into the values of the dictionary, with keys
 crate::compute::take(, , None)
     }
 }{code}
This is how I'd convert a dictionary to a 'normal' array of an unknown type.

Perhaps this could be a discussion for the mailing list? I'm interested in 
simplifying the dictionary API, and widening dictionary support; this could be 
a good starting point to do this. CC [~paddyhoran] [~andygrove]

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600
 ] 

Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 2:10 PM:
---

Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to 
show that we can leave the indices as how -1 is masked on (unfortunately it 
won't work with unsigned values, I think that's why the bit masking approach is 
better). Thanks for the links they were fruitful.

 

I think I am more inclined to not build the primitive array, neither user 
should collect the result from the iterator nor one by one look for the 
Some(_), that said I tend to have slice given back from the array, which is 
most probably enable users who are using SIMD later. Thou, it is also nice to 
have a PrimitiveArray API given to users. Current stable SIMD instructions 
(also packed_simd that rust impl uses) fill free so I need to use contiguous 
scalars for dict encoded operations, which are crucial for my use case 
(repacking the arrow array is an overhead for me). So I have started to make a 
vectorized slice implementation over current dictionary array, is it ok to 
include slice kind of approach to Arrow? with chunked offsets, we can even use 
Rust arrays too. Wdyt?

 

edit: contiguous not continuous


was (Author: vertexclique):
Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to 
show that we can leave the indices as how -1 is masked on (unfortunately it 
won't work with unsigned values, I think that's why the bit masking approach is 
better). Thanks for the links they were fruitful.

 

I think I am more inclined to not build the primitive array, neither user 
should collect the result from the iterator nor one by one look for the 
Some(_), that said I tend to have slice given back from the array, which is 
most probably enable users who are using SIMD later. Thou, it is also nice to 
have a PrimitiveArray API given to users. Current stable SIMD instructions 
(also packed_simd that rust impl uses) fill free so I need to use continuous 
scalars for dict encoded operations, which are crucial for my use case 
(repacking the arrow array is an overhead for me). So I have started to make a 
vectorized slice implementation over current dictionary array, is it ok to 
include slice kind of approach to Arrow? with chunked offsets, we can even use 
Rust arrays too. Wdyt?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600
 ] 

Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 2:06 PM:
---

Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to 
show that we can leave the indices as how -1 is masked on (unfortunately it 
won't work with unsigned values, I think that's why the bit masking approach is 
better). Thanks for the links they were fruitful.

 

I think I am more inclined to not build the primitive array, neither user 
should collect the result from the iterator nor one by one look for the 
Some(_), that said I tend to have slice given back from the array, which is 
most probably enable users who are using SIMD later. Thou, it is also nice to 
have a PrimitiveArray API given to users. Current stable SIMD instructions 
(also packed_simd that rust impl uses) fill free so I need to use continuous 
scalars for dict encoded operations, which are crucial for my use case 
(repacking the arrow array is an overhead for me). So I have started to make a 
vectorized slice implementation over current dictionary array, is it ok to 
include slice kind of approach to Arrow? with chunked offsets, we can even use 
Rust arrays too. Wdyt?


was (Author: vertexclique):
Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to 
show that we can leave the indices as how -1 is masked on (unfortunately it 
won't work with unsigned values, I think that's why the bit masking approach is 
better). Thanks for the links they were fruitful.

 

I think I am more inclined to not build the primitive array, neither user 
should collect the result from the iterator nor one by one look for the 
Some(_), that said I tend to have slice given back from the array, which is 
most probably enable users who are using SIMD later. Thou, it is also nice to 
have a PrimitiveArray API given to users. Current stable SIMD instructions also 
packed_simd are fill free so I need to use continuous scalars for dict encoded 
operations, which are crucial for my use case (repacking the arrow array is an 
overhead for me). So I have started to make a vectorized slice implementation 
over current dictionary array, is it ok to include slice kind of approach to 
Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600
 ] 

Mahmut Bulut commented on ARROW-5949:
-

Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to 
show that we can leave the indices as how -1 is masked on (unfortunately it 
won't work with unsigned values, I think that's why the bit masking approach is 
better). Thanks for the links they were fruitful.

 

I think I am more inclined to not build the primitive array, neither user 
should collect the result from the iterator nor one by one look for the 
Some(_), that said I tend to have slice given back from the array, which is 
most probably enable users who are using SIMD later. Thou, it is also nice to 
have a PrimitiveArray API given to users. Current stable SIMD instructions also 
packed_simd are fill free so I need to use continuous scalars for dict encoded 
operations, which are crucial for my use case (repacking the arrow array is an 
overhead for me). So I have started to make a vectorized slice implementation 
over current dictionary array, is it ok to include slice kind of approach to 
Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091569#comment-17091569
 ] 

Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:21 PM:
-

Thanks, having looked at the implementation; I think they're handled the same 
way in Rust (if we exclude the iterator interface).
  
{code:java}
  std::vector raw_indices = {0, 1, 2, -1, 3};
  std::vector is_valid = {1, 1, 1, 0, 1};{code}
 Are you referring to the -1 on the indices? It gets masked by the is_valid 
mask, so I think even if any other value was used, the result would still be 
the same. Perhaps I'm not understanding.


was (Author: nevi_me):
Thanks, having looked at the implementation; I think they're handled the same 
way in Rust (if we exclude the iterator interface).
 
{code:java}
  std::vector raw_indices = {0, 1, 2, -1, 3};
  std::vector is_valid = {1, 1, 1, 0, 1};{code}
 

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091569#comment-17091569
 ] 

Neville Dipale commented on ARROW-5949:
---

Thanks, having looked at the implementation; I think they're handled the same 
way in Rust (if we exclude the iterator interface).
 
{code:java}
  std::vector raw_indices = {0, 1, 2, -1, 3};
  std::vector is_valid = {1, 1, 1, 0, 1};{code}
 

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment

2020-04-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-8318:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] Dataset should instantiate Fragment
> --
>
> Key: ARROW-8318
> URL: https://issues.apache.org/jira/browse/ARROW-8318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset
>
> Fragments are created on the fly when invoking a Scan. This means that a lot 
> of the auxilliary/ancilliary data must be stored by the specialised Dataset, 
> e.g. the FileSystemDataset must hold the path and partition expression. With 
> the venue of more complex Fragment, e.g. ParquetFileFragment, more data must 
> be stored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547
 ] 

Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:10 PM:
-

Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
 My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert 
{code:java}
dict(i32)[
to 
i32<1, 1, null, 2, null>{code}
The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).


was (Author: nevi_me):
Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
 My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert `
{code:java}
dict(i32)[` 
to 


i32<1, 1, null, 2, null>{code}
The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547
 ] 

Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:09 PM:
-

Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
 My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert `
{code:java}
dict(i32)[` 
to 


i32<1, 1, null, 2, null>{code}
The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).


was (Author: nevi_me):
Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert `dict(i32)[` to `i32<1, 1, null, 
2, null>`.

The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091555#comment-17091555
 ] 

Mahmut Bulut commented on ARROW-5949:
-

For the reference implementation that I am talking about, please take a look at 
the `TestStringDictionaryAppendIndices` in cxx implementation for how nulls are 
handled in arrow cxx implementation.

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7297) [C++] Add value accessor in sparse tensor class

2020-04-24 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-7297:
-

Assignee: Rok Mihevc

> [C++] Add value accessor in sparse tensor class
> ---
>
> Key: ARROW-7297
> URL: https://issues.apache.org/jira/browse/ARROW-7297
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Major
>
> {{SparseTensor}} can have value accessor like {{Tensor::Value}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547
 ] 

Neville Dipale commented on ARROW-5949:
---

Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert `dict(i32)[` to `i32<1, 1, null, 
2, null>`.

The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device

2020-04-24 Thread Tanveer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523
 ] 

Tanveer commented on ARROW-8577:


Hi Kouhei,

This the program. I am taking a RecordBatch (batch_genomics) as input in this 
function. The error arises at:

gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, );

 
{code:java}
guint8 id_arr[20]; 
genRandom(id_arr,20);
char objID_file[] = "/home/tahmad/lib/core/objID.txt";

g_print("obj_id: %s\n", id_arr);

 gboolean success = TRUE;
 GError *error = NULL;

 GPlasmaClient *gPlasmaClient;
 GPlasmaObjectID *object_id;
 GPlasmaClientCreateOptions *create_options;
 GPlasmaClientOptions *gplasmaClient_options;
 GPlasmaCreatedObject *Object;
 GPlasmaReferredObject *refObject;
 GArrowBuffer *arrowBuffer;

 arrowBuffer = GSerializeRecordBatch(batch_genomics);
 gint32 size = garrow_buffer_get_size(arrowBuffer);

 gplasmaClient_options = gplasma_client_options_new();
 gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, 
);
 object_id = gplasma_object_id_new(id_arr, 20, );
 create_options = gplasma_client_create_options_new();
 {
 guint8 metadata[] = "metadata";
 gplasma_client_create_options_set_metadata(create_options, (const guint8 
*)metadata, sizeof(metadata));
 }
 Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, 
);

 g_object_unref(create_options);
 {
 GArrowBuffer *data;
 guint8 dataW[] = "data";
 g_object_get(Object, "data", , NULL);
 garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, 
garrow_buffer_get_databytes(arrowBuffer),size,);
 g_object_unref(data);
 }

 gplasma_created_object_seal(Object, );
 g_object_unref(Object);
 gplasma_client_disconnect(gPlasmaClient, );
 g_object_unref(gPlasmaClient);{code}

> [GLib][Plasma] gplasma_client_options_new() default settings are enabling a 
> check for CUDA device
> -
>
> Key: ARROW-8577
> URL: https://issues.apache.org/jira/browse/ARROW-8577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Tanveer
>Assignee: Kouhei Sutou
>Priority: Major
>
> Hi all,
>  Previously, I was using c_glib Plasma library (build 0.12) for creating 
> plasma objects. It was working as expected. But now I want to use Arrow's 
> newest build.  I incurred the following error:
>  
> /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on 
> an error: IOError: Cuda error 100 in function 'cuInit': 
> [CUDA_ERROR_NO_DEVICE] no CUDA-capable device is detected
> I think plasma client options (gplasma_client_options_new()) which I am using 
> with default settings are enabling a check for my CUDA device and I have no 
> CUDA device attached to my system. How I can disable this check? Any help 
> will be highly appreciated. Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8586) Failed to Install arrow From CRAN

2020-04-24 Thread Hei (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hei updated ARROW-8586:
---
Description: 
Hi,

I am trying to install arrow via RStudio, but it seems like it is not working 
that after I installed the package, it kept asking me to run 
arrow::install_arrow() even after I did:

{code}
> install.packages("arrow")
Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
Content type 'application/x-gzip' length 242534 bytes (236 KB)
==
downloaded 236 KB

* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
*** Successfully retrieved C++ source
*** Building C++ libraries
 cmake
 arrow  
./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
- NOTE ---
After installation, please run arrow::install_arrow()
for help installing required runtime libraries
-
** libs
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
array_from_vector.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
array_to_vector.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
arrowExports.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
chunkedarray.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c compression.cpp -o compression.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c compute.cpp -o compute.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c csv.cpp -o csv.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c dataset.cpp -o dataset.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  

[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512
 ] 

Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 12:22 PM:


Hi, I've just seen this. Is there any reason why we provide custom iterator 
over keys? (Which is basically resolving into Option) Can we use 0 as a null 
identifier?

 

Reason 1: Iteration over Iterator> will take time, rebuilding from 
that for lookup also takes double time.

Reason 2: We can't use SIMD for parallel comparison.


was (Author: vertexclique):
Hi, I've just seen this. Is there any reason why we provide custom iterator 
over keys? (Which is basically resolving into Option) Can we use 0 as a null 
identifier?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8586) Failed to Install arrow From CRAN

2020-04-24 Thread Hei (Jira)
Hei created ARROW-8586:
--

 Summary: Failed to Install arrow From CRAN
 Key: ARROW-8586
 URL: https://issues.apache.org/jira/browse/ARROW-8586
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 0.17.0
 Environment: CentOS 7
Reporter: Hei


Hi,

I am trying to install arrow via RStudio, but it seems like it is not working 
that after I installed the package, it kept asking me to run 
arrow::install_arrow() even after I did:

{code}
> install.packages("arrow")Installing package into 
> ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’(as ‘lib’ is 
> unspecified)trying URL 
> 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'Content type 
> 'application/x-gzip' length 242534 bytes (236 
> KB)==downloaded 236 KB
* installing *source* package ‘arrow’ ...** package ‘arrow’ successfully 
unpacked and MD5 sums checked** using staged installation*** Successfully 
retrieved C++ source*** Building C++ libraries cmake arrow  
./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a 
directory- NOTE ---After 
installation, please run arrow::install_arrow()for help installing required 
runtime libraries-** 
libsg++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.og++ -m64 
-std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
array_from_vector.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
array_to_vector.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.og++ 
-m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
arrowExports.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.og++ -m64 
-std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
chunkedarray.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c compression.cpp -o 
compression.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c compute.cpp -o compute.og++ 
-m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic  -c csv.cpp -o csv.og++ -m64 
-std=gnu++11 -I"/usr/include/R" -DNDEBUG  
-I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
-I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong 

[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512
 ] 

Mahmut Bulut commented on ARROW-5949:
-

Hi, I've just seen this. Is there any reason why we provide custom iterator 
over keys? Which is basically resolving into Option or None? Can we use 0 as a 
null identifier?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-24 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512
 ] 

Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 12:20 PM:


Hi, I've just seen this. Is there any reason why we provide custom iterator 
over keys? (Which is basically resolving into Option) Can we use 0 as a null 
identifier?


was (Author: vertexclique):
Hi, I've just seen this. Is there any reason why we provide custom iterator 
over keys? Which is basically resolving into Option or None? Can we use 0 as a 
null identifier?

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8578) [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on compiling system"

2020-04-24 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091502#comment-17091502
 ] 

David Li commented on ARROW-8578:
-

As Antoine mentioned, it's just a red herring (has to do with where gRPC was 
built). Unfortunately gRPC isn't so good about surfacing issues to the 
application; running with {{env GRPC_VERBOSITY=DEBUG}} and if needed 
{{GRPC_TRACE=all}} will give more information.

> [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on 
> compiling system"
> 
>
> Key: ARROW-8578
> URL: https://issues.apache.org/jira/browse/ARROW-8578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> Tried compiling and running this today  (with grpc 1.28.1)
> {code}
> $ release/arrow-flight-benchmark 
> Using standalone server: false
> Server running with pid 22385
> Testing method: DoGet
> Server host: localhost
> Server port: 31337
> E0423 21:54:15.174285695   22385 socket_utils_common_posix.cc:222] check for 
> SO_REUSEPORT: {"created":"@1587696855.174280083","description":"SO_REUSEPORT 
> unavailable on compiling 
> system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":190}
> Server host: localhost
> {code}
> my Linux kernel
> {code}
> $ uname -a
> Linux 4.15.0-1079-oem #89-Ubuntu SMP Fri Mar 27 05:22:11 UTC 2020 x86_64 
> x86_64 x86_64 GNU/Linux
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8585) [Packaging][Python] Windows wheels fail to build because of link error

2020-04-24 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8585:
--

 Summary: [Packaging][Python] Windows wheels fail to build because 
of link error
 Key: ARROW-8585
 URL: https://issues.apache.org/jira/browse/ARROW-8585
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Krisztian Szucs
 Fix For: 1.0.0


See build log 
https://ci.appveyor.com/project/Ursa-Labs/crossbow/builds/32406283#L1088



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8584) [Packaging][C++] Protobuf link error in deb builds

2020-04-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-8584:
---
Description: 
See build log 
Stretch: https://github.com/ursa-labs/crossbow/runs/614358553
Focal: https://github.com/ursa-labs/crossbow/runs/614358637

cc @kou

  was:
See build log https://github.com/ursa-labs/crossbow/runs/614358553

cc @kou


> [Packaging][C++] Protobuf link error in deb builds
> --
>
> Key: ARROW-8584
> URL: https://issues.apache.org/jira/browse/ARROW-8584
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> See build log 
> Stretch: https://github.com/ursa-labs/crossbow/runs/614358553
> Focal: https://github.com/ursa-labs/crossbow/runs/614358637
> cc @kou



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8584) [Packaging][C++] Protobuf link error in deb builds

2020-04-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-8584:
---
Summary: [Packaging][C++] Protobuf link error in deb builds  (was: 
[Packaging][C++] Protobuf link error in debian-stretch build)

> [Packaging][C++] Protobuf link error in deb builds
> --
>
> Key: ARROW-8584
> URL: https://issues.apache.org/jira/browse/ARROW-8584
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> See build log https://github.com/ursa-labs/crossbow/runs/614358553
> cc @kou



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8584) [Packaging][C++] Protobuf link error in debian-stretch build

2020-04-24 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8584:
--

 Summary: [Packaging][C++] Protobuf link error in debian-stretch 
build
 Key: ARROW-8584
 URL: https://issues.apache.org/jira/browse/ARROW-8584
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging
Reporter: Krisztian Szucs
 Fix For: 1.0.0


See build log https://github.com/ursa-labs/crossbow/runs/614358553

cc @kou



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8583) [C++][Doc] Undocumented parameter in Dataset namespace

2020-04-24 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8583:
--

 Summary: [C++][Doc] Undocumented parameter in Dataset namespace
 Key: ARROW-8583
 URL: https://issues.apache.org/jira/browse/ARROW-8583
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Documentation
Reporter: Krisztian Szucs
 Fix For: 1.0.0


See build log: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-24-0-circle-test-ubuntu-18.04-docs

We should build the doxygen docs on each commit, preferably in the conda-cpp 
build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8582) [Packaging][Python] macOS wheels occasionally exceed travis build time limit

2020-04-24 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8582:
--

 Summary: [Packaging][Python] macOS wheels occasionally exceed 
travis build time limit
 Key: ARROW-8582
 URL: https://issues.apache.org/jira/browse/ARROW-8582
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Krisztian Szucs


Either reduce the build time or port to another CI provider.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8578) [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on compiling system"

2020-04-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091401#comment-17091401
 ] 

Antoine Pitrou commented on ARROW-8578:
---

The SO_REUSEPORT message is just a notice from gRPC, not an actual error.

> [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on 
> compiling system"
> 
>
> Key: ARROW-8578
> URL: https://issues.apache.org/jira/browse/ARROW-8578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> Tried compiling and running this today  (with grpc 1.28.1)
> {code}
> $ release/arrow-flight-benchmark 
> Using standalone server: false
> Server running with pid 22385
> Testing method: DoGet
> Server host: localhost
> Server port: 31337
> E0423 21:54:15.174285695   22385 socket_utils_common_posix.cc:222] check for 
> SO_REUSEPORT: {"created":"@1587696855.174280083","description":"SO_REUSEPORT 
> unavailable on compiling 
> system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":190}
> Server host: localhost
> {code}
> my Linux kernel
> {code}
> $ uname -a
> Linux 4.15.0-1079-oem #89-Ubuntu SMP Fri Mar 27 05:22:11 UTC 2020 x86_64 
> x86_64 x86_64 GNU/Linux
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7808) [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7808:
--
Labels: dataset pull-request-available  (was: dataset)

> [Java][Dataset] Implement Datasets Java API 
> 
>
> Key: ARROW-7808
> URL: https://issues.apache.org/jira/browse/ARROW-7808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java
>Reporter: Hongze Zhang
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Porting following C++ Datasets APIs to Java: 
> * DataSource 
> * DataSourceDiscovery 
> * DataFragment 
> * Dataset
> * Scanner 
> * ScanTask 
> * ScanOptions 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls {{ToUnixTimeMilliseconds()}}, 
which [only considers the date 
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
 of its object, not the time portion or offset.  This means that the number of 
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought 
they were specifying.

If the user chooses to use NodaTime as a "better" date and time-handling 
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
h1. Suggested Improvement
 * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
 * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).

 

The conversion method for a {{Date32Array}} would then look a bit like this:
{code:java}
private static readonly DateTime Epoch = new DateTime(1970, 1, 1);

protected override int ConvertTo(DateTime value)
{
return (int)(value - Epoch).TotalDays;
} {code}

  was:
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); 

[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls {{ToUnixTimeMilliseconds()}}, 
which [only considers the date 
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
 of its object, not the time portion or offset.  This means that the number of 
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought 
they were specifying.

If the user chooses to use NodaTime as a "better" date and time-handling 
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
h1. Suggested Improvement
 * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
 * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).

 

The conversion method for a {{Date32Array}} would then look a bit like this:
{code:c#}
private static readonly DateTime Epoch = new DateTime(1970, 1, 1);

protected override int ConvertTo(DateTime value)
{
return (int)(value - Epoch).TotalDays;
} {code}

  was:
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); 

[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls {{ToUnixTimeMilliseconds()}}, 
which [only considers the date 
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
 of its object, not the time portion or offset.  This means that the number of 
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought 
they were specifying.

If the user chooses to use NodaTime as a "better" date and time-handling 
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug 
triggers if > 00:00:00
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified
var allocator = new NativeMemoryAllocator();
Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23!
{code}
h1. Suggested Improvement
 * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
 * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).

  was:
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified: 
triggers the bug
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls 

[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Environment: (was: Windows 10 x64)

> [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
> --
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Affects Versions: 0.17.0
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
> of type {{DateTimeOffset}}, but this makes it very easy for the user to 
> introduce subtle bugs when they work with the {{DateTime}} type in their own 
> code.  This class of bugs could be avoided if these builders were instead 
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> h1. Details
> The danger is introduced by the implicit widening conversion provided by the 
> _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
>  
> [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
> The important part is this text:
> {quote}The offset of the resulting DateTimeOffset object depends on the value 
> of the DateTime.Kind property of the dateTime parameter:
>  * If the value of the DateTime.Kind property is DateTimeKind.Local or 
> DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is 
> set equal to dateTime, and its Offset property *is set equal to the offset of 
> the local system's current time zone*.{quote}
>  (Emphasis mine)
>  If the user is operating in an environment with a positive GMT offset, it is 
> very easy to write the wrong date to the builder:
> {code:c#}
> var builder = new Date32Array.Builder();
> builder.Append(new DateTime(2020, 4, 24)); // Kind == 
> DateTimeKind.Unspecified: triggers the bug
> {code}
> Assume that the user is in the UK (as I am), where the GMT offset on the 
> above date is 1 hour ahead.  This means that the conversion to 
> {{DateTimeOffset}} will actually result in a value of 
> {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method.  Arrow 
> then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date 
> portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
>  of its object, not the time portion or offset.  This means that the number 
> of days gets calculated based on 2020-04-23, not 2020-04-24 as the user 
> thought they were specifying.
> If the user chooses to use NodaTime as a "better" date and time-handling 
> library, they will still likely run into the bug if they do the obvious thing:
> {code:c#}
> var builder = new Date32Array.Builder();
> var ld = new NodaTime.LocalDate(2020, 4, 24);
> builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
> DateTimeKind.Unspecified: also triggers the bug
> {code}
> h1. Suggested Improvement
>  * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
> {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
>  * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
> {{DateTime}}, not {{DateTimeOffset}} (also a breaking change).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Issue Type: Improvement  (was: Bug)

> [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
> --
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Affects Versions: 0.17.0
> Environment: Windows 10 x64
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
> of type {{DateTimeOffset}}, but this makes it very easy for the user to 
> introduce subtle bugs when they work with the {{DateTime}} type in their own 
> code.  This class of bugs could be avoided if these builders were instead 
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> h1. Details
> The danger is introduced by the implicit widening conversion provided by the 
> _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
>  
> [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
> The important part is this text:
> {quote}The offset of the resulting DateTimeOffset object depends on the value 
> of the DateTime.Kind property of the dateTime parameter:
>  * If the value of the DateTime.Kind property is DateTimeKind.Local or 
> DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is 
> set equal to dateTime, and its Offset property *is set equal to the offset of 
> the local system's current time zone*.{quote}
>  (Emphasis mine)
>  If the user is operating in an environment with a positive GMT offset, it is 
> very easy to write the wrong date to the builder:
> {code:c#}
> var builder = new Date32Array.Builder();
> builder.Append(new DateTime(2020, 4, 24)); // Kind == 
> DateTimeKind.Unspecified: triggers the bug
> {code}
> Assume that the user is in the UK (as I am), where the GMT offset on the 
> above date is 1 hour ahead.  This means that the conversion to 
> {{DateTimeOffset}} will actually result in a value of 
> {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method.  Arrow 
> then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date 
> portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
>  of its object, not the time portion or offset.  This means that the number 
> of days gets calculated based on 2020-04-23, not 2020-04-24 as the user 
> thought they were specifying.
> If the user chooses to use NodaTime as a "better" date and time-handling 
> library, they will still likely run into the bug if they do the obvious thing:
> {code:c#}
> var builder = new Date32Array.Builder();
> var ld = new NodaTime.LocalDate(2020, 4, 24);
> builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
> DateTimeKind.Unspecified: also triggers the bug
> {code}
> h1. Suggested Improvement
>  * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
> {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
>  * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
> {{DateTime}}, not {{DateTimeOffset}} (also a breaking change).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Summary: [C#] Date32/64Array.Builder should accept DateTime, not 
DateTimeOffset  (was: [C#] Date32/64Array write & read back introduces 
off-by-one error)

> [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
> --
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Affects Versions: 0.17.0
> Environment: Windows 10 x64
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
> of type {{DateTimeOffset}}, but this makes it very easy for the user to 
> introduce subtle bugs when they work with the {{DateTime}} type in their own 
> code.  This class of bugs could be avoided if these builders were instead 
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> h1. Details
> The danger is introduced by the implicit widening conversion provided by the 
> _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
>  
> [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
> The important part is this text:
> {quote}The offset of the resulting DateTimeOffset object depends on the value 
> of the DateTime.Kind property of the dateTime parameter:
>  * If the value of the DateTime.Kind property is DateTimeKind.Local or 
> DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is 
> set equal to dateTime, and its Offset property *is set equal to the offset of 
> the local system's current time zone*.{quote}
>  (Emphasis mine)
>  If the user is operating in an environment with a positive GMT offset, it is 
> very easy to write the wrong date to the builder:
> {code:c#}
> var builder = new Date32Array.Builder();
> builder.Append(new DateTime(2020, 4, 24)); // Kind == 
> DateTimeKind.Unspecified: triggers the bug
> {code}
> Assume that the user is in the UK (as I am), where the GMT offset on the 
> above date is 1 hour ahead.  This means that the conversion to 
> {{DateTimeOffset}} will actually result in a value of 
> {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method.  Arrow 
> then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date 
> portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
>  of its object, not the time portion or offset.  This means that the number 
> of days gets calculated based on 2020-04-23, not 2020-04-24 as the user 
> thought they were specifying.
> If the user chooses to use NodaTime as a "better" date and time-handling 
> library, they will still likely run into the bug if they do the obvious thing:
> {code:c#}
> var builder = new Date32Array.Builder();
> var ld = new NodaTime.LocalDate(2020, 4, 24);
> builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
> DateTimeKind.Unspecified: also triggers the bug
> {code}
> h1. Suggested Improvement
>  * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
> {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
>  * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
> {{DateTime}}, not {{DateTimeOffset}} (also a breaking change).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified: 
triggers the bug
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls {{ToUnixTimeMilliseconds()}}, 
which [only considers the date 
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
 of its object, not the time portion or offset.  This means that the number of 
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought 
they were specifying.

If the user chooses to use NodaTime as a "better" date and time-handling 
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
DateTimeKind.Unspecified: also triggers the bug
{code}
h1. Suggested Improvement
 * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
 * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).

  was:
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;

internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the 
above code as appropriate to demonstrate for the other type.
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 


> [C#] Date32/64Array write & read back introduces off-by-one error
> -
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Affects Versions: 0.17.0
> Environment: Windows 10 x64
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
> of type {{DateTimeOffset}}, but this makes it very easy for the user to 
> introduce subtle bugs when they work with the {{DateTime}} type in their own 
> code.  This class of bugs could be avoided if these builders were instead 
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> 

[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;

internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the 
above code as appropriate to demonstrate for the other type.
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 

  was:
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;

internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 


> [C#] Date32/64Array write & read back introduces off-by-one error
> -
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Affects Versions: 0.17.0
> Environment: Windows 10 x64
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary
> Writing a Date value using either a {{Date32Array.Builder}} or 
> {{Date64.Builder}} and then reading back the result from the built array 
> introduces an off-by-one error in the value.  The following minimal code 
> illustrates:
> {code:c#}
> namespace Date32ArrayReadWriteBug
> {
> using Apache.Arrow;
> using Apache.Arrow.Memory;
> using System;
> internal static class Program
> {
> public static void Main(string[] args)
> {
> var allocator = new NativeMemoryAllocator();
> var builder = new Date32Array.Builder();
> var date = new DateTime(2020, 4, 24);
> Console.WriteLine($"Appending date {date:-MM-dd}");
> builder.Append(date);
> var array = builder.Build(allocator);
> var dateAgain = array.GetDate(0);
> Console.WriteLine($"Read date {dateAgain:-MM-dd}");
> }
> }
> }{code}
> Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the 
> above code as appropriate to demonstrate for the other type.
> h2. Expected Output
> {noformat}
> Appending date 2020-04-24
> Read date 2020-04-24 {noformat}
> h2. Actual Output
> {noformat}
> Appending date 2020-04-24
> Read date 2020-04-23 {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error

2020-04-24 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8581:
---

 Summary: [C#] Date32/64Array write & read back introduces 
off-by-one error
 Key: ARROW-8581
 URL: https://issues.apache.org/jira/browse/ARROW-8581
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Affects Versions: 0.17.0
 Environment: Windows 10 x64
Reporter: Adam Szmigin


h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error

2020-04-24 Thread Adam Szmigin (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:

Description: 
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;

internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 

  was:
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 


> [C#] Date32/64Array write & read back introduces off-by-one error
> -
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Affects Versions: 0.17.0
> Environment: Windows 10 x64
>Reporter: Adam Szmigin
>Priority: Major
>
> h1. Summary
> Writing a Date value using either a {{Date32Array.Builder}} or 
> {{Date64.Builder}} and then reading back the result from the built array 
> introduces an off-by-one error in the value.  The following minimal code 
> illustrates:
> {code:c#}
> namespace Date32ArrayReadWriteBug
> {
> using Apache.Arrow;
> using Apache.Arrow.Memory;
> using System;
> internal static class Program
> {
> public static void Main(string[] args)
> {
> var allocator = new NativeMemoryAllocator();
> var builder = new Date32Array.Builder();
> var date = new DateTime(2020, 4, 24);
> Console.WriteLine($"Appending date {date:-MM-dd}");
> builder.Append(date);
> var array = builder.Build(allocator);
> var dateAgain = array.GetDate(0);
> Console.WriteLine($"Read date {dateAgain:-MM-dd}");
> }
> }
> }{code}
> h2. Expected Output
> {noformat}
> Appending date 2020-04-24
> Read date 2020-04-24 {noformat}
> h2. Actual Output
> {noformat}
> Appending date 2020-04-24
> Read date 2020-04-23 {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8568) [C++][Python] Crash on decimal cast in debug mode

2020-04-24 Thread Jacek Pliszka (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091300#comment-17091300
 ] 

Jacek Pliszka commented on ARROW-8568:
--

The problem is here:

```
{color:#267f99}DecimalStatus{color} 
{color:#267f99}BasicDecimal128{color}{color:#00}::{color}{color:#795e26}Rescale{color}{color:#00}({color}{color:#ff}int32_t{color}
 {color:#001080}original_scale{color}{color:#00}, 
{color}{color:#ff}int32_t{color} 
{color:#001080}new_scale{color}{color:#00},{color}
{color:#267f99}BasicDecimal128{color}{color:#ff}*{color} 
{color:#001080}out{color}{color:#00}) 
{color}{color:#ff}const{color}{color:#00} {{color}
{color:#795e26}DCHECK_NE{color}{color:#00}(out, 
{color}{color:#ff}nullptr{color}{color:#00});{color}
{color:#795e26}DCHECK_NE{color}{color:#00}(original_scale, 
new_scale);{color}
```

Firstly there is design question - should calling Rescale with original_scale 
== new_scale be allowed ?

 

If not - I can fix it in my code somewhere. But IMHO Rescale should allow for 
that and should handle data overflow then.

> [C++][Python] Crash on decimal cast in debug mode
> -
>
> Key: ARROW-8568
> URL: https://issues.apache.org/jira/browse/ARROW-8568
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> {code:python}
> >>> arr = pa.array([Decimal('123.45')])   
> >>>   
> >>>   
> >>> arr   
> >>>   
> >>>   
> 
> [
>   123.45
> ]
> >>> arr.type  
> >>>   
> >>>   
> Decimal128Type(decimal(5, 2))
> >>> arr.cast(pa.decimal128(4, 2)) 
> >>>   
> >>>   
> ../src/arrow/util/basic_decimal.cc:626:  Check failed: (original_scale) != 
> (new_scale) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8579) [C++] AVX512 part for SIMD operations of DecodeSpaced/EncodeSpaced

2020-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8579:
--
Labels: pull-request-available  (was: )

> [C++] AVX512 part for SIMD operations of DecodeSpaced/EncodeSpaced
> --
>
> Key: ARROW-8579
> URL: https://issues.apache.org/jira/browse/ARROW-8579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of https://issues.apache.org/jira/browse/PARQUET-1841, AVX512 path 
> identified with the helper of mask_compress_/mask_expand_  API.
> This Jira created for spaced benchmark, unittest and AVX512 path and other 
> basic support of further potential SIMD chance of SSE/AVX2. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)