[jira] [Created] (ARROW-8592) [C++] Docs still list LLVM 7 as compiler used
Micah Kornfield created ARROW-8592: -- Summary: [C++] Docs still list LLVM 7 as compiler used Key: ARROW-8592 URL: https://issues.apache.org/jira/browse/ARROW-8592 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Assignee: Micah Kornfield should be LLVM 8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data
[ https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092028#comment-17092028 ] Will Jones commented on ARROW-7706: --- To add to the idea of write modes, Spark's Dataframe.saveAsTable() method has a mode attribute similar to what you're discussing here. Might be a good part of their API to imitate. It includes the modes: {quote} * ??append??: Append contents of this [{{DataFrame}}|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame] to existing data. * ??overwrite??: Overwrite existing data. * ??error?? or ??errorifexists??: Throw an exception if data already exists. * ??ignore??: Silently ignore this operation if data already exists. {quote} The default is "error": error if destination is not empty. Reference: [https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.saveAsTable] > [Python] saving a dataframe to the same partitioned location silently doubles > the data > -- > > Key: ARROW-7706 > URL: https://issues.apache.org/jira/browse/ARROW-7706 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Tsvika Shapira >Priority: Major > Labels: dataset, parquet > > When a user saves a dataframe: > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') > {code} > it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in > {{/tmp/table}}. Each of them will contain one (or more?) parquet files with > random filenames. > If a user runs the same command again, the code will use the existing > sub-directories, but with different (random) filenames. As a result, any data > loaded from this folder will be wrong - each row will be present twice. > For example, when using > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') # > second time > df2 = pd.read_parquet('/tmp/table', engine='pyarrow') > assert len(df1) == len(df2) # raise an error{code} > This is a subtle change in the data that can pass unnoticed. > > I would expect that the code will prevent the user from using an non-empty > destination as partitioned target. an overwrite flag can also be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3
[ https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092026#comment-17092026 ] Shawn Li commented on ARROW-8435: - Hi Will, I posted the issue there as well because I'm not sure what the root cause is and where it belongs to as the issue occurred while using the `write_to_dataset` method of pyarrow. Thank for linking them together. By the way, what a small world, I hope you're doing well! > [Python] A TypeError is raised while token expires during writing to S3 > --- > > Key: ARROW-8435 > URL: https://issues.apache.org/jira/browse/ARROW-8435 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Shawn Li >Priority: Critical > > This issue occurs when a STS token expires *in the middle of* writing to S3. > An OSError: Write failed: TypeError("'NoneType' object is not > subscriptable",) is raised instead of a PermissionError. > > OSError: Write failed: TypeError("'NoneType' object is not subscriptable",) > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, > in > write_to_dataset write_table(subtable, f, **kwargs) > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, > in > write_table writer.write_table(table, row_group_size=row_group_size) > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, > in > write_table self.writer.write_table(table, row_group_size=row_group_size) > File "pyarrow/_parquet.pyx", line 1375, in > pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, > in > pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The > provided token has expired.. Detail: Python exception: PermissionError > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in > _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: > 'NoneType' object is not subscriptable > environment is: > s3fs==0.4.0 > boto3==1.10.27 > botocore==1.13.27 > pyarrow==0.15.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8435) [Python] A TypeError is raised while token expires during writing to S3
[ https://issues.apache.org/jira/browse/ARROW-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092024#comment-17092024 ] Will Jones commented on ARROW-8435: --- This looks to be a bug in s3fs, and the issue is being tracked here: [https://github.com/dask/s3fs/issues/314] > [Python] A TypeError is raised while token expires during writing to S3 > --- > > Key: ARROW-8435 > URL: https://issues.apache.org/jira/browse/ARROW-8435 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Shawn Li >Priority: Critical > > This issue occurs when a STS token expires *in the middle of* writing to S3. > An OSError: Write failed: TypeError("'NoneType' object is not > subscriptable",) is raised instead of a PermissionError. > > OSError: Write failed: TypeError("'NoneType' object is not subscriptable",) > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1450, > in > write_to_dataset write_table(subtable, f, **kwargs) > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1344, > in > write_table writer.write_table(table, row_group_size=row_group_size) > File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 474, > in > write_table self.writer.write_table(table, row_group_size=row_group_size) > File "pyarrow/_parquet.pyx", line 1375, in > pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 80, > in > pyarrow.lib.check_statuspyarrow.lib.ArrowIOError: Arrow error: IOError: The > provided token has expired.. Detail: Python exception: PermissionError > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/s3fs/core.py", line 1096, in > _upload_chunk PartNumber=part, UploadId=self.mpu['UploadId'],TypeError: > 'NoneType' object is not subscriptable > environment is: > s3fs==0.4.0 > boto3==1.10.27 > botocore==1.13.27 > pyarrow==0.15.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8586) [R] installation failure on CentOS 7
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091996#comment-17091996 ] Hei commented on ARROW-8586: Thanks for looking into it. Here is the output: {code} > install.packages("arrow") Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' Content type 'application/x-gzip' length 242534 bytes (236 KB) == downloaded 236 KB * installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation *** Generating code with data-raw/codegen.R Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory trying URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' Error in download.file(from_url, to_file, quiet = quietly) : cannot open URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' trying URL 'https://www.apache.org/dyn/closer.lua?action=download=arrow/arrow-0.17.0/apache-arrow-0.17.0.tar.gz' Content type 'application/x-gzip' length 6460548 bytes (6.2 MB) == downloaded 6.2 MB *** Successfully retrieved C++ source *** Building C++ libraries rm: cannot remove ‘src/*.o’: No such file or directory *** Building with MAKEFLAGS= -j2 cmake trying URL 'https://github.com/Kitware/CMake/releases/download/v3.16.2/cmake-3.16.2-Linux-x86_64.tar.gz' Content type 'application/octet-stream' length 39508533 bytes (37.7 MB) == downloaded 37.7 MB arrow with SOURCE_DIR=/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp BUILD_DIR=/tmp/RtmpFm22he/file197f76cef765 DEST_DIR=libarrow/arrow-0.17.0 CMAKE=/tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake ++ pwd + : /tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow + : /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp + : /tmp/RtmpFm22he/file197f76cef765 + : libarrow/arrow-0.17.0 + : /tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake ++ cd /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp ++ pwd + SOURCE_DIR=/tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp ++ mkdir -p libarrow/arrow-0.17.0 ++ cd libarrow/arrow-0.17.0 ++ pwd + DEST_DIR=/tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow/libarrow/arrow-0.17.0 + '[' '' = '' ']' + which ninja + '[' '' = false ']' + mkdir -p /tmp/RtmpFm22he/file197f76cef765 + pushd /tmp/RtmpFm22he/file197f76cef765 /tmp/RtmpFm22he/file197f76cef765 /tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow + /tmp/RtmpFm22he/file197f10953f3a/cmake-3.16.2-Linux-x86_64/bin/cmake -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF -DARROW_WITH_LZ4=OFF -DARROW_WITH_SNAPPY=OFF -DARROW_WITH_ZLIB=OFF -DARROW_WITH_ZSTD=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/tmp/RtmpgF1pnV/R.INSTALL195f36ddad48/arrow/libarrow/arrow-0.17.0 -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON -DOPENSSL_USE_STATIC_LIBS=ON -G 'Unix Makefiles' /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp -- Building using CMake version: 3.16.2 -- The C compiler identification is GNU 8.3.1 -- The CXX compiler identification is GNU 8.3.1 -- Check for working C compiler: /opt/rh/devtoolset-8/root/usr/bin/cc -- Check for working C compiler: /opt/rh/devtoolset-8/root/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /opt/rh/devtoolset-8/root/usr/bin/c++ -- Check for working CXX compiler: /opt/rh/devtoolset-8/root/usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Arrow version: 0.17.0 (full: '0.17.0') -- Arrow SO version: 17 (full: 17.0.0) -- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1") -- clang-tidy not found -- clang-format not found -- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN) -- infer not found -- Found Python3: /usr/bin/python3.6 (found version "3.6.8") found components: Interpreter -- Found cpplint executable at /tmp/RtmpFm22he/file197f1ee33abb/apache-arrow-0.17.0/cpp/build-support/cpplint.py -- System processor: x86_64 -- Performing Test CXX_SUPPORTS_SSE4_2 -- Performing Test CXX_SUPPORTS_SSE4_2 - Success -- Performing Test CXX_SUPPORTS_AVX2 -- Performing Test
[jira] [Commented] (ARROW-5634) [C#] ArrayData.NullCount should be a property
[ https://issues.apache.org/jira/browse/ARROW-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091994#comment-17091994 ] Zachary Gramana commented on ARROW-5634: [GitHub Pull Request #7032|https://github.com/apache/arrow/pull/7032] now properly computes the `NullCount` value and passes it to the `ArrayData` ctor in the `Slice` method. `NullCount` should remain a readonly field, however, in order to preserve immutability. > [C#] ArrayData.NullCount should be a property > -- > > Key: ARROW-5634 > URL: https://issues.apache.org/jira/browse/ARROW-5634 > Project: Apache Arrow > Issue Type: Task > Components: C# >Reporter: Prashanth Govindarajan >Priority: Major > > ArrayData.NullCount should be a property so that it can be computed when > necessary: for ex: after Slice(), NullCount is -1 and needs to be computed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5708) [C#] Null support for BooleanArray
[ https://issues.apache.org/jira/browse/ARROW-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091987#comment-17091987 ] Zachary Gramana edited comment on ARROW-5708 at 4/25/20, 12:37 AM: --- In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` BooleanArray.Builder yet because there were not any BooleanArray.Builder tests in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for `BooleanArray.Slice` there or in "BooleanArrayTests.cs". As a result of adding those tests, and getting them to pass, [GitHub PR 7032|https://github.com/apache/arrow/pull/7032] also now resolves this issue. was (Author: gramana): In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` BooleanArray.Builder yet because there were not any BooleanArray.Builder tests in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for `BooleanArray.Slice` there or in "BooleanArrayTests.cs". As a result of adding those tests, and getting them to pass, [GitHub PR 6161|https://github.com/apache/arrow/pull/6121] also now resolves this issue. > [C#] Null support for BooleanArray > -- > > Key: ARROW-5708 > URL: https://issues.apache.org/jira/browse/ARROW-5708 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Priority: Major > > See the conversation > [here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and > [here|https://github.com/apache/arrow/pull/3574#discussion_r262662083]. > We should add null support for BooleanArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5708) [C#] Null support for BooleanArray
[ https://issues.apache.org/jira/browse/ARROW-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091987#comment-17091987 ] Zachary Gramana commented on ARROW-5708: In implementing ARROW-6603, I discovered that I hadn't added a `AppendNull` BooleanArray.Builder yet because there were not any BooleanArray.Builder tests in "ArrayBuilderTests.cs" to begin with-–nor were there any tests for `BooleanArray.Slice` there or in "BooleanArrayTests.cs". As a result of adding those tests, and getting them to pass, [GitHub PR 6161|https://github.com/apache/arrow/pull/6121] also now resolves this issue. > [C#] Null support for BooleanArray > -- > > Key: ARROW-5708 > URL: https://issues.apache.org/jira/browse/ARROW-5708 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Priority: Major > > See the conversation > [here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and > [here|https://github.com/apache/arrow/pull/3574#discussion_r262662083]. > We should add null support for BooleanArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4544) [Rust] Read nested JSON structs into StructArrays
[ https://issues.apache.org/jira/browse/ARROW-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091969#comment-17091969 ] Jonathan Kelley commented on ARROW-4544: Is there a particular direction this would need to take that doesn't follow recursion? I'd like to contribute this feature but if recursion is not the recommended way, it would be nice to know up front. > [Rust] Read nested JSON structs into StructArrays > - > > Key: ARROW-4544 > URL: https://issues.apache.org/jira/browse/ARROW-4544 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Neville Dipale >Priority: Minor > > _Adding this as a separate task as it's a bit involved._ > Add the ability to read in JSON structs that are children of the JSON record > being read. > The main concern here is deeply nested structures, which will require a > performant and reusable basic JSON reader before dealing with recursion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091937#comment-17091937 ] Neal Richardson commented on ARROW-8556: Thanks, that's helpful. So what I see is that when the C++ library builds, `cmake` finds the system `zstd` so it opts to use that instead of build it from source too. But then when the R package shared library tries to load, it can't find it. This is beyond my level of C++ competence to debug further, so I'll solicit help from someone else. > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091925#comment-17091925 ] Karl Dunkle Werner commented on ARROW-8556: --- {noformat} Installing package into ‘/home/karl/test_arrow’ (as ‘lib’ is unspecified) --- Please select a CRAN mirror for use in this session --- trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.17.0.tar.gz' Content type 'application/x-gzip' length 242534 bytes (236 KB) == downloaded 236 KB* installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation *** Generating code with data-raw/codegen.R Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory trying URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' Error in download.file(from_url, to_file, quiet = quietly) : cannot open URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' trying URL 'https://www.apache.org/dyn/closer.lua?action=download=arrow/arrow-0.17.0/apache-arrow-0.17.0.tar.gz' Content type 'application/x-gzip' length 6460548 bytes (6.2 MB) == downloaded 6.2 MB*** Successfully retrieved C++ source *** Building C++ libraries rm: cannot remove 'src/*.o': No such file or directory *** Building with MAKEFLAGS= -j4 arrow with SOURCE_DIR=/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp BUILD_DIR=/tmp/RtmptP2CaW/file476e6fba345b DEST_DIR=libarrow/arrow-0.17.0 CMAKE=/usr/bin/cmake ++ pwd + : /tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow + : /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp + : /tmp/RtmptP2CaW/file476e6fba345b + : libarrow/arrow-0.17.0 + : /usr/bin/cmake ++ cd /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp ++ pwd + SOURCE_DIR=/tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp ++ mkdir -p libarrow/arrow-0.17.0 ++ cd libarrow/arrow-0.17.0 ++ pwd + DEST_DIR=/tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow/libarrow/arrow-0.17.0 + '[' '' = '' ']' + which ninja + CMAKE_GENERATOR=Ninja + '[' false = false ']' + ARROW_JEMALLOC=ON + ARROW_WITH_BROTLI=ON + ARROW_WITH_BZ2=ON + ARROW_WITH_LZ4=ON + ARROW_WITH_SNAPPY=ON + ARROW_WITH_ZLIB=ON + ARROW_WITH_ZSTD=ON + mkdir -p /tmp/RtmptP2CaW/file476e6fba345b + pushd /tmp/RtmptP2CaW/file476e6fba345b /tmp/RtmptP2CaW/file476e6fba345b /tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow + /usr/bin/cmake -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_WITH_BROTLI=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/tmp/RtmpynJFHV/R.INSTALL474739c260b7/arrow/libarrow/arrow-0.17.0 -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON -DOPENSSL_USE_STATIC_LIBS=ON -G Ninja /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp -- Building using CMake version: 3.13.4 -- The C compiler identification is GNU 9.2.1 -- The CXX compiler identification is GNU 9.2.1 -- Check for working C compiler: /usr/lib/ccache/cc -- Check for working C compiler: /usr/lib/ccache/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/lib/ccache/c++ -- Check for working CXX compiler: /usr/lib/ccache/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Arrow version: 0.17.0 (full: '0.17.0') -- Arrow SO version: 17 (full: 17.0.0) -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- clang-tidy not found -- clang-format not found -- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN) -- infer not found -- Found Python3: /usr/bin/python3.7 (found version "3.7.5") found components: Interpreter -- Using ccache: /usr/bin/ccache -- Found cpplint executable at /tmp/RtmptP2CaW/file476e274f73a4/apache-arrow-0.17.0/cpp/build-support/cpplint.py -- System processor: x86_64 -- Performing Test CXX_SUPPORTS_SSE4_2 -- Performing Test CXX_SUPPORTS_SSE4_2 - Success -- Performing Test CXX_SUPPORTS_AVX2 -- Performing Test CXX_SUPPORTS_AVX2 - Success -- Performing Test CXX_SUPPORTS_AVX512 -- Performing Test CXX_SUPPORTS_AVX512 - Success -- Arrow build warning level: PRODUCTION Using ld linker Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...}) -- Build Type: RELEASE -- Using AUTO approach to find dependencies
[jira] [Commented] (ARROW-6603) [C#] ArrayBuilder API to support writing nulls
[ https://issues.apache.org/jira/browse/ARROW-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091913#comment-17091913 ] Zachary Gramana commented on ARROW-6603: I came across this conversation late, and _after_ implementing an alternative approach which much more in-line with other Arrow implementations. I have submitted [GitHub Pull Request #7032|https://github.com/apache/arrow/pull/7032] which includes: * A newly added interface member, `AppendNull`, along with implementations for `PrimitiveArrayBuilder` and `Binary.BuilderBase`. * Additional work to finish the previously stubbed support for `NullBitmapBuffer` in a few of the specialized `Array` classes. * Several new and expanded tests. > [C#] ArrayBuilder API to support writing nulls > -- > > Key: ARROW-6603 > URL: https://issues.apache.org/jira/browse/ARROW-6603 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Assignee: Anthony Abate >Priority: Major > Labels: pull-request-available > Original Estimate: 72h > Time Spent: 3h 10m > Remaining Estimate: 68h 50m > > There is currently no API in the PrimitiveArrayBuilder class to support > writing nulls. See this TODO - > [https://github.com/apache/arrow/blob/1515fe10c039fb6685df2e282e2e888b773caa86/csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs#L101.] > > Also see [https://github.com/apache/arrow/issues/5381]. > > We should add some APIs to support writing nulls. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091909#comment-17091909 ] Antoine Pitrou commented on ARROW-8587: --- I suppose so. I had no idea that gRPC required zlib (probably for optional compression, though we don't use it). > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6718) [Rust] packed_simd requires nightly
[ https://issues.apache.org/jira/browse/ARROW-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6718: -- Labels: pull-request-available (was: ) > [Rust] packed_simd requires nightly > > > Key: ARROW-6718 > URL: https://issues.apache.org/jira/browse/ARROW-6718 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Andy Grove >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/rust-lang/rfcs/pull/2366] for more info on > stabilization of this crate. > > {code:java} > error[E0554]: `#![feature]` may not be used on the stable release channel >--> > /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1 > | > 202 | / #![feature( > 203 | | repr_simd, > 204 | | const_fn, > 205 | | platform_intrinsics, > ... | > 215 | | custom_inner_attributes > 216 | | )] > | |__^ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091890#comment-17091890 ] Neal Richardson commented on ARROW-8556: Maybe it's something about 19.10, maybe it's something about your particular setup, or maybe it's a more general issue. To debug, I'd recommend setting `ARROW_R_DEV=true` (for verbosity), `LIBARROW_BINARY=false` (to ensure that we build from source), and `LIBARROW_MINIMAL=false` (so that it turns on zstd) and reinstalling. Then attach here the full installation logs, and I can try to sift through them. Then I may have some other ideas of things to try. Thanks for your help! > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091888#comment-17091888 ] Wes McKinney commented on ARROW-8587: - Is the zlib dependency coming from gRPC? It shouldn't be necessary to add {{ARROW_WITH_ZLIB=ON}} here (so this should be enabled automatically if it's needed) > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091885#comment-17091885 ] Chengxin Ma commented on ARROW-8587: Adding {{-DARROW_WITH_ZLIB=ON}} solved this problem. (I was expecting that the build system could find zlib on my system automatically so I didn't set this flag.) > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8591) [Rust] Reverse lookup for a key in DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8591: -- Labels: pull-request-available (was: ) > [Rust] Reverse lookup for a key in DictionaryArray > -- > > Key: ARROW-8591 > URL: https://issues.apache.org/jira/browse/ARROW-8591 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there is no way to do a reverse lookup for DictionaryArray. A > reverse lookup would be beneficial. (Enables creation of combiner masks) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion
[ https://issues.apache.org/jira/browse/ARROW-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hildreth updated ARROW-8590: - Description: ARROW-8287 added some new utility methods for pretty printing into the rust arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These were basically copied from DataFusion. Modify DataFusion to use the utility methods in the arrow crate, removing the duplicate code. (was: ARROW-8287 added some new utility methods for pretty printing into the rust arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These were basically pulled from DataFusion. Modify DataFusion to use the utility methods in the arrow crate, removing the duplicate code.) > [Rust] Use Arrow pretty print utility in DataFusion > --- > > Key: ARROW-8590 > URL: https://issues.apache.org/jira/browse/ARROW-8590 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Mark Hildreth >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-8287 added some new utility methods for pretty printing into the rust > arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These > were basically copied from DataFusion. Modify DataFusion to use the utility > methods in the arrow crate, removing the duplicate code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8591) [Rust] Reverse lookup for a key in DictionaryArray
Mahmut Bulut created ARROW-8591: --- Summary: [Rust] Reverse lookup for a key in DictionaryArray Key: ARROW-8591 URL: https://issues.apache.org/jira/browse/ARROW-8591 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Currently, there is no way to do a reverse lookup for DictionaryArray. A reverse lookup would be beneficial. (Enables creation of combiner masks) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion
[ https://issues.apache.org/jira/browse/ARROW-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8590: -- Labels: pull-request-available (was: ) > [Rust] Use Arrow pretty print utility in DataFusion > --- > > Key: ARROW-8590 > URL: https://issues.apache.org/jira/browse/ARROW-8590 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Mark Hildreth >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-8287 added some new utility methods for pretty printing into the rust > arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These > were basically pulled from DataFusion. Modify DataFusion to use the utility > methods in the arrow crate, removing the duplicate code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091870#comment-17091870 ] Antoine Pitrou commented on ARROW-8587: --- I don't see that error myself. Can you try to pass {{-DARROW_WITH_ZSTD=on}} ? > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8590) [Rust] Use Arrow pretty print utility in DataFusion
Mark Hildreth created ARROW-8590: Summary: [Rust] Use Arrow pretty print utility in DataFusion Key: ARROW-8590 URL: https://issues.apache.org/jira/browse/ARROW-8590 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Mark Hildreth ARROW-8287 added some new utility methods for pretty printing into the rust arrow crate (see [PR 6972|https://github.com/apache/arrow/pull/6972]). These were basically pulled from DataFusion. Modify DataFusion to use the utility methods in the arrow crate, removing the duplicate code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8575) [Developer] Add issue_comment workflow to rebase a PR
[ https://issues.apache.org/jira/browse/ARROW-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-8575. Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7028 [https://github.com/apache/arrow/pull/7028] > [Developer] Add issue_comment workflow to rebase a PR > - > > Key: ARROW-8575 > URL: https://issues.apache.org/jira/browse/ARROW-8575 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxin Ma reopened ARROW-8587: > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091863#comment-17091863 ] Chengxin Ma commented on ARROW-8587: Thanks for the quick fix. Unfortunately I still see the following error messages: {code} [ 96%] Linking CXX executable ../../../release/arrow-flight-perf-server ../../../release/libarrow_flight.so.18.0.0: undefined reference to `inflateInit2_' ../../../release/libarrow_flight.so.18.0.0: undefined reference to `inflate' ../../../release/libarrow_flight.so.18.0.0: undefined reference to `deflateInit2_' ../../../release/libarrow_flight.so.18.0.0: undefined reference to `deflate' ../../../release/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' ../../../release/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' collect2: error: ld returned 1 exit status src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:156: recipe for target 'release/arrow-flight-perf-server' failed make[2]: *** [release/arrow-flight-perf-server] Error 1 CMakeFiles/Makefile2:2648: recipe for target 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] Error 2 Makefile:140: recipe for target 'all' failed make: *** [all] Error 2 {code} This seems to be a problem related to {{zlib}}. On my computer it is the latest version: {{zlib1g-dev is already the newest version (1:1.2.11.dfsg-0ubuntu2).}} I guess this issue is still related to {{ThirdpartyToolchain.cmake}}? > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?
[jira] [Commented] (ARROW-7244) [Python] Inconsistent behavior with reading in S3 parquet objects
[ https://issues.apache.org/jira/browse/ARROW-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091841#comment-17091841 ] Harini Kannan commented on ARROW-7244: -- Any update on this ? I'm seeing the same error pop up randomly when I have a lambda function triggering on new parquet files in an S3 bucket which reads the parquet files using ParquetDataset(). Or is there any workaround for this ? > [Python] Inconsistent behavior with reading in S3 parquet objects > - > > Key: ARROW-7244 > URL: https://issues.apache.org/jira/browse/ARROW-7244 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 > Environment: running in a lambda, compiled on an EC2 using linux >Reporter: William Tardio >Priority: Major > > We are piloting using pyarrow to reaching parquet files from AWS S3. > > We got it working in combination with s3fs as the filesystem. However, we are > seeing very inconsistent results when reading in parquet objects with > s3=s3fs.S3FileSystem() > ParquetDataset(url, filesystem=s3) > > The read inconsistently throws this error: > > [ERROR] OSError: Passed non-file path: > s3://bucket/schedule/sxaup/fms_db_aub/adn_master/trunc/20191122024436.parquet > Traceback (most recent call last): > File "/var/task/file_check.py", line 35, in lambda_handler > main(event, context) > File "/var/task/file_check.py", line 260, in main > validate_resp['object_type']) > File "/opt/python/utils.py", line 80, in schema_check > stage_pya_dataset = ParquetDataset(full_URL_stage, filesystem=s3) > File "/opt/python/lib/python3.7/site-packages/pyarrow/parquet.py", line > 1030, in __init__ > open_file_func=partial(_open_dataset_file, self._metadata) > File "/opt/python/lib/python3.7/site-packages/pyarrow/parquet.py", line > 1229, in _make_manifest > .format(path)) > > As you can see, the path is valid and sometimes works, others times does not > (no modification of the file between those successful and error runs). Does > ParquetDataset actually open the file and validate it and so the error is in > regards to the data? > > Willing to do any troubleshooting for get this solved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7759) [C++][Dataset] Add CsvFileFormat for CSV support
[ https://issues.apache.org/jira/browse/ARROW-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7759: -- Labels: dataset pull-request-available (was: dataset) > [C++][Dataset] Add CsvFileFormat for CSV support > > > Key: ARROW-7759 > URL: https://issues.apache.org/jira/browse/ARROW-7759 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Antoine Pitrou >Priority: Major > Labels: dataset, pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This should be a minimal implementation that binds 1-1 file and ScanTask for > now. Streaming optimizations can be done in ARROW-3410. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8587. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7031 [https://github.com/apache/arrow/pull/7031] > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8589) ModuleNotFoundError: No module named 'pyarrow._orc'
ryan created ARROW-8589: --- Summary: ModuleNotFoundError: No module named 'pyarrow._orc' Key: ARROW-8589 URL: https://issues.apache.org/jira/browse/ARROW-8589 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.0, 0.16.0, 0.15.1, 0.15.0, 0.14.1, 0.14.0 Environment: I am on a mac, mojave version 10.14.6 is the os version python 3.6.10 I am using a conda env, but I actually needed to use pip to install all the packages including pyarrow. Reporter: ryan When using verion 0.17.0 this error happens when I try to `import pyarrow.orc as orc` {code:java} Traceback (most recent call last): File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/Users/ryconnolly/code/source-syncer/sourcesyncer/s3_source_syncer.py", line 9, in import pyarrow.orc as orc File "/Users/ryconnolly/anaconda3/envs/source-syncer/lib/python3.6/site-packages/pyarrow/orc.py", line 24, in import pyarrow._orc as _orc ModuleNotFoundError: No module named 'pyarrow._orc'{code} the current work around is to pin the version to 0.13.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8588) `driver` param removed from `hdfs.connect()`
Jack Fan created ARROW-8588: --- Summary: `driver` param removed from `hdfs.connect()` Key: ARROW-8588 URL: https://issues.apache.org/jira/browse/ARROW-8588 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.0 Reporter: Jack Fan Hi, It appears in ARROW-7863 the `driver` param was removed from `hdfs.connect()` function. However, if I understand it correctly, ARROW-7863 should only remove `libhdfs3` related tests, not disabling it entirely. If I instantiate `pyarrow.HadoopFileSystem` class directly, it is still able to take in the `driver` param. Can the arrow project check whether this change in API is intended? Also, even if it is intended, this is a breaking change and deserves some documentation around it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8557) [Python] from pyarrow import parquet fails with AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__'
[ https://issues.apache.org/jira/browse/ARROW-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091791#comment-17091791 ] Hal T commented on ARROW-8557: -- No, this is in a jupyter notebook on a debian 8 environment, using 3.6.4 > [Python] from pyarrow import parquet fails with AttributeError: type object > 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__' > -- > > Key: ARROW-8557 > URL: https://issues.apache.org/jira/browse/ARROW-8557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.16.0, 0.17.0 > Environment: Python 3.8.4, GCC 4.8.4, Debian 8 >Reporter: Hal T >Priority: Major > > I have tried versions 0.15.1, 0.16.0, 0.17.0. Same error on all. I've seen in > other issues that co-installations of tensorflow and numpy might be causing > issues. I have tensorflow==1.14.0 and numpy==1.16.4 (and many other > libraries, but I've read that those tend to cause issues) > > {{}} > > {code:java} > from pyarrow import parquet > > ~/python/lib/python3.6/site-packages/pyarrow/parquet.py in > 32 import pyarrow as pa > 33 import pyarrow.lib as lib > ---> 34 import pyarrow._parquet as _parquet > 35 > 36 from pyarrow._parquet import (ParquetReader, Statistics, # noqa > ~/python/lib/python3.6/site-packages/pyarrow/_parquet.pyx in init > pyarrow._parquet() > > AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute > '__reduce_cython__' > {code} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8587: -- Labels: pull-request-available (was: ) > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091776#comment-17091776 ] Antoine Pitrou commented on ARROW-8587: --- By the way, you should build the benchmarks in release mode, not debug. > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-8587: - Assignee: Antoine Pitrou > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Assignee: Antoine Pitrou >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091770#comment-17091770 ] Antoine Pitrou commented on ARROW-8587: --- I've bisected and the culprit is ARROW-7869. > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7869) [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels
[ https://issues.apache.org/jira/browse/ARROW-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091772#comment-17091772 ] Antoine Pitrou commented on ARROW-7869: --- This seems to have caused ARROW-8587. > [Python] Boost::system and boost::filesystem not necessary anymore in Python > wheels > --- > > Key: ARROW-7869 > URL: https://issues.apache.org/jira/browse/ARROW-7869 > Project: Apache Arrow > Issue Type: Task > Components: Packaging, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Unfortunately it seems we still need boost::regex due to Parquet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091761#comment-17091761 ] Chengxin Ma commented on ARROW-8587: Additional information: I still saw this error after rolling back the code base by: {{git checkout apache-arrow-0.17.0}} > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-8587: -- Affects Version/s: (was: 1.0.0) 0.17.0 > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 0.17.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
[ https://issues.apache.org/jira/browse/ARROW-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091743#comment-17091743 ] Antoine Pitrou commented on ARROW-8587: --- Weirdly, I get the same error now on Ubuntu 18.04. I used to be able to build it, so something broke along the way. > Compilation error when linking arrow-flight-perf-server > --- > > Key: ARROW-8587 > URL: https://issues.apache.org/jira/browse/ARROW-8587 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, C++, FlightRPC >Affects Versions: 1.0.0 > Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar > 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Chengxin Ma >Priority: Minor > > I wanted to play around with Flight benchmark after seeing the discussion > regarding Flight's throughput in arrow dev mailing list today. > I met the following error when trying to build the benchmark from latest > source code: > {code:java} > [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::canonical(boost::filesystem::path const&, > boost::filesystem::path const&, boost::system::error_code*)' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::system_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::parent_path() const' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::system::generic_category()' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::detail::current_path(boost::system::error_code*)' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `inflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to > `deflateInit2_' > ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' > ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to > `boost::filesystem::path::operator/=(boost::filesystem::path const&)' > collect2: error: ld returned 1 exit status > src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: > recipe for target 'debug/arrow-flight-perf-server' failed > make[2]: *** [debug/arrow-flight-perf-server] Error 1 > CMakeFiles/Makefile2:2609: recipe for target > 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed > make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] > Error 2 > Makefile:140: recipe for target 'all' failed > make: *** [all] Error 2 > {code} > I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug > -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON > -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. > I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the > output, but the Boost library that I installed from the package manger was of > this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? > PS: > I was able to build the benchmark > [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with > the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very > similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate
[ https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091739#comment-17091739 ] Mark Hildreth edited comment on ARROW-8559 at 4/24/20, 5:00 PM: Generally in favor, but one question and one bikeshed: Question: perhaps my Rust-fu is lacking, but why would we also need a {{SendableBatchIterator}}? If we want to make sure that a type marks itself {{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only {{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}. Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either {{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} seems a bit misleading. was (Author: markhildreth): Generally in favor, but one question and one bikeshed: Question: perhaps my Rust-fu is lacking, but why would we need a {{SendableBatchIterator}}? If we want to make sure that a type marks itself {{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only {{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}. Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either {{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} seems a bit misleading. > [Rust] Consolidate Record Batch iterator traits in main arrow crate > --- > > Key: ARROW-8559 > URL: https://issues.apache.org/jira/browse/ARROW-8559 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Major > > We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` > trait in the main arrow crate. > They differ in that `BatchIterator` is Send + Sync. They should both be in > the Arrow crate and be named `BatchIterator` and `SendableBatchIterator` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate
[ https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091739#comment-17091739 ] Mark Hildreth commented on ARROW-8559: -- Generally in favor, but one question and one bikeshed: Question: perhaps my Rust-fu is lacking, but why would we need a {{SendableBatchIterator}}? If we want to make sure that a type marks itself {{Send}} and/or {{Sync}}, it can do that. If an interface wants to accept only {{Send}} and/or {{Sync}} iterators, it could do {{BatchIterator + Send + Sync}}. Bikeshed: There are no {{std::iter::Iterator}} trait implementation for either {{BatchIterator}} or {{RecordBatchReader}}. Thus, using the name {{Iterator}} seems a bit misleading. > [Rust] Consolidate Record Batch iterator traits in main arrow crate > --- > > Key: ARROW-8559 > URL: https://issues.apache.org/jira/browse/ARROW-8559 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Major > > We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` > trait in the main arrow crate. > They differ in that `BatchIterator` is Send + Sync. They should both be in > the Arrow crate and be named `BatchIterator` and `SendableBatchIterator` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8580) Pyarrow exceptions are not helpful
[ https://issues.apache.org/jira/browse/ARROW-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soroush Radpour updated ARROW-8580: --- Description: I'm trying to understand an exception in the code using pyarrow, and it is not very helpful. File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status OSError: IOError: b'Service Unavailable'. Detail: Python exception: RuntimeError It would be great if each of the three exceptions was unwrapped with full stack trace and error messages that came with it. was: I'm trying to understand an exception in the code using pyarrow, and it is not very helpful. {{ File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status OSError: IOError: b'Service Unavailable'. Detail: Python exception: RuntimeError}} It would be great if each of the three exceptions was unwrapped with full stack trace and error messages that came with it. > Pyarrow exceptions are not helpful > -- > > Key: ARROW-8580 > URL: https://issues.apache.org/jira/browse/ARROW-8580 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Soroush Radpour >Priority: Major > > I'm trying to understand an exception in the code using pyarrow, and it is > not very helpful. > File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > OSError: IOError: b'Service Unavailable'. Detail: Python exception: > RuntimeError > > It would be great if each of the three exceptions was unwrapped with full > stack trace and error messages that came with it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
Chengxin Ma created ARROW-8587: -- Summary: Compilation error when linking arrow-flight-perf-server Key: ARROW-8587 URL: https://issues.apache.org/jira/browse/ARROW-8587 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, C++, FlightRPC Affects Versions: 1.0.0 Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Reporter: Chengxin Ma I wanted to play around with Flight benchmark after seeing the discussion regarding Flight's throughput in arrow dev mailing list today. I met the following error when trying to build the benchmark from latest source code: {code:java} [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::detail::canonical(boost::filesystem::path const&, boost::filesystem::path const&, boost::system::error_code*)' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::system::system_category()' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::path::parent_path() const' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::system::generic_category()' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::detail::current_path(boost::system::error_code*)' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateInit2_' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateInit2_' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::path::operator/=(boost::filesystem::path const&)' collect2: error: ld returned 1 exit status src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: recipe for target 'debug/arrow-flight-perf-server' failed make[2]: *** [debug/arrow-flight-perf-server] Error 1 CMakeFiles/Makefile2:2609: recipe for target 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] Error 2 Makefile:140: recipe for target 'all' failed make: *** [all] Error 2 {code} I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the output, but the Boost library that I installed from the package manger was of this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? PS: I was able to build the benchmark [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091700#comment-17091700 ] Karl Dunkle Werner commented on ARROW-8556: --- Great! If you want to get to the bottom of it, I would be happy to run commands you send me. I think most 19.10 users will be moving to 20.04 soon, so this might only be worth it if 20.04 experiences the same issue. > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device
[ https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523 ] Tanveer edited comment on ARROW-8577 at 4/24/20, 4:10 PM: -- Hi Kouhei, This is the program. I am taking a RecordBatch (batch_genomics) as input in this function. The error arises at: gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); {code:java} guint8 id_arr[20]; genRandom(id_arr,20); char objID_file[] = "/home/tahmad/lib/core/objID.txt"; g_print("obj_id: %s\n", id_arr); gboolean success = TRUE; GError *error = NULL; GPlasmaClient *gPlasmaClient; GPlasmaObjectID *object_id; GPlasmaClientCreateOptions *create_options; GPlasmaClientOptions *gplasmaClient_options; GPlasmaCreatedObject *Object; GPlasmaReferredObject *refObject; GArrowBuffer *arrowBuffer; arrowBuffer = GSerializeRecordBatch(batch_genomics); gint32 size = garrow_buffer_get_size(arrowBuffer); gplasmaClient_options = gplasma_client_options_new(); gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); object_id = gplasma_object_id_new(id_arr, 20, ); create_options = gplasma_client_create_options_new(); { guint8 metadata[] = "metadata"; gplasma_client_create_options_set_metadata(create_options, (const guint8 *)metadata, sizeof(metadata)); } Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, ); g_object_unref(create_options); { GArrowBuffer *data; guint8 dataW[] = "data"; g_object_get(Object, "data", , NULL); garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, garrow_buffer_get_databytes(arrowBuffer),size,); g_object_unref(data); } gplasma_created_object_seal(Object, ); g_object_unref(Object); gplasma_client_disconnect(gPlasmaClient, ); g_object_unref(gPlasmaClient);{code} I am using this function to convert Arrow RecordBatch to ArrowBuffer: {code:java} extern "C" GArrowBuffer * GSerializeRecordBatchToBuffer(GArrowRecordBatch *record_batch) { const auto arrow_record_batch = garrow_record_batch_get_raw(record_batch); std::shared_ptr resizable_buffer; arrow::AllocateResizableBuffer(arrow::default_memory_pool(), 0, _buffer); std::shared_ptr buffer = std::dynamic_pointer_cast(resizable_buffer); arrow::ipc::SerializeRecordBatch(*arrow_record_batch, arrow::default_memory_pool(), ); return garrow_buffer_new_raw(); } {code} was (Author: tahmad): Hi Kouhei, This is the program. I am taking a RecordBatch (batch_genomics) as input in this function. The error arises at: gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); {code:java} guint8 id_arr[20]; genRandom(id_arr,20); char objID_file[] = "/home/tahmad/lib/core/objID.txt"; g_print("obj_id: %s\n", id_arr); gboolean success = TRUE; GError *error = NULL; GPlasmaClient *gPlasmaClient; GPlasmaObjectID *object_id; GPlasmaClientCreateOptions *create_options; GPlasmaClientOptions *gplasmaClient_options; GPlasmaCreatedObject *Object; GPlasmaReferredObject *refObject; GArrowBuffer *arrowBuffer; arrowBuffer = GSerializeRecordBatch(batch_genomics); gint32 size = garrow_buffer_get_size(arrowBuffer); gplasmaClient_options = gplasma_client_options_new(); gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); object_id = gplasma_object_id_new(id_arr, 20, ); create_options = gplasma_client_create_options_new(); { guint8 metadata[] = "metadata"; gplasma_client_create_options_set_metadata(create_options, (const guint8 *)metadata, sizeof(metadata)); } Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, ); g_object_unref(create_options); { GArrowBuffer *data; guint8 dataW[] = "data"; g_object_get(Object, "data", , NULL); garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, garrow_buffer_get_databytes(arrowBuffer),size,); g_object_unref(data); } gplasma_created_object_seal(Object, ); g_object_unref(Object); gplasma_client_disconnect(gPlasmaClient, ); g_object_unref(gPlasmaClient);{code} > [GLib][Plasma] gplasma_client_options_new() default settings are enabling a > check for CUDA device > - > > Key: ARROW-8577 > URL: https://issues.apache.org/jira/browse/ARROW-8577 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Reporter: Tanveer >Assignee: Kouhei Sutou >Priority: Major > > Hi all, > Previously, I was using c_glib Plasma library (build 0.12) for creating > plasma objects. It was working as expected. But now I want to use Arrow's > newest build. I incurred the following error: > > /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on > an error: IOError: Cuda error 100 in function 'cuInit': >
[jira] [Comment Edited] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device
[ https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523 ] Tanveer edited comment on ARROW-8577 at 4/24/20, 4:07 PM: -- Hi Kouhei, This is the program. I am taking a RecordBatch (batch_genomics) as input in this function. The error arises at: gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); {code:java} guint8 id_arr[20]; genRandom(id_arr,20); char objID_file[] = "/home/tahmad/lib/core/objID.txt"; g_print("obj_id: %s\n", id_arr); gboolean success = TRUE; GError *error = NULL; GPlasmaClient *gPlasmaClient; GPlasmaObjectID *object_id; GPlasmaClientCreateOptions *create_options; GPlasmaClientOptions *gplasmaClient_options; GPlasmaCreatedObject *Object; GPlasmaReferredObject *refObject; GArrowBuffer *arrowBuffer; arrowBuffer = GSerializeRecordBatch(batch_genomics); gint32 size = garrow_buffer_get_size(arrowBuffer); gplasmaClient_options = gplasma_client_options_new(); gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); object_id = gplasma_object_id_new(id_arr, 20, ); create_options = gplasma_client_create_options_new(); { guint8 metadata[] = "metadata"; gplasma_client_create_options_set_metadata(create_options, (const guint8 *)metadata, sizeof(metadata)); } Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, ); g_object_unref(create_options); { GArrowBuffer *data; guint8 dataW[] = "data"; g_object_get(Object, "data", , NULL); garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, garrow_buffer_get_databytes(arrowBuffer),size,); g_object_unref(data); } gplasma_created_object_seal(Object, ); g_object_unref(Object); gplasma_client_disconnect(gPlasmaClient, ); g_object_unref(gPlasmaClient);{code} was (Author: tahmad): Hi Kouhei, This the program. I am taking a RecordBatch (batch_genomics) as input in this function. The error arises at: gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); {code:java} guint8 id_arr[20]; genRandom(id_arr,20); char objID_file[] = "/home/tahmad/lib/core/objID.txt"; g_print("obj_id: %s\n", id_arr); gboolean success = TRUE; GError *error = NULL; GPlasmaClient *gPlasmaClient; GPlasmaObjectID *object_id; GPlasmaClientCreateOptions *create_options; GPlasmaClientOptions *gplasmaClient_options; GPlasmaCreatedObject *Object; GPlasmaReferredObject *refObject; GArrowBuffer *arrowBuffer; arrowBuffer = GSerializeRecordBatch(batch_genomics); gint32 size = garrow_buffer_get_size(arrowBuffer); gplasmaClient_options = gplasma_client_options_new(); gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); object_id = gplasma_object_id_new(id_arr, 20, ); create_options = gplasma_client_create_options_new(); { guint8 metadata[] = "metadata"; gplasma_client_create_options_set_metadata(create_options, (const guint8 *)metadata, sizeof(metadata)); } Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, ); g_object_unref(create_options); { GArrowBuffer *data; guint8 dataW[] = "data"; g_object_get(Object, "data", , NULL); garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, garrow_buffer_get_databytes(arrowBuffer),size,); g_object_unref(data); } gplasma_created_object_seal(Object, ); g_object_unref(Object); gplasma_client_disconnect(gPlasmaClient, ); g_object_unref(gPlasmaClient);{code} > [GLib][Plasma] gplasma_client_options_new() default settings are enabling a > check for CUDA device > - > > Key: ARROW-8577 > URL: https://issues.apache.org/jira/browse/ARROW-8577 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Reporter: Tanveer >Assignee: Kouhei Sutou >Priority: Major > > Hi all, > Previously, I was using c_glib Plasma library (build 0.12) for creating > plasma objects. It was working as expected. But now I want to use Arrow's > newest build. I incurred the following error: > > /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on > an error: IOError: Cuda error 100 in function 'cuInit': > [CUDA_ERROR_NO_DEVICE] no CUDA-capable device is detected > I think plasma client options (gplasma_client_options_new()) which I am using > with default settings are enabling a check for my CUDA device and I have no > CUDA device attached to my system. How I can disable this check? Any help > will be highly appreciated. Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data
[ https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091686#comment-17091686 ] Gregory Hayes commented on ARROW-7706: -- I've encountered this as well, when using pyarrow v0.17. In my instance, I've attempted to both write and to append to a partitioned dataset. Both a write and append operation silently double the data. > [Python] saving a dataframe to the same partitioned location silently doubles > the data > -- > > Key: ARROW-7706 > URL: https://issues.apache.org/jira/browse/ARROW-7706 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Tsvika Shapira >Priority: Major > Labels: dataset, parquet > > When a user saves a dataframe: > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') > {code} > it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in > {{/tmp/table}}. Each of them will contain one (or more?) parquet files with > random filenames. > If a user runs the same command again, the code will use the existing > sub-directories, but with different (random) filenames. As a result, any data > loaded from this folder will be wrong - each row will be present twice. > For example, when using > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') # > second time > df2 = pd.read_parquet('/tmp/table', engine='pyarrow') > assert len(df1) == len(df2) # raise an error{code} > This is a subtle change in the data that can pass unnoticed. > > I would expect that the code will prevent the user from using an non-empty > destination as partitioned target. an overwrite flag can also be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091678#comment-17091678 ] Neal Richardson commented on ARROW-8556: Thanks. I've mapped ubuntu 19.10 to ubuntu-18.04 [here|https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/distro-map.csv#L13] so installation with a binary should Just Work now. I'm curious why zstd wasn't included correctly before (see that there is no {{-lzstd}} in the {{PKG_LIBS}} line), but if you want to let it lie and move on, that's fine with me, we can wait and see if anyone else experiences that. > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8586) [R] installation failure on CentOS 7
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-8586: --- Summary: [R] installation failure on CentOS 7 (was: Failed to Install arrow From CRAN) > [R] installation failure on CentOS 7 > > > Key: ARROW-8586 > URL: https://issues.apache.org/jira/browse/ARROW-8586 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: CentOS 7 >Reporter: Hei >Priority: Major > > Hi, > I am trying to install arrow via RStudio, but it seems like it is not working > that after I installed the package, it kept asking me to run > arrow::install_arrow() even after I did: > {code} > > install.packages("arrow") > Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ > (as ‘lib’ is unspecified) > trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' > Content type 'application/x-gzip' length 242534 bytes (236 KB) > == > downloaded 236 KB > * installing *source* package ‘arrow’ ... > ** package ‘arrow’ successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ source > *** Building C++ libraries > cmake > arrow > ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > - > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_from_vector.cpp -o > array_from_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_to_vector.cpp -o > array_to_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c arraydata.cpp -o arraydata.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c arrowExports.cpp -o > arrowExports.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c buffer.cpp -o buffer.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c chunkedarray.cpp -o > chunkedarray.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c compression.cpp -o > compression.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c compute.cpp -o compute.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG >
[jira] [Commented] (ARROW-8586) Failed to Install arrow From CRAN
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091670#comment-17091670 ] Neal Richardson commented on ARROW-8586: Thanks for the report. There seem to be two issues: (1) C++ build from source is failing, and (2) when {{install_arrow}} tries to download a prebuilt binary, it's not correctly identifying your OS version. To debug the first issue, could you please set the environment variable {{ARROW_R_DEV=true}} and retry, and share with me the (much more verbose) installation logs? To debug the second, could you please tell me what {{lsb_release -rs}} says at the command line? A workaround will be to set {{LIBARROW_BINARY=centos-7}} and reinstall (or, equivalently, call {{arrow::install_arrow(binary="centos-7")}} from R, since you have that installed). But I'd appreciate your help in debugging the issue so that we can make it work correctly going forward. > Failed to Install arrow From CRAN > - > > Key: ARROW-8586 > URL: https://issues.apache.org/jira/browse/ARROW-8586 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: CentOS 7 >Reporter: Hei >Priority: Major > > Hi, > I am trying to install arrow via RStudio, but it seems like it is not working > that after I installed the package, it kept asking me to run > arrow::install_arrow() even after I did: > {code} > > install.packages("arrow") > Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ > (as ‘lib’ is unspecified) > trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' > Content type 'application/x-gzip' length 242534 bytes (236 KB) > == > downloaded 236 KB > * installing *source* package ‘arrow’ ... > ** package ‘arrow’ successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ source > *** Building C++ libraries > cmake > arrow > ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > - > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_from_vector.cpp -o > array_from_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_to_vector.cpp -o > array_to_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c arraydata.cpp -o arraydata.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c arrowExports.cpp -o > arrowExports.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c buffer.cpp -o buffer.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c chunkedarray.cpp -o > chunkedarray.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG >
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091640#comment-17091640 ] Neville Dipale commented on ARROW-5949: --- I think not providing more convenient ways of using DictionaryArray potentially defeats the purpose of having it. I've already mentioned the need for compute kernel support on dictionaries, some of which would require access to the array's keys as a primitive array (e.g. sort, take), and others which would need both keys and values (filter). I would rather have the DictionaryArray::keys() return ArrayRef instead of NullableIter, then support iterating on arrays in general. Yes, building the primitive array is a bit expensive, and more importantly, it's opaque to a casual Arrow user; so I'd support providing that option. Look at the below, for example: {code:java} impl<'a, K: ArrowPrimitiveType> DictionaryArray { pub fn decode_dictionary() -> Result { // convert the keys into an array let keys = Arc::new(PrimitiveArrayfrom(self.data.clone())) as ArrayRef; // cast keys to an uint32 array let keys = crate::compute::cast(, ::UInt32)?; let keys = UInt32Array::from(keys.data()); // index into the values of the dictionary, with keys crate::compute::take(, , None) } }{code} This is how I'd convert a dictionary to a 'normal' array of an unknown type. Perhaps this could be a discussion for the mailing list? I'm interested in simplifying the dictionary API, and widening dictionary support; this could be a good starting point to do this. CC [~paddyhoran] [~andygrove] > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600 ] Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 2:10 PM: --- Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to show that we can leave the indices as how -1 is masked on (unfortunately it won't work with unsigned values, I think that's why the bit masking approach is better). Thanks for the links they were fruitful. I think I am more inclined to not build the primitive array, neither user should collect the result from the iterator nor one by one look for the Some(_), that said I tend to have slice given back from the array, which is most probably enable users who are using SIMD later. Thou, it is also nice to have a PrimitiveArray API given to users. Current stable SIMD instructions (also packed_simd that rust impl uses) fill free so I need to use contiguous scalars for dict encoded operations, which are crucial for my use case (repacking the arrow array is an overhead for me). So I have started to make a vectorized slice implementation over current dictionary array, is it ok to include slice kind of approach to Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt? edit: contiguous not continuous was (Author: vertexclique): Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to show that we can leave the indices as how -1 is masked on (unfortunately it won't work with unsigned values, I think that's why the bit masking approach is better). Thanks for the links they were fruitful. I think I am more inclined to not build the primitive array, neither user should collect the result from the iterator nor one by one look for the Some(_), that said I tend to have slice given back from the array, which is most probably enable users who are using SIMD later. Thou, it is also nice to have a PrimitiveArray API given to users. Current stable SIMD instructions (also packed_simd that rust impl uses) fill free so I need to use continuous scalars for dict encoded operations, which are crucial for my use case (repacking the arrow array is an overhead for me). So I have started to make a vectorized slice implementation over current dictionary array, is it ok to include slice kind of approach to Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600 ] Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 2:06 PM: --- Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to show that we can leave the indices as how -1 is masked on (unfortunately it won't work with unsigned values, I think that's why the bit masking approach is better). Thanks for the links they were fruitful. I think I am more inclined to not build the primitive array, neither user should collect the result from the iterator nor one by one look for the Some(_), that said I tend to have slice given back from the array, which is most probably enable users who are using SIMD later. Thou, it is also nice to have a PrimitiveArray API given to users. Current stable SIMD instructions (also packed_simd that rust impl uses) fill free so I need to use continuous scalars for dict encoded operations, which are crucial for my use case (repacking the arrow array is an overhead for me). So I have started to make a vectorized slice implementation over current dictionary array, is it ok to include slice kind of approach to Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt? was (Author: vertexclique): Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to show that we can leave the indices as how -1 is masked on (unfortunately it won't work with unsigned values, I think that's why the bit masking approach is better). Thanks for the links they were fruitful. I think I am more inclined to not build the primitive array, neither user should collect the result from the iterator nor one by one look for the Some(_), that said I tend to have slice given back from the array, which is most probably enable users who are using SIMD later. Thou, it is also nice to have a PrimitiveArray API given to users. Current stable SIMD instructions also packed_simd are fill free so I need to use continuous scalars for dict encoded operations, which are crucial for my use case (repacking the arrow array is an overhead for me). So I have started to make a vectorized slice implementation over current dictionary array, is it ok to include slice kind of approach to Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091600#comment-17091600 ] Mahmut Bulut commented on ARROW-5949: - Sorry, yes, that's exactly like that, it is ok and valid. Gave that example to show that we can leave the indices as how -1 is masked on (unfortunately it won't work with unsigned values, I think that's why the bit masking approach is better). Thanks for the links they were fruitful. I think I am more inclined to not build the primitive array, neither user should collect the result from the iterator nor one by one look for the Some(_), that said I tend to have slice given back from the array, which is most probably enable users who are using SIMD later. Thou, it is also nice to have a PrimitiveArray API given to users. Current stable SIMD instructions also packed_simd are fill free so I need to use continuous scalars for dict encoded operations, which are crucial for my use case (repacking the arrow array is an overhead for me). So I have started to make a vectorized slice implementation over current dictionary array, is it ok to include slice kind of approach to Arrow? with chunked offsets, we can even use Rust arrays too. Wdyt? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091569#comment-17091569 ] Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:21 PM: - Thanks, having looked at the implementation; I think they're handled the same way in Rust (if we exclude the iterator interface). {code:java} std::vector raw_indices = {0, 1, 2, -1, 3}; std::vector is_valid = {1, 1, 1, 0, 1};{code} Are you referring to the -1 on the indices? It gets masked by the is_valid mask, so I think even if any other value was used, the result would still be the same. Perhaps I'm not understanding. was (Author: nevi_me): Thanks, having looked at the implementation; I think they're handled the same way in Rust (if we exclude the iterator interface). {code:java} std::vector raw_indices = {0, 1, 2, -1, 3}; std::vector is_valid = {1, 1, 1, 0, 1};{code} > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091569#comment-17091569 ] Neville Dipale commented on ARROW-5949: --- Thanks, having looked at the implementation; I think they're handled the same way in Rust (if we exclude the iterator interface). {code:java} std::vector raw_indices = {0, 1, 2, -1, 3}; std::vector is_valid = {1, 1, 1, 0, 1};{code} > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment
[ https://issues.apache.org/jira/browse/ARROW-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques reassigned ARROW-8318: - Assignee: Francois Saint-Jacques > [C++][Dataset] Dataset should instantiate Fragment > -- > > Key: ARROW-8318 > URL: https://issues.apache.org/jira/browse/ARROW-8318 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Major > Labels: dataset > > Fragments are created on the fly when invoking a Scan. This means that a lot > of the auxilliary/ancilliary data must be stored by the specialised Dataset, > e.g. the FileSystemDataset must hold the path and partition expression. With > the venue of more complex Fragment, e.g. ParquetFileFragment, more data must > be stored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547 ] Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:10 PM: - Hi [~vertexclique], there was some discussion around using sentinel values over bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] and I believe it was a matter of sentinel values not being spec-compliant. We never resolved the following point, but I was of the opinion that it'd be better to provide methods/functions that allow converting a dictionary array into a primitive array. My opinion was mainly informed by my concern that we don't have a way of using dictionary arrays in compute kernels, so at the time I preferred something to convert {code:java} dict(i32)[ to i32<1, 1, null, 2, null>{code} The contributor of the PR provided a valid use-case, which led them in the route of providing iterator access, so we eventually merged the PR under the premise that more work could be done in future to provide other access methods. Regarding the 2 reasons: R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a primitive array from the dictionary's iterator? If so, would a method that converts a dict(i32) into a primitive(i32) suffice for your needs? R2: may you please provide an example of what you mean by parallel comparison? My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the Rust implementation is that we can often forgo explicit SIMD on some computation kernels if we relegate null handling to bitmask manipulation, and operate on arrays without branching to check nulls ([https://github.com/apache/arrow/pull/6086]). was (Author: nevi_me): Hi [~vertexclique], there was some discussion around using sentinel values over bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] and I believe it was a matter of sentinel values not being spec-compliant. We never resolved the following point, but I was of the opinion that it'd be better to provide methods/functions that allow converting a dictionary array into a primitive array. My opinion was mainly informed by my concern that we don't have a way of using dictionary arrays in compute kernels, so at the time I preferred something to convert ` {code:java} dict(i32)[` to i32<1, 1, null, 2, null>{code} The contributor of the PR provided a valid use-case, which led them in the route of providing iterator access, so we eventually merged the PR under the premise that more work could be done in future to provide other access methods. Regarding the 2 reasons: R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a primitive array from the dictionary's iterator? If so, would a method that converts a dict(i32) into a primitive(i32) suffice for your needs? R2: may you please provide an example of what you mean by parallel comparison? My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the Rust implementation is that we can often forgo explicit SIMD on some computation kernels if we relegate null handling to bitmask manipulation, and operate on arrays without branching to check nulls ([https://github.com/apache/arrow/pull/6086]). > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547 ] Neville Dipale edited comment on ARROW-5949 at 4/24/20, 1:09 PM: - Hi [~vertexclique], there was some discussion around using sentinel values over bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] and I believe it was a matter of sentinel values not being spec-compliant. We never resolved the following point, but I was of the opinion that it'd be better to provide methods/functions that allow converting a dictionary array into a primitive array. My opinion was mainly informed by my concern that we don't have a way of using dictionary arrays in compute kernels, so at the time I preferred something to convert ` {code:java} dict(i32)[` to i32<1, 1, null, 2, null>{code} The contributor of the PR provided a valid use-case, which led them in the route of providing iterator access, so we eventually merged the PR under the premise that more work could be done in future to provide other access methods. Regarding the 2 reasons: R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a primitive array from the dictionary's iterator? If so, would a method that converts a dict(i32) into a primitive(i32) suffice for your needs? R2: may you please provide an example of what you mean by parallel comparison? My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the Rust implementation is that we can often forgo explicit SIMD on some computation kernels if we relegate null handling to bitmask manipulation, and operate on arrays without branching to check nulls ([https://github.com/apache/arrow/pull/6086]). was (Author: nevi_me): Hi [~vertexclique], there was some discussion around using sentinel values over bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] and I believe it was a matter of sentinel values not being spec-compliant. We never resolved the following point, but I was of the opinion that it'd be better to provide methods/functions that allow converting a dictionary array into a primitive array. My opinion was mainly informed by my concern that we don't have a way of using dictionary arrays in compute kernels, so at the time I preferred something to convert `dict(i32)[` to `i32<1, 1, null, 2, null>`. The contributor of the PR provided a valid use-case, which led them in the route of providing iterator access, so we eventually merged the PR under the premise that more work could be done in future to provide other access methods. Regarding the 2 reasons: R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a primitive array from the dictionary's iterator? If so, would a method that converts a dict(i32) into a primitive(i32) suffice for your needs? R2: may you please provide an example of what you mean by parallel comparison? My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the Rust implementation is that we can often forgo explicit SIMD on some computation kernels if we relegate null handling to bitmask manipulation, and operate on arrays without branching to check nulls ([https://github.com/apache/arrow/pull/6086]). > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091555#comment-17091555 ] Mahmut Bulut commented on ARROW-5949: - For the reference implementation that I am talking about, please take a look at the `TestStringDictionaryAppendIndices` in cxx implementation for how nulls are handled in arrow cxx implementation. > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7297) [C++] Add value accessor in sparse tensor class
[ https://issues.apache.org/jira/browse/ARROW-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-7297: - Assignee: Rok Mihevc > [C++] Add value accessor in sparse tensor class > --- > > Key: ARROW-7297 > URL: https://issues.apache.org/jira/browse/ARROW-7297 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Major > > {{SparseTensor}} can have value accessor like {{Tensor::Value}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091547#comment-17091547 ] Neville Dipale commented on ARROW-5949: --- Hi [~vertexclique], there was some discussion around using sentinel values over bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] and I believe it was a matter of sentinel values not being spec-compliant. We never resolved the following point, but I was of the opinion that it'd be better to provide methods/functions that allow converting a dictionary array into a primitive array. My opinion was mainly informed by my concern that we don't have a way of using dictionary arrays in compute kernels, so at the time I preferred something to convert `dict(i32)[` to `i32<1, 1, null, 2, null>`. The contributor of the PR provided a valid use-case, which led them in the route of providing iterator access, so we eventually merged the PR under the premise that more work could be done in future to provide other access methods. Regarding the 2 reasons: R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a primitive array from the dictionary's iterator? If so, would a method that converts a dict(i32) into a primitive(i32) suffice for your needs? R2: may you please provide an example of what you mean by parallel comparison? My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the Rust implementation is that we can often forgo explicit SIMD on some computation kernels if we relegate null handling to bitmask manipulation, and operate on arrays without branching to check nulls ([https://github.com/apache/arrow/pull/6086]). > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8577) [GLib][Plasma] gplasma_client_options_new() default settings are enabling a check for CUDA device
[ https://issues.apache.org/jira/browse/ARROW-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091523#comment-17091523 ] Tanveer commented on ARROW-8577: Hi Kouhei, This the program. I am taking a RecordBatch (batch_genomics) as input in this function. The error arises at: gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); {code:java} guint8 id_arr[20]; genRandom(id_arr,20); char objID_file[] = "/home/tahmad/lib/core/objID.txt"; g_print("obj_id: %s\n", id_arr); gboolean success = TRUE; GError *error = NULL; GPlasmaClient *gPlasmaClient; GPlasmaObjectID *object_id; GPlasmaClientCreateOptions *create_options; GPlasmaClientOptions *gplasmaClient_options; GPlasmaCreatedObject *Object; GPlasmaReferredObject *refObject; GArrowBuffer *arrowBuffer; arrowBuffer = GSerializeRecordBatch(batch_genomics); gint32 size = garrow_buffer_get_size(arrowBuffer); gplasmaClient_options = gplasma_client_options_new(); gPlasmaClient = gplasma_client_new("/tmp/store0",gplasmaClient_options, ); object_id = gplasma_object_id_new(id_arr, 20, ); create_options = gplasma_client_create_options_new(); { guint8 metadata[] = "metadata"; gplasma_client_create_options_set_metadata(create_options, (const guint8 *)metadata, sizeof(metadata)); } Object = gplasma_client_create(gPlasmaClient, object_id, size, create_options, ); g_object_unref(create_options); { GArrowBuffer *data; guint8 dataW[] = "data"; g_object_get(Object, "data", , NULL); garrow_mutable_buffer_set_data(GARROW_MUTABLE_BUFFER(data),0, garrow_buffer_get_databytes(arrowBuffer),size,); g_object_unref(data); } gplasma_created_object_seal(Object, ); g_object_unref(Object); gplasma_client_disconnect(gPlasmaClient, ); g_object_unref(gPlasmaClient);{code} > [GLib][Plasma] gplasma_client_options_new() default settings are enabling a > check for CUDA device > - > > Key: ARROW-8577 > URL: https://issues.apache.org/jira/browse/ARROW-8577 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Reporter: Tanveer >Assignee: Kouhei Sutou >Priority: Major > > Hi all, > Previously, I was using c_glib Plasma library (build 0.12) for creating > plasma objects. It was working as expected. But now I want to use Arrow's > newest build. I incurred the following error: > > /build/apache-arrow-0.17.0/cpp/src/arrow/result.cc:28: ValueOrDie called on > an error: IOError: Cuda error 100 in function 'cuInit': > [CUDA_ERROR_NO_DEVICE] no CUDA-capable device is detected > I think plasma client options (gplasma_client_options_new()) which I am using > with default settings are enabling a check for my CUDA device and I have no > CUDA device attached to my system. How I can disable this check? Any help > will be highly appreciated. Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8586) Failed to Install arrow From CRAN
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hei updated ARROW-8586: --- Description: Hi, I am trying to install arrow via RStudio, but it seems like it is not working that after I installed the package, it kept asking me to run arrow::install_arrow() even after I did: {code} > install.packages("arrow") Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' Content type 'application/x-gzip' length 242534 bytes (236 KB) == downloaded 236 KB * installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation *** Successfully retrieved C++ source *** Building C++ libraries cmake arrow ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory - NOTE --- After installation, please run arrow::install_arrow() for help installing required runtime libraries - ** libs g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array_from_vector.cpp -o array_from_vector.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array_to_vector.cpp -o array_to_vector.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c arraydata.cpp -o arraydata.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c arrowExports.cpp -o arrowExports.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c buffer.cpp -o buffer.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c chunkedarray.cpp -o chunkedarray.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c compression.cpp -o compression.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c compute.cpp -o compute.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c csv.cpp -o csv.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c dataset.cpp -o dataset.o g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512 ] Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 12:22 PM: Hi, I've just seen this. Is there any reason why we provide custom iterator over keys? (Which is basically resolving into Option) Can we use 0 as a null identifier? Reason 1: Iteration over Iterator> will take time, rebuilding from that for lookup also takes double time. Reason 2: We can't use SIMD for parallel comparison. was (Author: vertexclique): Hi, I've just seen this. Is there any reason why we provide custom iterator over keys? (Which is basically resolving into Option) Can we use 0 as a null identifier? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8586) Failed to Install arrow From CRAN
Hei created ARROW-8586: -- Summary: Failed to Install arrow From CRAN Key: ARROW-8586 URL: https://issues.apache.org/jira/browse/ARROW-8586 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 0.17.0 Environment: CentOS 7 Reporter: Hei Hi, I am trying to install arrow via RStudio, but it seems like it is not working that after I installed the package, it kept asking me to run arrow::install_arrow() even after I did: {code} > install.packages("arrow")Installing package into > ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’(as ‘lib’ is > unspecified)trying URL > 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'Content type > 'application/x-gzip' length 242534 bytes (236 > KB)==downloaded 236 KB * installing *source* package ‘arrow’ ...** package ‘arrow’ successfully unpacked and MD5 sums checked** using staged installation*** Successfully retrieved C++ source*** Building C++ libraries cmake arrow ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory- NOTE ---After installation, please run arrow::install_arrow()for help installing required runtime libraries-** libsg++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array_from_vector.cpp -o array_from_vector.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array_to_vector.cpp -o array_to_vector.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c arraydata.cpp -o arraydata.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c arrowExports.cpp -o arrowExports.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c buffer.cpp -o buffer.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c chunkedarray.cpp -o chunkedarray.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c compression.cpp -o compression.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c compute.cpp -o compute.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c csv.cpp -o csv.og++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
[jira] [Commented] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512 ] Mahmut Bulut commented on ARROW-5949: - Hi, I've just seen this. Is there any reason why we provide custom iterator over keys? Which is basically resolving into Option or None? Can we use 0 as a null identifier? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091512#comment-17091512 ] Mahmut Bulut edited comment on ARROW-5949 at 4/24/20, 12:20 PM: Hi, I've just seen this. Is there any reason why we provide custom iterator over keys? (Which is basically resolving into Option) Can we use 0 as a null identifier? was (Author: vertexclique): Hi, I've just seen this. Is there any reason why we provide custom iterator over keys? Which is basically resolving into Option or None? Can we use 0 as a null identifier? > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8578) [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on compiling system"
[ https://issues.apache.org/jira/browse/ARROW-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091502#comment-17091502 ] David Li commented on ARROW-8578: - As Antoine mentioned, it's just a red herring (has to do with where gRPC was built). Unfortunately gRPC isn't so good about surfacing issues to the application; running with {{env GRPC_VERBOSITY=DEBUG}} and if needed {{GRPC_TRACE=all}} will give more information. > [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on > compiling system" > > > Key: ARROW-8578 > URL: https://issues.apache.org/jira/browse/ARROW-8578 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > Tried compiling and running this today (with grpc 1.28.1) > {code} > $ release/arrow-flight-benchmark > Using standalone server: false > Server running with pid 22385 > Testing method: DoGet > Server host: localhost > Server port: 31337 > E0423 21:54:15.174285695 22385 socket_utils_common_posix.cc:222] check for > SO_REUSEPORT: {"created":"@1587696855.174280083","description":"SO_REUSEPORT > unavailable on compiling > system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":190} > Server host: localhost > {code} > my Linux kernel > {code} > $ uname -a > Linux 4.15.0-1079-oem #89-Ubuntu SMP Fri Mar 27 05:22:11 UTC 2020 x86_64 > x86_64 x86_64 GNU/Linux > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8585) [Packaging][Python] Windows wheels fail to build because of link error
Krisztian Szucs created ARROW-8585: -- Summary: [Packaging][Python] Windows wheels fail to build because of link error Key: ARROW-8585 URL: https://issues.apache.org/jira/browse/ARROW-8585 Project: Apache Arrow Issue Type: Bug Components: Packaging, Python Reporter: Krisztian Szucs Fix For: 1.0.0 See build log https://ci.appveyor.com/project/Ursa-Labs/crossbow/builds/32406283#L1088 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8584) [Packaging][C++] Protobuf link error in deb builds
[ https://issues.apache.org/jira/browse/ARROW-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8584: --- Description: See build log Stretch: https://github.com/ursa-labs/crossbow/runs/614358553 Focal: https://github.com/ursa-labs/crossbow/runs/614358637 cc @kou was: See build log https://github.com/ursa-labs/crossbow/runs/614358553 cc @kou > [Packaging][C++] Protobuf link error in deb builds > -- > > Key: ARROW-8584 > URL: https://issues.apache.org/jira/browse/ARROW-8584 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Packaging >Reporter: Krisztian Szucs >Priority: Major > Fix For: 1.0.0 > > > See build log > Stretch: https://github.com/ursa-labs/crossbow/runs/614358553 > Focal: https://github.com/ursa-labs/crossbow/runs/614358637 > cc @kou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8584) [Packaging][C++] Protobuf link error in deb builds
[ https://issues.apache.org/jira/browse/ARROW-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8584: --- Summary: [Packaging][C++] Protobuf link error in deb builds (was: [Packaging][C++] Protobuf link error in debian-stretch build) > [Packaging][C++] Protobuf link error in deb builds > -- > > Key: ARROW-8584 > URL: https://issues.apache.org/jira/browse/ARROW-8584 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Packaging >Reporter: Krisztian Szucs >Priority: Major > Fix For: 1.0.0 > > > See build log https://github.com/ursa-labs/crossbow/runs/614358553 > cc @kou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8584) [Packaging][C++] Protobuf link error in debian-stretch build
Krisztian Szucs created ARROW-8584: -- Summary: [Packaging][C++] Protobuf link error in debian-stretch build Key: ARROW-8584 URL: https://issues.apache.org/jira/browse/ARROW-8584 Project: Apache Arrow Issue Type: Bug Components: C++, Packaging Reporter: Krisztian Szucs Fix For: 1.0.0 See build log https://github.com/ursa-labs/crossbow/runs/614358553 cc @kou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8583) [C++][Doc] Undocumented parameter in Dataset namespace
Krisztian Szucs created ARROW-8583: -- Summary: [C++][Doc] Undocumented parameter in Dataset namespace Key: ARROW-8583 URL: https://issues.apache.org/jira/browse/ARROW-8583 Project: Apache Arrow Issue Type: Bug Components: C++, Documentation Reporter: Krisztian Szucs Fix For: 1.0.0 See build log: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-24-0-circle-test-ubuntu-18.04-docs We should build the doxygen docs on each commit, preferably in the conda-cpp build. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8582) [Packaging][Python] macOS wheels occasionally exceed travis build time limit
Krisztian Szucs created ARROW-8582: -- Summary: [Packaging][Python] macOS wheels occasionally exceed travis build time limit Key: ARROW-8582 URL: https://issues.apache.org/jira/browse/ARROW-8582 Project: Apache Arrow Issue Type: Bug Components: Packaging, Python Reporter: Krisztian Szucs Either reduce the build time or port to another CI provider. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8578) [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on compiling system"
[ https://issues.apache.org/jira/browse/ARROW-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091401#comment-17091401 ] Antoine Pitrou commented on ARROW-8578: --- The SO_REUSEPORT message is just a notice from gRPC, not an actual error. > [C++][Flight] Test executable failures due to "SO_REUSEPORT unavailable on > compiling system" > > > Key: ARROW-8578 > URL: https://issues.apache.org/jira/browse/ARROW-8578 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > Tried compiling and running this today (with grpc 1.28.1) > {code} > $ release/arrow-flight-benchmark > Using standalone server: false > Server running with pid 22385 > Testing method: DoGet > Server host: localhost > Server port: 31337 > E0423 21:54:15.174285695 22385 socket_utils_common_posix.cc:222] check for > SO_REUSEPORT: {"created":"@1587696855.174280083","description":"SO_REUSEPORT > unavailable on compiling > system","file":"../src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":190} > Server host: localhost > {code} > my Linux kernel > {code} > $ uname -a > Linux 4.15.0-1079-oem #89-Ubuntu SMP Fri Mar 27 05:22:11 UTC 2020 x86_64 > x86_64 x86_64 GNU/Linux > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7808) [Java][Dataset] Implement Datasets Java API
[ https://issues.apache.org/jira/browse/ARROW-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7808: -- Labels: dataset pull-request-available (was: dataset) > [Java][Dataset] Implement Datasets Java API > > > Key: ARROW-7808 > URL: https://issues.apache.org/jira/browse/ARROW-7808 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Java >Reporter: Hongze Zhang >Priority: Major > Labels: dataset, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Porting following C++ Datasets APIs to Java: > * DataSource > * DataSourceDiscovery > * DataFragment > * Dataset > * Scanner > * ScanTask > * ScanOptions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). The conversion method for a {{Date32Array}} would then look a bit like this: {code:java} private static readonly DateTime Epoch = new DateTime(1970, 1, 1); protected override int ConvertTo(DateTime value) { return (int)(value - Epoch).TotalDays; } {code} was: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24));
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). The conversion method for a {{Date32Array}} would then look a bit like this: {code:c#} private static readonly DateTime Epoch = new DateTime(1970, 1, 1); protected override int ConvertTo(DateTime value) { return (int)(value - Epoch).TotalDays; } {code} was: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24));
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). was: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified: triggers the bug {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Environment: (was: Windows 10 x64) > [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset > -- > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Affects Versions: 0.17.0 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary Proposal > The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values > of type {{DateTimeOffset}}, but this makes it very easy for the user to > introduce subtle bugs when they work with the {{DateTime}} type in their own > code. This class of bugs could be avoided if these builders were instead > typed on {{DateTime}} rather than {{DateTimeOffset}}. > h1. Details > The danger is introduced by the implicit widening conversion provided by the > _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: > > [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] > The important part is this text: > {quote}The offset of the resulting DateTimeOffset object depends on the value > of the DateTime.Kind property of the dateTime parameter: > * If the value of the DateTime.Kind property is DateTimeKind.Local or > DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is > set equal to dateTime, and its Offset property *is set equal to the offset of > the local system's current time zone*.{quote} > (Emphasis mine) > If the user is operating in an environment with a positive GMT offset, it is > very easy to write the wrong date to the builder: > {code:c#} > var builder = new Date32Array.Builder(); > builder.Append(new DateTime(2020, 4, 24)); // Kind == > DateTimeKind.Unspecified: triggers the bug > {code} > Assume that the user is in the UK (as I am), where the GMT offset on the > above date is 1 hour ahead. This means that the conversion to > {{DateTimeOffset}} will actually result in a value of > {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow > then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date > portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] > of its object, not the time portion or offset. This means that the number > of days gets calculated based on 2020-04-23, not 2020-04-24 as the user > thought they were specifying. > If the user chooses to use NodaTime as a "better" date and time-handling > library, they will still likely run into the bug if they do the obvious thing: > {code:c#} > var builder = new Date32Array.Builder(); > var ld = new NodaTime.LocalDate(2020, 4, 24); > builder.Append(ld.ToDateTimeUnspecified()); // Kind == > DateTimeKind.Unspecified: also triggers the bug > {code} > h1. Suggested Improvement > * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a > {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). > * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a > {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Issue Type: Improvement (was: Bug) > [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset > -- > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Affects Versions: 0.17.0 > Environment: Windows 10 x64 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary Proposal > The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values > of type {{DateTimeOffset}}, but this makes it very easy for the user to > introduce subtle bugs when they work with the {{DateTime}} type in their own > code. This class of bugs could be avoided if these builders were instead > typed on {{DateTime}} rather than {{DateTimeOffset}}. > h1. Details > The danger is introduced by the implicit widening conversion provided by the > _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: > > [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] > The important part is this text: > {quote}The offset of the resulting DateTimeOffset object depends on the value > of the DateTime.Kind property of the dateTime parameter: > * If the value of the DateTime.Kind property is DateTimeKind.Local or > DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is > set equal to dateTime, and its Offset property *is set equal to the offset of > the local system's current time zone*.{quote} > (Emphasis mine) > If the user is operating in an environment with a positive GMT offset, it is > very easy to write the wrong date to the builder: > {code:c#} > var builder = new Date32Array.Builder(); > builder.Append(new DateTime(2020, 4, 24)); // Kind == > DateTimeKind.Unspecified: triggers the bug > {code} > Assume that the user is in the UK (as I am), where the GMT offset on the > above date is 1 hour ahead. This means that the conversion to > {{DateTimeOffset}} will actually result in a value of > {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow > then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date > portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] > of its object, not the time portion or offset. This means that the number > of days gets calculated based on 2020-04-23, not 2020-04-24 as the user > thought they were specifying. > If the user chooses to use NodaTime as a "better" date and time-handling > library, they will still likely run into the bug if they do the obvious thing: > {code:c#} > var builder = new Date32Array.Builder(); > var ld = new NodaTime.LocalDate(2020, 4, 24); > builder.Append(ld.ToDateTimeUnspecified()); // Kind == > DateTimeKind.Unspecified: also triggers the bug > {code} > h1. Suggested Improvement > * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a > {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). > * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a > {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Summary: [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset (was: [C#] Date32/64Array write & read back introduces off-by-one error) > [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset > -- > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Bug > Components: C# >Affects Versions: 0.17.0 > Environment: Windows 10 x64 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary Proposal > The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values > of type {{DateTimeOffset}}, but this makes it very easy for the user to > introduce subtle bugs when they work with the {{DateTime}} type in their own > code. This class of bugs could be avoided if these builders were instead > typed on {{DateTime}} rather than {{DateTimeOffset}}. > h1. Details > The danger is introduced by the implicit widening conversion provided by the > _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: > > [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] > The important part is this text: > {quote}The offset of the resulting DateTimeOffset object depends on the value > of the DateTime.Kind property of the dateTime parameter: > * If the value of the DateTime.Kind property is DateTimeKind.Local or > DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is > set equal to dateTime, and its Offset property *is set equal to the offset of > the local system's current time zone*.{quote} > (Emphasis mine) > If the user is operating in an environment with a positive GMT offset, it is > very easy to write the wrong date to the builder: > {code:c#} > var builder = new Date32Array.Builder(); > builder.Append(new DateTime(2020, 4, 24)); // Kind == > DateTimeKind.Unspecified: triggers the bug > {code} > Assume that the user is in the UK (as I am), where the GMT offset on the > above date is 1 hour ahead. This means that the conversion to > {{DateTimeOffset}} will actually result in a value of > {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow > then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date > portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] > of its object, not the time portion or offset. This means that the number > of days gets calculated based on 2020-04-23, not 2020-04-24 as the user > thought they were specifying. > If the user chooses to use NodaTime as a "better" date and time-handling > library, they will still likely run into the bug if they do the obvious thing: > {code:c#} > var builder = new Date32Array.Builder(); > var ld = new NodaTime.LocalDate(2020, 4, 24); > builder.Append(ld.ToDateTimeUnspecified()); // Kind == > DateTimeKind.Unspecified: also triggers the bug > {code} > h1. Suggested Improvement > * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a > {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). > * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a > {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified: triggers the bug {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified: also triggers the bug {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). was: h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System; internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the above code as appropriate to demonstrate for the other type. h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} > [C#] Date32/64Array write & read back introduces off-by-one error > - > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Bug > Components: C# >Affects Versions: 0.17.0 > Environment: Windows 10 x64 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary Proposal > The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values > of type {{DateTimeOffset}}, but this makes it very easy for the user to > introduce subtle bugs when they work with the {{DateTime}} type in their own > code. This class of bugs could be avoided if these builders were instead > typed on {{DateTime}} rather than {{DateTimeOffset}}. >
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System; internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the above code as appropriate to demonstrate for the other type. h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} was: h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System; internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} > [C#] Date32/64Array write & read back introduces off-by-one error > - > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Bug > Components: C# >Affects Versions: 0.17.0 > Environment: Windows 10 x64 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary > Writing a Date value using either a {{Date32Array.Builder}} or > {{Date64.Builder}} and then reading back the result from the built array > introduces an off-by-one error in the value. The following minimal code > illustrates: > {code:c#} > namespace Date32ArrayReadWriteBug > { > using Apache.Arrow; > using Apache.Arrow.Memory; > using System; > internal static class Program > { > public static void Main(string[] args) > { > var allocator = new NativeMemoryAllocator(); > var builder = new Date32Array.Builder(); > var date = new DateTime(2020, 4, 24); > Console.WriteLine($"Appending date {date:-MM-dd}"); > builder.Append(date); > var array = builder.Build(allocator); > var dateAgain = array.GetDate(0); > Console.WriteLine($"Read date {dateAgain:-MM-dd}"); > } > } > }{code} > Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the > above code as appropriate to demonstrate for the other type. > h2. Expected Output > {noformat} > Appending date 2020-04-24 > Read date 2020-04-24 {noformat} > h2. Actual Output > {noformat} > Appending date 2020-04-24 > Read date 2020-04-23 {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error
Adam Szmigin created ARROW-8581: --- Summary: [C#] Date32/64Array write & read back introduces off-by-one error Key: ARROW-8581 URL: https://issues.apache.org/jira/browse/ARROW-8581 Project: Apache Arrow Issue Type: Bug Components: C# Affects Versions: 0.17.0 Environment: Windows 10 x64 Reporter: Adam Szmigin h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System;internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error
[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szmigin updated ARROW-8581: Description: h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System; internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} was: h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System;internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} > [C#] Date32/64Array write & read back introduces off-by-one error > - > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Bug > Components: C# >Affects Versions: 0.17.0 > Environment: Windows 10 x64 >Reporter: Adam Szmigin >Priority: Major > > h1. Summary > Writing a Date value using either a {{Date32Array.Builder}} or > {{Date64.Builder}} and then reading back the result from the built array > introduces an off-by-one error in the value. The following minimal code > illustrates: > {code:c#} > namespace Date32ArrayReadWriteBug > { > using Apache.Arrow; > using Apache.Arrow.Memory; > using System; > internal static class Program > { > public static void Main(string[] args) > { > var allocator = new NativeMemoryAllocator(); > var builder = new Date32Array.Builder(); > var date = new DateTime(2020, 4, 24); > Console.WriteLine($"Appending date {date:-MM-dd}"); > builder.Append(date); > var array = builder.Build(allocator); > var dateAgain = array.GetDate(0); > Console.WriteLine($"Read date {dateAgain:-MM-dd}"); > } > } > }{code} > h2. Expected Output > {noformat} > Appending date 2020-04-24 > Read date 2020-04-24 {noformat} > h2. Actual Output > {noformat} > Appending date 2020-04-24 > Read date 2020-04-23 {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8568) [C++][Python] Crash on decimal cast in debug mode
[ https://issues.apache.org/jira/browse/ARROW-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091300#comment-17091300 ] Jacek Pliszka commented on ARROW-8568: -- The problem is here: ``` {color:#267f99}DecimalStatus{color} {color:#267f99}BasicDecimal128{color}{color:#00}::{color}{color:#795e26}Rescale{color}{color:#00}({color}{color:#ff}int32_t{color} {color:#001080}original_scale{color}{color:#00}, {color}{color:#ff}int32_t{color} {color:#001080}new_scale{color}{color:#00},{color} {color:#267f99}BasicDecimal128{color}{color:#ff}*{color} {color:#001080}out{color}{color:#00}) {color}{color:#ff}const{color}{color:#00} {{color} {color:#795e26}DCHECK_NE{color}{color:#00}(out, {color}{color:#ff}nullptr{color}{color:#00});{color} {color:#795e26}DCHECK_NE{color}{color:#00}(original_scale, new_scale);{color} ``` Firstly there is design question - should calling Rescale with original_scale == new_scale be allowed ? If not - I can fix it in my code somewhere. But IMHO Rescale should allow for that and should handle data overflow then. > [C++][Python] Crash on decimal cast in debug mode > - > > Key: ARROW-8568 > URL: https://issues.apache.org/jira/browse/ARROW-8568 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.17.0 >Reporter: Antoine Pitrou >Priority: Major > > {code:python} > >>> arr = pa.array([Decimal('123.45')]) > >>> > >>> > >>> arr > >>> > >>> > > [ > 123.45 > ] > >>> arr.type > >>> > >>> > Decimal128Type(decimal(5, 2)) > >>> arr.cast(pa.decimal128(4, 2)) > >>> > >>> > ../src/arrow/util/basic_decimal.cc:626: Check failed: (original_scale) != > (new_scale) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8579) [C++] AVX512 part for SIMD operations of DecodeSpaced/EncodeSpaced
[ https://issues.apache.org/jira/browse/ARROW-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8579: -- Labels: pull-request-available (was: ) > [C++] AVX512 part for SIMD operations of DecodeSpaced/EncodeSpaced > -- > > Key: ARROW-8579 > URL: https://issues.apache.org/jira/browse/ARROW-8579 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Frank Du >Assignee: Frank Du >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As part of https://issues.apache.org/jira/browse/PARQUET-1841, AVX512 path > identified with the helper of mask_compress_/mask_expand_ API. > This Jira created for spaced benchmark, unittest and AVX512 path and other > basic support of further potential SIMD chance of SSE/AVX2. -- This message was sent by Atlassian Jira (v8.3.4#803005)