[jira] [Updated] (ARROW-7741) [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic
[ https://issues.apache.org/jira/browse/ARROW-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-7741: --- Affects Version/s: 1.0.0 > [C++][Parquet] Incorporate new level generation logic in parquet write path > with a flag to revert back to old logic > --- > > Key: ARROW-7741 > URL: https://issues.apache.org/jira/browse/ARROW-7741 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Affects Versions: 1.0.0 >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is likely going to be a decent amount of changes we should isolate them > behind a feature flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7741) [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic
[ https://issues.apache.org/jira/browse/ARROW-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-7741: --- Labels: (was: pull-request-available) > [C++][Parquet] Incorporate new level generation logic in parquet write path > with a flag to revert back to old logic > --- > > Key: ARROW-7741 > URL: https://issues.apache.org/jira/browse/ARROW-7741 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is likely going to be a decent amount of changes we should isolate them > behind a feature flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7741) [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic
[ https://issues.apache.org/jira/browse/ARROW-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-7741: --- Component/s: C++ > [C++][Parquet] Incorporate new level generation logic in parquet write path > with a flag to revert back to old logic > --- > > Key: ARROW-7741 > URL: https://issues.apache.org/jira/browse/ARROW-7741 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is likely going to be a decent amount of changes we should isolate them > behind a feature flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8030) Fix inconsistent comment style in plasma
[ https://issues.apache.org/jira/browse/ARROW-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8030: -- Labels: pull-request-available (was: ) > Fix inconsistent comment style in plasma > > > Key: ARROW-8030 > URL: https://issues.apache.org/jira/browse/ARROW-8030 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Siyuan Zhuang >Assignee: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > > The comments in the plasma are a mixture of '@params' and '\params'. The > reviewers required me to unify the style when I was trying to add windows > support. I think it would be better to address it using a different PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8030) Fix inconsistent comment style in plasma
Siyuan Zhuang created ARROW-8030: Summary: Fix inconsistent comment style in plasma Key: ARROW-8030 URL: https://issues.apache.org/jira/browse/ARROW-8030 Project: Apache Arrow Issue Type: Improvement Components: C++ - Plasma Reporter: Siyuan Zhuang Assignee: Siyuan Zhuang The comments in the plasma are a mixture of '@params' and '\params'. The reviewers required me to unify the style when I was trying to add windows support. I think it would be better to address it using a different PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Fix Version/s: 1.0.0 > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Component/s: (was: C++) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Component/s: (was: C++) > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Fix Version/s: 1.0.0 > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7428) [Format][C++] Add serialization for CSF sparse tensors
[ https://issues.apache.org/jira/browse/ARROW-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7428: -- Component/s: Format C++ > [Format][C++] Add serialization for CSF sparse tensors > -- > > Key: ARROW-7428 > URL: https://issues.apache.org/jira/browse/ARROW-7428 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Once ARROW-4225 and ARROW-4226 are completed we should add serialization > support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7419) [Python] Support SparseCSCMatrix
[ https://issues.apache.org/jira/browse/ARROW-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7419: -- Component/s: Format C++ > [Python] Support SparseCSCMatrix > > > Key: ARROW-7419 > URL: https://issues.apache.org/jira/browse/ARROW-7419 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Kenta Murata >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Fix Version/s: 1.0.0 > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-7427: -- Component/s: Python Format C++ > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Format, Python >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7427) [Python] Support SparseCSFTensor
[ https://issues.apache.org/jira/browse/ARROW-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7427: -- Labels: pull-request-available (was: ) > [Python] Support SparseCSFTensor > > > Key: ARROW-7427 > URL: https://issues.apache.org/jira/browse/ARROW-7427 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > > Once ARROW-4226 is complete we want to be able to use sparse CSF tensors from > Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7741) [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic
[ https://issues.apache.org/jira/browse/ARROW-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-7741: --- Summary: [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic (was: [C++][Parquet] Add feature flag for nested-capable reader/writer) > [C++][Parquet] Incorporate new level generation logic in parquet write path > with a flag to revert back to old logic > --- > > Key: ARROW-7741 > URL: https://issues.apache.org/jira/browse/ARROW-7741 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is likely going to be a decent amount of changes we should isolate them > behind a feature flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7865) [R] Test builds on latest Linux versions
[ https://issues.apache.org/jira/browse/ARROW-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054227#comment-17054227 ] Neal Richardson commented on ARROW-7865: Confirmed that debian unstable/testing build issues is the same known m4 issue as on some CentOS versions. This will be resolved by ARROW-6821. > [R] Test builds on latest Linux versions > > > Key: ARROW-7865 > URL: https://issues.apache.org/jira/browse/ARROW-7865 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > See https://github.com/apache/arrow/issues/6435. CRAN might use old/stable > versions but not everyone is so nostalgic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-7501) [C++] CMake build_thrift should build flex and bison if necessary
[ https://issues.apache.org/jira/browse/ARROW-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson closed ARROW-7501. -- Resolution: Won't Fix I think ARROW-6821 is a better solution. > [C++] CMake build_thrift should build flex and bison if necessary > - > > Key: ARROW-7501 > URL: https://issues.apache.org/jira/browse/ARROW-7501 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > On MSVC and APPLE, {{build_thrift}} will handle thrift's flex and bison > dependencies: > [https://github.com/apache/arrow/blob/f578521/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1052-L1097] > But you're on your own on linux. In ARROW-6793, I wrote 100 lines of R code > to do this for my needs: > [https://github.com/apache/arrow/pull/6068/files#diff-3875fa5e75833c426b36487b25892bd8R204-R309] > We should translate this to CMake so it's generally available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8029) [R] rstudio/r-base:3.6-centos7 GHA build failing on master
[ https://issues.apache.org/jira/browse/ARROW-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054219#comment-17054219 ] Neal Richardson commented on ARROW-8029: Expanding the install logs, the failure is that the bundled bzip2 is failing to download: https://github.com/apache/arrow/runs/492280141#step:6:2338 See more verbosity here: https://github.com/apache/arrow/pull/6557/checks?check_run_id=492502389#step:6:2210 I'm expecting that this is temporary and will resolve itself. > [R] rstudio/r-base:3.6-centos7 GHA build failing on master > -- > > Key: ARROW-8029 > URL: https://issues.apache.org/jira/browse/ARROW-8029 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > https://github.com/apache/arrow/runs/492280141 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7943) [C++][Parquet] Add a new level builder capable of handling nested data
[ https://issues.apache.org/jira/browse/ARROW-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-7943. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 6490 [https://github.com/apache/arrow/pull/6490] > [C++][Parquet] Add a new level builder capable of handling nested data > -- > > Key: ARROW-7943 > URL: https://issues.apache.org/jira/browse/ARROW-7943 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > There will be one or two more steps to integrate this with the existing > higher level APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8029) [R] rstudio/r-base:3.6-centos7 GHA build failing on master
Wes McKinney created ARROW-8029: --- Summary: [R] rstudio/r-base:3.6-centos7 GHA build failing on master Key: ARROW-8029 URL: https://issues.apache.org/jira/browse/ARROW-8029 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Wes McKinney Fix For: 1.0.0 https://github.com/apache/arrow/runs/492280141 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8028) [Go] Allow duplicate field names in schemas and nested types
Ben Kietzman created ARROW-8028: --- Summary: [Go] Allow duplicate field names in schemas and nested types Key: ARROW-8028 URL: https://issues.apache.org/jira/browse/ARROW-8028 Project: Apache Arrow Issue Type: Improvement Components: Go Affects Versions: 0.16.0 Reporter: Ben Kietzman Fix For: 1.0.0 Go's implementation of Schema panics if field names are duplicated within a schema. This is not guaranteed by the standard, so Go will not be able to handle valid record batches produced by other implementations which contain these. https://github.com/apache/arrow/blob/084549a/go/arrow/schema.go#L117 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8027) [Developer][Integration] Add integration tests for duplicate field names
Ben Kietzman created ARROW-8027: --- Summary: [Developer][Integration] Add integration tests for duplicate field names Key: ARROW-8027 URL: https://issues.apache.org/jira/browse/ARROW-8027 Project: Apache Arrow Issue Type: Improvement Components: Integration Affects Versions: 0.16.0 Reporter: Ben Kietzman Fix For: 1.0.0 Schemas and nested types whose fields' names are not unique are permitted, so the integration tests should include a case which exercises these. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1231) [C++] Add filesystem / IO implementation for Google Cloud Storage
[ https://issues.apache.org/jira/browse/ARROW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054094#comment-17054094 ] Wes McKinney commented on ARROW-1231: - Even if the performance of uploads and downloads is equivalent, I would guess that the SDK provided by the GCP development team will provide the most comprehensive access to GCS's features. And there are utilities (such as parallel uploading [1]) developed with GCS's particular characteristics in mind [1]: https://github.com/googleapis/google-cloud-cpp/blob/master/google/cloud/storage/parallel_upload.h#L1131 > [C++] Add filesystem / IO implementation for Google Cloud Storage > - > > Key: ARROW-1231 > URL: https://issues.apache.org/jira/browse/ARROW-1231 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: filesystem > > See example jumping off point > https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/cloud -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7785) [C++] sparse_tensor.cc is extremely slow to compile
[ https://issues.apache.org/jira/browse/ARROW-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-7785. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 6533 [https://github.com/apache/arrow/pull/6533] > [C++] sparse_tensor.cc is extremely slow to compile > --- > > Key: ARROW-7785 > URL: https://issues.apache.org/jira/browse/ARROW-7785 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Kenta Murata >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > This comes up especially when doing an optimized build. {{sparse_tensor.cc}} > is always enabled even if all components are disabled, and it takes multiple > seconds to compile. > Using [CLangBuildAnalyzer|https://github.com/aras-p/ClangBuildAnalyzer] I get > the following results: > {code} > Files that took longest to codegen (compiler backend): > 66372 ms: > build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o > 16457 ms: > build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o > 6283 ms: > build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o > 5284 ms: > build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o > 5090 ms: > build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8026) [Python] Support memoryview in addition to string value types for constructing string and binary type arrays
Wes McKinney created ARROW-8026: --- Summary: [Python] Support memoryview in addition to string value types for constructing string and binary type arrays Key: ARROW-8026 URL: https://issues.apache.org/jira/browse/ARROW-8026 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7444) [GLib] Add LocalFileSystem support
[ https://issues.apache.org/jira/browse/ARROW-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-7444. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 6105 [https://github.com/apache/arrow/pull/6105] > [GLib] Add LocalFileSystem support > -- > > Key: ARROW-7444 > URL: https://issues.apache.org/jira/browse/ARROW-7444 > Project: Apache Arrow > Issue Type: Sub-task > Components: GLib >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1231) [C++] Add filesystem / IO implementation for Google Cloud Storage
[ https://issues.apache.org/jira/browse/ARROW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053954#comment-17053954 ] Antoine Pitrou commented on ARROW-1231: --- Doesn't GCS provide a S3-compatible endpoint? Is it detrimental to use the AWS SDK as opposed to the native GCS APIs? > [C++] Add filesystem / IO implementation for Google Cloud Storage > - > > Key: ARROW-1231 > URL: https://issues.apache.org/jira/browse/ARROW-1231 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: filesystem > > See example jumping off point > https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/cloud -- This message was sent by Atlassian Jira (v8.3.4#803005)