[jira] [Commented] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184939#comment-17184939 ] Andrew Wieteska commented on ARROW-7226: The line-delimited variant can be written to parquet on 2.0 master so only need to comment re: supported JSON format > [JSON][Python] Json loader fails on example in documentation. > - > > Key: ARROW-7226 > URL: https://issues.apache.org/jira/browse/ARROW-7226 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Rinke Hoekstra >Assignee: Andrew Wieteska >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > I was just trying this with the example found in the pyarrow docs at > [http://arrow.apache.org/docs/python/json.html] > The documented example does not work. Is this related to this issue, or is it > another matter? > It says to load the following JSON file: > {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03" > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01" > I fixed this to make it valid JSON (It is valid [JSON > Lines|[http://jsonlines.org/]], but that's another issue): > {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}} > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}} > Then reading the JSON from a file called `my_data.json`: > {{from pyarrow import json}} > {{table = json.read_json("my_data.json")}} > Gives the following error: > {code:java} > ---}} > ArrowInvalid Traceback (most recent call last) > in () > 1 from pyarrow import json > > 2 table = json.read_json('test.json') > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx > in pyarrow._json.read_json() > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: JSON parse error: A column changed from object to array > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-7226: Assignee: Apache Arrow JIRA Bot (was: Andrew Wieteska) > [JSON][Python] Json loader fails on example in documentation. > - > > Key: ARROW-7226 > URL: https://issues.apache.org/jira/browse/ARROW-7226 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Rinke Hoekstra >Assignee: Apache Arrow JIRA Bot >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I was just trying this with the example found in the pyarrow docs at > [http://arrow.apache.org/docs/python/json.html] > The documented example does not work. Is this related to this issue, or is it > another matter? > It says to load the following JSON file: > {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03" > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01" > I fixed this to make it valid JSON (It is valid [JSON > Lines|[http://jsonlines.org/]], but that's another issue): > {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}} > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}} > Then reading the JSON from a file called `my_data.json`: > {{from pyarrow import json}} > {{table = json.read_json("my_data.json")}} > Gives the following error: > {code:java} > ---}} > ArrowInvalid Traceback (most recent call last) > in () > 1 from pyarrow import json > > 2 table = json.read_json('test.json') > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx > in pyarrow._json.read_json() > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: JSON parse error: A column changed from object to array > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-7226: Assignee: Andrew Wieteska (was: Apache Arrow JIRA Bot) > [JSON][Python] Json loader fails on example in documentation. > - > > Key: ARROW-7226 > URL: https://issues.apache.org/jira/browse/ARROW-7226 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Rinke Hoekstra >Assignee: Andrew Wieteska >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I was just trying this with the example found in the pyarrow docs at > [http://arrow.apache.org/docs/python/json.html] > The documented example does not work. Is this related to this issue, or is it > another matter? > It says to load the following JSON file: > {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03" > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01" > I fixed this to make it valid JSON (It is valid [JSON > Lines|[http://jsonlines.org/]], but that's another issue): > {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}} > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}} > Then reading the JSON from a file called `my_data.json`: > {{from pyarrow import json}} > {{table = json.read_json("my_data.json")}} > Gives the following error: > {code:java} > ---}} > ArrowInvalid Traceback (most recent call last) > in () > 1 from pyarrow import json > > 2 table = json.read_json('test.json') > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx > in pyarrow._json.read_json() > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: JSON parse error: A column changed from object to array > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7226: -- Labels: pull-request-available (was: ) > [JSON][Python] Json loader fails on example in documentation. > - > > Key: ARROW-7226 > URL: https://issues.apache.org/jira/browse/ARROW-7226 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Rinke Hoekstra >Assignee: Andrew Wieteska >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I was just trying this with the example found in the pyarrow docs at > [http://arrow.apache.org/docs/python/json.html] > The documented example does not work. Is this related to this issue, or is it > another matter? > It says to load the following JSON file: > {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03" > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01" > I fixed this to make it valid JSON (It is valid [JSON > Lines|[http://jsonlines.org/]], but that's another issue): > {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}} > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}} > Then reading the JSON from a file called `my_data.json`: > {{from pyarrow import json}} > {{table = json.read_json("my_data.json")}} > Gives the following error: > {code:java} > ---}} > ArrowInvalid Traceback (most recent call last) > in () > 1 from pyarrow import json > > 2 table = json.read_json('test.json') > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx > in pyarrow._json.read_json() > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: JSON parse error: A column changed from object to array > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wieteska reassigned ARROW-7226: -- Assignee: Andrew Wieteska > [JSON][Python] Json loader fails on example in documentation. > - > > Key: ARROW-7226 > URL: https://issues.apache.org/jira/browse/ARROW-7226 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Rinke Hoekstra >Assignee: Andrew Wieteska >Priority: Major > > I was just trying this with the example found in the pyarrow docs at > [http://arrow.apache.org/docs/python/json.html] > The documented example does not work. Is this related to this issue, or is it > another matter? > It says to load the following JSON file: > {{{"a": [1, 2], "b": {"c": true, "d": "1991-02-03" > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01" > I fixed this to make it valid JSON (It is valid [JSON > Lines|[http://jsonlines.org/]], but that's another issue): > {{[{"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}},}} > {{{"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}}]}} > Then reading the JSON from a file called `my_data.json`: > {{from pyarrow import json}} > {{table = json.read_json("my_data.json")}} > Gives the following error: > {code:java} > ---}} > ArrowInvalid Traceback (most recent call last) > in () > 1 from pyarrow import json > > 2 table = json.read_json('test.json') > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/_json.pyx > in pyarrow._json.read_json() > ~/.local/share/virtualenvs/parquet-ifRxINoC/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: JSON parse error: A column changed from object to array > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9849) [Rust] [DataFusion] Make UDFs not need a Field
[ https://issues.apache.org/jira/browse/ARROW-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-9849. --- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8045 [https://github.com/apache/arrow/pull/8045] > [Rust] [DataFusion] Make UDFs not need a Field > -- > > Key: ARROW-9849 > URL: https://issues.apache.org/jira/browse/ARROW-9849 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge >Assignee: Jorge >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/pull/7967,] shows that it is possible to not > require users to pass a `Field` to UDFs declarations and instead just pass a > `DataType`. > Let's deprecate Field from them, and instead just use `DataType`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9464) [Rust] [DataFusion] Physical plan refactor to support optimization rules and more efficient use of threads
[ https://issues.apache.org/jira/browse/ARROW-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-9464. --- Resolution: Fixed Issue resolved by pull request 8034 [https://github.com/apache/arrow/pull/8034] > [Rust] [DataFusion] Physical plan refactor to support optimization rules and > more efficient use of threads > -- > > Key: ARROW-9464 > URL: https://issues.apache.org/jira/browse/ARROW-9464 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I would like to propose a refactor of the physical/execution planning based > on the experience I have had in implementing distributed execution in > Ballista. > This will likely need subtasks but here is an overview of the changes I am > proposing. > h3. *Introduce physical plan optimization rule to insert "shuffle" operators* > We should extend the ExecutionPlan trait so that each operator can specify > its input and output partitioning needs, and then have an optimization rule > that can insert any repartitioning or reordering steps required. > For example, these are the methods to be added to ExecutionPlan. This design > is based on Apache Spark. > > {code:java} > /// Specifies how data is partitioned across different nodes in the cluster > fn output_partitioning() -> Partitioning { > Partitioning::UnknownPartitioning(0) > } > /// Specifies the data distribution requirements of all the children for this > operator > fn required_child_distribution() -> Distribution { > Distribution::UnspecifiedDistribution > } > /// Specifies how data is ordered in each partition > fn output_ordering() -> Option> { > None > } > /// Specifies the data distribution requirements of all the children for this > operator > fn required_child_ordering() -> Option>> { > None > } > {code} > A good example of applying this rule would be in the case of hash aggregates > where we perform a partial aggregate in parallel across partitions and then > coalesce the results and apply a final hash aggregate. > Another example would be a SortMergeExec specifying the sort order required > for its children. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9816) [C++] Escape quotes in config.h
[ https://issues.apache.org/jira/browse/ARROW-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-9816. - Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8016 [https://github.com/apache/arrow/pull/8016] > [C++] Escape quotes in config.h > --- > > Key: ARROW-9816 > URL: https://issues.apache.org/jira/browse/ARROW-9816 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0, 1.0.1 >Reporter: Lawrence Chan >Assignee: Lawrence Chan >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the config.h file is generated without the `ESCAPE_QUOTES` option, > which cases quotes in e.g. CXXFLAGS to break config.h parsing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9816) [C++] Escape quotes in config.h
[ https://issues.apache.org/jira/browse/ARROW-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-9816: --- Assignee: Lawrence Chan > [C++] Escape quotes in config.h > --- > > Key: ARROW-9816 > URL: https://issues.apache.org/jira/browse/ARROW-9816 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0, 1.0.1 >Reporter: Lawrence Chan >Assignee: Lawrence Chan >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently the config.h file is generated without the `ESCAPE_QUOTES` option, > which cases quotes in e.g. CXXFLAGS to break config.h parsing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9813) [C++] Disable semantic interposition
[ https://issues.apache.org/jira/browse/ARROW-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-9813. - Resolution: Fixed Issue resolved by pull request 8048 [https://github.com/apache/arrow/pull/8048] > [C++] Disable semantic interposition > > > Key: ARROW-9813 > URL: https://issues.apache.org/jira/browse/ARROW-9813 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Trivial > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > On gcc, semantic interposition is enabled by default. It can be beneficial to > disable it when building Arrow libraries (and it's most certainly harmless > anyway). > See > https://stackoverflow.com/questions/35745543/new-option-in-gcc-5-3-fno-semantic-interposition > for more background on this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184756#comment-17184756 ] Lawrence Chan edited comment on ARROW-9820 at 8/25/20, 10:17 PM: - - Language-agnostic - once a storage driver is written/built, _any_ arrow library can load it (assuming we've finished implementing the plugin API). So rather than needing to add support to each language, I just need to write the wrapper once, and then users can use that filesystem in C++, python, go, rust, whatever. - Application-agnostic - if users want to use my storage driver in a downstream application, I can distribute a plugin and arrow can load the plugin at runtime without needing to do a special build of that application with my filesystem code. This greatly simplifies the ability for users to add storage functionality without recompiling the entire world that uses arrow. You might argue that this could be achieved by linking arrow as a shared library, but there are use cases where static linking is desirable, or use cases where I don't control the arrow shared library but the users can obtain my plugin. - Maintainer-friendly and Sysadmin-friendly - if I maintain a storage driver plugin, I can version control it entirely independently, distribute it separately from the arrow library, and have a simpler build system that doesnt necessarily need to integrate with the arrow cmake machinery. Otherwise somehow cmake needs to know about the extra filesystem implementation and needs to do something to embed it at compile-time. - There are also some functions in the C++ library that have hardcoded string comparisions to e.g. "hdfs". These are not the hardest ones to solve, because we could switch it to a lookup from a global mapping that the user can register factory function to, but I figured I would mention them anyways. If you are wondering about the concrete hurdle that prompted this, it's that the pyarrow bits are seemingly half wrappers to the C++ lib and and half implemented in python, with what I _think_ are manually-written Cython wrappers around the pieces that need to be visible in python. For my storage library, I don't really want to mess with forking pyarrow and writing Cython wrappers and rebuilding pyarrow, and I'd like to just do it once in C/C++ and have it work in pyarrow automatically. I understand the hesitation here, but I think the scary bits can be done safely, and I think this will open the doors to a more organized and community-driven collection of storage drivers without cluttering the arrow codebase. For some related prior art, this feels to me like a tiny lower-level version of CSI plugins. If we wanted to support the whole universe of drivers from within the arrow codebase, it would get pretty bloated. was (Author: llchan): - Language-agnostic - once a storage driver is written/built, _any_ arrow library can load it (assuming we've finished implementing the plugin API). So rather than needing to add support to each language, I just need to write the wrapper once, and then users can use that filesystem in C++, python, go, rust, whatever. - Application-agnostic - if users want to use my storage driver in a downstream application, I can distribute a plugin and arrow can load the plugin at runtime without needing to do a special build of that application with my filesystem code. This greatly simplifies the ability for users to add storage functionality without recompiling the entire world that uses arrow. You might argue that this could be achieved by linking arrow as a shared library, but there are use cases where static linking is desirable, or use cases where I don't control the arrow shared library but the users can obtain my plugin. - Maintainer-friendly - if I maintain a storage driver plugin, I can version control it entirely independently, distribute it separately from the arrow library, and have a simpler build system that doesnt necessarily need to integrate with the arrow cmake machinery. Otherwise somehow cmake needs to know about the extra filesystem implementation and needs to do something to embed it at compile-time. - There are also some functions in the C++ library that have hardcoded string comparisions to e.g. "hdfs". These are not the hardest ones to solve, because we could switch it to a lookup from a global mapping that the user can register factory function to, but I figured I would mention them anyways. If you are wondering about the concrete hurdle that prompted this, it's that the pyarrow bits are seemingly half wrappers to the C++ lib and and half implemented in python, with what I _think_ are manually-written Cython wrappers around the pieces that need to be visible in python. For my storage library, I don't really want to mess with forking pyarrow and writing Cython wrappers and rebuilding
[jira] [Commented] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184756#comment-17184756 ] Lawrence Chan commented on ARROW-9820: -- - Language-agnostic - once a storage driver is written/built, _any_ arrow library can load it (assuming we've finished implementing the plugin API). So rather than needing to add support to each language, I just need to write the wrapper once, and then users can use that filesystem in C++, python, go, rust, whatever. - Application-agnostic - if users want to use my storage driver in a downstream application, I can distribute a plugin and arrow can load the plugin at runtime without needing to do a special build of that application with my filesystem code. This greatly simplifies the ability for users to add storage functionality without recompiling the entire world that uses arrow. You might argue that this could be achieved by linking arrow as a shared library, but there are use cases where static linking is desirable, or use cases where I don't control the arrow shared library but the users can obtain my plugin. - Maintainer-friendly - if I maintain a storage driver plugin, I can version control it entirely independently, distribute it separately from the arrow library, and have a simpler build system that doesnt necessarily need to integrate with the arrow cmake machinery. Otherwise somehow cmake needs to know about the extra filesystem implementation and needs to do something to embed it at compile-time. - There are also some functions in the C++ library that have hardcoded string comparisions to e.g. "hdfs". These are not the hardest ones to solve, because we could switch it to a lookup from a global mapping that the user can register factory function to, but I figured I would mention them anyways. If you are wondering about the concrete hurdle that prompted this, it's that the pyarrow bits are seemingly half wrappers to the C++ lib and and half implemented in python, with what I _think_ are manually-written Cython wrappers around the pieces that need to be visible in python. For my storage library, I don't really want to mess with forking pyarrow and writing Cython wrappers and rebuilding pyarrow, and I'd like to just do it once in C/C++ and have it work in pyarrow automatically. I understand the hesitation here, but I think the scary bits can be done safely, and I think this will open the doors to a more organized and community-driven collection of storage drivers without cluttering the arrow codebase. For some related prior art, this feels to me like a tiny lower-level version of CSI plugins. If we wanted to support the whole universe of drivers from within the arrow codebase, it would get pretty bloated. > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9860) Arrow Flight JavaScript Client or Example
[ https://issues.apache.org/jira/browse/ARROW-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Monahan updated ARROW-9860: Component/s: Python > Arrow Flight JavaScript Client or Example > - > > Key: ARROW-9860 > URL: https://issues.apache.org/jira/browse/ARROW-9860 > Project: Apache Arrow > Issue Type: Wish > Components: JavaScript, Python >Reporter: Alex Monahan >Priority: Major > > Is it possible to use Apache Arrow Flight to send data from a Python Web > Server to a JavaScript browser client? If it is possible, is there a code > example to use to get started? > > If this is not possible, what is the fastest way to send data from a Python > Web Server to Apache Arrow in the browser today? Would it be faster to send a > Parquet file and unpack it client-side, or send Arrow directly/with gzip/ > etc.? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184736#comment-17184736 ] Wes McKinney commented on ARROW-9820: - What would not be solved by creating an implementation of {{arrow::fs::FileSystem}}? > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9860) Arrow Flight JavaScript Client or Example
Alex Monahan created ARROW-9860: --- Summary: Arrow Flight JavaScript Client or Example Key: ARROW-9860 URL: https://issues.apache.org/jira/browse/ARROW-9860 Project: Apache Arrow Issue Type: Wish Components: JavaScript Reporter: Alex Monahan Is it possible to use Apache Arrow Flight to send data from a Python Web Server to a JavaScript browser client? If it is possible, is there a code example to use to get started? If this is not possible, what is the fastest way to send data from a Python Web Server to Apache Arrow in the browser today? Would it be faster to send a Parquet file and unpack it client-side, or send Arrow directly/with gzip/ etc.? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9859) [C++] S3 FileSystemFromUri with special char in secret key fails
Neal Richardson created ARROW-9859: -- Summary: [C++] S3 FileSystemFromUri with special char in secret key fails Key: ARROW-9859 URL: https://issues.apache.org/jira/browse/ARROW-9859 Project: Apache Arrow Issue Type: Bug Components: C++, Documentation, Python Reporter: Neal Richardson Fix For: 2.0.0 S3 Secret access keys can contain special characters like {{/}}. When they do 1) FileSystemFromUri will fail to parse the URI unless you URL-encode them (e.g. replace / with %2F) 2) When you do escape the special characters, requests that require authorization fail with the message "The request signature we calculated does not match the signature you provided. Check your key and signing method." This may suggest that there's some extra URL encoding/decoding that needs to happen inside. I was only able to work around this by generating a new access key that happened not to have special characters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9858) [C++][Python][Docs] User guide for S3FileSystem
Neal Richardson created ARROW-9858: -- Summary: [C++][Python][Docs] User guide for S3FileSystem Key: ARROW-9858 URL: https://issues.apache.org/jira/browse/ARROW-9858 Project: Apache Arrow Issue Type: New Feature Components: C++, Documentation, Python Reporter: Neal Richardson Fix For: 2.0.0 https://arrow.apache.org/docs/python/filesystems.html is pretty thin https://arrow.apache.org/docs/python/api/filesystems.html doesn't mention S3 and in general there are some tricks to getting FileSystemFromUri to work -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9857) Failed to install Arrow 0.14.1
[ https://issues.apache.org/jira/browse/ARROW-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHOBHIT SHUKLA updated ARROW-9857: -- Description: We are seeing issue to install arrow R package on RHEL machines. Using below command to install arrow. *R -e "install.packages(\"remotes\",repos = \"http://cran.r-project.org\;);remotes::install_github(\"apache/arrow\", subdir = \"r\", ref = \"apache-arrow-0.14.1\")"* *Error logs :* The downloaded source packages are in '/tmp/Rtmpycmy4e/downloaded_packages' [0m[91mUpdating HTML index of packages in '.Library' [0m[91mMaking 'packages.html' ...[0m[91m done [0m[91mRunning `R CMD build`... [0m* checking for file '/tmp/Rtmpycmy4e/remotesa23015b2/apache-arrow-8d09de2/r/DESCRIPTION' ... OK * preparing 'arrow': * checking DESCRIPTION meta-information ... OK * cleaning src * running 'cleanup' * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * building 'arrow_0.14.1.tar.gz' [91m* installing *source* package 'arrow' ... [0m[91m** using staged installation [0m$ pkg-config --cflags --silence-errors arrow parquet PKGCONFIG_CFLAGS = "-DNDEBUG " $ pkg-config --libs arrow parquet PKGCONFIG_LIBS = "-lparquet -larrow " Found pkg-config cflags and libs! PKG_CFLAGS=-DNDEBUG -DARROW_R_WITH_ARROW PKG_LIBS=-lparquet -larrow [91m** libs [0mg++ -std=gnu++11 -I"/opt/ibm/conda/R/lib64/R/include" -DNDEBUG -DNDEBUG -DARROW_R_WITH_ARROW -I"/opt/ibm/conda/R/lib64/R/library/Rcpp/include" -I/usr/local/include -fvisibility=hidden -fpic -g -O2 -c array.cpp -o array.o [91mIn file included from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/macros.h:134:0, from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/r/headers.h:69, from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/RcppCommon.h:29, from ./arrow_types.h:24, from array.cpp:18: ./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:106: note: in definition of macro 'RCPP_EXPOSED_ENUM_AS' #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ ./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:106: note: in definition of macro 'RCPP_EXPOSED_ENUM_AS' #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ [0m[91m/opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:112: error: template argument 1 is invalid #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:80:3: note: in expansion of macro 'RCPP_EXPOSED_ENUM_AS' RCPP_EXPOSED_ENUM_AS(CLASS) \ ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ [0m[91m./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:77:109: note: in definition of macro 'RCPP_EXPOSED_ENUM_WRAP' #define RCPP_EXPOSED_ENUM_WRAP(CLASS) namespace Rcpp{ namespace traits{ template<> struct wrap_type_traits< CLASS >{typedef wrap_type_enum_tag wrap_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ [0m[91m./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type)
[jira] [Created] (ARROW-9857) Failed to install Arrow 0.14.1
SHOBHIT SHUKLA created ARROW-9857: - Summary: Failed to install Arrow 0.14.1 Key: ARROW-9857 URL: https://issues.apache.org/jira/browse/ARROW-9857 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.14.1 Reporter: SHOBHIT SHUKLA We are seeing issue to install arrow R package on RHEL machines. Using below command to install arrow. R -e "install.packages(\"remotes\",repos = \"http://cran.r-project.org\;);remotes::install_github(\"apache/arrow\", subdir = \"r\", ref = \"apache-arrow-0.14.1\")" *Error logs :* The downloaded source packages are in '/tmp/Rtmpycmy4e/downloaded_packages' [0m[91mUpdating HTML index of packages in '.Library' [0m[91mMaking 'packages.html' ...[0m[91m done [0m[91mRunning `R CMD build`... [0m* checking for file '/tmp/Rtmpycmy4e/remotesa23015b2/apache-arrow-8d09de2/r/DESCRIPTION' ... OK * preparing 'arrow': * checking DESCRIPTION meta-information ... OK * cleaning src * running 'cleanup' * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * building 'arrow_0.14.1.tar.gz' [91m* installing *source* package 'arrow' ... [0m[91m** using staged installation [0m$ pkg-config --cflags --silence-errors arrow parquet PKGCONFIG_CFLAGS = "-DNDEBUG " $ pkg-config --libs arrow parquet PKGCONFIG_LIBS = "-lparquet -larrow " Found pkg-config cflags and libs! PKG_CFLAGS=-DNDEBUG -DARROW_R_WITH_ARROW PKG_LIBS=-lparquet -larrow [91m** libs [0mg++ -std=gnu++11 -I"/opt/ibm/conda/R/lib64/R/include" -DNDEBUG -DNDEBUG -DARROW_R_WITH_ARROW -I"/opt/ibm/conda/R/lib64/R/library/Rcpp/include" -I/usr/local/include -fvisibility=hidden -fpic -g -O2 -c array.cpp -o array.o [91mIn file included from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/macros.h:134:0, from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/r/headers.h:69, from /opt/ibm/conda/R/lib64/R/library/Rcpp/include/RcppCommon.h:29, from ./arrow_types.h:24, from array.cpp:18: ./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:106: note: in definition of macro 'RCPP_EXPOSED_ENUM_AS' #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ ./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:106: note: in definition of macro 'RCPP_EXPOSED_ENUM_AS' #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ [0m[91m/opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:76:112: error: template argument 1 is invalid #define RCPP_EXPOSED_ENUM_AS(CLASS) namespace Rcpp{ namespace traits{ template<> struct r_type_traits< CLASS >{ typedef r_type_enum_tag r_category ; } ; }} ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:80:3: note: in expansion of macro 'RCPP_EXPOSED_ENUM_AS' RCPP_EXPOSED_ENUM_AS(CLASS) \ ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ [0m[91m./arrow_types.h:188:26: error: 'Type' is not a member of 'arrow::ipc::Message' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^ /opt/ibm/conda/R/lib64/R/library/Rcpp/include/Rcpp/macros/module.h:77:109: note: in definition of macro 'RCPP_EXPOSED_ENUM_WRAP' #define RCPP_EXPOSED_ENUM_WRAP(CLASS) namespace Rcpp{ namespace traits{ template<> struct wrap_type_traits< CLASS >{typedef wrap_type_enum_tag wrap_category ; } ; }} ^ ./arrow_types.h:188:1: note: in expansion of macro 'RCPP_EXPOSED_ENUM_NODECL' RCPP_EXPOSED_ENUM_NODECL(arrow::ipc::Message::Type) ^
[jira] [Commented] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184682#comment-17184682 ] Lawrence Chan commented on ARROW-9820: -- I agree lifetimes with C-based plugins require some care to get correct, but I think it is something we can design to be relatively safe for the end user. I have some work in progress that I can push up to a PR draft and it may be easier to discuss with some code in hand. The general gist of it is that anything allocated by the plugin will be immediately wrapped in safer C++ owning objects that will handle destruction. There will also be ABI versioning so that we have an upgrade path for future backwards-incompatible changes that are safe from dangerous ABI mismatches. I think some of this will be more clear once I get that PR pushed up. For context about our use case: we have an in-house data storage system that can read/write files via a userspace library, and it has a fair amount of overlap with arrow::fs stuff in spirit. I wrote OutputStream + RandomAccessFile subclasses and got the I/O working fine, but once I started looking at the pyarrow bindings and the dataset stuff I realized the other required changes would need to be hardcoded in a way that will be very difficult for me to maintain down the road, so I started thinking about pluggable storage drivers. > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9855) [R] Fix bad merge/Rcpp conflict
[ https://issues.apache.org/jira/browse/ARROW-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-9855. Resolution: Fixed Issue resolved by pull request 8053 [https://github.com/apache/arrow/pull/8053] > [R] Fix bad merge/Rcpp conflict > --- > > Key: ARROW-9855 > URL: https://issues.apache.org/jira/browse/ARROW-9855 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > ARROW-8001 merged after the switch to cpp11 but was based on master before > it, so that brought some generated code that still referenced Rcpp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9856) [R] Add bindings for string compute functions
Neal Richardson created ARROW-9856: -- Summary: [R] Add bindings for string compute functions Key: ARROW-9856 URL: https://issues.apache.org/jira/browse/ARROW-9856 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Neal Richardson See https://arrow.apache.org/docs/cpp/compute.html#string-predicates and below. Since R's base string functions, as well as stringr/stringi, aren't generics that we can define methods for, this will probably make most sense within the context of a dplyr expression where we have more control over the evaluation. This will require enabling utf8proc in the builds; there's already an rtools-package for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9855) [R] Fix bad merge/Rcpp conflict
[ https://issues.apache.org/jira/browse/ARROW-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9855: Assignee: Apache Arrow JIRA Bot (was: Neal Richardson) > [R] Fix bad merge/Rcpp conflict > --- > > Key: ARROW-9855 > URL: https://issues.apache.org/jira/browse/ARROW-9855 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Neal Richardson >Assignee: Apache Arrow JIRA Bot >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-8001 merged after the switch to cpp11 but was based on master before > it, so that brought some generated code that still referenced Rcpp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9855) [R] Fix bad merge/Rcpp conflict
[ https://issues.apache.org/jira/browse/ARROW-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9855: Assignee: Neal Richardson (was: Apache Arrow JIRA Bot) > [R] Fix bad merge/Rcpp conflict > --- > > Key: ARROW-9855 > URL: https://issues.apache.org/jira/browse/ARROW-9855 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-8001 merged after the switch to cpp11 but was based on master before > it, so that brought some generated code that still referenced Rcpp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9855) [R] Fix bad merge/Rcpp conflict
[ https://issues.apache.org/jira/browse/ARROW-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9855: -- Labels: pull-request-available (was: ) > [R] Fix bad merge/Rcpp conflict > --- > > Key: ARROW-9855 > URL: https://issues.apache.org/jira/browse/ARROW-9855 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-8001 merged after the switch to cpp11 but was based on master before > it, so that brought some generated code that still referenced Rcpp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9855) [R] Fix bad merge/Rcpp conflict
Neal Richardson created ARROW-9855: -- Summary: [R] Fix bad merge/Rcpp conflict Key: ARROW-9855 URL: https://issues.apache.org/jira/browse/ARROW-9855 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 2.0.0 ARROW-8001 merged after the switch to cpp11 but was based on master before it, so that brought some generated code that still referenced Rcpp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9854) [R] Support reading/writing data to/from S3
Neal Richardson created ARROW-9854: -- Summary: [R] Support reading/writing data to/from S3 Key: ARROW-9854 URL: https://issues.apache.org/jira/browse/ARROW-9854 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 2.0.0 Current S3 support is limited to (1) being able to instantiate an S3FileSystem object, primarily from a URI, and (2) ability to open_dataset from an S3 URI. Before widely declaring that we support S3 in R, we should be able to: * download dataset (i.e. copy files/directory recursively) * read_parquet/feather/etc. from S3 (use FileSystem->OpenInputFile(path)) * write_$FORMAT via FileSystem->OpenOutputStream(path) * write_dataset * for linux, an argument to install_arrow to help, assuming you've installed aws-sdk-cpp already (turn on ARROW_S3, AWSSDK_SOURCE=SYSTEM) * testing with minio on CI * set up a real test bucket and user for e2e testing * update docs and vignettes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3757) [R] R bindings for Flight RPC client
[ https://issues.apache.org/jira/browse/ARROW-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-3757: --- Fix Version/s: 2.0.0 > [R] R bindings for Flight RPC client > > > Key: ARROW-3757 > URL: https://issues.apache.org/jira/browse/ARROW-3757 > Project: Apache Arrow > Issue Type: New Feature > Components: FlightRPC, R >Reporter: Wes McKinney >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9761) [C++] Add experimental pull-based iterator structures to C interface implementation
[ https://issues.apache.org/jira/browse/ARROW-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9761: Assignee: Antoine Pitrou (was: Apache Arrow JIRA Bot) > [C++] Add experimental pull-based iterator structures to C interface > implementation > --- > > Key: ARROW-9761 > URL: https://issues.apache.org/jira/browse/ARROW-9761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This purpose of this would be to validate some initial use cases / workflows > prior to potentially formalizing the interface in the C ABI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9761) [C++] Add experimental pull-based iterator structures to C interface implementation
[ https://issues.apache.org/jira/browse/ARROW-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9761: Assignee: Apache Arrow JIRA Bot (was: Antoine Pitrou) > [C++] Add experimental pull-based iterator structures to C interface > implementation > --- > > Key: ARROW-9761 > URL: https://issues.apache.org/jira/browse/ARROW-9761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Apache Arrow JIRA Bot >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This purpose of this would be to validate some initial use cases / workflows > prior to potentially formalizing the interface in the C ABI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9761) [C++] Add experimental pull-based iterator structures to C interface implementation
[ https://issues.apache.org/jira/browse/ARROW-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9761: -- Labels: pull-request-available (was: ) > [C++] Add experimental pull-based iterator structures to C interface > implementation > --- > > Key: ARROW-9761 > URL: https://issues.apache.org/jira/browse/ARROW-9761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This purpose of this would be to validate some initial use cases / workflows > prior to potentially formalizing the interface in the C ABI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9853) [RUST] Implement "take" kernel for dictionary arrays
[ https://issues.apache.org/jira/browse/ARROW-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9853: -- Labels: pull-request-available (was: ) > [RUST] Implement "take" kernel for dictionary arrays > > > Key: ARROW-9853 > URL: https://issues.apache.org/jira/browse/ARROW-9853 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 1.0.0 >Reporter: Jörn Horstmann >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8001) [R][Dataset] Bindings for dataset writing
[ https://issues.apache.org/jira/browse/ARROW-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-8001. Resolution: Fixed Issue resolved by pull request 8041 [https://github.com/apache/arrow/pull/8041] > [R][Dataset] Bindings for dataset writing > - > > Key: ARROW-8001 > URL: https://issues.apache.org/jira/browse/ARROW-8001 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: dataset, pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This was started in ARROW-8002 but there's more to implement and test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9853) [RUST] Implement "take" kernel for dictionary arrays
Jörn Horstmann created ARROW-9853: - Summary: [RUST] Implement "take" kernel for dictionary arrays Key: ARROW-9853 URL: https://issues.apache.org/jira/browse/ARROW-9853 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 1.0.0 Reporter: Jörn Horstmann -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9761) [C++] Add experimental pull-based iterator structures to C interface implementation
[ https://issues.apache.org/jira/browse/ARROW-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-9761: - Assignee: Antoine Pitrou > [C++] Add experimental pull-based iterator structures to C interface > implementation > --- > > Key: ARROW-9761 > URL: https://issues.apache.org/jira/browse/ARROW-9761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 2.0.0 > > > This purpose of this would be to validate some initial use cases / workflows > prior to potentially formalizing the interface in the C ABI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8040) [Python][Packaging] Add Parquet encryption / OpenSSL to Python wheels
[ https://issues.apache.org/jira/browse/ARROW-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184135#comment-17184135 ] Itamar Turner-Trauring commented on ARROW-8040: --- I would like to get this working (doing this on behalf of a client)—the packaging sides seem relatively simple, just adding the right flags to the build scripts, and maybe making sure OpenSSL is compiled in statically. However, it doesn't seem like there's Python bindings for the encryption? Or at least, it's not clear to me how to use Parquet encryption from Python... So does that need to be done separately? Or is there an example I can look at? Thanks! > [Python][Packaging] Add Parquet encryption / OpenSSL to Python wheels > - > > Key: ARROW-8040 > URL: https://issues.apache.org/jira/browse/ARROW-8040 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Pandas and Parquet
[ https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184108#comment-17184108 ] Mayur Srivastava commented on ARROW-9812: - Thanks [~jorisvandenbossche] When ARROW-1644 is done, we can start using for non-pandas use cases. > [Python] Map data types doesn't work from Arrow to Pandas and Parquet > - > > Key: ARROW-9812 > URL: https://issues.apache.org/jira/browse/ARROW-9812 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mayur Srivastava >Priority: Major > > Hi, > I'm having problems using 'map' data type in Arrow/parquet/pandas. > I'm able to convert a pandas data frame to Arrow with a map data type. > But, Arrow to Pandas doesn't work. > When I write Arrow to Parquet, it seems to work, but I'm not sure if the data > type is written correctly. > When I read back Parquet to Arrow, it fails saying "reading list of structs" > is not supported. It seems that map is stored as list of structs. > There are two problems here: > # Map data type doesn't work from Arrow -> Pandas. > # Map data type doesn't get written to or read from Arrow -> Parquet. > Questions: > 1. Am I doing something wrong? Is there a way to get these to work? > 2. If these are unsupported features, will this be fixed in a future version? > Do you plans or ETA? > The following code example (followed by output) should demonstrate the issues: > I'm using Arrow 1.0.0 and Pandas 1.0.5. > Thanks! > Mayur > {code:java} > $ cat arrowtest.py > import pyarrow as pa > import pandas as pd > import pyarrow.parquet as pq > import traceback as tb > import io > print(f'PyArrow Version = {pa.__version__}') > print(f'Pandas Version = {pd.__version__}') > df1 = pd.DataFrame({'a': [[('b', '2')]]}) > print(f'df1') > print(f'{df1}') > print(f'Pandas -> Arrow') > try: > t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', > pa.map_(pa.string(), pa.string()))])) > print('PASSED') > print(t1) > except: > print(f'FAILED') > tb.print_exc() > print(f'Arrow -> Pandas') > try: > t1.to_pandas() > print('PASSED') > except: > print(f'FAILED') > tb.print_exc()print(f'Arrow -> Parquet') > fh = io.BytesIO() > try: > pq.write_table(t1, fh) > print('PASSED') > except: > print('FAILED') > tb.print_exc() > > print(f'Parquet -> Arrow') > try: > t2 = pq.read_table(source=fh) > print('PASSED') > print(t2) > except: > print('FAILED') > tb.print_exc() > {code} > {code:java} > $ python3.6 arrowtest.py > PyArrow Version = 1.0.0 > Pandas Version = 1.0.5 > df1 > a 0 [(b, 2)] > > Pandas -> Arrow > PASSED > pyarrow.Table > a: map > child 0, entries: struct not null > child 0, key: string not null > child 1, value: string > > Arrow -> Pandas > FAILED > Traceback (most recent call last): > File "arrowtest.py", line 26, in t1.to_pandas() > File "pyarrow/array.pxi", line 715, in > pyarrow.lib._PandasConvertible.to_pandas > File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File > "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in > table_to_blockmanager blocks = _table_to_blocks(options, table, categories, > ext_columns_dtypes) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line > 1115, in _table_to_blocks list(extension_columns.keys())) > File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File > "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for > Arrow data of type map is known. > > Arrow -> Parquet > PASSED > > Parquet -> Arrow > FAILED > Traceback (most recent call last): File "arrowtest.py", line 43, in > t2 = pq.read_table(source=fh) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in > read_table use_pandas_metadata=use_pandas_metadata) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in > read use_threads=use_threads > File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 122, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet > files not yet supported: key_value: list null, value: string> not null> not null > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9852) [C++] Fix crash on invalid IPC input (OSS-Fuzz)
[ https://issues.apache.org/jira/browse/ARROW-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9852: Assignee: Apache Arrow JIRA Bot (was: Antoine Pitrou) > [C++] Fix crash on invalid IPC input (OSS-Fuzz) > --- > > Key: ARROW-9852 > URL: https://issues.apache.org/jira/browse/ARROW-9852 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Apache Arrow JIRA Bot >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9852) [C++] Fix crash on invalid IPC input (OSS-Fuzz)
[ https://issues.apache.org/jira/browse/ARROW-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9852: Assignee: Antoine Pitrou (was: Apache Arrow JIRA Bot) > [C++] Fix crash on invalid IPC input (OSS-Fuzz) > --- > > Key: ARROW-9852 > URL: https://issues.apache.org/jira/browse/ARROW-9852 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9852) [C++] Fix crash on invalid IPC input (OSS-Fuzz)
[ https://issues.apache.org/jira/browse/ARROW-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9852: -- Labels: pull-request-available (was: ) > [C++] Fix crash on invalid IPC input (OSS-Fuzz) > --- > > Key: ARROW-9852 > URL: https://issues.apache.org/jira/browse/ARROW-9852 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9852) [C++] Fix crash on invalid IPC input (OSS-Fuzz)
Antoine Pitrou created ARROW-9852: - Summary: [C++] Fix crash on invalid IPC input (OSS-Fuzz) Key: ARROW-9852 URL: https://issues.apache.org/jira/browse/ARROW-9852 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays
[ https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184086#comment-17184086 ] Andrew Lamb commented on ARROW-9275: In general, I think the notion of implementing async Parquet and Arrow APIs that don't rely on tokio or other executors is a good idea. I think in order to make the crate as widely useful as possible, it should also retain a synchronous API for use with the rust standard library. One pattern I have seen is a using a `async` crate option that adds the appropriate async options (and possibly additional dependencies). For example, https://docs.rs/bzip2/0.4.1/bzip2/#async-io > [Rust] – Async Sans IO: R/W into/to Arrow Arrays > > > Key: ARROW-9275 > URL: https://issues.apache.org/jira/browse/ARROW-9275 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > This issue can be considered an epic level that spans across other arrow > projects. > *Drill down* > Currently, traits like `ParquetReader` only allow synchronous interface which > uses BufReader having 8KB constant buffer. Over the network, this becomes a > problem. This can be easily solvable with differential buffers. In addition > to this shortage, there is a problem of executor engine is needed to schedule > from async trait methods to sync trait methods which should sit somewhere in > between to make requests asynchronous to external IO. On-disk IO is > acceptable with the approach we currently have since no reliable evented IO > exists for on-disk IO on major platforms. > All these considered abstractions that will expose asynchronous IO without > any side from executors, needs to be exposed. > > *Design Suggestions & Considerations* > The design should apply and consider: > * Sans IO, (for more information about Sans approach please see > [https://sans-io.readthedocs.io/] ) > * Not including any executor specific data, at all. > * Tests should work with any executor with little to no modification. > * Buffers are adjusted accordingly and use differential buffers to optimize > network trips. > * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO > traits or we do overlapping implementation, that will make our life harder in > the future. Sans IO should be compartmentalized. > > *Notes* > If Sans approach is not taken, the project will: > * use an extreme amount of dependencies. > * be not compatible with other Rust code at all. > * break currently working code uses array ingestions. > * integrations tests are going to be harder. > * it will really hard to adapt to completion-based APIs stabilize in the > future. (in the user projects) > * this suggestion is not about the flight format or any flight-related > information atm. This is purely making on-disk, remote IO (provider backends > like AWS etc.) async. > > *Open points* > A couple of open points: > * Identifying traits that are going to be asyncized. > * Designing internal routines. > * package name to expose. > * Gather traits into the designated packages in all file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9851) [C++] Valgrind errors due to unrecognized instructions
[ https://issues.apache.org/jira/browse/ARROW-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9851: -- Labels: pull-request-available (was: ) > [C++] Valgrind errors due to unrecognized instructions > -- > > Key: ARROW-9851 > URL: https://issues.apache.org/jira/browse/ARROW-9851 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Valgrind seems to barf on AVX512 instructions: > https://github.com/ursa-labs/crossbow/runs/1025065792 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9851) [C++] Valgrind errors due to unrecognized instructions
[ https://issues.apache.org/jira/browse/ARROW-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184010#comment-17184010 ] Antoine Pitrou commented on ARROW-9851: --- AVX512 support is still not merged in Valgrind mainline: https://bugs.kde.org/show_bug.cgi?id=383010 > [C++] Valgrind errors due to unrecognized instructions > -- > > Key: ARROW-9851 > URL: https://issues.apache.org/jira/browse/ARROW-9851 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > > Valgrind seems to barf on AVX512 instructions: > https://github.com/ursa-labs/crossbow/runs/1025065792 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9851) [C++] Valgrind errors due to unrecognized instructions
Antoine Pitrou created ARROW-9851: - Summary: [C++] Valgrind errors due to unrecognized instructions Key: ARROW-9851 URL: https://issues.apache.org/jira/browse/ARROW-9851 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Valgrind seems to barf on AVX512 instructions: https://github.com/ursa-labs/crossbow/runs/1025065792 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9781) [C++] Fix uninitialized value warnings
[ https://issues.apache.org/jira/browse/ARROW-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-9781: -- Description: The nightly valgrind build show warnings due to unitialized values: [https://github.com/ursa-labs/crossbow/runs/996955686] (was: The nightly valgrind build has failures due to unitialized values: https://github.com/ursa-labs/crossbow/runs/996955686) > [C++] Fix uninitialized value warnings > -- > > Key: ARROW-9781 > URL: https://issues.apache.org/jira/browse/ARROW-9781 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The nightly valgrind build show warnings due to unitialized values: > [https://github.com/ursa-labs/crossbow/runs/996955686] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9781) [C++] Fix uninitialized value warnings
[ https://issues.apache.org/jira/browse/ARROW-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-9781: -- Summary: [C++] Fix uninitialized value warnings (was: [C++] Fix valgrind uninitialized value warnings) > [C++] Fix uninitialized value warnings > -- > > Key: ARROW-9781 > URL: https://issues.apache.org/jira/browse/ARROW-9781 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The nightly valgrind build has failures due to unitialized values: > https://github.com/ursa-labs/crossbow/runs/996955686 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9813) [C++] Disable semantic interposition
[ https://issues.apache.org/jira/browse/ARROW-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9813: Assignee: Antoine Pitrou (was: Apache Arrow JIRA Bot) > [C++] Disable semantic interposition > > > Key: ARROW-9813 > URL: https://issues.apache.org/jira/browse/ARROW-9813 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Trivial > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > On gcc, semantic interposition is enabled by default. It can be beneficial to > disable it when building Arrow libraries (and it's most certainly harmless > anyway). > See > https://stackoverflow.com/questions/35745543/new-option-in-gcc-5-3-fno-semantic-interposition > for more background on this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9813) [C++] Disable semantic interposition
[ https://issues.apache.org/jira/browse/ARROW-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9813: Assignee: Apache Arrow JIRA Bot (was: Antoine Pitrou) > [C++] Disable semantic interposition > > > Key: ARROW-9813 > URL: https://issues.apache.org/jira/browse/ARROW-9813 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Apache Arrow JIRA Bot >Priority: Trivial > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > On gcc, semantic interposition is enabled by default. It can be beneficial to > disable it when building Arrow libraries (and it's most certainly harmless > anyway). > See > https://stackoverflow.com/questions/35745543/new-option-in-gcc-5-3-fno-semantic-interposition > for more background on this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9813) [C++] Disable semantic interposition
[ https://issues.apache.org/jira/browse/ARROW-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9813: -- Labels: pull-request-available (was: ) > [C++] Disable semantic interposition > > > Key: ARROW-9813 > URL: https://issues.apache.org/jira/browse/ARROW-9813 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Trivial > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > On gcc, semantic interposition is enabled by default. It can be beneficial to > disable it when building Arrow libraries (and it's most certainly harmless > anyway). > See > https://stackoverflow.com/questions/35745543/new-option-in-gcc-5-3-fno-semantic-interposition > for more background on this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9813) [C++] Disable semantic interposition
[ https://issues.apache.org/jira/browse/ARROW-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-9813: - Assignee: Antoine Pitrou > [C++] Disable semantic interposition > > > Key: ARROW-9813 > URL: https://issues.apache.org/jira/browse/ARROW-9813 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Trivial > Fix For: 2.0.0 > > > On gcc, semantic interposition is enabled by default. It can be beneficial to > disable it when building Arrow libraries (and it's most certainly harmless > anyway). > See > https://stackoverflow.com/questions/35745543/new-option-in-gcc-5-3-fno-semantic-interposition > for more background on this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9702) [C++] Move bpacking simd to runtime path
[ https://issues.apache.org/jira/browse/ARROW-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-9702. --- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 7940 [https://github.com/apache/arrow/pull/7940] > [C++] Move bpacking simd to runtime path > > > Key: ARROW-9702 > URL: https://issues.apache.org/jira/browse/ARROW-9702 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Frank Du >Assignee: Frank Du >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently there are some static avx512 SIMD codes for unpack32 function, it > should be reworked to runtime path. Also it can be implemented with avx2. > > The unpack32 API is used by PlainDecodingBoolean. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9844) [Go][CI] Add Travis CI job for Go on s390x
[ https://issues.apache.org/jira/browse/ARROW-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9844: -- Labels: pull-request-available (was: ) > [Go][CI] Add Travis CI job for Go on s390x > -- > > Key: ARROW-9844 > URL: https://issues.apache.org/jira/browse/ARROW-9844 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Go >Reporter: Vivian Kong >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As suggested in [https://github.com/apache/arrow/pull/8011], add a Travis CI > job for Go on s390x. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183975#comment-17183975 ] Antoine Pitrou edited comment on ARROW-9820 at 8/25/20, 12:21 PM: -- Thanks for posting this. I agree it would be a good idea to allow adding custom filesystem implementations. Some more comments: 1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow implementations don't necessarily provide the same facilities. That said, the ones that bind around Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++. 2) If using C rather than C++ , how would we handle lifetime and ownership issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... (if someone OTOH wants to write a C Arrow implementation, nobody will object :)) 3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to add a new filesystem type. If that's what you mean by "runtime", then let's do that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to have to call a registration function). 4) filesystem API stability: we can change the API assuming there are *good* reasons to change it. But that's orthogonal to this issue, and you should open separate JIRAs for that. Given all this, perhaps you could tell us a bit more about what kind of plugin API you're expecting or able to work with. was (Author: pitrou): Thanks for posting this. I agree it would be a good idea to allow adding custom filesystem implementations. Some more comments: 1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow implementations don't necessarily provide. That said, the ones that bind around Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++. 2) If using C rather than C++ , how would we handle lifetime and ownership issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... (if someone OTOH wants to write a C Arrow implementation, nobody will object :-)) 3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to add a new filesystem type. If that's what you mean by "runtime", then let's do that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to have to call a registration function). 4) filesystem API stability: we can change the API assuming there are *good* reasons to change it. But that's orthogonal to this issue, and you should open separate JIRAs for that. Given all this, perhaps you could tell us a bit more about what kind of plugin API you're expecting or able to work with. > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183975#comment-17183975 ] Antoine Pitrou commented on ARROW-9820: --- Thanks for posting this. I agree it would be a good idea to allow adding custom filesystem implementations. Some more comments: 1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow implementations don't necessarily provide. That said, the ones that bind around Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++. 2) If using C rather than C++, how would we handle lifetime and ownership issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... (if someone OTOH wants to write a C Arrow implementation, nobody will object :-)) 3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to add a new filesystem type. If that's what you mean by "runtime", then let's do that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to have to call a registration function). 4) filesystem API stability: we can change the API assuming there are *good* reasons to change it. But that's orthogonal to this issue, and you should open separate JIRAs for that. Given all this, perhaps you could tell us a bit more about what kind of plugin API you're expecting or able to work with. > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9820) [C++] Plugin Architecture for Filesystem and File IO
[ https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183975#comment-17183975 ] Antoine Pitrou edited comment on ARROW-9820 at 8/25/20, 12:20 PM: -- Thanks for posting this. I agree it would be a good idea to allow adding custom filesystem implementations. Some more comments: 1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow implementations don't necessarily provide. That said, the ones that bind around Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++. 2) If using C rather than C++ , how would we handle lifetime and ownership issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... (if someone OTOH wants to write a C Arrow implementation, nobody will object :-)) 3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to add a new filesystem type. If that's what you mean by "runtime", then let's do that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to have to call a registration function). 4) filesystem API stability: we can change the API assuming there are *good* reasons to change it. But that's orthogonal to this issue, and you should open separate JIRAs for that. Given all this, perhaps you could tell us a bit more about what kind of plugin API you're expecting or able to work with. was (Author: pitrou): Thanks for posting this. I agree it would be a good idea to allow adding custom filesystem implementations. Some more comments: 1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow implementations don't necessarily provide. That said, the ones that bind around Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++. 2) If using C rather than C++, how would we handle lifetime and ownership issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... (if someone OTOH wants to write a C Arrow implementation, nobody will object :-)) 3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to add a new filesystem type. If that's what you mean by "runtime", then let's do that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to have to call a registration function). 4) filesystem API stability: we can change the API assuming there are *good* reasons to change it. But that's orthogonal to this issue, and you should open separate JIRAs for that. Given all this, perhaps you could tell us a bit more about what kind of plugin API you're expecting or able to work with. > [C++] Plugin Architecture for Filesystem and File IO > > > Key: ARROW-9820 > URL: https://issues.apache.org/jira/browse/ARROW-9820 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Lawrence Chan >Priority: Minor > > Adding a new custom filesystem with corresponding file i/o streams is quite a > process at the moment. Looks like HDFS and S3FS are basically hardcoded in > many places. It would be useful to develop a plugin system to allow users to > interface with other data stores without maintaining a permanent fork with > hardcoded changes. > We can either do runtime plugins or compile-time plugins. Runtime is more > user-friendly, but with C++, ABI compatibility is fairly delicate. So we > would either want to use a C ABI or accept a youre-on-your-own situation > where the user is expected to be very careful with versioning and compiler > flags. > With compile-time plugins, maybe there's a way to have the cmake machinery > build third party code and also register those new URI schemes automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9849) [Rust] [DataFusion] Make UDFs not need a Field
[ https://issues.apache.org/jira/browse/ARROW-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9849: Assignee: Jorge (was: Apache Arrow JIRA Bot) > [Rust] [DataFusion] Make UDFs not need a Field > -- > > Key: ARROW-9849 > URL: https://issues.apache.org/jira/browse/ARROW-9849 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge >Assignee: Jorge >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/pull/7967,] shows that it is possible to not > require users to pass a `Field` to UDFs declarations and instead just pass a > `DataType`. > Let's deprecate Field from them, and instead just use `DataType`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9849) [Rust] [DataFusion] Make UDFs not need a Field
[ https://issues.apache.org/jira/browse/ARROW-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-9849: Assignee: Apache Arrow JIRA Bot (was: Jorge) > [Rust] [DataFusion] Make UDFs not need a Field > -- > > Key: ARROW-9849 > URL: https://issues.apache.org/jira/browse/ARROW-9849 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge >Assignee: Apache Arrow JIRA Bot >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/pull/7967,] shows that it is possible to not > require users to pass a `Field` to UDFs declarations and instead just pass a > `DataType`. > Let's deprecate Field from them, and instead just use `DataType`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9849) [Rust] [DataFusion] Make UDFs not need a Field
[ https://issues.apache.org/jira/browse/ARROW-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9849: -- Labels: pull-request-available (was: ) > [Rust] [DataFusion] Make UDFs not need a Field > -- > > Key: ARROW-9849 > URL: https://issues.apache.org/jira/browse/ARROW-9849 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge >Assignee: Jorge >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/pull/7967,] shows that it is possible to not > require users to pass a `Field` to UDFs declarations and instead just pass a > `DataType`. > Let's deprecate Field from them, and instead just use `DataType`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9699) [C++][Compute] Improve mode kernel performance for small integer types
[ https://issues.apache.org/jira/browse/ARROW-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-9699. --- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 7963 [https://github.com/apache/arrow/pull/7963] > [C++][Compute] Improve mode kernel performance for small integer types > -- > > Key: ARROW-9699 > URL: https://issues.apache.org/jira/browse/ARROW-9699 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Mode kernel usesl hash table to count distinct values. For small integer > types (bool, int8, uint8), counting directly with a value indexed array can > be more efficient. This card is to evaluate the approach and upstream patch > if workable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9850) [Go] Defer should not be used in the loop
[ https://issues.apache.org/jira/browse/ARROW-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9850: -- Labels: pull-request-available (was: ) > [Go] Defer should not be used in the loop > - > > Key: ARROW-9850 > URL: https://issues.apache.org/jira/browse/ARROW-9850 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Affects Versions: 1.0.0 >Reporter: FredGan >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As is described in the second section in > [https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01] > > defer inside the loop may cause unforeseen problems. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9850) [Go] Defer should not be used in the loop
FredGan created ARROW-9850: -- Summary: [Go] Defer should not be used in the loop Key: ARROW-9850 URL: https://issues.apache.org/jira/browse/ARROW-9850 Project: Apache Arrow Issue Type: Improvement Components: Go Affects Versions: 1.0.0 Reporter: FredGan As is described in the second section in [https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01] defer inside the loop may cause unforeseen problems. -- This message was sent by Atlassian Jira (v8.3.4#803005)