[jira] [Resolved] (ARROW-10292) [Rust] [DataFusion] Simplify merge
[ https://issues.apache.org/jira/browse/ARROW-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão resolved ARROW-10292. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 8453 [https://github.com/apache/arrow/pull/8453] > [Rust] [DataFusion] Simplify merge > -- > > Key: ARROW-10292 > URL: https://issues.apache.org/jira/browse/ARROW-10292 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10289) [Rust] Support reading dictionary streams
[ https://issues.apache.org/jira/browse/ARROW-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão reassigned ARROW-10289: Assignee: Neville Dipale > [Rust] Support reading dictionary streams > - > > Key: ARROW-10289 > URL: https://issues.apache.org/jira/browse/ARROW-10289 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Affects Versions: 2.0.0 >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We support reading dictionaries in the IPC file reader. > We should do the same with the stream reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10289) [Rust] Support reading dictionary streams
[ https://issues.apache.org/jira/browse/ARROW-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão resolved ARROW-10289. -- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8450 [https://github.com/apache/arrow/pull/8450] > [Rust] Support reading dictionary streams > - > > Key: ARROW-10289 > URL: https://issues.apache.org/jira/browse/ARROW-10289 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Affects Versions: 2.0.0 >Reporter: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We support reading dictionaries in the IPC file reader. > We should do the same with the stream reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213567#comment-17213567 ] utsav commented on ARROW-10276: --- An update. I upgraded to Spark 3.0.1 and received the same error > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10304) [C++][Compute] Optimize variance kernel for integers
Yibo Cai created ARROW-10304: Summary: [C++][Compute] Optimize variance kernel for integers Key: ARROW-10304 URL: https://issues.apache.org/jira/browse/ARROW-10304 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yibo Cai Assignee: Yibo Cai Current variance kernel converts all data type to `double` before calculation. It's sub-optimal for integers. Integer arithmetic is much faster than floating points, e.g., summation is 4x faster [1]. A quick test for calculating int32 variance shows up to 3x performance gain. Another benefit is that integer arithmetic is accurate. [1] https://quick-bench.com/q/_Sz-Peq1MNWYwZYrTtQDx3GI7lQ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] utsav updated ARROW-10276: -- Comment: was deleted (was: An update. I tried running the code in a script 20/10/13 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/10/13 23:29:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/10/13 23:29:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes +---+--+ |_c0|_c1| +---+--+ |1582999200|1| |1582999260|1| |1582999320|1| |1582999380|1| |1582999440|1| |1582999500|1| |1582999560|1| |1582999620|1| |1582999680|1| |1582999740|1| |1582999800|1| |1582999860|1| |158220|1| |158280|1| |158340|1| |1583000100|1| |1583000160|1| |1583000220|1| |1583000280|1| |1583000340|1| +---+--+ only showing top 20 rows /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true. warnings.warn(msg) I then did:- `pip3 show pyarrow ` Name: pyarrow Version: 0.17.0 Summary: Python library for Apache Arrow Home-page: [https://arrow.apache.org/] Author: Apache Arrow Developers Author-email: d...@arrow.apache.org License: Apache License, Version 2.0 Location: /home/xilinx/.local/lib/python3.6/site-packages Requires: numpy Required-by: It definitely exist in my PYTHONPATH as I added the following in bashrc and sourced it to activate `export PYTHONPATH=/home/xilinx/.local/lib/python3.6/site-packages:$PYTHONPATH` ) > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213484#comment-17213484 ] utsav edited comment on ARROW-10276 at 10/13/20, 11:34 PM: --- An update. I tried running the code in a script 20/10/13 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/10/13 23:29:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/10/13 23:29:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes +---+--+ |_c0|_c1| +---+--+ |1582999200|1| |1582999260|1| |1582999320|1| |1582999380|1| |1582999440|1| |1582999500|1| |1582999560|1| |1582999620|1| |1582999680|1| |1582999740|1| |1582999800|1| |1582999860|1| |158220|1| |158280|1| |158340|1| |1583000100|1| |1583000160|1| |1583000220|1| |1583000280|1| |1583000340|1| +---+--+ only showing top 20 rows /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true. warnings.warn(msg) I then did:- `pip3 show pyarrow ` Name: pyarrow Version: 0.17.0 Summary: Python library for Apache Arrow Home-page: [https://arrow.apache.org/] Author: Apache Arrow Developers Author-email: d...@arrow.apache.org License: Apache License, Version 2.0 Location: /home/xilinx/.local/lib/python3.6/site-packages Requires: numpy Required-by: It definitely exist in my PYTHONPATH as I added the following in bashrc and sourced it to activate `export PYTHONPATH=/home/xilinx/.local/lib/python3.6/site-packages:$PYTHONPATH` was (Author: utri092): An update. I tried running the code in a script 20/10/13 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/10/13 23:29:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/10/13 23:29:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes +--+---+ | _c0|_c1| +--+---+ |1582999200| 1| |1582999260| 1| |1582999320| 1| |1582999380| 1| |1582999440| 1| |1582999500| 1| |1582999560| 1| |1582999620| 1| |1582999680| 1| |1582999740| 1| |1582999800| 1| |1582999860| 1| |158220| 1| |158280| 1| |158340| 1| |1583000100| 1| |1583000160| 1| |1583000220| 1| |1583000280| 1| |1583000340| 1| +--+---+ only showing top 20 rows /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true. warnings.warn(msg) I then did pip3 show pyarrow Name: pyarrow Version: 0.17.0 Summary: Python library for Apache Arrow Home-page: https://arrow.apache.org/ Author: Apache Arrow Developers Author-email: d...@arrow.apache.org License: Apache License, Version 2.0 Location: /home/xilinx/.local/lib/python3.6/site-packages Requires: numpy Required-by: It definitely exist in my PYTHONPATH as I added the following in bashrc and sourced it to activate export PYTHONPATH=/home/xilinx/.local/lib/python3.6/site-packages:$PYTHONPATH > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using
[jira] [Commented] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213484#comment-17213484 ] utsav commented on ARROW-10276: --- An update. I tried running the code in a script 20/10/13 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/10/13 23:29:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/10/13 23:29:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes +--+---+ | _c0|_c1| +--+---+ |1582999200| 1| |1582999260| 1| |1582999320| 1| |1582999380| 1| |1582999440| 1| |1582999500| 1| |1582999560| 1| |1582999620| 1| |1582999680| 1| |1582999740| 1| |1582999800| 1| |1582999860| 1| |158220| 1| |158280| 1| |158340| 1| |1583000100| 1| |1583000160| 1| |1583000220| 1| |1583000280| 1| |1583000340| 1| +--+---+ only showing top 20 rows /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true. warnings.warn(msg) I then did pip3 show pyarrow Name: pyarrow Version: 0.17.0 Summary: Python library for Apache Arrow Home-page: https://arrow.apache.org/ Author: Apache Arrow Developers Author-email: d...@arrow.apache.org License: Apache License, Version 2.0 Location: /home/xilinx/.local/lib/python3.6/site-packages Requires: numpy Required-by: It definitely exist in my PYTHONPATH as I added the following in bashrc and sourced it to activate export PYTHONPATH=/home/xilinx/.local/lib/python3.6/site-packages:$PYTHONPATH > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213463#comment-17213463 ] utsav commented on ARROW-10276: --- [~uwe] according to ARROW-8420 I posted earlier in my issue. Support for armv7 was added only in 0.17.0. I cannot use 0.8.0. I tried to build and it failed. I even set {{export ARROW_PRE_0_15_IPC_FORMAT=1 in conf/spark-env.sh according to the link you sent me but no luck.}} > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213449#comment-17213449 ] utsav edited comment on ARROW-10276 at 10/13/20, 10:07 PM: --- [~uwe] will try and let you know. I guess the orc and flight flags are separate issue in themselves. At the moment it cannot build with them set to On was (Author: utri092): [~uwe] will try and let you know. I guess the orc and flight flags are separate issue in themselves. At the moment it cannot build with them > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213449#comment-17213449 ] utsav commented on ARROW-10276: --- [~uwe] will try and let you know. I guess the orc and flight flags are separate issue in themselves. At the moment it cannot build with them > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10303) Parallel type transformation in CSV reader
Sergej Fries created ARROW-10303: Summary: Parallel type transformation in CSV reader Key: ARROW-10303 URL: https://issues.apache.org/jira/browse/ARROW-10303 Project: Apache Arrow Issue Type: Wish Components: Rust Reporter: Sergej Fries Attachments: tracing.png Currently, when the CSV file is read, a single thread is responsible for reading the file and for transformation of returned string values into correct data types. In my case, reading a 2 GB CSV file with a dozen of float columns, takes ~40 seconds. Out of this time, only ~10% of this is reading the file, and ~68% is transformation of the string values into correct data types. My proposal is to parallelize the part responsible for the data type transformation. It seems to be quite simple to achieve since after the CSV reader reads a batch, all projected columns are transformed one by one using an iterator over vector and a map function afterwards. I believe that if one uses the rayon crate, the only change will be the adjustment of "iter()" into "par_iter()" and changing {color:#0033b3}impl{color}<{color:#20999d}R{color}: {color:#00}Read{color}> {color:#00}Reader{color}<{color:#20999d}R{color}> into: {color:#0033b3}impl{color}<{color:#20999d}R{color}: {color:#00}Read {color}+ {color:#00}std{color}::{color:#00}marker{color}::{color:#00}Sync{color}> {color:#00}Reader{color}<{color:#20999d}R{color}> But maybe I oversee something crucial (as being quite new in Rust and Arrow). Any advise from someone experienced is therefore very welcome! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10253) [Python] Don't bundle plasma-store-server in pyarrow conda package
[ https://issues.apache.org/jira/browse/ARROW-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Korn resolved ARROW-10253. -- Resolution: Duplicate > [Python] Don't bundle plasma-store-server in pyarrow conda package > -- > > Key: ARROW-10253 > URL: https://issues.apache.org/jira/browse/ARROW-10253 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > > We currently have it in the {{arrow-cpp}} and the {{pyarrow}} conda package, > we should only have it in {{arrow-cpp}} as this is always there and also the > source of the binary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10302) [Python] Don't double-package plasma-store-server
[ https://issues.apache.org/jira/browse/ARROW-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10302: --- Labels: pull-request-available (was: ) > [Python] Don't double-package plasma-store-server > - > > Key: ARROW-10302 > URL: https://issues.apache.org/jira/browse/ARROW-10302 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This is part of the {{arrow-cpp}} and {{pyarrow}} conda packages. We > shouldn't ship the version in {{pyarrow}} as this is just a copy to a > different location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10302) [Python] Don't double-package plasma-store-server
Uwe Korn created ARROW-10302: Summary: [Python] Don't double-package plasma-store-server Key: ARROW-10302 URL: https://issues.apache.org/jira/browse/ARROW-10302 Project: Apache Arrow Issue Type: Improvement Components: Packaging, Python Reporter: Uwe Korn Assignee: Uwe Korn Fix For: 3.0.0 This is part of the {{arrow-cpp}} and {{pyarrow}} conda packages. We shouldn't ship the version in {{pyarrow}} as this is just a copy to a different location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10301) Add "all" boolean reducing kernel
Andrew Wieteska created ARROW-10301: --- Summary: Add "all" boolean reducing kernel Key: ARROW-10301 URL: https://issues.apache.org/jira/browse/ARROW-10301 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Reporter: Andrew Wieteska Assignee: Andrew Wieteska Fix For: 3.0.0 As discussed on GitHub: [https://github.com/apache/arrow/pull/8294#discussion_r504034461] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9164) [C++] Provide APIs for adding "docstrings" to arrow::compute::Function classes that can be accessed by bindings
[ https://issues.apache.org/jira/browse/ARROW-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9164: -- Labels: pull-request-available (was: ) > [C++] Provide APIs for adding "docstrings" to arrow::compute::Function > classes that can be accessed by bindings > --- > > Key: ARROW-9164 > URL: https://issues.apache.org/jira/browse/ARROW-9164 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10300) [Rust] Parquet/CSV TPC-H data
Remi Dettai created ARROW-10300: --- Summary: [Rust] Parquet/CSV TPC-H data Key: ARROW-10300 URL: https://issues.apache.org/jira/browse/ARROW-10300 Project: Apache Arrow Issue Type: Wish Components: Rust Reporter: Remi Dettai The TPC-H benchmark for datafusion works with Parquet/CSV data but the data generation routine described in the README generates `.tbl` data. Could we describe how the TPC-H Parquet/CSV data can be generated to make the benchmark easier to setup and more reproducible ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10299) [Rust] Support reading and writing V5 of IPC metadata
Neville Dipale created ARROW-10299: -- Summary: [Rust] Support reading and writing V5 of IPC metadata Key: ARROW-10299 URL: https://issues.apache.org/jira/browse/ARROW-10299 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale This is mostly alignment issues and tracking when we encounter the v4 legacy padding. I had done this work in another branch, but discarded it without noticing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10295) [Rust] [DataFusion] Simplify accumulators
[ https://issues.apache.org/jira/browse/ARROW-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-10295: Summary: [Rust] [DataFusion] Simplify accumulators (was: [Rist] [DataFusion] Simplify accumulators) > [Rust] [DataFusion] Simplify accumulators > - > > Key: ARROW-10295 > URL: https://issues.apache.org/jira/browse/ARROW-10295 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Replace Rc> by Box<>. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10296) [R] Data saved as integer64 loaded as integer
[ https://issues.apache.org/jira/browse/ARROW-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-10296. - Fix Version/s: 2.0.0 Assignee: Neal Richardson Resolution: Duplicate This is a deliberate [feature|https://arrow.apache.org/docs/r/news/index.html#arrow-format-conversion], but in the upcoming release you'll be able to [disable it|https://github.com/apache/arrow/blob/master/r/NEWS.md#bug-fixes-and-other-enhancements]. > [R] Data saved as integer64 loaded as integer > - > > Key: ARROW-10296 > URL: https://issues.apache.org/jira/browse/ARROW-10296 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 1.0.1 > Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 > full sessionIfno(): > R version 3.6.1 (2019-07-05) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19041) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 > crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 > [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 > rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 > [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 > purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 > [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 >Reporter: Ofek Shilon >Assignee: Neal Richardson >Priority: Major > Fix For: 2.0.0 > > > {{> v <- bit64::as.integer64(1:10)}} > {{> df <- data.frame(v=v)}} > {{> class(df$v)}} > {{[1] "*integer64*"}} > {{> arrow::write_feather(df, "./tmp")}} > {{> df2 <- arrow::read_feather("./tmp")}} > {{> class(df2$v)}} > {{[1] "*integer*"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10295) [Rist] [DataFusion] Simplify accumulators
[ https://issues.apache.org/jira/browse/ARROW-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-10295. Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8456 [https://github.com/apache/arrow/pull/8456] > [Rist] [DataFusion] Simplify accumulators > - > > Key: ARROW-10295 > URL: https://issues.apache.org/jira/browse/ARROW-10295 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Replace Rc> by Box<>. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10293) [Rust] [DataFusion] Fix benchmarks
[ https://issues.apache.org/jira/browse/ARROW-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-10293. Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8452 [https://github.com/apache/arrow/pull/8452] > [Rust] [DataFusion] Fix benchmarks > -- > > Key: ARROW-10293 > URL: https://issues.apache.org/jira/browse/ARROW-10293 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > They are only benchmarking planning, not execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10145) [C++][Dataset] Integer-like partition field values outside int32 range error on reading
[ https://issues.apache.org/jira/browse/ARROW-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman reassigned ARROW-10145: Assignee: Ben Kietzman > [C++][Dataset] Integer-like partition field values outside int32 range error > on reading > --- > > Key: ARROW-10145 > URL: https://issues.apache.org/jira/browse/ARROW-10145 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Joris Van den Bossche >Assignee: Ben Kietzman >Priority: Major > Labels: dataset > Fix For: 2.0.1 > > > From > https://stackoverflow.com/questions/64137664/how-to-override-type-inference-for-partition-columns-in-hive-partitioned-dataset > Small reproducer: > {code} > import pyarrow as pa > import pyarrow.parquet as pq > table = pa.table({'part': [3760212050]*10, 'col': range(10)}) > pq.write_to_dataset(table, "test_int64_partition", partition_cols=['part']) > In [35]: pq.read_table("test_int64_partition/") > ... > ArrowInvalid: error parsing '3760212050' as scalar of type int32 > In ../src/arrow/scalar.cc, line 333, code: VisitTypeInline(*type_, this) > In ../src/arrow/dataset/partition.cc, line 218, code: > (_error_or_value26).status() > In ../src/arrow/dataset/partition.cc, line 229, code: > (_error_or_value27).status() > In ../src/arrow/dataset/discovery.cc, line 256, code: > (_error_or_value17).status() > In [36]: pq.read_table("test_int64_partition/", use_legacy_dataset=True) > Out[36]: > pyarrow.Table > col: int64 > part: dictionary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10298) [Rust] Incorrect offset handling in iterator over dictionary keys
Jörn Horstmann created ARROW-10298: -- Summary: [Rust] Incorrect offset handling in iterator over dictionary keys Key: ARROW-10298 URL: https://issues.apache.org/jira/browse/ARROW-10298 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jörn Horstmann The NullableIterator used by DictionaryArray.keys calls ArrayData.is_null without taking the offset of that ArrayData into account. It would probably be better if ArrayData itself handled the offset in that method. The iterator implementation could now also be replaced with the PrimitiveIter that was recently added -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark
[ https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213024#comment-17213024 ] Uwe Korn commented on ARROW-10276: -- According to the Spark documentation, you need {{pyarrow==0.8.0}}: http://spark.apache.org/docs/2.4.5/sql-pyspark-pandas-with-arrow.html#ensure-pyarrow-installed So this seems rather a mismatch in installed {{pyarrow}} versions then actually related to Armv7. > Armv7 orc and flight not supported for build. Compat error on using with spark > -- > > Key: ARROW-10276 > URL: https://issues.apache.org/jira/browse/ARROW-10276 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: utsav >Priority: Major > Attachments: arrow_compat_error, build_pip_wheel.sh, > dpu_stream_spark.ipynb, get_arrow_and_create_venv.sh, run_build.sh > > > I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have > tried to use it for the raspberry pi 3 without luck in previous posts. > I figured out how to successfully build it for armv7 using the script below > but cannot use orc and flight flags. People had looked into it in ARROW-8420 > but I don't know if they faced these issues. > I tried converting a spark dataframe to pandas using pyarrow but now it > complains about a compat feature. I have attached images below > Any help would be appreciated. Thanks > Spark Version: 2.4.5. > The code is as follows: > ``` > import pandas as pd > df_pd = df.toPandas() > npArr = df_pd.to_numpy() > ``` > The error is as follows:- > ``` > /opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas > attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is > set to true; however, failed by the reason below: > module 'pyarrow' has no attribute 'compat' > Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' > is set to true. > warnings.warn(msg) > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10297) [Rust] Parameter for parquet-read to output data in json format
[ https://issues.apache.org/jira/browse/ARROW-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörn Horstmann reassigned ARROW-10297: -- Assignee: Jörn Horstmann > [Rust] Parameter for parquet-read to output data in json format > --- > > Key: ARROW-10297 > URL: https://issues.apache.org/jira/browse/ARROW-10297 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Minor > > When analyzing data related issues I found it really helpful to filter or > portprocess the contents of parquet files on the command line using jq > (https://stedolan.github.io/jq/manual/). > Currently the output of parquet-read is in a custom json-like format, I > propose to add an optional flag that outputs the contents as json using the > serde_json library. This should probably be behind a feature gate to avoid > adding the dependency for everyone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10263) [C++][Compute] Improve numerical stability of variances merging
[ https://issues.apache.org/jira/browse/ARROW-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-10263. Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8437 [https://github.com/apache/arrow/pull/8437] > [C++][Compute] Improve numerical stability of variances merging > --- > > Key: ARROW-10263 > URL: https://issues.apache.org/jira/browse/ARROW-10263 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > For chunked array, variance kernel needs to merge variances. > Tested with two single value chunk, [400800490], [400800400]. > The merged variance is 3872. If treated as single array with two values, the > variance is 3904, same as numpy outputs. > So current merging method is not stable in extreme cases when chunks are very > short and with approximate mean values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10297) [Rust] Parameter for parquet-read to output data in json format
Jörn Horstmann created ARROW-10297: -- Summary: [Rust] Parameter for parquet-read to output data in json format Key: ARROW-10297 URL: https://issues.apache.org/jira/browse/ARROW-10297 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Jörn Horstmann When analyzing data related issues I found it really helpful to filter or portprocess the contents of parquet files on the command line using jq (https://stedolan.github.io/jq/manual/). Currently the output of parquet-read is in a custom json-like format, I propose to add an optional flag that outputs the contents as json using the serde_json library. This should probably be behind a feature gate to avoid adding the dependency for everyone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10296) [R] Data saved as integer64 loaded as integer
[ https://issues.apache.org/jira/browse/ARROW-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ofek Shilon updated ARROW-10296: Description: {{> v <- bit64::as.integer64(1:10)}} {{> df <- data.frame(v=v)}} {{> class(df$v)}} {{[1] "*integer64*"}} {{> arrow::write_feather(df, "./tmp")}} {{> df2 <- arrow::read_feather("./tmp")}} {{> class(df2$v)}} {{[1] "*integer*"}} was: {{> v <- bit64::as.integer64(1:10)}} {{> v <- as.integer64(1:10)}} {{> df <- data.frame(v=v)}} {{> class(df$v)}} {{[1] "*integer64*"}} {{> arrow::write_feather(df, "./tmp")}} {{> df2 <- arrow::read_feather("./tmp")}} {{> class(df2$v)}} {{[1] "*integer*"}} > [R] Data saved as integer64 loaded as integer > - > > Key: ARROW-10296 > URL: https://issues.apache.org/jira/browse/ARROW-10296 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 1.0.1 > Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 > full sessionIfno(): > R version 3.6.1 (2019-07-05) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19041) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 > crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 > [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 > rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 > [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 > purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 > [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 >Reporter: Ofek Shilon >Priority: Major > > {{> v <- bit64::as.integer64(1:10)}} > {{> df <- data.frame(v=v)}} > {{> class(df$v)}} > {{[1] "*integer64*"}} > {{> arrow::write_feather(df, "./tmp")}} > {{> df2 <- arrow::read_feather("./tmp")}} > {{> class(df2$v)}} > {{[1] "*integer*"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10296) [R] Data saved as integer64 loaded as integer
[ https://issues.apache.org/jira/browse/ARROW-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ofek Shilon updated ARROW-10296: Description: {{> v <- bit64::as.integer64(1:10)}} {{> v <- as.integer64(1:10)}} {{> df <- data.frame(v=v)}} {{> class(df$v)}} {{[1] "*integer64*"}} {{> arrow::write_feather(df, "./tmp")}} {{> df2 <- arrow::read_feather("./tmp")}} {{> class(df2$v)}} {{[1] "*integer*"}} was: {{> v <- bit64::as.integer64(1:10)}} {{ > v <- as.integer64(1:10)}} {{ > df <- data.frame(v=v)}} {{ > class(df$v)}} {{ [1] "*integer64*"}} {{ > arrow::write_feather(df, "./tmp")}} {{ > df2 <- arrow::read_feather("./tmp")}} {{ > class(df2$v)}} {{ [1] "*integer*"}} > [R] Data saved as integer64 loaded as integer > - > > Key: ARROW-10296 > URL: https://issues.apache.org/jira/browse/ARROW-10296 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 1.0.1 > Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 > full sessionIfno(): > R version 3.6.1 (2019-07-05) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19041) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 > crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 > [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 > rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 > [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 > purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 > [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 >Reporter: Ofek Shilon >Priority: Major > > {{> v <- bit64::as.integer64(1:10)}} > {{> v <- as.integer64(1:10)}} > {{> df <- data.frame(v=v)}} > {{> class(df$v)}} > {{[1] "*integer64*"}} > {{> arrow::write_feather(df, "./tmp")}} > {{> df2 <- arrow::read_feather("./tmp")}} > {{> class(df2$v)}} > {{[1] "*integer*"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10296) [R] Data saved as integer64 loaded as integer
[ https://issues.apache.org/jira/browse/ARROW-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ofek Shilon updated ARROW-10296: Description: {{> v <- bit64::as.integer64(1:10)}} {{ > v <- as.integer64(1:10)}} {{ > df <- data.frame(v=v)}} {{ > class(df$v)}} {{ [1] "*integer64*"}} {{ > arrow::write_feather(df, "./tmp")}} {{ > df2 <- arrow::read_feather("./tmp")}} {{ > class(df2$v)}} {{ [1] "*integer*"}} was: > v <- bit64::as.integer64(1:10) {{ {{> v <- as.integer64(1:10) {{ {{> df <- data.frame(v=v) {{> class(df$v)}} {{[1] "*integer64*"}} {{ > arrow::write_feather(df, "./tmp")}} {{ {{> df2 <- arrow::read_feather("./tmp") {{ {{> class(df2$v) {{ {{[1] "*integer*" > [R] Data saved as integer64 loaded as integer > - > > Key: ARROW-10296 > URL: https://issues.apache.org/jira/browse/ARROW-10296 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 1.0.1 > Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 > full sessionIfno(): > R version 3.6.1 (2019-07-05) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19041) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 > crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 > [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 > rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 > [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 > purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 > [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 >Reporter: Ofek Shilon >Priority: Major > > {{> v <- bit64::as.integer64(1:10)}} > {{ > v <- as.integer64(1:10)}} > {{ > df <- data.frame(v=v)}} > {{ > class(df$v)}} > {{ [1] "*integer64*"}} > {{ > arrow::write_feather(df, "./tmp")}} > {{ > df2 <- arrow::read_feather("./tmp")}} > {{ > class(df2$v)}} > {{ [1] "*integer*"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10296) [R] Data saved as integer64 loaded as integer
[ https://issues.apache.org/jira/browse/ARROW-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ofek Shilon updated ARROW-10296: Description: > v <- bit64::as.integer64(1:10) {{ {{> v <- as.integer64(1:10) {{ {{> df <- data.frame(v=v) {{> class(df$v)}} {{[1] "*integer64*"}} {{ > arrow::write_feather(df, "./tmp")}} {{ {{> df2 <- arrow::read_feather("./tmp") {{ {{> class(df2$v) {{ {{[1] "*integer*" was: {{> v <- bit64::as.integer64(1:10)}} {{> v <- as.integer64(1:10)}} {{> df <- data.frame(v=v)}} {{> class(df$v) }}{{[1] "*integer64*" > arrow::write_feather(df, "./tmp")}} {{> df2 <- arrow::read_feather("./tmp")}} {{> class(df2$v)}} {{[1] "*integer*"}} > [R] Data saved as integer64 loaded as integer > - > > Key: ARROW-10296 > URL: https://issues.apache.org/jira/browse/ARROW-10296 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 1.0.1 > Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 > full sessionIfno(): > R version 3.6.1 (2019-07-05) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19041) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 > crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 > [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 > rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 > [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 > purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 > [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 >Reporter: Ofek Shilon >Priority: Major > > > v <- bit64::as.integer64(1:10) > {{ {{> v <- as.integer64(1:10) > {{ {{> df <- data.frame(v=v) > {{> class(df$v)}} > {{[1] "*integer64*"}} > {{ > arrow::write_feather(df, "./tmp")}} > {{ {{> df2 <- arrow::read_feather("./tmp") > {{ {{> class(df2$v) > {{ {{[1] "*integer*" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10296) [R] Data saved as integer64 loaded as integer
Ofek Shilon created ARROW-10296: --- Summary: [R] Data saved as integer64 loaded as integer Key: ARROW-10296 URL: https://issues.apache.org/jira/browse/ARROW-10296 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 1.0.1 Environment: R3.6.1, arrow 1.0.1, bit64 4.0.5 full sessionIfno(): R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] Rcpp_1.0.5 fansi_0.4.1 arrow_1.0.1 dplyr_1.0.2 crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 lifecycle_0.2.0 [9] magrittr_1.5 pillar_1.4.6 cli_2.0.2rlang_0.4.7 rstudioapi_0.11 generics_0.0.2 vctrs_0.3.4 ellipsis_0.3.1 [17] tools_3.6.1 bit64_4.0.5 feather_0.3.5glue_1.4.2 purrr_0.3.4 bit_4.0.4hms_0.5.3compiler_3.6.1 [25] pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3 Reporter: Ofek Shilon {{> v <- bit64::as.integer64(1:10)}} {{> v <- as.integer64(1:10)}} {{> df <- data.frame(v=v)}} {{> class(df$v) }}{{[1] "*integer64*" > arrow::write_feather(df, "./tmp")}} {{> df2 <- arrow::read_feather("./tmp")}} {{> class(df2$v)}} {{[1] "*integer*"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10295) [Rist] [DataFusion] Simplify accumulators
[ https://issues.apache.org/jira/browse/ARROW-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10295: --- Labels: pull-request-available (was: ) > [Rist] [DataFusion] Simplify accumulators > - > > Key: ARROW-10295 > URL: https://issues.apache.org/jira/browse/ARROW-10295 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Replace Rc> by Box<>. -- This message was sent by Atlassian Jira (v8.3.4#803005)