[jira] [Commented] (ARROW-10226) [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file
[ https://issues.apache.org/jira/browse/ARROW-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211833#comment-17211833 ] Josh Taylor commented on ARROW-10226: - I'm seeing the same issue of the initial title, which was that it never completes. Test file: [https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing] (This is from snowflakes example data, exported as a single file parquet file, same thing happens for many files). Code that fails (both group by with sum of columns and the builder pattern doesn't work): https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs > [Rust] [Parquet] Parquet reader reading wrong columns in some batches within > a parquet file > --- > > Key: ARROW-10226 > URL: https://issues.apache.org/jira/browse/ARROW-10226 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 2.0.0 > > > I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS) and > when I try and run the TPC-H benchmark, it never completes and eventually > uses up all 64 GB RAM. > I can run Spark against the data set and the query completes in 24 seconds, > which IIRC is how long it took before. > It is possible that something is odd on my environment, but it is also > possible/likely that this is a real bug. > I am investigating this and will update the Jira once I know more. > I also went back to old commits that were working for me before and they show > the same issue so I don't think this is related to a recent code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10274) [Rust] arithmetic without SIMD does unnecesary copy
Ritchie created ARROW-10274: --- Summary: [Rust] arithmetic without SIMD does unnecesary copy Key: ARROW-10274 URL: https://issues.apache.org/jira/browse/ARROW-10274 Project: Apache Arrow Issue Type: Improvement Reporter: Ritchie The arithmetic kernels that don't use SIMD create a `vec` in memory and later copy that data into a Buffer. Maybe we could directly write the arithmetic result to a mutable buffer and prevent this redundant copy? {code:java} let values = (0..left.len()) .map(|i| op(left.value(i), right.value(i))) .collect::>(); let data = ArrayData::new( T::get_data_type(), left.len(), None, null_bit_buffer, 0, vec![Buffer::from(values.to_byte_slice())], vec![], );{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")
[ https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211816#comment-17211816 ] Josh Taylor commented on ARROW-10242: - I couldn't get this to fail again, I rebuilt everything and the basic querying seems to work now. Thanks! > Parquet reader thread terminated due to error: ExecutionError("sending on a > disconnected channel") > -- > > Key: ARROW-10242 > URL: https://issues.apache.org/jira/browse/ARROW-10242 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Affects Versions: 2.0.0 >Reporter: Josh Taylor >Assignee: Andy Grove >Priority: Major > > *Running the latest code from github for datafusion & parquet.* > When trying to read a directory of around ~210 parquet files (3.2gb total, > each file around 13-18mb), doing the following: > {code:java} > let mut ctx = ExecutionContext::new(); > // register parquet file with the execution context > ctx.register_parquet( > "something", > "/home/josh/dev/pat/fff/" > )?; > // execute the query > let df = ctx.sql( > "select * from something", > )?; > let results = df.collect().await?; > > {code} > I get the following error shown ~204 times: > {code:java} > Parquet reader thread terminated due to error: ExecutionError("sending on a > disconnected channel"){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")
[ https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Taylor closed ARROW-10242. --- Resolution: Fixed > Parquet reader thread terminated due to error: ExecutionError("sending on a > disconnected channel") > -- > > Key: ARROW-10242 > URL: https://issues.apache.org/jira/browse/ARROW-10242 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Affects Versions: 2.0.0 >Reporter: Josh Taylor >Assignee: Andy Grove >Priority: Major > > *Running the latest code from github for datafusion & parquet.* > When trying to read a directory of around ~210 parquet files (3.2gb total, > each file around 13-18mb), doing the following: > {code:java} > let mut ctx = ExecutionContext::new(); > // register parquet file with the execution context > ctx.register_parquet( > "something", > "/home/josh/dev/pat/fff/" > )?; > // execute the query > let df = ctx.sql( > "select * from something", > )?; > let results = df.collect().await?; > > {code} > I get the following error shown ~204 times: > {code:java} > Parquet reader thread terminated due to error: ExecutionError("sending on a > disconnected channel"){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10273) [CI][Homebrew] Fix "brew audit" usage
[ https://issues.apache.org/jira/browse/ARROW-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10273: --- Labels: pull-request-available (was: ) > [CI][Homebrew] Fix "brew audit" usage > - > > Key: ARROW-10273 > URL: https://issues.apache.org/jira/browse/ARROW-10273 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10273) [CI][Homebrew] Fix "brew audit" usage
Kouhei Sutou created ARROW-10273: Summary: [CI][Homebrew] Fix "brew audit" usage Key: ARROW-10273 URL: https://issues.apache.org/jira/browse/ARROW-10273 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10266) [CI][macOS] Ensure using Python 3.8 with Homebrew
[ https://issues.apache.org/jira/browse/ARROW-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-10266: Fix Version/s: 2.0.0 > [CI][macOS] Ensure using Python 3.8 with Homebrew > - > > Key: ARROW-10266 > URL: https://issues.apache.org/jira/browse/ARROW-10266 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10266) [CI][macOS] Ensure using Python 3.8 with Homebrew
[ https://issues.apache.org/jira/browse/ARROW-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10266. - Resolution: Fixed > [CI][macOS] Ensure using Python 3.8 with Homebrew > - > > Key: ARROW-10266 > URL: https://issues.apache.org/jira/browse/ARROW-10266 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10272) [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
[ https://issues.apache.org/jira/browse/ARROW-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10272: --- Labels: pull-request-available (was: ) > [Packaging][Python] Pin newer multibuild version to avoid updating homebrew > --- > > Key: ARROW-10272 > URL: https://issues.apache.org/jira/browse/ARROW-10272 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Build failure: > https://travis-ci.org/github/ursa-labs/crossbow/builds/734324594 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10272) [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
[ https://issues.apache.org/jira/browse/ARROW-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10272. - Resolution: Fixed Issue resolved by pull request 8431 [https://github.com/apache/arrow/pull/8431] > [Packaging][Python] Pin newer multibuild version to avoid updating homebrew > --- > > Key: ARROW-10272 > URL: https://issues.apache.org/jira/browse/ARROW-10272 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Fix For: 2.0.0 > > > Build failure: > https://travis-ci.org/github/ursa-labs/crossbow/builds/734324594 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10272) [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
Krisztian Szucs created ARROW-10272: --- Summary: [Packaging][Python] Pin newer multibuild version to avoid updating homebrew Key: ARROW-10272 URL: https://issues.apache.org/jira/browse/ARROW-10272 Project: Apache Arrow Issue Type: Improvement Components: Packaging, Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 2.0.0 Build failure: https://travis-ci.org/github/ursa-labs/crossbow/builds/734324594 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9553) [Rust] Release script doesn't bump parquet crate's arrow dependency version
[ https://issues.apache.org/jira/browse/ARROW-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-9553. Resolution: Fixed Issue resolved by pull request 8429 [https://github.com/apache/arrow/pull/8429] > [Rust] Release script doesn't bump parquet crate's arrow dependency version > --- > > Key: ARROW-9553 > URL: https://issues.apache.org/jira/browse/ARROW-9553 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 1.0.0 >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > After rebasing the master the rust builds have started to fail. > The solution is to bump a version number gere > https://github.com/apache/arrow/pull/7829 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader
[ https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10249: --- Labels: pull-request-available (was: ) > [Rust]: Support Dictionary types for ListArrays in arrow json reader > > > Key: ARROW-10249 > URL: https://issues.apache.org/jira/browse/ARROW-10249 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, dictionary types for listarrays are not supported in Arrow JSON > reader. It would be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9553) [Rust] Release script doesn't bump parquet crate's arrow dependency version
[ https://issues.apache.org/jira/browse/ARROW-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9553: -- Labels: pull-request-available (was: ) > [Rust] Release script doesn't bump parquet crate's arrow dependency version > --- > > Key: ARROW-9553 > URL: https://issues.apache.org/jira/browse/ARROW-9553 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 1.0.0 >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > After rebasing the master the rust builds have started to fail. > The solution is to bump a version number gere > https://github.com/apache/arrow/pull/7829 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10100) [C++][Dataset] Ability to read/subset a ParquetFileFragment with given set of row group ids
[ https://issues.apache.org/jira/browse/ARROW-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10100. - Resolution: Fixed Issue resolved by pull request 8301 [https://github.com/apache/arrow/pull/8301] > [C++][Dataset] Ability to read/subset a ParquetFileFragment with given set of > row group ids > --- > > Key: ARROW-10100 > URL: https://issues.apache.org/jira/browse/ARROW-10100 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: dataset, dataset-dask-integration, pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > From discussion at > https://github.com/dask/dask/pull/6534#issuecomment-698723009 (dask using the > dataset API in their parquet reader), it might be useful to somehow "subset" > or read a subset of a ParquetFileFragment for a specific set of row group ids. > Use cases: > * Read only a set of row groups ids (this is similar as > {{ParquetFile.read_row_groups}}), eg because you want to control the size of > the resulting table by reading subsets of row groups > * Get a ParquetFileFragment with a subset of row groups (eg based on a > filter) to then eg get the statistics of only those row groups > The first case could for example be solved by adding a {{row_groups}} keyword > to {{ParquetFileFragment.to_table}} (but, this is then a keyword specific to > the parquet format, and we should then probably also add it to {{scan}} et > al). > The second case is something you can in principle do yourself manually by > recreating a fragment with {{fragment.format.make_fragment(fragment.path, > ..., row_groups=[...])}}. However, this is a) a bit cumbersome and b) > statistics might need to be parsed again? > The statistics of a set of filtered row groups could also be obtained by > using {{split_by_row_group(filter)}} (and then get the statistics of each of > the fragments), but if you then want a single fragment, you need to recreate > a fragment with the obtained row group ids. > So one idea I have now (but mostly brainstorming here). Would it be useful to > have a method to create a "subsetted" ParquetFileFragment, either based on a > list of row group ids ({{fragment.subset(row_groups=[...])}} or either based > on a filter ({{fragment.subset(filter=...)}}, which would be equivalent as > split_by_row_group+recombining into a single fragment) ? > cc [~bkietz] [~rjzamora] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project
Ritchie created ARROW-10271: --- Summary: [Rust] packed_simd is broken and continued under a new project Key: ARROW-10271 URL: https://issues.apache.org/jira/browse/ARROW-10271 Project: Apache Arrow Issue Type: Bug Reporter: Ritchie The dependency doesn't compile on newer versions of nightly. This is also known by the (new) project maintainers. Due to complications they continued the project under a new name: `packed_simd_2`. packed_simd = { version = "0.3.4", package = "packed_simd_2" } See: https://github.com/rust-lang/packed_simd -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7957) [Python] ParquetDataset cannot take HadoopFileSystem as filesystem
[ https://issues.apache.org/jira/browse/ARROW-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-7957. Resolution: Fixed Issue resolved by pull request 8414 [https://github.com/apache/arrow/pull/8414] > [Python] ParquetDataset cannot take HadoopFileSystem as filesystem > -- > > Key: ARROW-7957 > URL: https://issues.apache.org/jira/browse/ARROW-7957 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.16.0 >Reporter: Catherine >Assignee: Joris Van den Bossche >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {{from pyarrow.fs import HadoopFileSystem}} > {{import pyarrow.parquet as pq}} > > {{file_name = "hdfs://localhost:9000/test/file_name.pq"}} > {{hdfs, path = HadoopFileSystem.from_uri(file_name)}} > {{dataset = pq.ParquetDataset(file_name, filesystem=hdfs)}} > > has error: > {{OSError: Unrecognized filesystem: 'pyarrow._hdfs.HadoopFileSystem'>}} > > When I tried using the deprecated {{HadoopFileSystem}}: > {{import pyarrow}} > {{import pyarrow.parquet as pq}} > > {{file_name = "hdfs://localhost:9000/test/file_name.pq"}} > {{hdfs = pyarrow.hdfs.connect('localhost', 9000)}} > {{dataset = pq.ParquetDataset(file_names, filesystem=hdfs)}} > {{pa_schema = dataset.schema.to_arrow_schema()}} > {{pieces = dataset.pieces}} > {{for piece in pieces: }} > {{ print(piece.path)}} > > {{piece.path}} lose the {{hdfs://localhost:9000}} prefix. > > I think {{ParquetDataset}} should accept {{pyarrow.fs.}}{{HadoopFileSystem as > filesystem?}} > And {{piece.path}} should have the prefix? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6043) [Python] Array equals returns incorrectly if NaNs are in arrays
[ https://issues.apache.org/jira/browse/ARROW-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-6043: --- Fix Version/s: (was: 2.0.0) 3.0.0 > [Python] Array equals returns incorrectly if NaNs are in arrays > --- > > Key: ARROW-6043 > URL: https://issues.apache.org/jira/browse/ARROW-6043 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.1 >Reporter: Keith Kraus >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:python} > import numpy as np > import pyarrow as pa > data = [0, 1, np.nan, None, 4] > arr1 = pa.array(data) > arr2 = pa.array(data) > pa.Array.equals(arr1, arr2) > {code} > Unsure if this is expected behavior, but in Arrow 0.12.1 this returned `True` > as compared to `False` in 0.14.1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10240) [Rust] [Datafusion] Optionally load tpch data into memory before running benchmark query
[ https://issues.apache.org/jira/browse/ARROW-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-10240. Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8409 [https://github.com/apache/arrow/pull/8409] > [Rust] [Datafusion] Optionally load tpch data into memory before running > benchmark query > > > Key: ARROW-10240 > URL: https://issues.apache.org/jira/browse/ARROW-10240 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The tpch benchmark runtime seems to be dominated by csv parsing code and it > is really difficult to see any performance hotspots related to actual query > execution in a flamegraph. > With the date in memory and more iterations it should be easier to profile > and find bottlenecks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10251) [Rust] [DataFusion] MemTable::load() should load partitions in parallel
[ https://issues.apache.org/jira/browse/ARROW-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10251: --- Labels: beginner pull-request-available (was: beginner) > [Rust] [DataFusion] MemTable::load() should load partitions in parallel > --- > > Key: ARROW-10251 > URL: https://issues.apache.org/jira/browse/ARROW-10251 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Labels: beginner, pull-request-available > Fix For: 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > MemTable::load() should load partitions in parallel using async tasks, rather > than loading one partition at a time. > Also, we should make batch size configurable. It is currently hard-coded to > 1024*1024 which can be quite inefficient. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10270) [R] Fix CSV timestamp_parsers test on R-devel
Neal Richardson created ARROW-10270: --- Summary: [R] Fix CSV timestamp_parsers test on R-devel Key: ARROW-10270 URL: https://issues.apache.org/jira/browse/ARROW-10270 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 2.0.0 Apparently there is a change in the development version of R with respect to timezone handling. I suspect it is this: https://github.com/wch/r-source/blob/trunk/doc/NEWS.Rd#L296-L300 It causes this failure: {code} ── 1. Failure: read_csv_arrow() can read timestamps (@test-csv.R#216) ─ `tbl` not equal to `df`. Component "time": 'tzone' attributes are inconsistent ('UTC' and '') ── 2. Failure: read_csv_arrow() can read timestamps (@test-csv.R#219) ─ `tbl` not equal to `df`. Component "time": 'tzone' attributes are inconsistent ('UTC' and '') {code} This needs to be fixed for the CRAN release because they check on the devel version. But it doesn't need to block the 2.0 release candidate because I can (at minimum) skip these tests before submitting to CRAN (FYI [~kszucs]) I'll also add a CI job to test on R-devel. I just removed 2 R jobs so we can afford to add one back. cc [~romainfrancois] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10267) [Python] Skip flight test if disable_server_verification feature is not available
[ https://issues.apache.org/jira/browse/ARROW-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10267. - Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8427 [https://github.com/apache/arrow/pull/8427] > [Python] Skip flight test if disable_server_verification feature is not > available > - > > Key: ARROW-10267 > URL: https://issues.apache.org/jira/browse/ARROW-10267 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Our nightly builds are failing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader
[ https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut reassigned ARROW-10249: Assignee: Mahmut Bulut > [Rust]: Support Dictionary types for ListArrays in arrow json reader > > > Key: ARROW-10249 > URL: https://issues.apache.org/jira/browse/ARROW-10249 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > Currently, dictionary types for listarrays are not supported in Arrow JSON > reader. It would be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10269) [Rust] Update nightly: Oct 2020 Edition
Neville Dipale created ARROW-10269: -- Summary: [Rust] Update nightly: Oct 2020 Edition Key: ARROW-10269 URL: https://issues.apache.org/jira/browse/ARROW-10269 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Neville Dipale We should update to a more recent nighly after the 2.0.0 release. It carries some clippy annoyances, which will mean that I have to revert much of what I did around float comparisons. Might also be preferable to do this sooner, so that we can complete the clippy integration and throw away the carrot in favour of the stick. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10268) [Rust] Support writing dictionaries to IPC file and stream
Neville Dipale created ARROW-10268: -- Summary: [Rust] Support writing dictionaries to IPC file and stream Key: ARROW-10268 URL: https://issues.apache.org/jira/browse/ARROW-10268 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale We currently do not support writing dictionary arrays to the IPC file and stream format. When this is supported, we can test the integration with other implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10267) [Python] Skip flight test if disable_server_verification feature is not available
[ https://issues.apache.org/jira/browse/ARROW-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10267: --- Labels: pull-request-available (was: ) > [Python] Skip flight test if disable_server_verification feature is not > available > - > > Key: ARROW-10267 > URL: https://issues.apache.org/jira/browse/ARROW-10267 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Our nightly builds are failing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10267) [Python] Skip flight test if disable_server_verification feature is not available
Krisztian Szucs created ARROW-10267: --- Summary: [Python] Skip flight test if disable_server_verification feature is not available Key: ARROW-10267 URL: https://issues.apache.org/jira/browse/ARROW-10267 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Krisztian Szucs Our nightly builds are failing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10265) [CI] Use smaler build when cache doesn't exit on Travis CI
[ https://issues.apache.org/jira/browse/ARROW-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-10265. -- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8424 [https://github.com/apache/arrow/pull/8424] > [CI] Use smaler build when cache doesn't exit on Travis CI > -- > > Key: ARROW-10265 > URL: https://issues.apache.org/jira/browse/ARROW-10265 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10266) [CI][macOS] Ensure using Python 3.8 with Homebrew
[ https://issues.apache.org/jira/browse/ARROW-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10266: --- Labels: pull-request-available (was: ) > [CI][macOS] Ensure using Python 3.8 with Homebrew > - > > Key: ARROW-10266 > URL: https://issues.apache.org/jira/browse/ARROW-10266 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10266) [CI][macOS] Ensure using Python 3.8 with Homebrew
Kouhei Sutou created ARROW-10266: Summary: [CI][macOS] Ensure using Python 3.8 with Homebrew Key: ARROW-10266 URL: https://issues.apache.org/jira/browse/ARROW-10266 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10105) [FlightRPC] Add client option to disable certificate validation with TLS
[ https://issues.apache.org/jira/browse/ARROW-10105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10105. - Resolution: Fixed > [FlightRPC] Add client option to disable certificate validation with TLS > > > Key: ARROW-10105 > URL: https://issues.apache.org/jira/browse/ARROW-10105 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, FlightRPC, Java, Python >Reporter: James Duong >Assignee: James Duong >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Users of Flight may want to disable certificate validation if they want to > only use encryption. A use case might be that the Flight server uses a > self-signed certificate and doesn't distribute a certificate for clients to > use. > This feature would be to add an explicit option to FlightClient.Builder to > disable certificate validation. Note that this should not happen implicitly > if a client uses a TLS location, but does not set a certificate. The client > should explicitly set this option so that they are fully aware that they are > making a connection with reduced security. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9704) [Java] TestEndianness.testLittleEndian fails on big endian platform
[ https://issues.apache.org/jira/browse/ARROW-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-9704: --- Fix Version/s: (was: 2.0.0) 3.0.0 > [Java] TestEndianness.testLittleEndian fails on big endian platform > --- > > Key: ARROW-9704 > URL: https://issues.apache.org/jira/browse/ARROW-9704 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {{TestEndianness.testLittleEndian}} assumes that the data layout of int is > little-endian. Thus, this test fails on a big-endian platform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10265) [CI] Use smaler build when cache doesn't exit on Travis CI
[ https://issues.apache.org/jira/browse/ARROW-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10265: --- Labels: pull-request-available (was: ) > [CI] Use smaler build when cache doesn't exit on Travis CI > -- > > Key: ARROW-10265 > URL: https://issues.apache.org/jira/browse/ARROW-10265 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10265) [CI] Use smaler build when cache doesn't exit on Travis CI
Kouhei Sutou created ARROW-10265: Summary: [CI] Use smaler build when cache doesn't exit on Travis CI Key: ARROW-10265 URL: https://issues.apache.org/jira/browse/ARROW-10265 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9952) [Python] Use pyarrow.dataset writing for pq.write_to_dataset
[ https://issues.apache.org/jira/browse/ARROW-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-9952. Resolution: Fixed Issue resolved by pull request 8412 [https://github.com/apache/arrow/pull/8412] > [Python] Use pyarrow.dataset writing for pq.write_to_dataset > > > Key: ARROW-9952 > URL: https://issues.apache.org/jira/browse/ARROW-9952 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Now ARROW-9658 and ARROW-9893 are in, we can explore using the > {{pyarrow.dataset}} writing capabilities in {{parquet.write_to_dataset}}. > Similarly as was done in {{pq.read_table}}, we could initially have a keyword > to switch between both implementations, eventually defaulting to the new > datasets one, and to deprecated the old (inefficient) python implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10256) [C++][Flight] Disable -Werror carefully
[ https://issues.apache.org/jira/browse/ARROW-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-10256. -- Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8419 [https://github.com/apache/arrow/pull/8419] > [C++][Flight] Disable -Werror carefully > --- > > Key: ARROW-10256 > URL: https://issues.apache.org/jira/browse/ARROW-10256 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10261) [Rust] [BREAKING] Lists should take Field instead of DataType
[ https://issues.apache.org/jira/browse/ARROW-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211667#comment-17211667 ] Andrew Lamb commented on ARROW-10261: - [~nevi_me] -- this proposal makes sense to me and I think it brings the Rust implementation closer to the C++ implementation, which is a good thing. >From my (cursory) reading of the C++, it appears [source >link|https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L539-L546] > that Lists in C++ use `Field` rather than `DataType` do describe each list >item's type as well > [Rust] [BREAKING] Lists should take Field instead of DataType > - > > Key: ARROW-10261 > URL: https://issues.apache.org/jira/browse/ARROW-10261 > Project: Apache Arrow > Issue Type: Sub-task > Components: Integration, Rust >Affects Versions: 1.0.1 >Reporter: Neville Dipale >Priority: Major > > There is currently no way of tracking nested field metadata on lists. For > example, if a list's children are nullable, there's no way of telling just by > looking at the Field. > This causes problems with integration testing, and also affects Parquet > roundtrips. > I propose the breaking change of [Large|FixedSize]List taking a Field instead > of Box, as this will overcome this issue, and ensure that the Rust > implementation passes integration tests. > CC [~andygrove] [~jorgecarleitao] [~alamb] [~jhorstmann] ([~carols10cents] > as this addresses some of the roundtrip failures). > I'm leaning towards this landing in 3.0.0, as I'd love for us to have > completed or made significant traction on the Arrow Parquet writer (and > reader), and integration testing, by then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10260) [Python] Missing MapType to Pandas dtype
[ https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10260. - Resolution: Fixed Issue resolved by pull request 8422 [https://github.com/apache/arrow/pull/8422] > [Python] Missing MapType to Pandas dtype > > > Key: ARROW-10260 > URL: https://issues.apache.org/jira/browse/ARROW-10260 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Assignee: Derek Marsh >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype > mapping for {{to_pandas_dtype()}} > > {code:java} > In [2]: d = pa.map_(pa.int64(), pa.float64()) >In [3]: d.to_pandas_dtype() > > > --- > NotImplementedError Traceback (most recent call last) > in > > 1 > d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi > in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10261) [Rust] [BREAKING] Lists should take Field instead of DataType
[ https://issues.apache.org/jira/browse/ARROW-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211654#comment-17211654 ] Neville Dipale commented on ARROW-10261: [~jhorstmann] nullability should be determined by the overall field for consistency; as you could have 1000 batches of 1000 records, but only have say 5 nulls scattered around. The main issue is that if I have a non-nullable list, which in turn has a nullable struct with various child fields with differing nullability; I won't know if the struct is nullable, because I lose that information when only taking the field. Also, in the hypothetical case where the struct has some metadata of its own, it gets lost because we would only keep the DataType, and not other attributes such as dictionary or metadata (HashMap). Interestingly, looking at the CPP implementation, it looks like they still use List, but I can't see how they preserve the extra details that the Rust implementation is failing because of. [~apitrou] any ideas? > [Rust] [BREAKING] Lists should take Field instead of DataType > - > > Key: ARROW-10261 > URL: https://issues.apache.org/jira/browse/ARROW-10261 > Project: Apache Arrow > Issue Type: Sub-task > Components: Integration, Rust >Affects Versions: 1.0.1 >Reporter: Neville Dipale >Priority: Major > > There is currently no way of tracking nested field metadata on lists. For > example, if a list's children are nullable, there's no way of telling just by > looking at the Field. > This causes problems with integration testing, and also affects Parquet > roundtrips. > I propose the breaking change of [Large|FixedSize]List taking a Field instead > of Box, as this will overcome this issue, and ensure that the Rust > implementation passes integration tests. > CC [~andygrove] [~jorgecarleitao] [~alamb] [~jhorstmann] ([~carols10cents] > as this addresses some of the roundtrip failures). > I'm leaning towards this landing in 3.0.0, as I'd love for us to have > completed or made significant traction on the Arrow Parquet writer (and > reader), and integration testing, by then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10261) [Rust] [BREAKING] Lists should take Field instead of DataType
[ https://issues.apache.org/jira/browse/ARROW-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211650#comment-17211650 ] Jörn Horstmann commented on ARROW-10261: For my understanding, this is about metadata? So even if there are no null values in one batch or partition you want to mark the elements as potentially nullable? > [Rust] [BREAKING] Lists should take Field instead of DataType > - > > Key: ARROW-10261 > URL: https://issues.apache.org/jira/browse/ARROW-10261 > Project: Apache Arrow > Issue Type: Sub-task > Components: Integration, Rust >Affects Versions: 1.0.1 >Reporter: Neville Dipale >Priority: Major > > There is currently no way of tracking nested field metadata on lists. For > example, if a list's children are nullable, there's no way of telling just by > looking at the Field. > This causes problems with integration testing, and also affects Parquet > roundtrips. > I propose the breaking change of [Large|FixedSize]List taking a Field instead > of Box, as this will overcome this issue, and ensure that the Rust > implementation passes integration tests. > CC [~andygrove] [~jorgecarleitao] [~alamb] [~jhorstmann] ([~carols10cents] > as this addresses some of the roundtrip failures). > I'm leaning towards this landing in 3.0.0, as I'd love for us to have > completed or made significant traction on the Arrow Parquet writer (and > reader), and integration testing, by then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9945) [C++][Dataset] Refactor Expression::Assume to return a Result
[ https://issues.apache.org/jira/browse/ARROW-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-9945: --- Fix Version/s: (was: 2.0.0) 3.0.0 > [C++][Dataset] Refactor Expression::Assume to return a Result > - > > Key: ARROW-9945 > URL: https://issues.apache.org/jira/browse/ARROW-9945 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 1.0.0 >Reporter: Ben Kietzman >Assignee: Ben Kietzman >Priority: Major > Labels: dataset > Fix For: 3.0.0 > > > Expression::Assume can abort if the two expressions are not valid against a > single schema. This is not ideal since a schema is not always easily > available. The method should be able to fail gracefully in the case of a > best-effort simplification where validation against a schema is not desired. > https://github.com/apache/arrow/pull/8037#discussion_r475594117 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10252) [Python] Add option to skip inclusion of Arrow headers in Python installation
[ https://issues.apache.org/jira/browse/ARROW-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10252. - Fix Version/s: 2.0.0 Resolution: Fixed Issue resolved by pull request 8416 [https://github.com/apache/arrow/pull/8416] > [Python] Add option to skip inclusion of Arrow headers in Python installation > - > > Key: ARROW-10252 > URL: https://issues.apache.org/jira/browse/ARROW-10252 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We don't want to have them as part of the conda package as the single source > should be {{arrow-cpp}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4960) [R] Add crossbow task for r-arrow-feedstock
[ https://issues.apache.org/jira/browse/ARROW-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211621#comment-17211621 ] Krisztian Szucs commented on ARROW-4960: It doesn't seem required for 2.0 so updating the version. > [R] Add crossbow task for r-arrow-feedstock > --- > > Key: ARROW-4960 > URL: https://issues.apache.org/jira/browse/ARROW-4960 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, R >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We also have an R package on conda-forge now: > [https://github.com/conda-forge/r-arrow-feedstock] This should be tested > using crossbow as we do with the other packages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4960) [R] Add crossbow task for r-arrow-feedstock
[ https://issues.apache.org/jira/browse/ARROW-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-4960: --- Fix Version/s: (was: 2.0.0) 3.0.0 > [R] Add crossbow task for r-arrow-feedstock > --- > > Key: ARROW-4960 > URL: https://issues.apache.org/jira/browse/ARROW-4960 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, R >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We also have an R package on conda-forge now: > [https://github.com/conda-forge/r-arrow-feedstock] This should be tested > using crossbow as we do with the other packages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10230) [JS][Doc] JavaScript documentation fails to build
[ https://issues.apache.org/jira/browse/ARROW-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-10230. - Resolution: Fixed Issue resolved by pull request 8395 [https://github.com/apache/arrow/pull/8395] > [JS][Doc] JavaScript documentation fails to build > - > > Key: ARROW-10230 > URL: https://issues.apache.org/jira/browse/ARROW-10230 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, JavaScript >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Probably because of typedoc updates. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-3080) [Python] Unify Arrow to Python object conversion paths
[ https://issues.apache.org/jira/browse/ARROW-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-3080. -- Fix Version/s: (was: 3.0.0) 2.0.0 Resolution: Fixed Issue resolved by pull request 8349 [https://github.com/apache/arrow/pull/8349] > [Python] Unify Arrow to Python object conversion paths > -- > > Key: ARROW-3080 > URL: https://issues.apache.org/jira/browse/ARROW-3080 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Similar to ARROW-2814, we have inconsistent support for converting Arrow > nested types back to object sequences. For example, a list of structs fails > when calling {{to_pandas}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10260) [Python] Missing MapType to Pandas dtype
[ https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-10260: - Assignee: Derek Marsh > [Python] Missing MapType to Pandas dtype > > > Key: ARROW-10260 > URL: https://issues.apache.org/jira/browse/ARROW-10260 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Assignee: Derek Marsh >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype > mapping for {{to_pandas_dtype()}} > > {code:java} > In [2]: d = pa.map_(pa.int64(), pa.float64()) >In [3]: d.to_pandas_dtype() > > > --- > NotImplementedError Traceback (most recent call last) > in > > 1 > d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi > in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10248) [C++][Dataset] Dataset writing does not write schema metadata
[ https://issues.apache.org/jira/browse/ARROW-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-10248. --- Resolution: Fixed Issue resolved by pull request 8415 [https://github.com/apache/arrow/pull/8415] > [C++][Dataset] Dataset writing does not write schema metadata > - > > Key: ARROW-10248 > URL: https://issues.apache.org/jira/browse/ARROW-10248 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Joris Van den Bossche >Assignee: Ben Kietzman >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Not sure if this is related to the writing refactor that landed yesterday, > but `write_dataset` does not preserve the schema metadata (eg used for pandas > metadata): > {code} > In [20]: df = pd.DataFrame({'a': [1, 2, 3]}) > In [21]: table = pa.Table.from_pandas(df) > In [22]: table.schema > Out[22]: > a: int64 > -- schema metadata -- > pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + > 396 > In [23]: ds.write_dataset(table, "test_write_dataset_pandas", > format="parquet") > In [24]: pq.read_table("test_write_dataset_pandas/part-0.parquet").schema > Out[24]: > a: int64 > -- field metadata -- > PARQUET:field_id: '1' > {code} > I tagged it for 2.0.0 for a moment in case it's possible today, but I didn't > yet look into how easy it would be to fix. > cc [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10175) [CI] Nightly hdfs integration test job fails
[ https://issues.apache.org/jira/browse/ARROW-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211606#comment-17211606 ] Joris Van den Bossche commented on ARROW-10175: --- I opened ARROW-10264 for the failing URI test (which I would think should work) > [CI] Nightly hdfs integration test job fails > > > Key: ARROW-10175 > URL: https://issues.apache.org/jira/browse/ARROW-10175 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Neal Richardson >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Two tests fail: > https://github.com/ursa-labs/crossbow/runs/1204680589 > [removed bogus investigation] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10264) [C++][Python] Parquet test failing with HadoopFileSystem URI
[ https://issues.apache.org/jira/browse/ARROW-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-10264: -- Labels: filesystem hdfs (was: ) > [C++][Python] Parquet test failing with HadoopFileSystem URI > > > Key: ARROW-10264 > URL: https://issues.apache.org/jira/browse/ARROW-10264 > Project: Apache Arrow > Issue Type: Bug >Reporter: Joris Van den Bossche >Priority: Major > Labels: filesystem, hdfs > Fix For: 3.0.0 > > > Follow-up on ARROW-10175. In the HDFS integration tests, there is a test > using a URI failing if we use the new filesystem / dataset implementation: > {code} > FAILED > opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/tests/test_hdfs.py::TestLibHdfs::test_read_multiple_parquet_files_with_uri > {code} > fails with > {code} > pyarrow.lib.ArrowInvalid: Path > '/tmp/pyarrow-test-838/multi-parquet-uri-48569714efc74397816722c9c6723191/0.parquet' > is not relative to '/user/root' > {code} > while it is passing a URI (and not a filesystem object) to > {{parquet.read_table}}, and the new filesystems/dataset implementation should > be able to handle URIs. > cc [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10264) [C++][Python] Parquet test failing with HadoopFileSystem URI
Joris Van den Bossche created ARROW-10264: - Summary: [C++][Python] Parquet test failing with HadoopFileSystem URI Key: ARROW-10264 URL: https://issues.apache.org/jira/browse/ARROW-10264 Project: Apache Arrow Issue Type: Bug Reporter: Joris Van den Bossche Fix For: 3.0.0 Follow-up on ARROW-10175. In the HDFS integration tests, there is a test using a URI failing if we use the new filesystem / dataset implementation: {code} FAILED opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/tests/test_hdfs.py::TestLibHdfs::test_read_multiple_parquet_files_with_uri {code} fails with {code} pyarrow.lib.ArrowInvalid: Path '/tmp/pyarrow-test-838/multi-parquet-uri-48569714efc74397816722c9c6723191/0.parquet' is not relative to '/user/root' {code} while it is passing a URI (and not a filesystem object) to {{parquet.read_table}}, and the new filesystems/dataset implementation should be able to handle URIs. cc [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Parquet
[ https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-9812: - Fix Version/s: 3.0.0 > [Python] Map data types doesn't work from Arrow to Parquet > -- > > Key: ARROW-9812 > URL: https://issues.apache.org/jira/browse/ARROW-9812 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mayur Srivastava >Priority: Major > Fix For: 3.0.0 > > > Hi, > I'm having problems using 'map' data type in Arrow/parquet/pandas. > I'm able to convert a pandas data frame to Arrow with a map data type. > When I write Arrow to Parquet, it seems to work, but I'm not sure if the data > type is written correctly. > When I read back Parquet to Arrow, it fails saying "reading list of structs" > is not supported. It seems that map is stored as list of structs. > There are two problems here: > # -Map data type doesn't work from Arrow -> Pandas-. Fixed in ARROW-10151 > # Map data type doesn't get written to or read from Arrow -> Parquet. > Questions: > 1. Am I doing something wrong? Is there a way to get these to work? > 2. If these are unsupported features, will this be fixed in a future version? > Do you plans or ETA? > The following code example (followed by output) should demonstrate the issues: > I'm using Arrow 1.0.0 and Pandas 1.0.5. > Thanks! > Mayur > {code:java} > $ cat arrowtest.py > import pyarrow as pa > import pandas as pd > import pyarrow.parquet as pq > import traceback as tb > import io > print(f'PyArrow Version = {pa.__version__}') > print(f'Pandas Version = {pd.__version__}') > df1 = pd.DataFrame({'a': [[('b', '2')]]}) > print(f'df1') > print(f'{df1}') > print(f'Pandas -> Arrow') > try: > t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', > pa.map_(pa.string(), pa.string()))])) > print('PASSED') > print(t1) > except: > print(f'FAILED') > tb.print_exc() > print(f'Arrow -> Pandas') > try: > t1.to_pandas() > print('PASSED') > except: > print(f'FAILED') > tb.print_exc()print(f'Arrow -> Parquet') > fh = io.BytesIO() > try: > pq.write_table(t1, fh) > print('PASSED') > except: > print('FAILED') > tb.print_exc() > > print(f'Parquet -> Arrow') > try: > t2 = pq.read_table(source=fh) > print('PASSED') > print(t2) > except: > print('FAILED') > tb.print_exc() > {code} > {code:java} > $ python3.6 arrowtest.py > PyArrow Version = 1.0.0 > Pandas Version = 1.0.5 > df1 > a 0 [(b, 2)] > > Pandas -> Arrow > PASSED > pyarrow.Table > a: map > child 0, entries: struct not null > child 0, key: string not null > child 1, value: string > > Arrow -> Pandas > FAILED > Traceback (most recent call last): > File "arrowtest.py", line 26, in t1.to_pandas() > File "pyarrow/array.pxi", line 715, in > pyarrow.lib._PandasConvertible.to_pandas > File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File > "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in > table_to_blockmanager blocks = _table_to_blocks(options, table, categories, > ext_columns_dtypes) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line > 1115, in _table_to_blocks list(extension_columns.keys())) > File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File > "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for > Arrow data of type map is known. > > Arrow -> Parquet > PASSED > > Parquet -> Arrow > FAILED > Traceback (most recent call last): File "arrowtest.py", line 43, in > t2 = pq.read_table(source=fh) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in > read_table use_pandas_metadata=use_pandas_metadata) > File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in > read use_threads=use_threads > File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 122, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet > files not yet supported: key_value: list null, value: string> not null> not null > {code} > Updated to indicate to Pandas conversion done, but not yet for Parquet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10243) [Rust] [Datafusion] Optimize literal expression evaluation
[ https://issues.apache.org/jira/browse/ARROW-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörn Horstmann reassigned ARROW-10243: -- Assignee: Jörn Horstmann > [Rust] [Datafusion] Optimize literal expression evaluation > -- > > Key: ARROW-10243 > URL: https://issues.apache.org/jira/browse/ARROW-10243 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Major > Attachments: flamegraph.svg > > > While benchmarking the tpch query I noticed that the physical literal > expression takes up a sizable amount of time. I think the creation of the > corresponding array for numeric literals can be speed up by creating Buffer > and ArrayData directly without going through a builder. That also allows to > skip building a null bitmap for non-null literals. > I'm also thinking whether it might be possible to cache the created array. > For queries without a WHERE clause, I'd expect all batches except the last to > have the same length. I'm not sure though where to store the cached value. > Another possible optimization could be to cast literals already on the > logical plan side. In the tpch query the literal `1` is of type `u64` in the > logical plan and then needs to be processed by a cast kernel to convert to > `f64` for usage in an arithmetic expression. > The attached flamegraph is of 10 runs of tpch, with the data being loaded > into memory before running the queries (See ARROW-10240). > {code} > flamegraph ./target/release/tpch --iterations 10 --path ../tpch-dbgen > --format tbl --query 1 --batch-size 4096 -c1 --load > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10244) [Python][Docs] Add docs on using pyarrow.dataset.parquet_dataset
[ https://issues.apache.org/jira/browse/ARROW-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-10244. --- Resolution: Fixed Issue resolved by pull request 8410 [https://github.com/apache/arrow/pull/8410] > [Python][Docs] Add docs on using pyarrow.dataset.parquet_dataset > > > Key: ARROW-10244 > URL: https://issues.apache.org/jira/browse/ARROW-10244 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10263) [C++][Compute] Improve numerical stability of variances merging
[ https://issues.apache.org/jira/browse/ARROW-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai updated ARROW-10263: - Description: For chunked array, variance kernel needs to merge variances. Tested with two single value chunk, [400800490], [400800400]. The merged variance is 3872. If treated as single array with two values, the variance is 3904, same as numpy outputs. So current merging method is not stable in extreme cases when chunks are very short and with approximate mean values. was: For chunked array, variance kernel needs to merge variances. Tested with two single value chunk, [400800490], [400800400]. The merged variance is 3872. If treated as single array with two values, the variance is 3904, same as numpy outputs. So current merging method is not stable in extreme cases. > [C++][Compute] Improve numerical stability of variances merging > --- > > Key: ARROW-10263 > URL: https://issues.apache.org/jira/browse/ARROW-10263 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > > For chunked array, variance kernel needs to merge variances. > Tested with two single value chunk, [400800490], [400800400]. > The merged variance is 3872. If treated as single array with two values, the > variance is 3904, same as numpy outputs. > So current merging method is not stable in extreme cases when chunks are very > short and with approximate mean values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10263) [C++][Compute] Improve numerical stability of variances merging
Yibo Cai created ARROW-10263: Summary: [C++][Compute] Improve numerical stability of variances merging Key: ARROW-10263 URL: https://issues.apache.org/jira/browse/ARROW-10263 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yibo Cai Assignee: Yibo Cai For chunked array, variance kernel needs to merge variances. Tested with two single value chunk, [400800490], [400800400]. The merged variance is 3872. If treated as single array with two values, the variance is 3904, same as numpy outputs. So current merging method is not stable in extreme cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9962) [Python] Conversion to pandas with index column using fixed timezone fails
[ https://issues.apache.org/jira/browse/ARROW-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-9962. -- Resolution: Fixed Issue resolved by pull request 8162 [https://github.com/apache/arrow/pull/8162] > [Python] Conversion to pandas with index column using fixed timezone fails > -- > > Key: ARROW-9962 > URL: https://issues.apache.org/jira/browse/ARROW-9962 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > From https://github.com/pandas-dev/pandas/issues/35997: it seems we are > handling a normal column and index column differently in the conversion to > pandas. > {code} > In [5]: import pandas as pd >...: from datetime import datetime, timezone >...: >...: df = pd.DataFrame([[datetime.now(timezone.utc), > datetime.now(timezone.utc)]], columns=['date_index', 'date_column']) >...: table = pa.Table.from_pandas(df.set_index('date_index')) >...: > In [6]: table > Out[6]: > pyarrow.Table > date_column: timestamp[ns, tz=+00:00] > date_index: timestamp[ns, tz=+00:00] > In [7]: table.to_pandas() > ... > UnknownTimeZoneError: '+00:00' > {code} > So this happens specifically for "fixed offset" timezones, and only for index > columns (eg {{table.select(["date_column"]).to_pandas()}} works fine). > It seems this is because for columns we use our helper {{make_tz_aware}} to > convert the string "+01:00" to a python timezone, which is then understood by > pandas (the string is not handled by pandas). But for the index column we > fail to do this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10260) [Python] Missing MapType to Pandas dtype
[ https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned ARROW-10260: Assignee: (was: Bryan Cutler) > [Python] Missing MapType to Pandas dtype > > > Key: ARROW-10260 > URL: https://issues.apache.org/jira/browse/ARROW-10260 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype > mapping for {{to_pandas_dtype()}} > > {code:java} > In [2]: d = pa.map_(pa.int64(), pa.float64()) >In [3]: d.to_pandas_dtype() > > > --- > NotImplementedError Traceback (most recent call last) > in > > 1 > d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi > in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10260) [Python] Missing MapType to Pandas dtype
[ https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned ARROW-10260: Assignee: Bryan Cutler > [Python] Missing MapType to Pandas dtype > > > Key: ARROW-10260 > URL: https://issues.apache.org/jira/browse/ARROW-10260 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype > mapping for {{to_pandas_dtype()}} > > {code:java} > In [2]: d = pa.map_(pa.int64(), pa.float64()) >In [3]: d.to_pandas_dtype() > > > --- > NotImplementedError Traceback (most recent call last) > in > > 1 > d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi > in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10262) [C++] Some TypeClass in Scalar classes seem incorrect
[ https://issues.apache.org/jira/browse/ARROW-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10262: --- Labels: pull-request-available (was: ) > [C++] Some TypeClass in Scalar classes seem incorrect > - > > Key: ARROW-10262 > URL: https://issues.apache.org/jira/browse/ARROW-10262 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.1 >Reporter: RUOXI SUN >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Alias _TypeClass_ in > _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ > and > _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ > are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be > _BinaryType_ and _LargeBinaryType_? > I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - > compiler complains that there are no whatever members in specialized > _TypeTrait_ and _TypeTrait_. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10262) [C++] Some TypeClass in Scalar classes seem incorrect
[ https://issues.apache.org/jira/browse/ARROW-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RUOXI SUN updated ARROW-10262: -- Description: Alias _TypeClass_ in _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ and _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be _BinaryType_ and _LargeBinaryType_? I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - compiler complains that there are no whatever members in specialized _TypeTrait_ and _TypeTrait_. was: Alias _TypeClass_ in _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ and _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be _BinaryType_ and _LargeBinaryType_? I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - compiler complains that there are no whatever members in specialized _TypeTrait_ class _TypeTrait_. > [C++] Some TypeClass in Scalar classes seem incorrect > - > > Key: ARROW-10262 > URL: https://issues.apache.org/jira/browse/ARROW-10262 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.1 >Reporter: RUOXI SUN >Priority: Minor > > Alias _TypeClass_ in > _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ > and > _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ > are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be > _BinaryType_ and _LargeBinaryType_? > I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - > compiler complains that there are no whatever members in specialized > _TypeTrait_ and _TypeTrait_. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10262) [C++] Some TypeClass in Scalar classes seem incorrect
[ https://issues.apache.org/jira/browse/ARROW-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RUOXI SUN updated ARROW-10262: -- Description: Alias _TypeClass_ in _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ and _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be _BinaryType_ and _LargeBinaryType_? I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - compiler complains that there are no whatever members in specialized _TypeTrait_ class _TypeTrait_. was: Alias `TypeClass` in [BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217] and [LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242] are being `BinaryScalar` and `LargeBinaryScalar`. Are they supposed to be `BinaryType` and `LargeBinaryType`? I'm having issues when I use `TypeTrait` on `ScalarType::TypeClass`. > [C++] Some TypeClass in Scalar classes seem incorrect > - > > Key: ARROW-10262 > URL: https://issues.apache.org/jira/browse/ARROW-10262 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.1 >Reporter: RUOXI SUN >Priority: Minor > > Alias _TypeClass_ in > _[BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217]_ > and > _[LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242]_ > are being _BinaryScalar_ and _LargeBinaryScalar_. Are they supposed to be > _BinaryType_ and _LargeBinaryType_? > I'm having issues when I use _TypeTrait_ on _ScalarType::TypeClass_ - > compiler complains that there are no whatever members in specialized > _TypeTrait_ class _TypeTrait_. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10262) [C++] Some TypeClass in Scalar classes seem incorrect
RUOXI SUN created ARROW-10262: - Summary: [C++] Some TypeClass in Scalar classes seem incorrect Key: ARROW-10262 URL: https://issues.apache.org/jira/browse/ARROW-10262 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 1.0.1 Reporter: RUOXI SUN Alias `TypeClass` in [BinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L217] and [LargeBinaryScalar|https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.h#L242] are being `BinaryScalar` and `LargeBinaryScalar`. Are they supposed to be `BinaryType` and `LargeBinaryType`? I'm having issues when I use `TypeTrait` on `ScalarType::TypeClass`. -- This message was sent by Atlassian Jira (v8.3.4#803005)