[jira] [Updated] (ARROW-11223) [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and getBufferSizeFor is buggy
[ https://issues.apache.org/jira/browse/ARROW-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated ARROW-11223: --- Summary: [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and getBufferSizeFor is buggy (was: [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy) > [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and > getBufferSizeFor is buggy > - > > Key: ARROW-11223 > URL: https://issues.apache.org/jira/browse/ARROW-11223 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > We may get error java.lang.IndexOutOfBoundsException: index: 15880, length: > 4 (expected: range(0, 15880)). > I test on arrow 2.0.0 > Reproduce code in scala: > {code} > import org.apache.arrow.vector.VarCharVector > import org.apache.arrow.memory.RootAllocator > val rootAllocator = new RootAllocator(Long.MaxValue) > val v1 = new VarCharVector("var1", rootAllocator) > v1.allocateNew() > val valueCount = 3970 // use any number >= 3970 will get similar error > for (idx <- 0 until valueCount) { > v1.setNull(idx) > } > v1.getBufferSizeFor(valueCount) # failed, get error > java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: > range(0, 15880)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11314) [Release][APT][Yum] Add support for verifying arm64 packages
Kouhei Sutou created ARROW-11314: Summary: [Release][APT][Yum] Add support for verifying arm64 packages Key: ARROW-11314 URL: https://issues.apache.org/jira/browse/ARROW-11314 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11314) [Release][APT][Yum] Add support for verifying arm64 packages
[ https://issues.apache.org/jira/browse/ARROW-11314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11314: --- Labels: pull-request-available (was: ) > [Release][APT][Yum] Add support for verifying arm64 packages > - > > Key: ARROW-11314 > URL: https://issues.apache.org/jira/browse/ARROW-11314 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11313) [Rust] Size hint of iterators is incorrect
[ https://issues.apache.org/jira/browse/ARROW-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11313: --- Labels: pull-request-available (was: ) > [Rust] Size hint of iterators is incorrect > -- > > Key: ARROW-11313 > URL: https://issues.apache.org/jira/browse/ARROW-11313 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11313) [Rust] Size hint of iterators is incorrect
Jorge Leitão created ARROW-11313: Summary: [Rust] Size hint of iterators is incorrect Key: ARROW-11313 URL: https://issues.apache.org/jira/browse/ARROW-11313 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) [Rust] unset_bit is toggling bits, not unsetting them
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Component/s: Rust > [Rust] unset_bit is toggling bits, not unsetting them > - > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11312) [Rust] Implement FromIter for timestamps, that includes timezone info
Neville Dipale created ARROW-11312: -- Summary: [Rust] Implement FromIter for timestamps, that includes timezone info Key: ARROW-11312 URL: https://issues.apache.org/jira/browse/ARROW-11312 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale We currently have TimestampArray::from_vec and TimestampArray::from_opt_vec in order to provide timezone information. We do not have an option that uses FromIter. When implementing this, we should search the codebase (esp Parquet) and replace the vector-based methods above with iterators where it makes sense. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11310) [Rust] Implement arrow JSON writer
[ https://issues.apache.org/jira/browse/ARROW-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-11310: --- Summary: [Rust] Implement arrow JSON writer (was: implement arrow JSON writer) > [Rust] Implement arrow JSON writer > -- > > Key: ARROW-11310 > URL: https://issues.apache.org/jira/browse/ARROW-11310 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) [Rust] unset_bit is toggling bits, not unsetting them
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-11311: --- Summary: [Rust] unset_bit is toggling bits, not unsetting them (was: unset_bit is toggling bits, not unsetting them) > [Rust] unset_bit is toggling bits, not unsetting them > - > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7396) [Format] Register media types (MIME types) for Apache Arrow formats to IANA
[ https://issues.apache.org/jira/browse/ARROW-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267660#comment-17267660 ] QP Hou commented on ARROW-7396: --- Any update on this task? Should we start a vote for what [~maartenbreddels] proposed to move this forward? > [Format] Register media types (MIME types) for Apache Arrow formats to IANA > --- > > Key: ARROW-7396 > URL: https://issues.apache.org/jira/browse/ARROW-7396 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Kouhei Sutou >Priority: Major > > See "MIME types" thread for details: > https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E > Summary: > * If we don't register our media types for Apache Arrow formats (IPC File > Format and IPC Streaming Format) to IANA, we should use "x-" prefix such as > "application/x-apache-arrow-file". > * It may be better that we reuse the same manner as Apache Thrift. Apache > Thrift registers their media types as "application/vnd.apache.thrift.XXX". If > we use the same manner as Apache Thrift, we will use > "application/vnd.apache.arrow.file" or something. > TODO: > * Decide which media types should we register. (Do we need vote?) > * Register our media types to IANA. > ** Media types page: > https://www.iana.org/assignments/media-types/media-types.xhtml > ** Application form for new media types: > https://www.iana.org/form/media-types -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) unset_bit is toggling bits, not unsetting them
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11311: --- Labels: pull-request-available (was: ) > unset_bit is toggling bits, not unsetting them > -- > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) unset_bit is incorrect
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Description: The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not setting them. (was: The functions {{bit_util::set_bit[_raw]}} are currently toggling bits, not setting them.) > unset_bit is incorrect > -- > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > > The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) unset_bit is toggling bits, not unsetting them
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Summary: unset_bit is toggling bits, not unsetting them (was: unset_bit is incorrect) > unset_bit is toggling bits, not unsetting them > -- > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > > The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) unset_bit is incorrect
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Summary: unset_bit is incorrect (was: set_bit is incorrect) > unset_bit is incorrect > -- > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > > The functions {{bit_util::set_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) set_bit is incorrect
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Description: The functions {{bit_util::set_bit[_raw]}} are currently toggling bits, not setting them. (was: The functions {{bit_util::[un]set_bit[_raw]}} are currently flipping a bit, not setting or unsetting it.) > set_bit is incorrect > > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > > The functions {{bit_util::set_bit[_raw]}} are currently toggling bits, not > setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11311) set_bit is incorrect
[ https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-11311: - Summary: set_bit is incorrect (was: set_bit and unset_bit are incorrect) > set_bit is incorrect > > > Key: ARROW-11311 > URL: https://issues.apache.org/jira/browse/ARROW-11311 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > > The functions {{bit_util::[un]set_bit[_raw]}} are currently flipping a bit, > not setting or unsetting it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11311) set_bit and unset_bit are incorrect
Jorge Leitão created ARROW-11311: Summary: set_bit and unset_bit are incorrect Key: ARROW-11311 URL: https://issues.apache.org/jira/browse/ARROW-11311 Project: Apache Arrow Issue Type: Bug Reporter: Jorge Leitão Assignee: Jorge Leitão The functions {{bit_util::[un]set_bit[_raw]}} are currently flipping a bit, not setting or unsetting it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11310) implement arrow JSON writer
[ https://issues.apache.org/jira/browse/ARROW-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11310: --- Labels: pull-request-available (was: ) > implement arrow JSON writer > --- > > Key: ARROW-11310 > URL: https://issues.apache.org/jira/browse/ARROW-11310 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11310) implement arrow JSON writer
QP Hou created ARROW-11310: -- Summary: implement arrow JSON writer Key: ARROW-11310 URL: https://issues.apache.org/jira/browse/ARROW-11310 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11309) [Release][C#] Use .NET 3.1 for verification
[ https://issues.apache.org/jira/browse/ARROW-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-11309. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 9254 [https://github.com/apache/arrow/pull/9254] > [Release][C#] Use .NET 3.1 for verification > --- > > Key: ARROW-11309 > URL: https://issues.apache.org/jira/browse/ARROW-11309 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11309) [Release][C#] Use .NET 3.1 for verification
[ https://issues.apache.org/jira/browse/ARROW-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11309: --- Labels: pull-request-available (was: ) > [Release][C#] Use .NET 3.1 for verification > --- > > Key: ARROW-11309 > URL: https://issues.apache.org/jira/browse/ARROW-11309 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11309) [Release][C#] Use .NET 3.1 for verification
Kouhei Sutou created ARROW-11309: Summary: [Release][C#] Use .NET 3.1 for verification Key: ARROW-11309 URL: https://issues.apache.org/jira/browse/ARROW-11309 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11061) [Rust] Validate array properties against schema
[ https://issues.apache.org/jira/browse/ARROW-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-11061: --- Fix Version/s: 4.0.0 > [Rust] Validate array properties against schema > --- > > Key: ARROW-11061 > URL: https://issues.apache.org/jira/browse/ARROW-11061 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Neville Dipale >Priority: Major > Fix For: 4.0.0 > > > We have a problem when it comes to nested arrays, where one could create a > > where the array fields can't be null, but > the list can have null slots. > This creates a lot of work when working with such nested arrays, because we > have to create work-arounds to account for this, and take unnecessarily > slower paths. > I propose that we prevent this problem at the source, by: > * checking that a batch can't be created with arrays that have incompatible > null contracts > * preventing list and struct children from being non-null if any descendant > of such children are null (might be less of an issue for structs) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11077) [Rust] ParquetFileArrowReader panicks when trying to read nested list
[ https://issues.apache.org/jira/browse/ARROW-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-11077: --- Fix Version/s: 4.0.0 > [Rust] ParquetFileArrowReader panicks when trying to read nested list > - > > Key: ARROW-11077 > URL: https://issues.apache.org/jira/browse/ARROW-11077 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Ben Sully >Assignee: Neville Dipale >Priority: Major > Fix For: 4.0.0 > > Attachments: small-nested-lists.parquet > > > I think this is documented in the code, but I can't be 100% sure. > When trying to execute a DataFusion query over a Parquet file where one field > is a struct with a nested list, the thread panicks due to unwrapping on an > `Option::None` [at this > point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] > > [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] > This `None` is returned by > [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], > but I can't quite make sense of _why_ it returns a `None` rather than an > error? > I added a couple of dbg! calls to see what the item_type and list_type are: > {code} > [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] > _type = PrimitiveType { > basic_info: BasicTypeInfo { > name: "item", > repetition: Some( > OPTIONAL, > ), > logical_type: UTF8, > id: None, > }, > physical_type: BYTE_ARRAY, > type_length: -1, > scale: -1, > precision: -1, > } > [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] > _type = GroupType { > basic_info: BasicTypeInfo { > name: "tags", > repetition: Some( > OPTIONAL, > ), > logical_type: LIST, > id: None, > }, > fields: [ > GroupType { > basic_info: BasicTypeInfo { > name: "list", > repetition: Some( > REPEATED, > ), > logical_type: NONE, > id: None, > }, > fields: [ > PrimitiveType { > basic_info: BasicTypeInfo { > name: "item", > repetition: Some( > OPTIONAL, > ), > logical_type: UTF8, > id: None, > }, > physical_type: BYTE_ARRAY, > type_length: -1, > scale: -1, > precision: -1, > }, > ], > }, > ], > }{code} > I guess we should at least use `.expect` here instead of `.unwrap` so it's > more clear why this is happening! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10550) [Rust] [Parquet] Write nested types (struct, list)
[ https://issues.apache.org/jira/browse/ARROW-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-10550: --- Fix Version/s: 4.0.0 > [Rust] [Parquet] Write nested types (struct, list) > -- > > Key: ARROW-10550 > URL: https://issues.apache.org/jira/browse/ARROW-10550 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Neville Dipale >Priority: Major > Fix For: 4.0.0 > > > After being able to compute arbitrarily nested definition and repetitions, we > should be able to write structs and lists -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10927) [Rust] [Parquet] Add Decimal to ArrayBuilderReader for physical type fixed size binary
[ https://issues.apache.org/jira/browse/ARROW-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-10927: --- Summary: [Rust] [Parquet] Add Decimal to ArrayBuilderReader for physical type fixed size binary (was: Add Decimal to ArrayBuilderReader for physical type fixed size binary) > [Rust] [Parquet] Add Decimal to ArrayBuilderReader for physical type fixed > size binary > -- > > Key: ARROW-10927 > URL: https://issues.apache.org/jira/browse/ARROW-10927 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Florian Müller >Assignee: Florian Müller >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11308) [Rust] [Parquet] Add Arrow decimal array writer
Neville Dipale created ARROW-11308: -- Summary: [Rust] [Parquet] Add Arrow decimal array writer Key: ARROW-11308 URL: https://issues.apache.org/jira/browse/ARROW-11308 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10926) Add parquet reader / writer for decimal types
[ https://issues.apache.org/jira/browse/ARROW-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-10926: --- Fix Version/s: 4.0.0 > Add parquet reader / writer for decimal types > - > > Key: ARROW-10926 > URL: https://issues.apache.org/jira/browse/ARROW-10926 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Florian Müller >Priority: Major > Fix For: 4.0.0 > > > Decimal values, stored physically as e.g. Fixed Size Binary should be > represented by DecimalArray when the logical type indicates decimal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11269) [Rust] Unable to read Parquet file because of mismatch in column-derived and embedded schemas
[ https://issues.apache.org/jira/browse/ARROW-11269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale reassigned ARROW-11269: -- Assignee: Neville Dipale > [Rust] Unable to read Parquet file because of mismatch in column-derived and > embedded schemas > - > > Key: ARROW-11269 > URL: https://issues.apache.org/jira/browse/ARROW-11269 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 3.0.0 >Reporter: Max Burke >Assignee: Neville Dipale >Priority: Blocker > Labels: pull-request-available > Attachments: 0100c937-7c1c-78c4-1f4b-156ef04e79f0.parquet, main.rs > > Time Spent: 40m > Remaining Estimate: 0h > > The issue seems to stem from the new(-ish) behavior of the Arrow Parquet > reader where the embedded arrow schema is used instead of deriving the schema > from the Parquet columns. > > However it seems like some cases still derive the schema type from the column > types, leading to the Arrow record batch reader erroring out that the column > types must match the schema types. > > In our case, the column type is an int96 datetime (ns) type, and the Arrow > type in the embedded schema is DataType::Timestamp(TimeUnit::Nanoseconds, > Some("UTC")). However, the code that constructs the Arrays seems to re-derive > this column type as DataType::Timestamp(TimeUnit::Nanoseconds, None) (because > the Parquet schema has no timezone information). And so, Parquet files that > we were able to read successfully with our branch of Arrow circa October are > now unreadable. > > I've attached an example of a Parquet file that demonstrates the problem. > This file was created in Python (as most of our Parquet files are). > > I've also attached a sample Rust program that will demonstrate the error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10984) [Rust] Document use of unsafe in parquet crate
[ https://issues.apache.org/jira/browse/ARROW-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-10984: --- Fix Version/s: 4.0.0 > [Rust] Document use of unsafe in parquet crate > -- > > Key: ARROW-10984 > URL: https://issues.apache.org/jira/browse/ARROW-10984 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Andy Grove >Priority: Major > Fix For: 4.0.0 > > > There are ~64 uses of unsafe in the parquet crate > {code:java} > ./parquet/src/util/hash_util.rs:6 > ./parquet/src/util/bit_packing.rs:34 > ./parquet/src/util/bit_util.rs:1 > ./parquet/src/data_type.rs:12 > ./parquet/src/arrow/record_reader.rs:5 > ./parquet/src/arrow/array_reader.rs:8 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8796) [Rust] Allow parquet to be written directly to memory
[ https://issues.apache.org/jira/browse/ARROW-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-8796: -- Fix Version/s: 4.0.0 > [Rust] Allow parquet to be written directly to memory > - > > Key: ARROW-8796 > URL: https://issues.apache.org/jira/browse/ARROW-8796 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Markus Westerlind >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The `TryClone` bound currently needed in `ParquetWriter` makes it awkward to > write parquet to memory, forcing either a `Rc` + `RefCell` wrapper or to > write to a `File` first. > By explictly threading lifetimes around the underlying writer can be passed > mutably through all parts of the writer, allowing ` Vec` or any other > implementors of the basic io traits to be used directly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10553) [Rust] [Parquet] Panic when reading Parquet file produced with parquet-cpp
[ https://issues.apache.org/jira/browse/ARROW-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-10553: --- Fix Version/s: 4.0.0 > [Rust] [Parquet] Panic when reading Parquet file produced with parquet-cpp > -- > > Key: ARROW-10553 > URL: https://issues.apache.org/jira/browse/ARROW-10553 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 2.0.0 > Environment: Windows 10 x86_64 > Cargo nightly >Reporter: Michael Spector >Priority: Major > Fix For: 4.0.0 > > Attachments: 3072786907765935896_0_3.snappy.Parquet > > > See attached Parquet file that was created with parquet-cpp. > The file metadata is: > > {color:#dcdfe4}creator: parquet-cpp version 1.5.1-SNAPSHOT > > file schema: schema > > > __sys_isSystemRelocated: OPTIONAL INT64 R:0 D:1 > __sys_schemaId: OPTIONAL INT64 R:0 D:1 > __sys_invOffsetLSID: OPTIONAL INT64 R:0 D:1 > __sys_invOffsetGroupIdx: OPTIONAL INT64 R:0 D:1 > __sys_invOffsetRecordIdx: OPTIONAL INT64 R:0 D:1 > _rid: OPTIONAL BINARY L:STRING R:0 D:1 > __sys_sequenceNumber: OPTIONAL INT64 R:0 D:1 > __sys_recordIndex: OPTIONAL INT64 R:0 D:1 > __sys_isTombstone: OPTIONAL INT64 R:0 D:1 > _ts: OPTIONAL INT64 R:0 D:1 > partitionKey: OPTIONAL BINARY L:STRING R:0 D:1 > entityType: OPTIONAL BINARY L:STRING R:0 D:1 > ttl: OPTIONAL INT64 R:0 D:1 > tripId: OPTIONAL INT32 R:0 D:1 > vin: OPTIONAL BINARY L:STRING R:0 D:1 > state: OPTIONAL BINARY L:STRING R:0 D:1 > region: OPTIONAL INT32 R:0 D:1 > outsideTemperature: OPTIONAL INT64 R:0 D:1 > engineTemperature: OPTIONAL INT64 R:0 D:1 > speed: OPTIONAL INT64 R:0 D:1 > fuel: OPTIONAL INT64 R:0 D:1 > fuelRate: OPTIONAL DOUBLE R:0 D:1 > engineoil: OPTIONAL INT64 R:0 D:1 > tirepressure: OPTIONAL INT64 R:0 D:1 > odometer: OPTIONAL DOUBLE R:0 D:1 > accelerator_pedal_position: OPTIONAL INT64 R:0 D:1 > parking_brake_status: OPTIONAL BOOLEAN R:0 D:1 > brake_pedal_status: OPTIONAL BOOLEAN R:0 D:1 > headlamp_status: OPTIONAL BOOLEAN R:0 D:1 > transmission_gear_position: OPTIONAL INT64 R:0 D:1 > ignition_status: OPTIONAL BOOLEAN R:0 D:1 > windshield_wiper_status: OPTIONAL BOOLEAN R:0 D:1 > abs: OPTIONAL BOOLEAN R:0 D:1 > refrigerationUnitKw: OPTIONAL DOUBLE R:0 D:1 > refrigerationUnitTemp: OPTIONAL DOUBLE R:0 D:1 > timestamp: OPTIONAL BINARY L:STRING R:0 D:1 > id: OPTIONAL BINARY L:STRING R:0 D:1 > _etag: OPTIONAL BINARY L:STRING R:0 D:1 > __sys_value: OPTIONAL BINARY L:STRING R:0 D:1 > > row group 1: RC:27150 TS:2481123 OFFSET:4 > > > __sys_isSystemRelocated: INT64 SNAPPY DO:4 FPO:28 SZ:102/98/0.96 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: 0, max: 0, num_nulls: 0] > __sys_schemaId: INT64 SNAPPY DO:205 FPO:220 SZ:51/48/0.94 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[num_nulls: 27150, min/max not defined] > __sys_invOffsetLSID: INT64 SNAPPY DO:308 FPO:323 SZ:51/48/0.94 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[num_nulls: 27150, min/max not defined] > __sys_invOffsetGroupIdx: INT64 SNAPPY DO:416 FPO:431 SZ:51/48/0.94 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[num_nulls: 27150, min/max not defined] > __sys_invOffsetRecordIdx: INT64 SNAPPY DO:528 FPO:543 SZ:51/48/0.94 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[num_nulls: 27150, min/max not defined] > _rid: BINARY SNAPPY DO:641 FPO:137000 SZ:187417/811272/4.33 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: o9dcAMA1y14+BA==, max: > o9dcAMA1y17zaQAABA==, num_nulls: 0] > __sys_sequenceNumber: INT64 SNAPPY DO:188156 FPO:296856 > SZ:159746/268260/1.68 VC:27150 ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: 3, > max: 27152, num_nulls: 0] > __sys_recordIndex: INT64 SNAPPY DO:348005 FPO:456699 SZ:159740/268260/1.68 > VC:27150 ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: 0, max: 27149, num_nulls: 0] > __sys_isTombstone: INT64 SNAPPY DO:507845 FPO:507860 SZ:51/48/0.94 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[num_nulls: 27150, min/max not defined] > _ts: INT64 SNAPPY DO:507954 FPO:510167 SZ:3974/6137/1.54 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: 1597365315, max: 1597365859, > num_nulls: 0] > partitionKey: BINARY SNAPPY DO:512012 FPO:512256 SZ:13967/14026/1.00 > VC:27150 ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: 0A4SMSAGR5CA4LAY6-2020-08, > max: YKO1Q8RX7Z20BVBG0-2020-08, num_nulls: 0] > entityType: BINARY SNAPPY DO:526088 FPO:526124 SZ:110/106/0.96 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE ST:[min: VehicleTelemetry, max: > VehicleTelemetry, num_nulls: 0] > ttl: INT64 SNAPPY DO:526285 FPO:526309 SZ:102/98/0.96 VC:27150 > ENC:PLAIN,PLAIN_DICTIONARY,RLE
[jira] [Updated] (ARROW-11269) [Rust] Unable to read Parquet file because of mismatch in column-derived and embedded schemas
[ https://issues.apache.org/jira/browse/ARROW-11269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11269: --- Labels: pull-request-available (was: ) > [Rust] Unable to read Parquet file because of mismatch in column-derived and > embedded schemas > - > > Key: ARROW-11269 > URL: https://issues.apache.org/jira/browse/ARROW-11269 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 3.0.0 >Reporter: Max Burke >Priority: Blocker > Labels: pull-request-available > Attachments: 0100c937-7c1c-78c4-1f4b-156ef04e79f0.parquet, main.rs > > Time Spent: 10m > Remaining Estimate: 0h > > The issue seems to stem from the new(-ish) behavior of the Arrow Parquet > reader where the embedded arrow schema is used instead of deriving the schema > from the Parquet columns. > > However it seems like some cases still derive the schema type from the column > types, leading to the Arrow record batch reader erroring out that the column > types must match the schema types. > > In our case, the column type is an int96 datetime (ns) type, and the Arrow > type in the embedded schema is DataType::Timestamp(TimeUnit::Nanoseconds, > Some("UTC")). However, the code that constructs the Arrays seems to re-derive > this column type as DataType::Timestamp(TimeUnit::Nanoseconds, None) (because > the Parquet schema has no timezone information). And so, Parquet files that > we were able to read successfully with our branch of Arrow circa October are > now unreadable. > > I've attached an example of a Parquet file that demonstrates the problem. > This file was created in Python (as most of our Parquet files are). > > I've also attached a sample Rust program that will demonstrate the error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11303) [Release][C++] Enable mimalloc in the windows verification script
[ https://issues.apache.org/jira/browse/ARROW-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-11303: Assignee: Krisztian Szucs > [Release][C++] Enable mimalloc in the windows verification script > - > > Key: ARROW-11303 > URL: https://issues.apache.org/jira/browse/ARROW-11303 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11303) [Release][C++] Enable mimalloc in the windows verification script
[ https://issues.apache.org/jira/browse/ARROW-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-11303. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 9247 [https://github.com/apache/arrow/pull/9247] > [Release][C++] Enable mimalloc in the windows verification script > - > > Key: ARROW-11303 > URL: https://issues.apache.org/jira/browse/ARROW-11303 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11183) [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
[ https://issues.apache.org/jira/browse/ARROW-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267559#comment-17267559 ] Ivan Smirnov commented on ARROW-11183: -- [~nevi_me] Yea, I think I could give it a go in all three if you give a brief outline of what needs to be done and where. > [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing > - > > Key: ARROW-11183 > URL: https://issues.apache.org/jira/browse/ARROW-11183 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Ivan Smirnov >Priority: Major > > There's UnitTime::NANOS in parquet-format, but no nanosecond timestamp > support (seemingly) in schema's LogicalType. What is needed to add support > for nanosecond timestamps in Rust Parquet? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11302) [Release][Python] Remove verification of python 3.5 wheel on macOS
[ https://issues.apache.org/jira/browse/ARROW-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-11302: Fix Version/s: (was: 4.0.0) 3.0.0 > [Release][Python] Remove verification of python 3.5 wheel on macOS > -- > > Key: ARROW-11302 > URL: https://issues.apache.org/jira/browse/ARROW-11302 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11307) [Release][Ubuntu][20.10] Add workaround for dependency issue
[ https://issues.apache.org/jira/browse/ARROW-11307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-11307. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 9252 [https://github.com/apache/arrow/pull/9252] > [Release][Ubuntu][20.10] Add workaround for dependency issue > > > Key: ARROW-11307 > URL: https://issues.apache.org/jira/browse/ARROW-11307 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11307) [Release][Ubuntu][20.10] Add workaround for dependency issue
Kouhei Sutou created ARROW-11307: Summary: [Release][Ubuntu][20.10] Add workaround for dependency issue Key: ARROW-11307 URL: https://issues.apache.org/jira/browse/ARROW-11307 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11307) [Release][Ubuntu][20.10] Add workaround for dependency issue
[ https://issues.apache.org/jira/browse/ARROW-11307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11307: --- Labels: pull-request-available (was: ) > [Release][Ubuntu][20.10] Add workaround for dependency issue > > > Key: ARROW-11307 > URL: https://issues.apache.org/jira/browse/ARROW-11307 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11257) [C++][Parquet] PyArrow Table contains different data after writing and reloading from Parquet
[ https://issues.apache.org/jira/browse/ARROW-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267527#comment-17267527 ] Joris Van den Bossche commented on ARROW-11257: --- I am not really sure what the exact bug was, but ARROW-10493 was one of the nested-parquet-related bugs reported after pyarrow 2.0.0 (note that the ability to write the data you have was new in 2.0.0) bq. And when is the next release containing the update scheduled for? There is a release candidate out right now. So if all goes well by the end of this week. > [C++][Parquet] PyArrow Table contains different data after writing and > reloading from Parquet > - > > Key: ARROW-11257 > URL: https://issues.apache.org/jira/browse/ARROW-11257 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 2.0.0 >Reporter: Kari Schoonbee >Priority: Critical > Attachments: anonymised.jsonl, pyarrow_parquet_issue.ipynb > > > * I'm loading a JSONlines object into a table using > {code:java} > pa.json.readjson{code} > It contains one column that is a nested dictionary. > * I select a row by key and inspect its nested dictionary. > * I write the table to parquet > * I load the table again from the parquet file > * I check the same key and the nested dictionary is not the same. > > To reproduce: > > Find the attached JSONLines file and Jupyter Notebook. > The json file contains entries per customer with a generated `msisdn`, > `scoring_request_id` and `scorecard_result` object. Each `scorecard result > consists of a list of feature objects, all with the value the same as the > msidn` and a score. > The notebook reads the file and demonstrates the issue. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11306) [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
[ https://issues.apache.org/jira/browse/ARROW-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-11306. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 9251 [https://github.com/apache/arrow/pull/9251] > [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency > - > > Key: ARROW-11306 > URL: https://issues.apache.org/jira/browse/ARROW-11306 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267509#comment-17267509 ] Kouhei Sutou edited comment on ARROW-11135 at 1/18/21, 8:57 PM: My mistake. I missed that the JAR was macOS only. The tests pass and all is fine on openJDK 11. However, on openJDK 8 and 15, I get the following error after the tests complete which still causes the build to fail. However, I assume this may be because I did not correctly close some Gandiva resources. I have edited the code to try to properly free all resources, and I'll watch what happens with this build. {noformat} pure virtual method called terminate called without an active exception {noformat} was (Author: michaelmior): My mistake. I missed that the JAR was macOS only. The tests pass and all is fine on openJDK 11. However, on openJDK 8 and 15, I get the following error after the tests complete which still causes the build to fail. However, I assume this may be because I did not correctly close some Gandiva resources. I have edited the code to try to properly free all resources, and I'll watch what happens with this build. {{pure virtual method called}} {{ terminate called without an active exception}} > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11301) [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation
[ https://issues.apache.org/jira/browse/ARROW-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-11301. - Resolution: Fixed Issue resolved by pull request 9244 [https://github.com/apache/arrow/pull/9244] > [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet > implementation > -- > > Key: ARROW-11301 > URL: https://issues.apache.org/jira/browse/ARROW-11301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We slightly misunderstood the Hadoop LZ4 format. A compressed buffer can > actually contain several "frames", each prefixed with (de)compressed size. > See > https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11306) [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
[ https://issues.apache.org/jira/browse/ARROW-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11306: --- Labels: pull-request-available (was: ) > [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency > - > > Key: ARROW-11306 > URL: https://issues.apache.org/jira/browse/ARROW-11306 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11306) [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
Kouhei Sutou created ARROW-11306: Summary: [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency Key: ARROW-11306 URL: https://issues.apache.org/jira/browse/ARROW-11306 Project: Apache Arrow Issue Type: Bug Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267509#comment-17267509 ] Michael Mior commented on ARROW-11135: -- My mistake. I missed that the JAR was macOS only. The tests pass and all is fine on openJDK 11. However, on openJDK 8 and 15, I get the following error after the tests complete which still causes the build to fail. However, I assume this may be because I did not correctly close some Gandiva resources. I have edited the code to try to properly free all resources, and I'll watch what happens with this build. {{pure virtual method called}} {{ terminate called without an active exception}} > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11223) [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy
[ https://issues.apache.org/jira/browse/ARROW-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-11223: Assignee: Weichen Xu > [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy > > > Key: ARROW-11223 > URL: https://issues.apache.org/jira/browse/ARROW-11223 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > We may get error java.lang.IndexOutOfBoundsException: index: 15880, length: > 4 (expected: range(0, 15880)). > I test on arrow 2.0.0 > Reproduce code in scala: > {code} > import org.apache.arrow.vector.VarCharVector > import org.apache.arrow.memory.RootAllocator > val rootAllocator = new RootAllocator(Long.MaxValue) > val v1 = new VarCharVector("var1", rootAllocator) > v1.allocateNew() > val valueCount = 3970 // use any number >= 3970 will get similar error > for (idx <- 0 until valueCount) { > v1.setNull(idx) > } > v1.getBufferSizeFor(valueCount) # failed, get error > java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: > range(0, 15880)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11223) [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy
[ https://issues.apache.org/jira/browse/ARROW-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-11223: - Description: We may get error java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: range(0, 15880)). I test on arrow 2.0.0 Reproduce code in scala: {code} import org.apache.arrow.vector.VarCharVector import org.apache.arrow.memory.RootAllocator val rootAllocator = new RootAllocator(Long.MaxValue) val v1 = new VarCharVector("var1", rootAllocator) v1.allocateNew() val valueCount = 3970 // use any number >= 3970 will get similar error for (idx <- 0 until valueCount) { v1.setNull(idx) } v1.getBufferSizeFor(valueCount) # failed, get error java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: range(0, 15880)) {code} was: I test on arrow 2.0.0 Reproduce code in scala: {code} import org.apache.arrow.vector.VarCharVector import org.apache.arrow.memory.RootAllocator val rootAllocator = new RootAllocator(Long.MaxValue) val v1 = new VarCharVector("var1", rootAllocator) v1.allocateNew() val valueCount = 3970 // use any number >= 3970 will get similar error for (idx <- 0 until valueCount) { v1.setNull(idx) } v1.getBufferSizeFor(valueCount) # failed, get error java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: range(0, 15880)) {code} > [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy > > > Key: ARROW-11223 > URL: https://issues.apache.org/jira/browse/ARROW-11223 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Weichen Xu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > We may get error java.lang.IndexOutOfBoundsException: index: 15880, length: > 4 (expected: range(0, 15880)). > I test on arrow 2.0.0 > Reproduce code in scala: > {code} > import org.apache.arrow.vector.VarCharVector > import org.apache.arrow.memory.RootAllocator > val rootAllocator = new RootAllocator(Long.MaxValue) > val v1 = new VarCharVector("var1", rootAllocator) > v1.allocateNew() > val valueCount = 3970 // use any number >= 3970 will get similar error > for (idx <- 0 until valueCount) { > v1.setNull(idx) > } > v1.getBufferSizeFor(valueCount) # failed, get error > java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: > range(0, 15880)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11223) [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy
[ https://issues.apache.org/jira/browse/ARROW-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-11223: - Summary: [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy (was: BaseVariableWidthVector setNull and getBufferSizeFor is buggy, may get error java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: range(0, 15880))) > [Java] BaseVariableWidthVector setNull and getBufferSizeFor is buggy > > > Key: ARROW-11223 > URL: https://issues.apache.org/jira/browse/ARROW-11223 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Weichen Xu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > I test on arrow 2.0.0 > Reproduce code in scala: > {code} > import org.apache.arrow.vector.VarCharVector > import org.apache.arrow.memory.RootAllocator > val rootAllocator = new RootAllocator(Long.MaxValue) > val v1 = new VarCharVector("var1", rootAllocator) > v1.allocateNew() > val valueCount = 3970 // use any number >= 3970 will get similar error > for (idx <- 0 until valueCount) { > v1.setNull(idx) > } > v1.getBufferSizeFor(valueCount) # failed, get error > java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: > range(0, 15880)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267494#comment-17267494 ] al-hadi boublenza commented on ARROW-10344: --- Facing the same issue and wondering how to know if you're dealing with a Feather V1 or Feather V2 file? (Using pyarrow) > [Python] Get all columns names (or schema) from Feather file, before loading > whole Feather file > > > Key: ARROW-10344 > URL: https://issues.apache.org/jira/browse/ARROW-10344 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Affects Versions: 1.0.1 >Reporter: Gert Hulselmans >Priority: Major > > Is there a way to get all column names (or schema) from a Feather file before > loading the full Feather file? > My Feather files are big (like 100GB) and the names of the columns are > different per analysis and can't be hard coded. > {code:python} > import pyarrow.feather as feather > # Code here to check which columns are in the feather file. > ... > my_columns = ... > # Result is pandas.DataFrame > read_df = feather.read_feather('/path/to/file', columns=my_columns) > # Result is pyarrow.Table > read_arrow = feather.read_table('/path/to/file', columns=my_columns) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11183) [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
[ https://issues.apache.org/jira/browse/ARROW-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267491#comment-17267491 ] Neville Dipale commented on ARROW-11183: Hey [~aldanor], I had look at the commit. The nanosecond type is part of the 2.6 format, which we don't fully support yet. There's 3 tasks, # Wire up the 2.6 changes in the parquet-format, it's clear that we're missing an enum (there might be more that's missing) # Add a reader for ts-nano # Add a writer for ts-nano, that uses the old int96 writer if the legacy support is requested, or uses the new ts-nano for non-legacy types Would you like to contribute some of the above? I can help out with 1 as a start. Thanks > [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing > - > > Key: ARROW-11183 > URL: https://issues.apache.org/jira/browse/ARROW-11183 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Ivan Smirnov >Priority: Major > > There's UnitTime::NANOS in parquet-format, but no nanosecond timestamp > support (seemingly) in schema's LogicalType. What is needed to add support > for nanosecond timestamps in Rust Parquet? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11183) [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
[ https://issues.apache.org/jira/browse/ARROW-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267452#comment-17267452 ] Ivan Smirnov commented on ARROW-11183: -- [~nevi_me] See this commit: [https://github.com/apache/parquet-format/commit/b879065ac1bee3fe1d770eb3c4b60ab4267044d7] (PARQUET-1387) > [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing > - > > Key: ARROW-11183 > URL: https://issues.apache.org/jira/browse/ARROW-11183 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Ivan Smirnov >Priority: Major > > There's UnitTime::NANOS in parquet-format, but no nanosecond timestamp > support (seemingly) in schema's LogicalType. What is needed to add support > for nanosecond timestamps in Rust Parquet? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11183) [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
[ https://issues.apache.org/jira/browse/ARROW-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267449#comment-17267449 ] Ivan Smirnov commented on ARROW-11183: -- [~csun] Here's a Python example that works: {code:java} import pandas as pd import pyarrow.parquet as pq df = pd.DataFrame(dict(x=[pd.Timestamp.now() for _ in range(10)])) table = pa.table(df) pq.write_table(table, 'timestamps.parquet', version='2.0') assert (pq.read_table('timestamps.parquet').to_pandas() == df).all().all() {code} What is the Rust equivalent then? Note: the table's schema shows up as {code:java} pyarrow.Table x: timestamp[ns] {code} > [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing > - > > Key: ARROW-11183 > URL: https://issues.apache.org/jira/browse/ARROW-11183 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Ivan Smirnov >Priority: Major > > There's UnitTime::NANOS in parquet-format, but no nanosecond timestamp > support (seemingly) in schema's LogicalType. What is needed to add support > for nanosecond timestamps in Rust Parquet? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11305) [Rust]: parquet-rowcount binary tries to open itself as a parquet file
[ https://issues.apache.org/jira/browse/ARROW-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11305: --- Labels: pull-request-available (was: ) > [Rust]: parquet-rowcount binary tries to open itself as a parquet file > -- > > Key: ARROW-11305 > URL: https://issues.apache.org/jira/browse/ARROW-11305 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Introduced accidentally during clippy warning cleanups in > https://github.com/apache/arrow/pull/8687/files#diff-f3f978052bd519af87898fa196715ddb445c327045c09ed07be600ca4e1703b6R60 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11305) [Rust]: parquet-rowcount binary tries to open itself as a parquet file
Jörn Horstmann created ARROW-11305: -- Summary: [Rust]: parquet-rowcount binary tries to open itself as a parquet file Key: ARROW-11305 URL: https://issues.apache.org/jira/browse/ARROW-11305 Project: Apache Arrow Issue Type: Bug Reporter: Jörn Horstmann Assignee: Jörn Horstmann Introduced accidentally during clippy warning cleanups in https://github.com/apache/arrow/pull/8687/files#diff-f3f978052bd519af87898fa196715ddb445c327045c09ed07be600ca4e1703b6R60 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11305) [Rust]: parquet-rowcount binary tries to open itself as a parquet file
[ https://issues.apache.org/jira/browse/ARROW-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörn Horstmann updated ARROW-11305: --- Component/s: Rust > [Rust]: parquet-rowcount binary tries to open itself as a parquet file > -- > > Key: ARROW-11305 > URL: https://issues.apache.org/jira/browse/ARROW-11305 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Major > > Introduced accidentally during clippy warning cleanups in > https://github.com/apache/arrow/pull/8687/files#diff-f3f978052bd519af87898fa196715ddb445c327045c09ed07be600ca4e1703b6R60 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11302) [Release][Python] Remove verification of python 3.5 wheel on macOS
[ https://issues.apache.org/jira/browse/ARROW-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-11302: Fix Version/s: (was: 3.0.0) 4.0.0 > [Release][Python] Remove verification of python 3.5 wheel on macOS > -- > > Key: ARROW-11302 > URL: https://issues.apache.org/jira/browse/ARROW-11302 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11302) [Release][Python] Remove verification of python 3.5 wheel on macOS
[ https://issues.apache.org/jira/browse/ARROW-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-11302. - Fix Version/s: (was: 4.0.0) 3.0.0 Resolution: Fixed Issue resolved by pull request 9246 [https://github.com/apache/arrow/pull/9246] > [Release][Python] Remove verification of python 3.5 wheel on macOS > -- > > Key: ARROW-11302 > URL: https://issues.apache.org/jira/browse/ARROW-11302 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11304) Add casts from / to DecimalArray
[ https://issues.apache.org/jira/browse/ARROW-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11304: --- Labels: pull-request-available (was: ) > Add casts from / to DecimalArray > > > Key: ARROW-11304 > URL: https://issues.apache.org/jira/browse/ARROW-11304 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Florian Müller >Assignee: Florian Müller >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As discussed in [https://github.com/apache/arrow/pull/8880,] several compute > implementations will be required. This task deals with cast. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11304) Add casts from / to DecimalArray
Florian Müller created ARROW-11304: -- Summary: Add casts from / to DecimalArray Key: ARROW-11304 URL: https://issues.apache.org/jira/browse/ARROW-11304 Project: Apache Arrow Issue Type: Sub-task Reporter: Florian Müller Assignee: Florian Müller As discussed in [https://github.com/apache/arrow/pull/8880,] several compute implementations will be required. This task deals with cast. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-11171) [Go] Build fails on s390x with noasm tag
[ https://issues.apache.org/jira/browse/ARROW-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Albrecht closed ARROW-11171. - verified `make test-noasm` on master on s390x thx [~kou]! > [Go] Build fails on s390x with noasm tag > > > Key: ARROW-11171 > URL: https://issues.apache.org/jira/browse/ARROW-11171 > Project: Apache Arrow > Issue Type: Bug > Components: Go > Environment: linux on s390x with -tags='noasm' >Reporter: Jonathan Albrecht >Assignee: Jonathan Albrecht >Priority: Minor > Labels: pull-request-available, s390x > Fix For: 3.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Initial support for s390x was added in > [aca707086160afd92da62aa2f9537a284528e48a|https://github.com/apache/arrow/commit/aca707086160afd92da62aa2f9537a284528e48a] > but if building with -tags='noasm' it fails with: > {code:go} > # github.com/apache/arrow/go/arrow/math > math/float64_s390x.go:21:6: initFloat64Go redeclared in this block > previous declaration at math/float64_noasm.go:23:6 > math/int64_s390x.go:21:6: initInt64Go redeclared in this block > previous declaration at math/int64_noasm.go:23:6 > math/math_s390x.go:24:6: initGo redeclared in this block > previous declaration at math/math_noasm.go:25:6 > math/uint64_s390x.go:21:6: initUint64Go redeclared in this block > previous declaration at math/uint64_noasm.go:23:6 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11303) [Release][C++] Enable mimalloc in the windows verification script
[ https://issues.apache.org/jira/browse/ARROW-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11303: --- Labels: pull-request-available (was: ) > [Release][C++] Enable mimalloc in the windows verification script > - > > Key: ARROW-11303 > URL: https://issues.apache.org/jira/browse/ARROW-11303 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11303) [Release][C++] Enable mimalloc in the windows verification script
Krisztian Szucs created ARROW-11303: --- Summary: [Release][C++] Enable mimalloc in the windows verification script Key: ARROW-11303 URL: https://issues.apache.org/jira/browse/ARROW-11303 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11301) [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation
[ https://issues.apache.org/jira/browse/ARROW-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11301: --- Priority: Critical (was: Blocker) > [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet > implementation > -- > > Key: ARROW-11301 > URL: https://issues.apache.org/jira/browse/ARROW-11301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We slightly misunderstood the Hadoop LZ4 format. A compressed buffer can > actually contain several "frames", each prefixed with (de)compressed size. > See > https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11301) [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation
[ https://issues.apache.org/jira/browse/ARROW-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11301: --- Labels: pull-request-available (was: ) > [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet > implementation > -- > > Key: ARROW-11301 > URL: https://issues.apache.org/jira/browse/ARROW-11301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Blocker > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We slightly misunderstood the Hadoop LZ4 format. A compressed buffer can > actually contain several "frames", each prefixed with (de)compressed size. > See > https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9177) [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
[ https://issues.apache.org/jira/browse/ARROW-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267338#comment-17267338 ] Antoine Pitrou commented on ARROW-9177: --- Great, thank you. I can confirm that the PR for ARROW-11301 reads the file properly. Depending on specifics of the release procedure, it may go into 3.0.0 or 4.0.0 (or perhaps a hypothetical 3.0.1). > [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet > compression compatibility > > > Key: ARROW-9177 > URL: https://issues.apache.org/jira/browse/ARROW-9177 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > per PARQUET-1878, it seems that there are still problems with our use of LZ4 > compression in the Parquet format. While we should fix this (the Parquet > specification and our implementation of it), we may need to disable use of > LZ4 compression until the appropriate compatibility testing can bed one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9177) [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
[ https://issues.apache.org/jira/browse/ARROW-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267334#comment-17267334 ] Steve M. Kim commented on ARROW-9177: - These are the decoded values as line-delimited JSON: https://github.com/chairmank/arrow-9177-example/blob/3a169e32701939de64a8ecafb155cb0b730cd8d8/561120a3094ee4513ba619b518c7a6093fe4e38398219ad172fb75373c3360b8_decoded.jsonl > [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet > compression compatibility > > > Key: ARROW-9177 > URL: https://issues.apache.org/jira/browse/ARROW-9177 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > per PARQUET-1878, it seems that there are still problems with our use of LZ4 > compression in the Parquet format. While we should fix this (the Parquet > specification and our implementation of it), we may need to disable use of > LZ4 compression until the appropriate compatibility testing can bed one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11302) [Release][Python] Remove verification of python 3.5 wheel on macOS
[ https://issues.apache.org/jira/browse/ARROW-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11302: --- Labels: pull-request-available (was: ) > [Release][Python] Remove verification of python 3.5 wheel on macOS > -- > > Key: ARROW-11302 > URL: https://issues.apache.org/jira/browse/ARROW-11302 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11302) [Release][Python] Remove verification of python 3.5 wheel on macOS
Krisztian Szucs created ARROW-11302: --- Summary: [Release][Python] Remove verification of python 3.5 wheel on macOS Key: ARROW-11302 URL: https://issues.apache.org/jira/browse/ARROW-11302 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11299) [Python] build warning in python
[ https://issues.apache.org/jira/browse/ARROW-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai updated ARROW-11299: - Component/s: Python C++ > [Python] build warning in python > > > Key: ARROW-11299 > URL: https://issues.apache.org/jira/browse/ARROW-11299 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Yibo Cai >Priority: Major > > Many warnings about compute kernel options when building Arrow python. > Removing below line suppresses the warnings. > https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 > I think the reason is virtual destructor makes the structure non C compatible > and cannot use offsetof macro safely. > As function options are straightforward, looks destructor is not necessary. > [~bkietz] > *Steps to reproduce* > build arrow cpp > {code:bash} > ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release > -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON > -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib > -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 > -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. > ~/arrow/cpp/release $ ninja install > {code} > build arrow python > {code:bash} > ~/arrow/python $ python --version > Python 3.6.9 > ~/arrow/python $ python setup.py build_ext --inplace > .. > [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] > In file included from > /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, > from /usr/include/signal.h:303, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, > from > /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, > from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function > ‘int __Pyx_modinit_type_init_code()’: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__FilterOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__MinMaxOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] > _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CountOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26210:154: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__VarianceOptions’ is undefined > [-Winvalid-offsetof] > pe_7pyarrow_8_compute__VarianceOptions.tp_weaklistoffset = offsetof(struct >
[jira] [Commented] (ARROW-9177) [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
[ https://issues.apache.org/jira/browse/ARROW-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267275#comment-17267275 ] Antoine Pitrou commented on ARROW-9177: --- [~chairmank] Can you post the decoded values in the file somewhere? At least the N first and last. > [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet > compression compatibility > > > Key: ARROW-9177 > URL: https://issues.apache.org/jira/browse/ARROW-9177 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > > per PARQUET-1878, it seems that there are still problems with our use of LZ4 > compression in the Parquet format. While we should fix this (the Parquet > specification and our implementation of it), we may need to disable use of > LZ4 compression until the appropriate compatibility testing can bed one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9177) [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
[ https://issues.apache.org/jira/browse/ARROW-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9177: -- Labels: pull-request-available (was: ) > [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet > compression compatibility > > > Key: ARROW-9177 > URL: https://issues.apache.org/jira/browse/ARROW-9177 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > > per PARQUET-1878, it seems that there are still problems with our use of LZ4 > compression in the Parquet format. While we should fix this (the Parquet > specification and our implementation of it), we may need to disable use of > LZ4 compression until the appropriate compatibility testing can bed one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11301) [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation
[ https://issues.apache.org/jira/browse/ARROW-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11301: --- Description: We slightly misunderstood the Hadoop LZ4 format. A compressed buffer can actually contain several "frames", each prefixed with (de)compressed size. See https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 was:See https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 > [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet > implementation > -- > > Key: ARROW-11301 > URL: https://issues.apache.org/jira/browse/ARROW-11301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Blocker > Fix For: 3.0.0 > > > We slightly misunderstood the Hadoop LZ4 format. A compressed buffer can > actually contain several "frames", each prefixed with (de)compressed size. > See > https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11301) [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation
Antoine Pitrou created ARROW-11301: -- Summary: [C++] Fix reading LZ4-compressed Parquet files produced by Java Parquet implementation Key: ARROW-11301 URL: https://issues.apache.org/jira/browse/ARROW-11301 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou Fix For: 3.0.0 See https://issues.apache.org/jira/browse/ARROW-9177?focusedCommentId=17267058=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17267058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9177) [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
[ https://issues.apache.org/jira/browse/ARROW-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267244#comment-17267244 ] Antoine Pitrou commented on ARROW-9177: --- Thank you [~chairmank], this helps a lot! Indeed it seems we misunderstood the undocumented Hadoop-LZ4-framing format :-( > [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet > compression compatibility > > > Key: ARROW-9177 > URL: https://issues.apache.org/jira/browse/ARROW-9177 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Critical > Fix For: 2.0.0 > > > per PARQUET-1878, it seems that there are still problems with our use of LZ4 > compression in the Parquet format. While we should fix this (the Parquet > specification and our implementation of it), we may need to disable use of > LZ4 compression until the appropriate compatibility testing can bed one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11257) [C++][Parquet] PyArrow Table contains different data after writing and reloading from Parquet
[ https://issues.apache.org/jira/browse/ARROW-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267198#comment-17267198 ] Kari Schoonbee commented on ARROW-11257: {color:#FF}{color}{color:#172b4d}Hey Joris. Do we know what caused the bug? I'm a bit worried as this has led to data corruption in production for us and it's possible that it has affected others without them being aware. And when is the next release containing the update scheduled for?{color} > [C++][Parquet] PyArrow Table contains different data after writing and > reloading from Parquet > - > > Key: ARROW-11257 > URL: https://issues.apache.org/jira/browse/ARROW-11257 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 2.0.0 >Reporter: Kari Schoonbee >Priority: Critical > Attachments: anonymised.jsonl, pyarrow_parquet_issue.ipynb > > > * I'm loading a JSONlines object into a table using > {code:java} > pa.json.readjson{code} > It contains one column that is a nested dictionary. > * I select a row by key and inspect its nested dictionary. > * I write the table to parquet > * I load the table again from the parquet file > * I check the same key and the nested dictionary is not the same. > > To reproduce: > > Find the attached JSONLines file and Jupyter Notebook. > The json file contains entries per customer with a generated `msisdn`, > `scoring_request_id` and `scorecard_result` object. Each `scorecard result > consists of a list of feature objects, all with the value the same as the > msidn` and a score. > The notebook reads the file and demonstrates the issue. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11300) [Rust][DataFusion] Improve hash aggregate performance with large number of groups in
Daniël Heres created ARROW-11300: Summary: [Rust][DataFusion] Improve hash aggregate performance with large number of groups in Key: ARROW-11300 URL: https://issues.apache.org/jira/browse/ARROW-11300 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Attachments: image-2021-01-18-13-00-36-685.png Currently, hash aggregates are performing well when having a small number of output groups, but the results on db-benchmark [https://github.com/h2oai/db-benchmark/pull/182] test on data with a high number of output groups. [https://github.com/apache/arrow/pull/9234] improved the situation a bit, but DataFusion is still much slower than even the slowest result when comparing to the published results. This seems mostly having to do with the way we use individual key/groups. For each new key, we _take_ the indices of the group, resulting in lots of small allocations and cache unfriendliness and other overhead if we have many keys with only a small (just 1-2) number of rows per group in a batch. Also the indices are converted from a Vec to an Array, making the situation worse (accounts for ~22% of the instructions on the master branch!), other profiling results seem to be from related allocations too. To make it efficient for tiny groups, we should probably change the hash aggregate algorithm to _take_ based on _all_ indices from the batch in one go, and "slice" into the resulting array for the individual accumulators. Here is some profiling info of the db-benchmark questions 1-5 against master: !image-2021-01-18-13-00-36-685.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11299) [Python] build warning in python
Yibo Cai created ARROW-11299: Summary: [Python] build warning in python Key: ARROW-11299 URL: https://issues.apache.org/jira/browse/ARROW-11299 Project: Apache Arrow Issue Type: Bug Reporter: Yibo Cai Many warnings about compute kernel options when building Arrow python. Removing below line suppresses the warnings. https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 I think the reason is virtual destructor makes the structure non C compatible and cannot use offsetof macro safely. As function options are straightforward, looks destructor is not necessary. [~bkietz] *Steps to reproduce* build arrow cpp {code:bash} ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. ~/arrow/cpp/release $ ninja install {code} build arrow python {code:bash} ~/arrow/python $ python --version Python 3.6.9 ~/arrow/python $ python setup.py build_ext --inplace .. [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] In file included from /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, from /usr/include/signal.h:303, from /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, from /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, from /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function ‘int __Pyx_modinit_type_init_code()’: /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined [-Winvalid-offsetof] type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__FilterOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined [-Winvalid-offsetof] type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__MinMaxOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__CountOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26210:154: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__VarianceOptions’ is undefined [-Winvalid-offsetof] pe_7pyarrow_8_compute__VarianceOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__VarianceOptions, __pyx_base.__pyx_base.__weakref__); ^ /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26258:156: warning: offsetof within non-standard-layout type ‘__pyx_obj_7pyarrow_8_compute__ArraySortOptions’ is undefined [-Winvalid-offsetof] e_7pyarrow_8_compute__ArraySortOptions.tp_weaklistoffset = offsetof(struct __pyx_obj_7pyarrow_8_compute__ArraySortOptions,