[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276861 ## File path: rust/datafusion/src/logicalplan.rs ## @@ -828,8 +828,8 @@ mod tests { .build()?; let expected = "Projection: #id\ -

[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276380 ## File path: rust/datafusion/src/utils.rs ## @@ -120,6 +143,7 @@ pub fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result {

[GitHub] [arrow] cyb70289 commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-23 Thread GitBox
cyb70289 commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-618782973 > Is the function `Armv8CrcHashParallel` used somewhere? Sorry if I overlook it. It's not used. Actually the whole file hash_util.h is not used per [this

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414241189 ## File path: rust/arrow/src/array/mod.rs ## @@ -85,6 +85,7 @@ mod array; mod builder; mod data; mod equal; +mod union; Review comment: Yea,

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414240892 ## File path: rust/arrow/src/array/equal.rs ## @@ -1046,6 +1062,30 @@ impl PartialEq for Value { } } +impl JsonEqual for UnionArray { +fn

[GitHub] [arrow] paddyhoran commented on pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7004: URL: https://github.com/apache/arrow/pull/7004#issuecomment-618761556 @andygrove just going to leave a general comment as it's all related. Overall, I felt this PR was getting big, I was trying to avoid getting into the IPC stuff in this

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-23 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414235616 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] zgramana commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in that I have just come across this conversation after implementing an alternative approach much more in line with other Arrow language

[GitHub] [arrow] paddyhoran commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414234200 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r414230710 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] paddyhoran edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #7010 This is an automated message from the Apache

[GitHub] [arrow] paddyhoran commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #8558 This is an automated message from the Apache Git

[GitHub] [arrow] sunchao edited a comment on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao edited a comment on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that.

[GitHub] [arrow] sunchao commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that. >

[GitHub] [arrow] github-actions[bot] commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618731413 https://issues.apache.org/jira/browse/ARROW-8575 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson opened a new pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7028: URL: https://github.com/apache/arrow/pull/7028 Instead of adding a PR comment of "This needs rebase" and wait for the author to get around to it, with this workflow you can just type "rebase" and GHA will do it for you. If it rebases

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-23 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r414184803 ## File path: r/src/expression.cpp ## @@ -21,99 +21,97 @@ // [[arrow::export]] std::shared_ptr dataset___expr__field_ref(std::string name) { - return

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-23 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r414184803 ## File path: r/src/expression.cpp ## @@ -21,99 +21,97 @@ // [[arrow::export]] std::shared_ptr dataset___expr__field_ref(std::string name) { - return

[GitHub] [arrow] nealrichardson commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-23 Thread GitBox
nealrichardson commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r414153717 ## File path: r/src/expression.cpp ## @@ -21,99 +21,97 @@ // [[arrow::export]] std::shared_ptr dataset___expr__field_ref(std::string name) { -

[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-23 Thread GitBox
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414120706 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] wesm commented on pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on pull request #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618678293 Sweet thanks, merging now This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] jorisvandenbossche commented on pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
jorisvandenbossche commented on pull request #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618674669 I further cleaned up the shim to remove if/else checks we no longer need, so should be ready now.

[GitHub] [arrow] github-actions[bot] commented on pull request #7027: ARROW-8572: [Python] expose UnionArray fields to Python

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7027: URL: https://github.com/apache/arrow/pull/7027#issuecomment-618659884 https://issues.apache.org/jira/browse/ARROW-8572 This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm opened a new pull request #7027: ARROW-8572: [Python] expose UnionArray fields to Python

2020-04-23 Thread GitBox
lidavidm opened a new pull request #7027: URL: https://github.com/apache/arrow/pull/7027 - Adds an explicit range check to `UnionArray.child` - Exposes `child`, `value_offsets`, and `type_codes` to Python. (In Python, they're wrapped in arrays for you to save you the trouble.)

[GitHub] [arrow] github-actions[bot] commented on pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7025: URL: https://github.com/apache/arrow/pull/7025#issuecomment-618651266 https://issues.apache.org/jira/browse/ARROW-2260 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7026: URL: https://github.com/apache/arrow/pull/7026#issuecomment-618651261 https://issues.apache.org/jira/browse/ARROW-7391 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz opened a new pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-23 Thread GitBox
bkietz opened a new pull request #7026: URL: https://github.com/apache/arrow/pull/7026 Serialization is implemented by converting Expressions to Arrays then writing a tiny IPC file. This is a ridiculous way to serialize Expressions but it should be acceptable since these classes are

[GitHub] [arrow] chrish42 opened a new pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-23 Thread GitBox
chrish42 opened a new pull request #7025: URL: https://github.com/apache/arrow/pull/7025 The following patch ads Gflags support to `plasma-store-server`, leaves out the backtraces on invalid command-line options, and generally tries to make the error messages more useful in terms of

[GitHub] [arrow] github-actions[bot] commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618624379 https://issues.apache.org/jira/browse/ARROW-8573 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove opened a new pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
andygrove opened a new pull request #7024: URL: https://github.com/apache/arrow/pull/7024 Now that Rust 1.43.0 is released, we should upgrade to 1.44 nightly. It looks like there were changes in rustfmt rules. This is an

[GitHub] [arrow] andygrove commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-23 Thread GitBox
andygrove commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-618605512 @nevi-me This is looking good, but the generated source file needs the ASF header. CI is failing with ` apache-rat license violation:

[GitHub] [arrow] markhildreth commented on pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on pull request #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618598341 @andygrove Yup, I was planning on doing that in a separate PR. If you'd like I can do that in this one.

[GitHub] [arrow] andygrove commented on pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
andygrove commented on pull request #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618597619 @markhildreth This looks great, but is now duplicating the code between arrow and datafusion. Can we remove the datafusion utils copy and have datafusion use the arrow utils

[GitHub] [arrow] mayuropensource commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618577963 @wesm sure thing, I'll keep that in mind in the future. This is an automated message from the Apache Git

[GitHub] [arrow] mayuropensource commented on a change in pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
mayuropensource commented on a change in pull request #7022: URL: https://github.com/apache/arrow/pull/7022#discussion_r414031141 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static

[GitHub] [arrow] houqp commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
houqp commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r414007919 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len() -> usize { self.fields.len() } + +pub fn

[GitHub] [arrow] xhochy commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
xhochy commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618538122  2 years ago released `pandas` version still sounds very generous. People who cannot upgrade from that to a newer version will probably have the same problems with `pyarrow`

[GitHub] [arrow] lidavidm commented on a change in pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #7022: URL: https://github.com/apache/arrow/pull/7022#discussion_r413990293 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618537844 Actually I'll hold off on merging this to confirm that @jorisvandenbossche has done everything that he planned This is an

[GitHub] [arrow] github-actions[bot] commented on issue #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7023: URL: https://github.com/apache/arrow/pull/7023#issuecomment-618536730 https://issues.apache.org/jira/browse/ARROW-8571 This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618536764 The Appveyor failure is unrelated This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413987657 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] pitrou commented on a change in pull request #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #7023: URL: https://github.com/apache/arrow/pull/7023#discussion_r413987573 ## File path: appveyor.yml ## @@ -61,7 +61,7 @@ environment: - JOB: "Build" GENERATOR: Ninja CONFIGURATION: "Release" -

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor JIRA issue for the following: * Trying

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor JIRA issue for the following: * Trying

[GitHub] [arrow] xhochy opened a new pull request #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
xhochy opened a new pull request #7023: URL: https://github.com/apache/arrow/pull/7023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618528458 Revision: 5bcfeab4c9bacc0b3a262a7522bfaf985025d3ec Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618527651 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413972358 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413970301 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413969298 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] BryanCutler commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
BryanCutler commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618517973 Sounds good to me. FWIW, Spark also has a minimum Pandas version set at 0.23.2. This is an automated message from

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413960907 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] bryantbiggs commented on issue #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-04-23 Thread GitBox
bryantbiggs commented on issue #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-618512883 thanks @andygrove ! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] github-actions[bot] commented on issue #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-618510578 https://issues.apache.org/jira/browse/ARROW-8562 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] wesm commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
wesm commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618508772 @mayuropensource it's not necessary to open a new PR when you want to redo your commits, you can just force push your branch

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng =

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413953833 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also tweaked the parquet test to workaround the new type inference changes.

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor issue for the following: * Trying to avoid the

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng =

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency. @nevi-me True

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency.

[GitHub] [arrow] mayuropensource opened a new pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7022: URL: https://github.com/apache/arrow/pull/7022 _(Recreating the PR from a clean repo, sorry about earlier PR which was not cleanly merged)._ **JIRA:** https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually

[GitHub] [arrow] github-actions[bot] commented on issue #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-618494095 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] lidavidm commented on a change in pull request #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #7020: URL: https://github.com/apache/arrow/pull/7020#discussion_r413923773 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static

[GitHub] [arrow] mayuropensource commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618486378 I messed up some commits. Will create a new one. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kszucs opened a new pull request #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
kszucs opened a new pull request #7021: URL: https://github.com/apache/arrow/pull/7021 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413908372 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] github-actions[bot] commented on issue #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618472360 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413905959 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413903768 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618464806 The CI failure looks unrelated, will merge. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mayuropensource opened a new pull request #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7020: URL: https://github.com/apache/arrow/pull/7020 JIRA: https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually used until https://github.com/apache/arrow/pull/6744 (@lidavidm) is pushed, however, it doesn't need to

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618459883 I'm OK with this. The maintenance burden of supporting several years' worth of pandas releases seems like a lot to bear. If there are parties which are affected by this they should

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618456824 https://issues.apache.org/jira/browse/ARROW-8569 This is an automated message from the Apache Git Service.

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618454886 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] nealrichardson opened a new pull request #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7019: URL: https://github.com/apache/arrow/pull/7019 See https://github.com/apache/arrow/pull/6996#issuecomment-618053499 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] bkietz commented on a change in pull request #6879: ARROW-8377: [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-23 Thread GitBox
bkietz commented on a change in pull request #6879: URL: https://github.com/apache/arrow/pull/6879#discussion_r413877530 ## File path: ci/scripts/PKGBUILD ## @@ -50,6 +52,12 @@ source_dir="$ARROW_HOME" cpp_build_dir=build-${CARCH}-cpp +# This should be "release" for real

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413864236 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] andygrove commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
andygrove commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r413857080 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len() -> usize { self.fields.len() } + +pub

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413819271 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413811549 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -260,12 +260,28 @@ class FileReaderImpl : public FileReader { Status

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413812220 ## File path: python/pyarrow/_parquet.pxd ## @@ -334,7 +334,7 @@ cdef extern from "parquet/api/reader.h" namespace "parquet" nogil:

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618404946 I rebased and improved the tests slightly. Also opened some issues for some oddities. This is an automated message from

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413814578 ## File path: python/pyarrow/_parquet.pyx ## @@ -1083,6 +1084,50 @@ cdef class ParquetReader: def set_use_threads(self, bint use_threads):

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413810237 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r41381 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

  1   2   >