[GitHub] [arrow-rs] velvia commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations
velvia commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880418323 This is definitely a good feature. It seems to me that in most cases, that time/duration manipulations could be based on existing arithmetic kernels, that'd be one way to go about it. ie some combination of cast(date32 -> i32) -> math add kernel -> cast back for example. Or at least, leveraging existing math kernels would make a ton of sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new pull request #729: provide more details on required .parquet file extension
Jimexist opened a new pull request #729: URL: https://github.com/apache/arrow-datafusion/pull/729 # Which issue does this PR close? provide more details on required .parquet file extension Closes #. # Rationale for this change provide more details on required .parquet file extension # What changes are included in this PR? # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new pull request #728: implement FromStr for FileType
Jimexist opened a new pull request #728: URL: https://github.com/apache/arrow-datafusion/pull/728 # Which issue does this PR close? implement FromStr for FileType Closes #. # Rationale for this change implement FromStr for FileType so that it can accept lower case and thus more flexible # What changes are included in this PR? # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #521: Change `nullif` to support arbitrary arrays
codecov-commenter commented on pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-880382473 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#521](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (2cbe13c) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f873d77bc77847b95921374aa66ba1d38e9cebf8?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f873d77) will **increase** coverage by `0.09%`. > The diff coverage is `97.51%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/521/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #521 +/- ## == + Coverage 82.47% 82.57% +0.09% == Files 167 167 Lines 4614446399 +255 == + Hits3805938315 +256 + Misses 8085 8084 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [arrow/src/compute/kernels/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9ib29sZWFuLnJz) | `97.53% <97.51%> (+0.76%)` | :arrow_up: | | [arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=) | `98.92% <0.00%> (+1.07%)` | :arrow_up: | | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [f873d77...2cbe13c](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays
bjchambers commented on a change in pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670118830 ## File path: arrow/src/compute/kernels/boolean.rs ## @@ -458,101 +458,113 @@ pub fn is_not_null(input: ) -> Result { Ok(BooleanArray::from(data)) } -/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true. -/// Typically used to implement NULLIF. -// NOTE: For now this only supports Primitive Arrays. Although the code could be made generic, the issue -// is that currently the bitmap operations result in a final bitmap which is aligned to bit 0, and thus -// the left array's data needs to be sliced to a new offset, and for non-primitive arrays shifting the -// data might be too complicated. In the future, to avoid shifting left array's data, we could instead -// shift the final bitbuffer to the right, prepending with 0's instead. -pub fn nullif( -left: , -right: , -) -> Result> -where -T: ArrowNumericType, -{ +/// Copies original array, setting null bit to true if a secondary comparison +/// boolean array is set to true. Typically used to implement NULLIF. +pub fn nullif(left: , right: ) -> Result { if left.len() != right.len() { return Err(ArrowError::ComputeError( "Cannot perform comparison operation on arrays of different length" .to_string(), )); } let left_data = left.data(); -let right_data = right.data(); - -// If left has no bitmap, create a new one with all values set for nullity op later -// left=0 (null) right=null output bitmap=null -// left=0 right=1 output bitmap=null -// left=1 (set)right=null output bitmap=set (passthrough) -// left=1 right=1 & comp=trueoutput bitmap=null -// left=1 right=1 & comp=false output bitmap=set -// -// Thus: result = left null bitmap & (!right_values | !right_bitmap) -// OR left null bitmap & !(right_values & right_bitmap) + +// If right has no trues and no nulls, then it will pass the left array +// unmodified. +if right.null_count() == 0 +&& right +.values() +.count_set_bits_offset(right.offset(), right.len()) +== 0 +{ +return Ok(left.clone()); Review comment: Added. ## File path: arrow/src/compute/kernels/boolean.rs ## @@ -1148,12 +1222,215 @@ mod tests { let comp = comp.slice(2, 3); // Some(false), None, Some(true) let comp = comp.as_any().downcast_ref::().unwrap(); let res = nullif(, ).unwrap(); +let res = res.as_any().downcast_ref::().unwrap(); let expected = Int32Array::from(vec![ Some(15), // False => keep it Some(8), // None => keep it None, // true => None ]); -assert_eq!(, ) +assert_eq!(, res) +} + +#[test] +fn test_nullif_int_large_right_offset() { +let a: ArrayRef = Arc::new(Int32Array::from(vec![ +None, // 0 +Some(15), // 1 +Some(8), +Some(1), +Some(9), +])); +let a = a.slice(1, 3); // Some(15), Some(8), Some(1) + +let comp = BooleanArray::from(vec![ +Some(false), // 0 +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), // 8 +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), // 16 +Some(false), // 17 +Some(false), // 18 +None, +Some(true), +Some(false), +None, +]); +let comp = comp.slice(18, 3); // Some(false), None, Some(true) +let comp = comp.as_any().downcast_ref::().unwrap(); +let res = nullif(, ).unwrap(); +let res = res.as_any().downcast_ref::().unwrap(); + +let expected = Int32Array::from(vec![ +Some(15), // False => keep it +Some(8), // None => keep it +None, // true => None +]); +assert_eq!(, res) +} + +#[test] +fn test_nullif_boolean_offset() { +let a: ArrayRef = Arc::new(BooleanArray::from(vec![ +None, // 0 +Some(true), // 1 +Some(false), +Some(true), +Some(true), +])); +let a = a.slice(1, 3); // Some(true), Some(false), Some(true) + +let comp = BooleanArray::from(vec![ +Some(false), // 0 +Some(false), // 1 +Some(false), // 2 +None, +Some(true), +Some(false),
[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays
bjchambers commented on a change in pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670114367 ## File path: arrow/src/compute/kernels/boolean.rs ## @@ -1148,12 +1222,215 @@ mod tests { let comp = comp.slice(2, 3); // Some(false), None, Some(true) let comp = comp.as_any().downcast_ref::().unwrap(); let res = nullif(, ).unwrap(); +let res = res.as_any().downcast_ref::().unwrap(); let expected = Int32Array::from(vec![ Some(15), // False => keep it Some(8), // None => keep it None, // true => None ]); -assert_eq!(, ) +assert_eq!(, res) +} + +#[test] +fn test_nullif_int_large_right_offset() { +let a: ArrayRef = Arc::new(Int32Array::from(vec![ +None, // 0 +Some(15), // 1 +Some(8), +Some(1), +Some(9), +])); +let a = a.slice(1, 3); // Some(15), Some(8), Some(1) + +let comp = BooleanArray::from(vec![ +Some(false), // 0 +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), // 8 +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), +Some(false), // 16 +Some(false), // 17 +Some(false), // 18 +None, +Some(true), +Some(false), +None, +]); +let comp = comp.slice(18, 3); // Some(false), None, Some(true) +let comp = comp.as_any().downcast_ref::().unwrap(); +let res = nullif(, ).unwrap(); +let res = res.as_any().downcast_ref::().unwrap(); + +let expected = Int32Array::from(vec![ +Some(15), // False => keep it +Some(8), // None => keep it +None, // true => None +]); +assert_eq!(, res) +} + +#[test] +fn test_nullif_boolean_offset() { +let a: ArrayRef = Arc::new(BooleanArray::from(vec![ +None, // 0 +Some(true), // 1 +Some(false), +Some(true), +Some(true), +])); +let a = a.slice(1, 3); // Some(true), Some(false), Some(true) + +let comp = BooleanArray::from(vec![ +Some(false), // 0 +Some(false), // 1 +Some(false), // 2 +None, +Some(true), +Some(false), +None, +]); +let comp = comp.slice(2, 3); // Some(false), None, Some(true) +let comp = comp.as_any().downcast_ref::().unwrap(); +let res = nullif(, ).unwrap(); +let res = res.as_any().downcast_ref::().unwrap(); + +let expected = BooleanArray::from(vec![ +Some(true), // False => keep it +Some(false), // None => keep it +None,// true => None +]); +assert_eq!(, res) +} + +#[test] +fn test_nullif_passthrough_all_nonnull() { +// DO NOT SUBMIT: Write this test +} + +#[test] +fn test_nullif_passthrough_all_null_or_true() { +// DO NOT SUBMIT: Write this test +} + +struct Foo { +a: Option, +b: Option, +/// Whether the entry should be valid. +is_valid: bool, +} + +impl Foo { +fn new_valid(a: i32, b: bool) -> Foo { +Self { +a: Some(a), +b: Some(b), +is_valid: true, +} +} + +fn new_null() -> Foo { +Self { +a: None, +b: None, +is_valid: false, +} +} +} + +/// Struct Array equality is a bit weird -- we need to have the *child values* +/// correct even if the enclosing struct indicates it is null. But we +/// also need the top level is_valid bits to be correct. +fn create_foo_struct(values: Vec) -> StructArray { +let mut struct_array = StructBuilder::new( +vec![ +Field::new("a", DataType::Int32, true), +Field::new("b", DataType::Boolean, true), +], +vec![ +Box::new(Int32Builder::new(values.len())), +Box::new(BooleanBuilder::new(values.len())), +], +); + +for value in values { +struct_array +.field_builder::(0) +.unwrap() +.append_option(value.a) +.unwrap(); +struct_array +.field_builder::(1) +.unwrap() +.append_option(value.b) +.unwrap(); +
[GitHub] [arrow-rs] bjchambers opened a new pull request #555: failing test
bjchambers opened a new pull request #555: URL: https://github.com/apache/arrow-rs/pull/555 # Which issue does this PR close? Closes #514. # Rationale for this change This was caused by a problem in struct slices and equality. It was fixed by a separate PR, but it seems beneficial to get an explicit test of the expected behavior in as well to prevent regression. # What changes are included in this PR? A test that was previously failing for equality on slices of a struct array. # Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] bjchambers commented on issue #514: Struct equality on slices has false negatives
bjchambers commented on issue #514: URL: https://github.com/apache/arrow-rs/issues/514#issuecomment-880371136 I think that #389 seems to have fixed this. I'll make a PR with the corresponding test case to prevent regression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-experimental-rs-parquet2] jorgecarleitao commented on pull request #1: Adds parquet2
jorgecarleitao commented on pull request #1: URL: https://github.com/apache/arrow-experimental-rs-parquet2/pull/1#issuecomment-880369806 @jhorstmann , did you submitted a CLA? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles
domoritz commented on a change in pull request #10698: URL: https://github.com/apache/arrow/pull/10698#discussion_r670105369 ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPackageJson(target, format)(orig), bin: orig.bin, name: npmPkgName, -main: `${mainExport}.node`, -browser: `${mainExport}.dom`, -module: `${mainExport}.dom.mjs`, +type: 'commonjs', +main: `${mainExport}.node.js`, +module: `${mainExport}.dom.js`, Review comment: ```suggestion module: `${mainExport}.node.mjs`, ``` To be consistent with https://github.com/apache/arrow/pull/10698/files#diff-9ad0361c0b3692cb897f0641d786ade504a91c8c1d6a36c3abd3cf33224946a0R96 we should use node. Also, don't we need the .mjs file here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles
domoritz commented on a change in pull request #10698: URL: https://github.com/apache/arrow/pull/10698#discussion_r670105202 ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPackageJson(target, format)(orig), bin: orig.bin, name: npmPkgName, -main: `${mainExport}.node`, -browser: `${mainExport}.dom`, -module: `${mainExport}.dom.mjs`, +type: 'commonjs', +main: `${mainExport}.node.js`, Review comment: ```suggestion main: `${mainExport}.node`, ``` ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPackageJson(target, format)(orig), bin: orig.bin, name: npmPkgName, -main: `${mainExport}.node`, -browser: `${mainExport}.dom`, -module: `${mainExport}.dom.mjs`, +type: 'commonjs', +main: `${mainExport}.node.js`, +module: `${mainExport}.dom.js`, Review comment: ```suggestion module: `${mainExport}.node.js`, ``` To be consistent with https://github.com/apache/arrow/pull/10698/files#diff-9ad0361c0b3692cb897f0641d786ade504a91c8c1d6a36c3abd3cf33224946a0R96. ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPackageJson(target, format)(orig), bin: orig.bin, name: npmPkgName, -main: `${mainExport}.node`, -browser: `${mainExport}.dom`, -module: `${mainExport}.dom.mjs`, +type: 'commonjs', +main: `${mainExport}.node.js`, +module: `${mainExport}.dom.js`, +browser: `${mainExport}.dom.js`, Review comment: ```suggestion browser: `${mainExport}.dom`, ``` This way bundlers can pick up mjs files more easily, no? ## File path: js/gulp/typescript-task.js ## @@ -61,8 +61,8 @@ function compileTypescript(out, tsconfigPath, tsconfigOverrides) { return Observable.forkJoin(writeSources, writeDTypes, writeJS); } -function cjsMapFile(mapFilePath) { return mapFilePath; } Review comment: Don't we need to undo these changes so we have the correct mapping files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages
cyb70289 commented on a change in pull request #10663: URL: https://github.com/apache/arrow/pull/10663#discussion_r670100132 ## File path: cpp/src/arrow/flight/test_util.cc ## @@ -616,6 +640,22 @@ Status ExampleLargeBatches(BatchVector* out) { return Status::OK(); } +arrow::Result> VeryLargeBatch() { + // In CI, some platforms don't let us allocate one very large + // buffer, so allocate a smaller buffer and repeat it a few times + constexpr int64_t nbytes = (1ul << 27ul) + 8ul; + constexpr int64_t nrows = nbytes / 8; + constexpr int64_t ncols = 16; + ARROW_ASSIGN_OR_RAISE(auto values, AllocateBuffer(nbytes)); + std::memset(values->mutable_data(), 0x00, values->capacity()); + std::vector> buffers = {nullptr, std::move(values)}; + auto array = std::make_shared(int64(), nrows, buffers, + /*null_count=*/0); + std::vector> arrays(ncols, array); + std::vector> fields(ncols, field("a", int64())); + return RecordBatch::Make(schema(std::move(fields)), nrows, std::move(arrays)); Review comment: So the ipc payload will be 16 columns in size, though all columns are from same buffer? ## File path: cpp/src/arrow/flight/types.cc ## @@ -21,6 +21,7 @@ #include #include +#include "arrow/buffer.h" Review comment: Is this header used? ## File path: cpp/src/arrow/flight/serialization_internal.cc ## @@ -164,26 +164,18 @@ static const uint8_t kPaddingBytes[8] = {0, 0, 0, 0, 0, 0, 0, 0}; // Update the sizes of our Protobuf fields based on the given IPC payload. grpc::Status IpcMessageHeaderSize(const arrow::ipc::IpcPayload& ipc_msg, bool has_body, - size_t* body_size, size_t* header_size, - int32_t* metadata_size) { - DCHECK_LT(ipc_msg.metadata->size(), kInt32Max); + size_t* header_size, int32_t* metadata_size) { + DCHECK_LE(ipc_msg.metadata->size(), kInt32Max); *metadata_size = static_cast(ipc_msg.metadata->size()); // 1 byte for metadata tag *header_size += 1 + WireFormatLite::LengthDelimitedSize(*metadata_size); - for (const auto& buffer : ipc_msg.body_buffers) { -// Buffer may be null when the row length is zero, or when all -// entries are invalid. -if (!buffer) continue; - -*body_size += static_cast(BitUtil::RoundUpToMultipleOf8(buffer->size())); - } - // 2 bytes for body tag if (has_body) { // We write the body tag in the header but not the actual body data -*header_size += 2 + WireFormatLite::LengthDelimitedSize(*body_size) - *body_size; +*header_size += 2 + WireFormatLite::LengthDelimitedSize(ipc_msg.body_length) - Review comment: From old code, looks `ipc_msg.body_length` may be 0. It's not possible now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz closed pull request #10695: ARROW-13299: [JS] Upgrade ix and rxjs
domoritz closed pull request #10695: URL: https://github.com/apache/arrow/pull/10695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
NinaPeng commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880357237 > > @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? > > @NinaPeng The change looks reasonable to me. I want to merge it in a few days, if there are no more comments. @liyafan82 got it. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
liyafan82 commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880355669 > @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? @NinaPeng The change looks reasonable to me. I want to merge it in a few days, if there are no more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc
cyb70289 commented on pull request #10679: URL: https://github.com/apache/arrow/pull/10679#issuecomment-880342643 > @wesm @cyb70289 @bkietz Is there anything else we could do for the low selectivity cases (1% select)? I don't have satisfying suggestions. A possible workaround I guess is to choose branch/non-branch code per selectivity. Smells like "benchmark oriented optimization"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
NinaPeng commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880340157 @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
kou commented on pull request #10614: URL: https://github.com/apache/arrow/pull/10614#issuecomment-880309249 Thanks! I've merged this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
kou closed pull request #10614: URL: https://github.com/apache/arrow/pull/10614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove opened a new pull request #727: UnresolvedShuffleExec should represent a single shuffle
andygrove opened a new pull request #727: URL: https://github.com/apache/arrow-datafusion/pull/727 # Which issue does this PR close? Closes #726 # Rationale for this change Small step towards getting shuffle working. # What changes are included in this PR? # Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove opened a new issue #726: Ballista: UnresolvedShuffleExec should only have a single stage_id
andygrove opened a new issue #726: URL: https://github.com/apache/arrow-datafusion/issues/726 **Describe the bug** UnresolvedShuffleExec should represent a single shuffle, not multiple. **To Reproduce** I discovered this while working on the PR to get shuffles working. **Expected behavior** N/A **Additional context** N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10604: ARROW-13190: [C++] [Gandiva] Change behavior of INITCAP function
anthonylouisbsb commented on a change in pull request #10604: URL: https://github.com/apache/arrow/pull/10604#discussion_r670020234 ## File path: cpp/src/gandiva/gdv_function_stubs.cc ## @@ -427,7 +427,8 @@ CAST_VARLEN_TYPE_FROM_NUMERIC(VARBINARY) #undef GDV_FN_CAST_VARCHAR_REAL GANDIVA_EXPORT Review comment: I removed it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669991533 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} + +FORCE_INLINE +const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in, + const char* pattern, gdv_int32 pattern_len, + gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + gdv_int64 millis = in % MILLIS_IN_SEC; + + if (pattern_len <= 0) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size"); +*out_len = 0; +return ""; + } + + static const int kTimeStampStringLen = pattern_len; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + const char* regex_format = + "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?" + "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?"; Review comment: Ok, I will take a look further within the already used libs on the project to process more patterns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669991187 ## File path: cpp/src/gandiva/precompiled/time_test.cc ## @@ -839,4 +839,86 @@ TEST(TestTime, TestToTimeNumeric) { EXPECT_EQ(expected_output, to_time_float64(3601.500)); } -} // namespace gandiva +TEST(TestTime, TestFromUnixtimeWithoutPattern) { Review comment: Ok, will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} + +FORCE_INLINE +const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in, + const char* pattern, gdv_int32 pattern_len, + gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + gdv_int64 millis = in % MILLIS_IN_SEC; + + if (pattern_len <= 0) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size"); +*out_len = 0; +return ""; + } + + static const int kTimeStampStringLen = pattern_len; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + const char* regex_format = + "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?" + "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?"; + bool match = std::regex_match(pattern, std::regex(regex_format)); + + if (!match) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern"); +*out_len = 0; +return ""; + } + + // length from pattern + int res = 0; + + switch (pattern_len) { +// +case 4: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year); + break; +// -MM +case 7: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" PRId64, year, + month); + break; +// -MM-dd +case 10: Review comment: No, I focused mainly on the first pattern. ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret =
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} + +FORCE_INLINE +const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in, + const char* pattern, gdv_int32 pattern_len, + gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + gdv_int64 millis = in % MILLIS_IN_SEC; + + if (pattern_len <= 0) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size"); +*out_len = 0; +return ""; + } + + static const int kTimeStampStringLen = pattern_len; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + const char* regex_format = + "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?" + "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?"; + bool match = std::regex_match(pattern, std::regex(regex_format)); + + if (!match) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern"); +*out_len = 0; +return ""; + } + + // length from pattern + int res = 0; + + switch (pattern_len) { +// +case 4: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year); + break; +// -MM +case 7: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" PRId64, year, + month); + break; +// -MM-dd +case 10: Review comment: Yes, the comment was misplaced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990486 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} + +FORCE_INLINE +const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in, + const char* pattern, gdv_int32 pattern_len, + gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + gdv_int64 millis = in % MILLIS_IN_SEC; + + if (pattern_len <= 0) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size"); +*out_len = 0; +return ""; + } + + static const int kTimeStampStringLen = pattern_len; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + const char* regex_format = + "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?" + "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?"; + bool match = std::regex_match(pattern, std::regex(regex_format)); Review comment: Ok, I will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] yordan-pavlov commented on issue #723: ABS() function in WHERE clause gives unexpected results
yordan-pavlov commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-88024 @mcassels thank you for reporting this - good find; in the implementation of the pruning predicate there is an assumption that for a predicate expression `f(v) OP c`, where v is any value in a column chunk and c is a constant / literal, then `f(v_min) <= f(v) <= f(v_max)` for any value v in that column chunk. What I did not take into account with the initial implementation of the parquet predicate push-down implementation is that this assumption is, sadly, not true for all functions supported by datafusion (obviously the function `ABS(c0 - 251.10794896957802)` being one example). As a shorter-term fix, in order to ensure correctness, the predicate push-down feature could be limited to simpler expressions (such as `col < literal`). Longer term, in order to support predicate push-down with more complex expressions, there must be a way to find out if the predicate expression satisfies the assumption detailed above (such as having a list of compatible functions as @lvheyang suggested above), without actually evaluating the expression. Sadly, I don't have time at the moment to work on this, due to things in my personal life. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669989632 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} Review comment: They are similar, but the from_unixtime does not has the seconds' fractions. Also, from_unixtime without the pattern the size is fixed. I think I could call the one that you have mentioned inside it to not duplicate the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type
github-actions[bot] commented on pull request #10721: URL: https://github.com/apache/arrow/pull/10721#issuecomment-880238653 https://issues.apache.org/jira/browse/ARROW-11673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nirandaperera opened a new pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type
nirandaperera opened a new pull request #10721: URL: https://github.com/apache/arrow/pull/10721 This PR adds casting from one dictionary type to anther dictionary type: ex: ``` dictionary(int8(), int16()) --> dictionary(int32(), int64()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] westonpace closed issue #10699: ModuleNotFoundError: No module named 'pyarrow._orc'
westonpace closed issue #10699: URL: https://github.com/apache/arrow/issues/10699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray
alamb commented on issue #554: URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880210994 > Thanks for this @alamb, I hadn't noticed that not linking PRs to existing issues affects the changelog. Yes @nevi-me I don't fully understand the intricacies of the changelog generator that @jorgecarleitao found but it seems to be mostly derived from issues (rather than PRs) 路 I figured I will keep honing the process in future releases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kevingurney commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
kevingurney commented on pull request #10614: URL: https://github.com/apache/arrow/pull/10614#issuecomment-880204002 @kou - we updated the description of the pull request to reflect the latest status. We also discovered a small issue during qualification. If you specified `MATLAB_BUILD_TESTS=ON`, then CMake would always use the Arrow C++ libraries built from source and ignore any custom `ARROW_HOME` or system-installed distributions of Arrow. We just pushed a small fix for this issue in b12b6fe1398dc4ee2ebb8fb00ce54ab4bfe4012a. Let us know if you have any questions. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function
lidavidm commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-880197573 Broke up the main function a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types
lidavidm commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880197464 Removed support for toplevel nulls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #719: Optimize min/max queries with table statistics
alamb commented on a change in pull request #719: URL: https://github.com/apache/arrow-datafusion/pull/719#discussion_r669933745 ## File path: datafusion/src/physical_plan/parquet.rs ## @@ -312,22 +431,47 @@ impl ParquetExec { if let Some(x) = _statistics { let part_nulls: Vec> = x.iter().map(|c| c.null_count).collect(); -has_null_counts = true; +has_statistics = true; + +let part_max_values: Vec> = +x.iter().map(|c| c.max_value.clone()).collect(); +let part_min_values: Vec> = +x.iter().map(|c| c.min_value.clone()).collect(); for in projection.iter() { null_counts[i] = part_nulls[i].unwrap_or(0); +if let Some(part_max_value) = part_max_values[i].clone() { +if let Some(max_value) = max_values[i] { +let _ = max_value.update(&[part_max_value]); Review comment: This doesn't seem right to me -- it seems like it ignores errors if the min or max values can't be updated (maybe due to overflow or data type mismatch). I think it might be safer if there is an error calling `update()` to reset the `min_values[i]` or `max_values[i]` back to None Otherwise if you have something like ``` Row Group 1: min=1 Row Group 2: min=MIN_INT <-- and this causes errors for some reason Row Group 3: min=3 ``` The code as written will return `min =1` for this column in the parquet file which may be incorrect. I think a more conservative approach here would be for this code to return `None` so we don't erroneously use these statistics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb merged pull request #687: #554: Lead/lag window function with offset and default value arguments
alamb merged pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #554: implement lead and lag with 2nd and 3rd argument
alamb closed issue #554: URL: https://github.com/apache/arrow-datafusion/issues/554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #687: #554: Lead/lag window function with offset and default value arguments
alamb commented on a change in pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687#discussion_r669925638 ## File path: datafusion/src/physical_plan/expressions/lead_lag.rs ## @@ -176,6 +240,28 @@ mod tests { .iter() .collect::(), )?; + +test_i32_result( +lag( +"lead".to_owned(), +DataType::Int32, +Arc::new(Column::new("c3", 0)), +None, +Some(ScalarValue::Int32(Some(100))), +), +vec![ +Some(100), Review comment: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb closed issue #554: ArrayData::slice() does not work for nested types such as StructArray
alamb closed issue #554: URL: https://github.com/apache/arrow-rs/issues/554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray
nevi-me commented on issue #554: URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880178885 Thanks for this @alamb, I hadn't noticed that not linking PRs to existing issues affects the changelog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #389: make slice work for nested types
alamb merged pull request #389: URL: https://github.com/apache/arrow-rs/pull/389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations
nevi-me commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880177931 this also relates to #45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb opened a new issue #554: ArrayData::slice() does not work for nested types such as StructArray
alamb opened a new issue #554: URL: https://github.com/apache/arrow-rs/issues/554 **Describe the bug** `ArrayData::slice()` does not work for nested types, because only the `ArrayData::buffers` are updated with the new offset and length. This has caused a lot of issues in the past. This blocks us from being able to implement `RecordBatch::slice()`, and has led to creating #381 to sidestep this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations
alamb commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880175659 @Jimexist no one that I know of is taking this on. If you were interested that would be awesome The reference implementation would probably still be postgres in my mind. There is a nice table https://www.postgresql.org/docs/current/functions-datetime.html#OPERATORS-DATETIME-TABLE that describes all the operators, input types, and output types. I think it helps because certain combinations, such as `date + date` don't really make logical sense so having the reference to guide what is needed would help a lot @velvia / @andygrove / @jorgecarleitao any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()
nealrichardson commented on a change in pull request #10624: URL: https://github.com/apache/arrow/pull/10624#discussion_r669913235 ## File path: r/R/dplyr-functions.R ## @@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) { Expression$create(trim_fun, string) } +nse_funcs$substr <- function(string, start, stop) { + assert_that( +length(start) == 1, +msg = "`start` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(stop) == 1, +msg = "`stop` must be length 1 - other lengths are not supported in Arrow" + ) + + if (start <= 0) { +start <- 1 + } + + if (stop < start) { +stop <- 0 Review comment: Can you explain these? It's not obvious why this is correct, and the base R code and docs don't discuss these cases. ## File path: r/R/dplyr-functions.R ## @@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) { Expression$create(trim_fun, string) } +nse_funcs$substr <- function(string, start, stop) { + assert_that( +length(start) == 1, +msg = "`start` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(stop) == 1, +msg = "`stop` must be length 1 - other lengths are not supported in Arrow" + ) + + if (start <= 0) { +start <- 1 + } + + if (stop < start) { +stop <- 0 + } + + Expression$create( +"utf8_slice_codeunits", +string, +options = list(start = start - 1L, stop = stop) + ) +} + +nse_funcs$substring <- function(text, first, last = 100L) { Review comment: You could just implement this one by calling nse_funcs$substr. The validation messages won't be exactly right because the argument names are different, but per the docs this function is only kept for compatibility with S, so I don't think it's a big deal. ## File path: r/R/dplyr-functions.R ## @@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) { Expression$create(trim_fun, string) } +nse_funcs$substr <- function(string, start, stop) { + assert_that( +length(start) == 1, +msg = "`start` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(stop) == 1, +msg = "`stop` must be length 1 - other lengths are not supported in Arrow" + ) + + if (start <= 0) { +start <- 1 + } + + if (stop < start) { +stop <- 0 + } + + Expression$create( +"utf8_slice_codeunits", +string, +options = list(start = start - 1L, stop = stop) + ) +} + +nse_funcs$substring <- function(text, first, last = 100L) { + assert_that( +length(first) == 1, +msg = "`first` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(last) == 1, +msg = "`last` must be length 1 - other lengths are not supported in Arrow" + ) + + if (first <= 0) { +first <- 1 + } + + if (last < first) { +last <- 0 + } + + Expression$create( +"utf8_slice_codeunits", +text, +options = list(start = first - 1L, stop = last) + ) +} + +nse_funcs$str_sub <- function(string, start = 1L, end = -1L) { + assert_that( +length(start) == 1, +msg = "`start` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(end) == 1, +msg = "`end` must be length 1 - other lengths are not supported in Arrow" + ) + + if (start == 0) start <- 1 + + if (end == -1) end <- .Machine$integer.max + + if (end < start) end <- 0 + + if (start > 0) start <- start - 1L Review comment: Why does this version subtract 1 up here but the others don't? Why only if start > 0? Is start <= 0 valid? ## File path: r/R/dplyr-functions.R ## @@ -391,7 +470,7 @@ nse_funcs$str_split <- function(string, pattern, n = Inf, simplify = FALSE) { string, options = list( pattern = - opts$pattern, +opts$pattern, Review comment: I know this is just linting but IDK why opts$pattern is on its own line instead of next to = above it. ## File path: r/R/dplyr-functions.R ## @@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) { Expression$create(trim_fun, string) } +nse_funcs$substr <- function(string, start, stop) { + assert_that( +length(start) == 1, +msg = "`start` must be length 1 - other lengths are not supported in Arrow" + ) + assert_that( +length(stop) == 1, +msg = "`stop` must be length 1 - other lengths are not supported in Arrow" + ) + + if (start <= 0) { +start <- 1 + } + + if (stop < start) { +stop <- 0 + } + + Expression$create( +"utf8_slice_codeunits", +string, +options = list(start = start - 1L, stop = stop) Review comment: Why do we only subtract 1 from start but not stop? ## File path: r/R/dplyr-functions.R
[GitHub] [arrow-datafusion] alamb commented on pull request #716: #699 fix return type conflict when calling builtin math fuctions
alamb commented on pull request #716: URL: https://github.com/apache/arrow-datafusion/pull/716#issuecomment-880173938 Any thoughts @Dandandan or @jorgecarleitao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #716: #699 fix return type conflict when calling builtin math fuctions
alamb commented on a change in pull request #716: URL: https://github.com/apache/arrow-datafusion/pull/716#discussion_r669916709 ## File path: datafusion/src/execution/context.rs ## @@ -2364,6 +2365,75 @@ mod tests { assert_batches_sorted_eq!(expected, ); } +#[tokio::test] +async fn case_builtin_math_expression() { +let mut ctx = ExecutionContext::new(); + +let type_values = vec![ +( +DataType::Int8, +Arc::new(Int8Array::from(vec![1])) as ArrayRef, +), +( +DataType::Int16, +Arc::new(Int16Array::from(vec![1])) as ArrayRef, +), +( +DataType::Int32, +Arc::new(Int32Array::from(vec![1])) as ArrayRef, +), +( +DataType::Int64, +Arc::new(Int64Array::from(vec![1])) as ArrayRef, +), +( +DataType::UInt8, +Arc::new(UInt8Array::from(vec![1])) as ArrayRef, +), +( +DataType::UInt16, +Arc::new(UInt16Array::from(vec![1])) as ArrayRef, +), +( +DataType::UInt32, +Arc::new(UInt32Array::from(vec![1])) as ArrayRef, +), +( +DataType::UInt64, +Arc::new(UInt64Array::from(vec![1])) as ArrayRef, +), +( +DataType::Float32, +Arc::new(Float32Array::from(vec![1.0_f32])) as ArrayRef, +), +( +DataType::Float64, +Arc::new(Float64Array::from(vec![1.0_f64])) as ArrayRef, +), +]; + +for (data_type, array) in type_values.iter() { +let schema = +Arc::new(Schema::new(vec![Field::new("v", data_type.clone(), false)])); +let batch = +RecordBatch::try_new(schema.clone(), vec![array.clone()]).unwrap(); +let provider = MemTable::try_new(schema, vec![vec![batch]]).unwrap(); +ctx.register_table("t", Arc::new(provider)).unwrap(); +let expected = vec![ +"+-+", +"| sqrt(v) |", +"+-+", +"| 1 |", +"+-+", +]; +let results = plan_and_collect( ctx, "SELECT sqrt(v) FROM t") Review comment: it is a good test. We can finagle the types later if needed (e.g. if we ever want to support `sqrt(numeric)` --> `numeric`. I think this PR makes DataFusion better than it is on master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on pull request #10636: ARROW-13153: [C++] `parquet_dataset` loses ordering of files in `_metadata`
bkietz commented on pull request #10636: URL: https://github.com/apache/arrow/pull/10636#issuecomment-880169894 @westonpace needs rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz closed pull request #10629: ARROW-13218: [Doc] Document/clarify conventions for timestamp storage
bkietz closed pull request #10629: URL: https://github.com/apache/arrow/pull/10629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz closed pull request #10628: ARROW-12364: [Python] [Dataset] Add metadata_collector option to ds.write_dataset()
bkietz closed pull request #10628: URL: https://github.com/apache/arrow/pull/10628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data
alamb commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-880147276 Thank you @MichaelBitard for taking the time to report it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz closed pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode
bkietz closed pull request #10720: URL: https://github.com/apache/arrow/pull/10720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types
lidavidm commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880137081 Should be fine. It would certainly trim down the inner loop a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types
bkietz commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880136525 @lidavidm what would you think about just raising an error for top level nulls? It doesn't seem like a useful case to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10608: ARROW-13136: [C++] Add coalesce function
bkietz commented on a change in pull request #10608: URL: https://github.com/apache/arrow/pull/10608#discussion_r669873043 ## File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc ## @@ -676,7 +677,339 @@ void AddPrimitiveIfElseKernels(const std::shared_ptr& scalar_fun } } -} // namespace +// Helper to copy or broadcast fixed-width values between buffers. +template +struct CopyFixedWidth {}; +template <> +struct CopyFixedWidth { + static void CopyScalar(const Scalar& scalar, const int64_t length, + uint8_t* raw_out_values, const int64_t out_offset) { +const bool value = UnboxScalar::Unbox(scalar); +BitUtil::SetBitsTo(raw_out_values, out_offset, length, value); + } + static void CopyArray(const DataType&, const uint8_t* in_values, +const int64_t in_offset, const int64_t length, +uint8_t* raw_out_values, const int64_t out_offset) { +arrow::internal::CopyBitmap(in_values, in_offset, length, raw_out_values, out_offset); + } +}; +template +struct CopyFixedWidth> { + using CType = typename TypeTraits::CType; + static void CopyScalar(const Scalar& scalar, const int64_t length, + uint8_t* raw_out_values, const int64_t out_offset) { +CType* out_values = reinterpret_cast(raw_out_values); +const CType value = UnboxScalar::Unbox(scalar); +std::fill(out_values + out_offset, out_values + out_offset + length, value); + } + static void CopyArray(const DataType&, const uint8_t* in_values, +const int64_t in_offset, const int64_t length, +uint8_t* raw_out_values, const int64_t out_offset) { +std::memcpy(raw_out_values + out_offset * sizeof(CType), +in_values + in_offset * sizeof(CType), length * sizeof(CType)); + } +}; +template +struct CopyFixedWidth> { + static void CopyScalar(const Scalar& values, const int64_t length, + uint8_t* raw_out_values, const int64_t out_offset) { +const int32_t width = +checked_cast(*values.type).byte_width(); +uint8_t* next = raw_out_values + (width * out_offset); +const auto& scalar = checked_cast(values); +// Scalar may have null value buffer +if (!scalar.value) return; +DCHECK_EQ(scalar.value->size(), width); +for (int i = 0; i < length; i++) { + std::memcpy(next, scalar.value->data(), width); + next += width; +} + } + static void CopyArray(const DataType& type, const uint8_t* in_values, +const int64_t in_offset, const int64_t length, +uint8_t* raw_out_values, const int64_t out_offset) { +const int32_t width = checked_cast(type).byte_width(); +uint8_t* next = raw_out_values + (width * out_offset); +std::memcpy(next, in_values + in_offset * width, length * width); + } +}; +template +struct CopyFixedWidth> { + using ScalarType = typename TypeTraits::ScalarType; + static void CopyScalar(const Scalar& values, const int64_t length, + uint8_t* raw_out_values, const int64_t out_offset) { +const int32_t width = +checked_cast(*values.type).byte_width(); +uint8_t* next = raw_out_values + (width * out_offset); +const auto& scalar = checked_cast(values); +const auto value = scalar.value.ToBytes(); +for (int i = 0; i < length; i++) { + std::memcpy(next, value.data(), width); + next += width; +} + } + static void CopyArray(const DataType& type, const uint8_t* in_values, +const int64_t in_offset, const int64_t length, +uint8_t* raw_out_values, const int64_t out_offset) { +const int32_t width = checked_cast(type).byte_width(); +uint8_t* next = raw_out_values + (width * out_offset); +std::memcpy(next, in_values + in_offset * width, length * width); + } +}; +// Copy fixed-width values from a scalar/array datum into an output values buffer +template +void CopyValues(const Datum& in_values, const int64_t in_offset, const int64_t length, +uint8_t* out_valid, uint8_t* out_values, const int64_t out_offset) { + if (in_values.is_scalar()) { +const auto& scalar = *in_values.scalar(); +if (out_valid) { + BitUtil::SetBitsTo(out_valid, out_offset, length, scalar.is_valid); +} +CopyFixedWidth::CopyScalar(scalar, length, out_values, out_offset); + } else { +const ArrayData& array = *in_values.array(); +if (out_valid) { + if (array.MayHaveNulls()) { +arrow::internal::CopyBitmap(array.buffers[0]->data(), array.offset + in_offset, +length, out_valid, out_offset); + } else { +BitUtil::SetBitsTo(out_valid, out_offset, length, true); + } +} +CopyFixedWidth::CopyArray(*array.type, array.buffers[1]->data(), +array.offset + in_offset, length, out_values, +
[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()
thisisnic commented on pull request #10624: URL: https://github.com/apache/arrow/pull/10624#issuecomment-880129596 @nealrichardson - have made some updates; please could you re-review this when you have a chance? Tomorrow I'm going to add in the tests for warnings/errors raised when incorrect parameter values are supplied but is there anything else that needs updating in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz closed pull request #10606: ARROW-13005: [C++] Add support for take implementation on dense union type
bkietz closed pull request #10606: URL: https://github.com/apache/arrow/pull/10606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()
thisisnic commented on pull request #10624: URL: https://github.com/apache/arrow/pull/10624#issuecomment-880113887 This PR now also contains some unrelated styling changes as I ran styler on the files I changed before pushing my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
lidavidm commented on a change in pull request #10693: URL: https://github.com/apache/arrow/pull/10693#discussion_r669830370 ## File path: docs/source/python/dataset.rst ## @@ -456,20 +456,163 @@ is materialized as columns when reading the data and can be used for filtering: dataset.to_table().to_pandas() dataset.to_table(filter=ds.field('year') == 2019).to_pandas() +Another benefit of manually scheduling the files is that the order of the files +controls the order of the data. When performing an ordered read (or a read to +a table) then the rows returned will match the order of the files given. This +only applies when the dataset is constructed with a list of files. There +are no order guarantees given when the files are instead discovered by scanning Review comment: Ah thanks. Sorry for the churn here @westonpace - we should keep stating that there's no guarantee then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
bkietz commented on a change in pull request #10693: URL: https://github.com/apache/arrow/pull/10693#discussion_r669825752 ## File path: docs/source/python/dataset.rst ## @@ -456,20 +456,163 @@ is materialized as columns when reading the data and can be used for filtering: dataset.to_table().to_pandas() dataset.to_table(filter=ds.field('year') == 2019).to_pandas() +Another benefit of manually scheduling the files is that the order of the files +controls the order of the data. When performing an ordered read (or a read to +a table) then the rows returned will match the order of the files given. This +only applies when the dataset is constructed with a list of files. There +are no order guarantees given when the files are instead discovered by scanning Review comment: We don't guarantee order for selectors because ARROW-8163 (asynchronous fragment discovery) might not guarantee order. Lexicographic sorting *could* be maintained for synchronous discovery from a selector, but in general we'd want to push a fragment into scan as soon as it's yielded by `FileSystem::GetFileInfoGenerator` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #723: ABS() function in WHERE clause gives unexpected results
alamb commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880076415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode
github-actions[bot] commented on pull request #10720: URL: https://github.com/apache/arrow/pull/10720#issuecomment-880075978 https://issues.apache.org/jira/browse/ARROW-13341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz opened a new pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode
bkietz opened a new pull request #10720: URL: https://github.com/apache/arrow/pull/10720 Multiple threads starting DoConsume would already have incremented `num_received_`, so if one were delayed another might erroneously begin to merge/finalize (leaving invalidated states) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] jgoday commented on pull request #687: #554: Lead/lag window function with offset and default value arguments
jgoday commented on pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687#issuecomment-880075811 > > @Jimexist do you think this PR is ready? Do you need help reviewing ? > > looks okay after rebasing. Hi @Jimexist, just made the rebase from master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results
lvheyang commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880059465 @alamb I think the root problem is some of the scalar functions ( such as abs/ sin/ cos/ pow ) are not monotonous. We cannot prune all the rows when `fun(min) < Value` or `fun(max) > Value` is false. Do you have any idea how should we deal with such a situation? A possible way is to maintain a map, to mark the scalar functions' monotonicity. If the function is not monotonous, we would stop building the pruning predicate. Do you think if it is acceptable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva
anthonylouisbsb commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669769114 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +*out_len = 0; +return ""; + } + + char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, *out_len)); + if (ret == nullptr) { +gdv_fn_context_set_error_msg(context, "Could not allocate memory for output string"); +*out_len = 0; +return ""; + } + + memcpy(ret, char_buffer, *out_len); + return ret; +} + +FORCE_INLINE +const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in, + const char* pattern, gdv_int32 pattern_len, + gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + gdv_int64 millis = in % MILLIS_IN_SEC; + + if (pattern_len <= 0) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size"); +*out_len = 0; +return ""; + } + + static const int kTimeStampStringLen = pattern_len; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + const char* regex_format = + "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?" + "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?"; + bool match = std::regex_match(pattern, std::regex(regex_format)); + + if (!match) { +gdv_fn_context_set_error_msg(context, "Invalid allowed pattern"); +*out_len = 0; +return ""; + } + + // length from pattern + int res = 0; + + switch (pattern_len) { +// +case 4: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year); + break; +// -MM +case 7: + res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" PRId64, year, + month); + break; +// -MM-dd +case 10: Review comment: What happens if the user defines a pattern like this: `dd-MM-`, the method will process it correctly? ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) { extractDay_daytimeinterval(in) * MILLIS_IN_DAY; } +FORCE_INLINE +const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, gdv_int32* out_len) { + gdv_int64 year = extractYear_timestamp(in); + gdv_int64 month = extractMonth_timestamp(in); + gdv_int64 day = extractDay_timestamp(in); + gdv_int64 hour = extractHour_timestamp(in); + gdv_int64 minute = extractMinute_timestamp(in); + gdv_int64 second = extractSecond_timestamp(in); + + static const int kTimeStampStringLen = 19; + const int char_buffer_length = kTimeStampStringLen + 1; // snprintf adds \0 + char char_buffer[char_buffer_length]; + + // -MM-dd hh:mm:ss + int res = snprintf(char_buffer, char_buffer_length, + "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 ":%02" PRId64 + ":%02" PRId64, + year, month, day, hour, minute, second); + if (res < 0) { +gdv_fn_context_set_error_msg(context, "Could not format the timestamp"); +*out_len = 0; +return ""; + } + + *out_len = kTimeStampStringLen; + + if (*out_len <= 0) { +if (*out_len < 0) { + gdv_fn_context_set_error_msg(context, "Length of output string cannot be negative"); +} +
[GitHub] [arrow-rs] Jimexist commented on pull request #552: use sort_unstable_by in primitive sorting
Jimexist commented on pull request #552: URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880037932 ``` sort 2^10 time: [110.68 us 111.64 us 112.55 us] change: [-14.710% -13.112% -11.406%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sort 2^12 time: [516.32 us 519.97 us 523.58 us] change: [-10.932% -9.5913% -8.2390%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sort nulls 2^10 time: [88.568 us 89.197 us 89.907 us] change: [-20.378% -18.966% -17.484%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild sort nulls 2^12 time: [372.34 us 375.56 us 378.70 us] change: [-25.293% -24.019% -22.641%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe bool sort 2^12 time: [184.53 us 185.82 us 187.17 us] change: [-60.022% -59.238% -58.451%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe bool sort nulls 2^12time: [210.24 us 213.40 us 216.47 us] change: [-54.063% -53.122% -52.165%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sort 2^12 limit 10 time: [61.881 us 62.244 us 62.614 us] change: [-8.6258% -6.5015% -4.3522%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe sort 2^12 limit 100 time: [67.314 us 67.729 us 68.205 us] change: [-5.5755% -3.5509% -1.4872%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe sort 2^12 limit 1000time: [178.07 us 179.37 us 180.82 us] change: [-2.1034% +0.0621% +2.3026%] (p = 0.96 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe sort 2^12 limit 2^12time: [459.51 us 462.14 us 464.90 us] change: [-25.238% -23.036% -21.081%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sort nulls 2^12 limit 10 time: [117.28 us 118.29 us 119.29 us] change: [+7.7135% +9.3966% +11.186%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe sort nulls 2^12 limit 100 time: [106.53 us 107.89 us 109.50 us] change: [-6.4134% -4.0486% -1.6589%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe sort nulls 2^12 limit 1000 time: [113.71 us 114.30 us 114.92 us] change: [-5.9266% -4.5141% -3.0736%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sort nulls 2^12 limit 2^12 time: [336.84 us 340.39 us 344.08 us] change: [-31.546% -30.224% -28.822%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #552: use sort_unstable_by in primitive sorting
codecov-commenter commented on pull request #552: URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880036383 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#552](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8418f1c) into [master](https://codecov.io/gh/apache/arrow-rs/commit/fc78af6324513cc3da9fea8c80658d85dfcd8263?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (fc78af6) will **not change** coverage. > The diff coverage is `90.90%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/552/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #552 +/- ## === Coverage 82.47% 82.47% === Files 167 167 Lines 4614246142 === Hits3805638056 Misses 8086 8086 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow-rs/pull/552/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9zb3J0LnJz) | `94.15% <90.90%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [fc78af6...8418f1c](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pachadotdev commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report
pachadotdev commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669741245 ## File path: dev/archery/archery/crossbow/cli.py ## @@ -233,6 +233,27 @@ def latest_prefix(obj, prefix, fetch): click.echo(latest.branch) +@crossbow.command() +@click.argument('job-name', required=True) +@click.pass_obj +def save_report_data(obj, job_name): +""" +Just print there the state of the job +""" +output = obj['output'] + +queue = obj['queue'] +print(dir(queue)) Review comment: not really -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations
Jimexist commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880028583 anyone taking this? also is there any reference implementation in other languages? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] lvheyang edited a comment on issue #723: ABS() function in WHERE clause gives unexpected results
lvheyang edited a comment on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602 I have reproduced this problem. I found the problem is in datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. In its comment : ``` /// A pruning predicate is one that has been rewritten in terms of /// the min and max values of column references and that evaluates /// to FALSE if the filter predicate would evaluate FALSE *for /// every row* whose values fell within the min / max ranges (aka /// could be pruned). ``` It emphasizes that if the pruning predicate evaluates to FALSE , then the filter should evaluates FALSE for **every row** In this case, it built the pruning predicate as `abs(c0_min - 251.10794896957802) < 1`, unluckily it's not right. c0_min is 107.0090813093981, this pruning predicate would evaluate False. But for the row with value 251.10794896957802, this predicate evaluate True. So it violates the rule in the comment, this causes the whole row group was omitted, so there is no result when compared with 1 or 111. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results
lvheyang commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602 I have reproduced this problem. I found the problem is in datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. In its comment : ``` /// A pruning predicate is one that has been rewritten in terms of /// the min and max values of column references and that evaluates /// to FALSE if the filter predicate would evaluate FALSE *for /// every row* whose values fell within the min / max ranges (aka /// could be pruned). ``` It emphasizes that if the pruning predicate evaluates to FALSE , then the filter should evaluates FALSE for **every row** In this case, it built the pruning predicate as `abs(c0_min - 251.10794896957802) < 1`, unluckily it's not right. c0_min is 107.0090813093981, this pruning predicate would evaluate False. But for the row with value 251.10794896957802, this predicate evaluate True. So it violates the rule in the comment, and this causes the whole row group was omitted, so there is no result when compared with 1 or 111. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10608: ARROW-13136: [C++] Add coalesce function
ursabot edited a comment on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122 Benchmark runs are scheduled for baseline = 9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/d391a5e50d69453b8c1648f74e28b5e1...0b2618a2ff3f4c8e879c59938408b00f/) [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/158e7a4062ba424aa32aab6410f2675f...354ae9f639ba47928a0c6e4c3781d22c/) [Finished :arrow_down:0.53% :arrow_up:0.05%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/4df834022c414dd39486a4bbca516589...f2036834887e4b4b8c6ef3cf6b65c8f6/) Supported benchmarks: ursa-i9-9960x: langs = Python, R ursa-thinkcentre-m75q: langs = C++, Java ec2-t3-xlarge-us-east-2: cloud = True -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new issue #553: use sort_unstable_by in primitive sorting
Jimexist opened a new issue #553: URL: https://github.com/apache/arrow-rs/issues/553 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) use sort_unstable_by in primitive sorting so that we can have: 1. better memory footprint 2. faster sort 3. consistent stableness behavior with and without `limit` **Describe the solution you'd like** A clear and concise description of what you want to happen. use sort_unstable_by in primitive sorting **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new pull request #552: use sort_unstable_by in primitive sorting
Jimexist opened a new pull request #552: URL: https://github.com/apache/arrow-rs/pull/552 # Which issue does this PR close? use [`sort_unstable_by`](https://doc.rust-lang.org/std/primitive.slice.html#method.sort_unstable_by) in primitive sorting Closes #. # Rationale for this change 1. less memory usage 2. likely faster # What changes are included in this PR? # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs edited a comment on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349 > nevermind... a spaces problem, which led to > > ``` > ___ ERROR collecting archery/crossbow/tests/test_reports.py > archery/crossbow/tests/test_reports.py:20: in > from archery.crossbow.core import yaml > archery/crossbow/__init__.py:19: in > from .reports import CommentReport, ConsoleReport, EmailReport # noqa > archery/crossbow/reports.py:124: in > class JsonReport(Report): > archery/crossbow/reports.py:175: in JsonReport > self.getJsonTasks() > E NameError: name 'self' is not defined > ``` `self` is avilable in the method's body, but it seems like that line is outside of the method. Could you try to indent that line to align with the rest of the method's body? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349 > nevermind... a spaces problem, which led to > > ``` > ___ ERROR collecting archery/crossbow/tests/test_reports.py > archery/crossbow/tests/test_reports.py:20: in > from archery.crossbow.core import yaml > archery/crossbow/__init__.py:19: in > from .reports import CommentReport, ConsoleReport, EmailReport # noqa > archery/crossbow/reports.py:124: in > class JsonReport(Report): > archery/crossbow/reports.py:175: in JsonReport > self.getJsonTasks() > E NameError: name 'self' is not defined > ``` `self` is avilable in the method's budy, but it seems like that line is outside of the method. Could you try to indent that line to align with the rest of the method's body? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880018883 > I got stuck with this > > ``` > archery/crossbow/tests/test_reports.py:20: in > from archery.crossbow.core import yaml > archery/crossbow/__init__.py:19: in > from .reports import CommentReport, ConsoleReport, EmailReport # noqa > E File "/home/pacha/github/arrow/dev/archery/archery/crossbow/reports.py", line 175 > E self.getJsonTasks() > E ^ > E IndentationError: unindent does not match any outer indentation level > ``` This is usually due to an extra or missing whitespace at the begining of the line. Python requires consistent indentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
rok commented on pull request #10476: URL: https://github.com/apache/arrow/pull/10476#issuecomment-880017358 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane closed pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
jonkeane closed pull request #10476: URL: https://github.com/apache/arrow/pull/10476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on issue #725: Global limit isn't really limiting parquet file reads and stops earlier
Dandandan commented on issue #725: URL: https://github.com/apache/arrow-datafusion/issues/725#issuecomment-880011030 I added some form of limit push down to parquet some time ago. Might be that it isn't applied to your dataset somehow? Or maybe getting the metadata / statistics itself might be slow? https://github.com/apache/arrow/pull/9672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10659: ARROW-12122: [Python] Cannot install via pip M1 mac
kszucs commented on pull request #10659: URL: https://github.com/apache/arrow/pull/10659#issuecomment-880010837 @wesm @xhochy could you please verify locally the produced wheels? - [pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl](https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-arm64/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl) - [pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl](https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-universal2/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl) There is an universal2 installer for Python 3.9: https://www.python.org/ftp/python/3.9.6/python-3.9.6-macos11.pkg, can be installed using: ```bash arrow/ci/scripts/install_python.sh macos 3.9 ``` Verify the `arm64` wheel: ```bash rm -rf arrow/python/dist export ARROW_FLIGHT=OFF wget -P arrow/python/dist https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-arm64/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl arch -arm64 arrow/ci/scripts/python_wheel_macos_test.sh ``` Verify the `universal2` wheel: ```bash rm -rf arrow/python/dist export ARROW_FLIGHT=OFF wget -P arrow/python/dist https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-universal2/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl arch -arm64 arrow/ci/scripts/python_wheel_macos_test.sh arch -x86_64 arrow/ci/scripts/python_wheel_macos_test.sh ``` (have not tested the snippets, the paths might be wrong, but the test script looks for wheels under `arrow/python/dist`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report
pachadotdev commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880009474 nevermind... a spaces problem, which led to ``` ___ ERROR collecting archery/crossbow/tests/test_reports.py archery/crossbow/tests/test_reports.py:20: in from archery.crossbow.core import yaml archery/crossbow/__init__.py:19: in from .reports import CommentReport, ConsoleReport, EmailReport # noqa archery/crossbow/reports.py:124: in class JsonReport(Report): archery/crossbow/reports.py:175: in JsonReport self.getJsonTasks() E NameError: name 'self' is not defined ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report
pachadotdev commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880008583 I got stuck with this ``` archery/crossbow/tests/test_reports.py:20: in from archery.crossbow.core import yaml archery/crossbow/__init__.py:19: in from .reports import CommentReport, ConsoleReport, EmailReport # noqa E File "/home/pacha/github/arrow/dev/archery/archery/crossbow/reports.py", line 175 E self.getJsonTasks() E ^ E IndentationError: unindent does not match any outer indentation level ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel
lidavidm commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-880001407 Merged, thanks. This should unblock ARROW-9431 if you do still plan to look at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel
lidavidm closed pull request #10412: URL: https://github.com/apache/arrow/pull/10412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] MichaelBitard commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data
MichaelBitard commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-87541 Oops, you are right, sorry. If I generate the sample.parquet with the latest version, it not longer hangs during reading. Thanks for noticing and sorry again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel
nirandaperera commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-879991878 > I think I've addressed all the feedback here. I'm +1 for this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None): asset)) +class JsonReport(Report): Review comment: You can run unittest by ```bash cd arrow/dev/archery pip install -e .[crossbow] # install archery in develop/editable mode pytest -sv archery/crossbow/tests/test_reports.py # execute specific archery unittests ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None): asset)) +class JsonReport(Report): Review comment: You can run unittest by ```bash cd arrow/dev/archery pip install -e .[crossbow] # install archery in develop mode pytest -sv archery/crossbow/tests/test_reports.py # execute specific archery unittests ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10608: ARROW-13136: [C++] Add coalesce function
ursabot commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122 Benchmark runs are scheduled for baseline = 9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/d391a5e50d69453b8c1648f74e28b5e1...0b2618a2ff3f4c8e879c59938408b00f/) [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/158e7a4062ba424aa32aab6410f2675f...354ae9f639ba47928a0c6e4c3781d22c/) [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/4df834022c414dd39486a4bbca516589...f2036834887e4b4b8c6ef3cf6b65c8f6/) Supported benchmarks: ursa-i9-9960x: langs = Python, R ursa-thinkcentre-m75q: langs = C++, Java ec2-t3-xlarge-us-east-2: cloud = True -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function
lidavidm commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879985810 @ursabot please benchmark lang=C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel
lidavidm commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-879985272 I think I've addressed all the feedback here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669711765 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None): asset)) +class JsonReport(Report): + +HEADER = textwrap.dedent(""" +Arrow Build Report for Job {job_name} + +All tasks: {all_tasks_url} +""") + +TASK = textwrap.dedent(""" + - {name}: +URL: {url} +""").strip() + +STATUS_HEADERS = { +# from CombinedStatus +'error': 'Errored Tasks:', +'failure': 'Failed Tasks:', +'pending': 'Pending Tasks:', +'success': 'Succeeded Tasks:', +} + +def __init__(self, job): +super().__init__(job) + +def url(self, query): +repo_url = self.job.queue.remote_url.strip('.git') +return '{}/branches/all?query={}'.format(repo_url, query) + +def todayStr(self): +date = datetime.utcnow() +return "{}-{}-{}".format(date.year, date.month, date.day) + +def tasksToDict(self, date, tasks): +jsonTasks = [] +for task_name, task in tasks.items(): +jsonTasks.append({ Review comment: Do we plan to store the log as well or jus the metadata about the build? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages
lidavidm commented on a change in pull request #10663: URL: https://github.com/apache/arrow/pull/10663#discussion_r669705163 ## File path: cpp/src/arrow/flight/serialization_internal.cc ## @@ -201,9 +193,7 @@ grpc::Status FlightDataSerialize(const FlightPayload& msg, ByteBuffer* out, // Write the descriptor if present int32_t descriptor_size = 0; if (msg.descriptor != nullptr) { -if (msg.descriptor->size() > kInt32Max) { - return ToGrpcStatus(Status::CapacityError("Descriptor size overflow (>= 2**31)")); Review comment: See https://github.com/grpc/grpc/issues/26695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()
thisisnic commented on a change in pull request #10624: URL: https://github.com/apache/arrow/pull/10624#discussion_r669142885 ## File path: r/src/compute.cpp ## @@ -316,6 +316,19 @@ std::shared_ptr make_compute_options( return std::make_shared(max_splits, reverse); } + if (func_name == "utf8_slice_codeunits") { +using Options = arrow::compute::SliceOptions; +int64_t start = 1; +int64_t stop = -1; Review comment: Sorry, I didn't explain this properly, my fault! What I mean is that here stop has been set to `-1` but the default C++ value isn't `-1`, so the default value here probably shouldn't be `-1` either. Check out this line of code here that shows the default value of `stop` in the C++: https://github.com/apache/arrow/blob/7114c4b6fdb639b3500d77cfd66649af8c5c5e6b/cpp/src/arrow/compute/api_scalar.h#L205-L206 The default value of `stop` is `std::numeric_limits::max()` which is the biggest int64. So if you don't supply a value for `stop` and just use the default, this allows you to slice to the end of the string. In some of the other functions in this file, this line has been used to set the default values to the ones from the C++ code: `std::make_shared(Options::Defaults());` Maybe instead of manually setting the value of stop to `-1`, if you use the line above, it might make it easier to fix some of the failing tests as now you'll be able to slice to the end of a string. If this doesn't make sense, let me know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new issue #725: Global limit isn't really limiting parquet file reads and stops earlier
Jimexist opened a new issue #725: URL: https://github.com/apache/arrow-datafusion/issues/725 **Describe the bug** A clear and concise description of what the bug is. When given a global limit: ```sql select * from some_large_data limit 50; ``` even with a `-c` being 100 i.e. small batch size, the global limit isn't really cutting running time small. **To Reproduce** Steps to reproduce the behavior: this is easily reproducible given a large data set. **Expected behavior** A clear and concise description of what you expected to happen. **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669699660 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None): asset)) +class JsonReport(Report): Review comment: We don't have many tests yet for crossbow, but this looks like one we could actually test. You can [reuse the test case including the fixture](https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/tests/test_reports.py#L24-L35) from `crossbow/test_reports.py`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org