date:20210714

[GitHub] [arrow-rs] velvia commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox



velvia commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880418323


   This is definitely a good feature.
   
   It seems to me that in most cases, that time/duration manipulations could be 
based on existing arithmetic kernels, that'd be one way to go about it.
   ie some combination of cast(date32 -> i32) -> math add kernel -> cast back 
for example.
   
   Or at least, leveraging existing math kernels would make a ton of sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #729: provide more details on required .parquet file extension

2021-07-14 Thread GitBox



Jimexist opened a new pull request #729:
URL: https://github.com/apache/arrow-datafusion/pull/729


   # Which issue does this PR close?
   
   provide more details on required .parquet file extension
   
   Closes #.
   
# Rationale for this change
   
   provide more details on required .parquet file extension
   
   # What changes are included in this PR?
   
   
   # Are there any user-facing changes?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #728: implement FromStr for FileType

2021-07-14 Thread GitBox



Jimexist opened a new pull request #728:
URL: https://github.com/apache/arrow-datafusion/pull/728


   # Which issue does this PR close?
   
   implement FromStr for FileType
   
   Closes #.
   
# Rationale for this change
   
   implement FromStr for FileType so that it can accept lower case and thus 
more flexible
   
   # What changes are included in this PR?
   
   
   # Are there any user-facing changes?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] codecov-commenter commented on pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox



codecov-commenter commented on pull request #521:
URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-880382473


   # 
[Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#521](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (2cbe13c) into 
[master](https://codecov.io/gh/apache/arrow-rs/commit/f873d77bc77847b95921374aa66ba1d38e9cebf8?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f873d77) will **increase** coverage by `0.09%`.
   > The diff coverage is `97.51%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow-rs/pull/521/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff @@
   ##   master #521  +/-   ##
   ==
   + Coverage   82.47%   82.57%   +0.09% 
   ==
 Files 167  167  
 Lines   4614446399 +255 
   ==
   + Hits3805938315 +256 
   + Misses   8085 8084   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[arrow/src/compute/kernels/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9ib29sZWFuLnJz)
 | `97.53% <97.51%> (+0.76%)` | :arrow_up: |
   | 
[arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=)
 | `98.92% <0.00%> (+1.07%)` | :arrow_up: |
   | 
[arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/521/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz)
 | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Last update 
[f873d77...2cbe13c](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox



bjchambers commented on a change in pull request #521:
URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670118830



##
File path: arrow/src/compute/kernels/boolean.rs
##
@@ -458,101 +458,113 @@ pub fn is_not_null(input: ) -> 
Result {
 Ok(BooleanArray::from(data))
 }
 
-/// Copies original array, setting null bit to true if a secondary comparison 
boolean array is set to true.
-/// Typically used to implement NULLIF.
-// NOTE: For now this only supports Primitive Arrays.  Although the code could 
be made generic, the issue
-// is that currently the bitmap operations result in a final bitmap which is 
aligned to bit 0, and thus
-// the left array's data needs to be sliced to a new offset, and for 
non-primitive arrays shifting the
-// data might be too complicated.   In the future, to avoid shifting left 
array's data, we could instead
-// shift the final bitbuffer to the right, prepending with 0's instead.
-pub fn nullif(
-left: ,
-right: ,
-) -> Result>
-where
-T: ArrowNumericType,
-{
+/// Copies original array, setting null bit to true if a secondary comparison
+/// boolean array is set to true. Typically used to implement NULLIF.
+pub fn nullif(left: , right: ) -> Result {
 if left.len() != right.len() {
 return Err(ArrowError::ComputeError(
 "Cannot perform comparison operation on arrays of different length"
 .to_string(),
 ));
 }
 let left_data = left.data();
-let right_data = right.data();
-
-// If left has no bitmap, create a new one with all values set for nullity 
op later
-// left=0 (null)   right=null   output bitmap=null
-// left=0  right=1  output bitmap=null
-// left=1 (set)right=null   output bitmap=set   (passthrough)
-// left=1  right=1 & comp=trueoutput bitmap=null
-// left=1  right=1 & comp=false   output bitmap=set
-//
-// Thus: result = left null bitmap & (!right_values | !right_bitmap)
-//  OR left null bitmap & !(right_values & right_bitmap)
+
+// If right has no trues and no nulls, then it will pass the left array
+// unmodified.
+if right.null_count() == 0
+&& right
+.values()
+.count_set_bits_offset(right.offset(), right.len())
+== 0
+{
+return Ok(left.clone());

Review comment:
   Added.

##
File path: arrow/src/compute/kernels/boolean.rs
##
@@ -1148,12 +1222,215 @@ mod tests {
 let comp = comp.slice(2, 3); // Some(false), None, Some(true)
 let comp = comp.as_any().downcast_ref::().unwrap();
 let res = nullif(, ).unwrap();
+let res = res.as_any().downcast_ref::().unwrap();
 
 let expected = Int32Array::from(vec![
 Some(15), // False => keep it
 Some(8),  // None => keep it
 None, // true => None
 ]);
-assert_eq!(, )
+assert_eq!(, res)
+}
+
+#[test]
+fn test_nullif_int_large_right_offset() {
+let a: ArrayRef = Arc::new(Int32Array::from(vec![
+None, // 0
+Some(15), // 1
+Some(8),
+Some(1),
+Some(9),
+]));
+let a = a.slice(1, 3); // Some(15), Some(8), Some(1)
+
+let comp = BooleanArray::from(vec![
+Some(false), // 0
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false), // 8
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false), // 16
+Some(false), // 17
+Some(false), // 18
+None,
+Some(true),
+Some(false),
+None,
+]);
+let comp = comp.slice(18, 3); // Some(false), None, Some(true)
+let comp = comp.as_any().downcast_ref::().unwrap();
+let res = nullif(, ).unwrap();
+let res = res.as_any().downcast_ref::().unwrap();
+
+let expected = Int32Array::from(vec![
+Some(15), // False => keep it
+Some(8),  // None => keep it
+None, // true => None
+]);
+assert_eq!(, res)
+}
+
+#[test]
+fn test_nullif_boolean_offset() {
+let a: ArrayRef = Arc::new(BooleanArray::from(vec![
+None,   // 0
+Some(true), // 1
+Some(false),
+Some(true),
+Some(true),
+]));
+let a = a.slice(1, 3); // Some(true), Some(false), Some(true)
+
+let comp = BooleanArray::from(vec![
+Some(false), // 0
+Some(false), // 1
+Some(false), // 2
+None,
+Some(true),
+Some(false),

[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox



bjchambers commented on a change in pull request #521:
URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670114367



##
File path: arrow/src/compute/kernels/boolean.rs
##
@@ -1148,12 +1222,215 @@ mod tests {
 let comp = comp.slice(2, 3); // Some(false), None, Some(true)
 let comp = comp.as_any().downcast_ref::().unwrap();
 let res = nullif(, ).unwrap();
+let res = res.as_any().downcast_ref::().unwrap();
 
 let expected = Int32Array::from(vec![
 Some(15), // False => keep it
 Some(8),  // None => keep it
 None, // true => None
 ]);
-assert_eq!(, )
+assert_eq!(, res)
+}
+
+#[test]
+fn test_nullif_int_large_right_offset() {
+let a: ArrayRef = Arc::new(Int32Array::from(vec![
+None, // 0
+Some(15), // 1
+Some(8),
+Some(1),
+Some(9),
+]));
+let a = a.slice(1, 3); // Some(15), Some(8), Some(1)
+
+let comp = BooleanArray::from(vec![
+Some(false), // 0
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false), // 8
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false),
+Some(false), // 16
+Some(false), // 17
+Some(false), // 18
+None,
+Some(true),
+Some(false),
+None,
+]);
+let comp = comp.slice(18, 3); // Some(false), None, Some(true)
+let comp = comp.as_any().downcast_ref::().unwrap();
+let res = nullif(, ).unwrap();
+let res = res.as_any().downcast_ref::().unwrap();
+
+let expected = Int32Array::from(vec![
+Some(15), // False => keep it
+Some(8),  // None => keep it
+None, // true => None
+]);
+assert_eq!(, res)
+}
+
+#[test]
+fn test_nullif_boolean_offset() {
+let a: ArrayRef = Arc::new(BooleanArray::from(vec![
+None,   // 0
+Some(true), // 1
+Some(false),
+Some(true),
+Some(true),
+]));
+let a = a.slice(1, 3); // Some(true), Some(false), Some(true)
+
+let comp = BooleanArray::from(vec![
+Some(false), // 0
+Some(false), // 1
+Some(false), // 2
+None,
+Some(true),
+Some(false),
+None,
+]);
+let comp = comp.slice(2, 3); // Some(false), None, Some(true)
+let comp = comp.as_any().downcast_ref::().unwrap();
+let res = nullif(, ).unwrap();
+let res = res.as_any().downcast_ref::().unwrap();
+
+let expected = BooleanArray::from(vec![
+Some(true),  // False => keep it
+Some(false), // None => keep it
+None,// true => None
+]);
+assert_eq!(, res)
+}
+
+#[test]
+fn test_nullif_passthrough_all_nonnull() {
+// DO NOT SUBMIT: Write this test
+}
+
+#[test]
+fn test_nullif_passthrough_all_null_or_true() {
+// DO NOT SUBMIT: Write this test
+}
+
+struct Foo {
+a: Option,
+b: Option,
+/// Whether the entry should be valid.
+is_valid: bool,
+}
+
+impl Foo {
+fn new_valid(a: i32, b: bool) -> Foo {
+Self {
+a: Some(a),
+b: Some(b),
+is_valid: true,
+}
+}
+
+fn new_null() -> Foo {
+Self {
+a: None,
+b: None,
+is_valid: false,
+}
+}
+}
+
+/// Struct Array equality is a bit weird -- we need to have the *child 
values*
+/// correct even if the enclosing struct indicates it is null. But we
+/// also need the top level is_valid bits to be correct.
+fn create_foo_struct(values: Vec) -> StructArray {
+let mut struct_array = StructBuilder::new(
+vec![
+Field::new("a", DataType::Int32, true),
+Field::new("b", DataType::Boolean, true),
+],
+vec![
+Box::new(Int32Builder::new(values.len())),
+Box::new(BooleanBuilder::new(values.len())),
+],
+);
+
+for value in values {
+struct_array
+.field_builder::(0)
+.unwrap()
+.append_option(value.a)
+.unwrap();
+struct_array
+.field_builder::(1)
+.unwrap()
+.append_option(value.b)
+.unwrap();
+

[GitHub] [arrow-rs] bjchambers opened a new pull request #555: failing test

2021-07-14 Thread GitBox



bjchambers opened a new pull request #555:
URL: https://github.com/apache/arrow-rs/pull/555


   # Which issue does this PR close?
   
   Closes #514.
   
   # Rationale for this change
   
   This was caused by a problem in struct slices and equality. It was fixed by 
a separate PR, but it seems beneficial to get an explicit test of the expected 
behavior in as well to prevent regression. 
   
   # What changes are included in this PR?
   
   A test that was previously failing for equality on slices of a struct array.
   
   # Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] bjchambers commented on issue #514: Struct equality on slices has false negatives

2021-07-14 Thread GitBox



bjchambers commented on issue #514:
URL: https://github.com/apache/arrow-rs/issues/514#issuecomment-880371136


   I think that #389 seems to have fixed this. I'll make a PR with the 
corresponding test case to prevent regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-experimental-rs-parquet2] jorgecarleitao commented on pull request #1: Adds parquet2

2021-07-14 Thread GitBox



jorgecarleitao commented on pull request #1:
URL: 
https://github.com/apache/arrow-experimental-rs-parquet2/pull/1#issuecomment-880369806


   @jhorstmann , did you submitted a CLA?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles

2021-07-14 Thread GitBox



domoritz commented on a change in pull request #10698:
URL: https://github.com/apache/arrow/pull/10698#discussion_r670105369



##
File path: js/gulp/package-task.js
##
@@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => 
({
 ...createTypeScriptPackageJson(target, format)(orig),
 bin: orig.bin,
 name: npmPkgName,
-main: `${mainExport}.node`,
-browser: `${mainExport}.dom`,
-module: `${mainExport}.dom.mjs`,
+type: 'commonjs',
+main: `${mainExport}.node.js`,
+module: `${mainExport}.dom.js`,

Review comment:
   ```suggestion
   module: `${mainExport}.node.mjs`,
   ```
   
   To be consistent with 
https://github.com/apache/arrow/pull/10698/files#diff-9ad0361c0b3692cb897f0641d786ade504a91c8c1d6a36c3abd3cf33224946a0R96
 we should use node. Also, don't we need the .mjs file here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles

2021-07-14 Thread GitBox



domoritz commented on a change in pull request #10698:
URL: https://github.com/apache/arrow/pull/10698#discussion_r670105202



##
File path: js/gulp/package-task.js
##
@@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => 
({
 ...createTypeScriptPackageJson(target, format)(orig),
 bin: orig.bin,
 name: npmPkgName,
-main: `${mainExport}.node`,
-browser: `${mainExport}.dom`,
-module: `${mainExport}.dom.mjs`,
+type: 'commonjs',
+main: `${mainExport}.node.js`,

Review comment:
   ```suggestion
   main: `${mainExport}.node`,
   ```

##
File path: js/gulp/package-task.js
##
@@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => 
({
 ...createTypeScriptPackageJson(target, format)(orig),
 bin: orig.bin,
 name: npmPkgName,
-main: `${mainExport}.node`,
-browser: `${mainExport}.dom`,
-module: `${mainExport}.dom.mjs`,
+type: 'commonjs',
+main: `${mainExport}.node.js`,
+module: `${mainExport}.dom.js`,

Review comment:
   ```suggestion
   module: `${mainExport}.node.js`,
   ```
   
   To be consistent with 
https://github.com/apache/arrow/pull/10698/files#diff-9ad0361c0b3692cb897f0641d786ade504a91c8c1d6a36c3abd3cf33224946a0R96.
 

##
File path: js/gulp/package-task.js
##
@@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => 
({
 ...createTypeScriptPackageJson(target, format)(orig),
 bin: orig.bin,
 name: npmPkgName,
-main: `${mainExport}.node`,
-browser: `${mainExport}.dom`,
-module: `${mainExport}.dom.mjs`,
+type: 'commonjs',
+main: `${mainExport}.node.js`,
+module: `${mainExport}.dom.js`,
+browser: `${mainExport}.dom.js`,

Review comment:
   ```suggestion
   browser: `${mainExport}.dom`,
   ```
   
   This way bundlers can pick up mjs files more easily, no?

##
File path: js/gulp/typescript-task.js
##
@@ -61,8 +61,8 @@ function compileTypescript(out, tsconfigPath, 
tsconfigOverrides) {
 return Observable.forkJoin(writeSources, writeDTypes, writeJS);
 }
 
-function cjsMapFile(mapFilePath) { return mapFilePath; }

Review comment:
   Don't we need to undo these changes so we have the correct mapping files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages

2021-07-14 Thread GitBox



cyb70289 commented on a change in pull request #10663:
URL: https://github.com/apache/arrow/pull/10663#discussion_r670100132



##
File path: cpp/src/arrow/flight/test_util.cc
##
@@ -616,6 +640,22 @@ Status ExampleLargeBatches(BatchVector* out) {
   return Status::OK();
 }
 
+arrow::Result> VeryLargeBatch() {
+  // In CI, some platforms don't let us allocate one very large
+  // buffer, so allocate a smaller buffer and repeat it a few times
+  constexpr int64_t nbytes = (1ul << 27ul) + 8ul;
+  constexpr int64_t nrows = nbytes / 8;
+  constexpr int64_t ncols = 16;
+  ARROW_ASSIGN_OR_RAISE(auto values, AllocateBuffer(nbytes));
+  std::memset(values->mutable_data(), 0x00, values->capacity());
+  std::vector> buffers = {nullptr, std::move(values)};
+  auto array = std::make_shared(int64(), nrows, buffers,
+   /*null_count=*/0);
+  std::vector> arrays(ncols, array);
+  std::vector> fields(ncols, field("a", int64()));
+  return RecordBatch::Make(schema(std::move(fields)), nrows, 
std::move(arrays));

Review comment:
   So the ipc payload will be 16 columns in size, though all columns are 
from same buffer?

##
File path: cpp/src/arrow/flight/types.cc
##
@@ -21,6 +21,7 @@
 #include 
 #include 
 
+#include "arrow/buffer.h"

Review comment:
   Is this header used?

##
File path: cpp/src/arrow/flight/serialization_internal.cc
##
@@ -164,26 +164,18 @@ static const uint8_t kPaddingBytes[8] = {0, 0, 0, 0, 0, 
0, 0, 0};
 
 // Update the sizes of our Protobuf fields based on the given IPC payload.
 grpc::Status IpcMessageHeaderSize(const arrow::ipc::IpcPayload& ipc_msg, bool 
has_body,
-  size_t* body_size, size_t* header_size,
-  int32_t* metadata_size) {
-  DCHECK_LT(ipc_msg.metadata->size(), kInt32Max);
+  size_t* header_size, int32_t* metadata_size) 
{
+  DCHECK_LE(ipc_msg.metadata->size(), kInt32Max);
   *metadata_size = static_cast(ipc_msg.metadata->size());
 
   // 1 byte for metadata tag
   *header_size += 1 + WireFormatLite::LengthDelimitedSize(*metadata_size);
 
-  for (const auto& buffer : ipc_msg.body_buffers) {
-// Buffer may be null when the row length is zero, or when all
-// entries are invalid.
-if (!buffer) continue;
-
-*body_size += 
static_cast(BitUtil::RoundUpToMultipleOf8(buffer->size()));
-  }
-
   // 2 bytes for body tag
   if (has_body) {
 // We write the body tag in the header but not the actual body data
-*header_size += 2 + WireFormatLite::LengthDelimitedSize(*body_size) - 
*body_size;
+*header_size += 2 + 
WireFormatLite::LengthDelimitedSize(ipc_msg.body_length) -

Review comment:
   From old code, looks `ipc_msg.body_length` may be 0. It's not possible 
now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] domoritz closed pull request #10695: ARROW-13299: [JS] Upgrade ix and rxjs

2021-07-14 Thread GitBox



domoritz closed pull request #10695:
URL: https://github.com/apache/arrow/pull/10695


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox



NinaPeng commented on pull request #10700:
URL: https://github.com/apache/arrow/pull/10700#issuecomment-880357237


   > > @liyafan82 Thanks for your review. anything else need to be fixed? or 
this pull request can be merged?
   > 
   > @NinaPeng The change looks reasonable to me. I want to merge it in a few 
days, if there are no more comments.
   
   @liyafan82 got it. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] liyafan82 commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox



liyafan82 commented on pull request #10700:
URL: https://github.com/apache/arrow/pull/10700#issuecomment-880355669


   > @liyafan82 Thanks for your review. anything else need to be fixed? or this 
pull request can be merged?
   
   @NinaPeng The change looks reasonable to me. I want to merge it in a few 
days, if there are no more comments. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

2021-07-14 Thread GitBox



cyb70289 commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-880342643


   > @wesm @cyb70289 @bkietz Is there anything else we could do for the low 
selectivity cases (1% select)?
   
   I don't have satisfying suggestions.
   A possible workaround I guess is to choose branch/non-branch code per 
selectivity. Smells like "benchmark oriented optimization"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox



NinaPeng commented on pull request #10700:
URL: https://github.com/apache/arrow/pull/10700#issuecomment-880340157


   @liyafan82 Thanks for your review. anything else need to be fixed? or this 
pull request can be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kou commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox



kou commented on pull request #10614:
URL: https://github.com/apache/arrow/pull/10614#issuecomment-880309249


   Thanks!
   I've merged this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kou closed pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox



kou closed pull request #10614:
URL: https://github.com/apache/arrow/pull/10614


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] andygrove opened a new pull request #727: UnresolvedShuffleExec should represent a single shuffle

2021-07-14 Thread GitBox



andygrove opened a new pull request #727:
URL: https://github.com/apache/arrow-datafusion/pull/727


   # Which issue does this PR close?
   
   
   
   Closes #726
   
# Rationale for this change
   
   
   Small step towards getting shuffle working.
   
   # What changes are included in this PR?
   
   
   # Are there any user-facing changes?
   
   
   No
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] andygrove opened a new issue #726: Ballista: UnresolvedShuffleExec should only have a single stage_id

2021-07-14 Thread GitBox



andygrove opened a new issue #726:
URL: https://github.com/apache/arrow-datafusion/issues/726


   **Describe the bug**
   UnresolvedShuffleExec should represent a single shuffle, not multiple.
   
   **To Reproduce**
   I discovered this while working on the PR to get shuffles working.
   
   **Expected behavior**
   N/A
   
   **Additional context**
   N/A
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10604: ARROW-13190: [C++] [Gandiva] Change behavior of INITCAP function

2021-07-14 Thread GitBox



anthonylouisbsb commented on a change in pull request #10604:
URL: https://github.com/apache/arrow/pull/10604#discussion_r670020234



##
File path: cpp/src/gandiva/gdv_function_stubs.cc
##
@@ -427,7 +427,8 @@ CAST_VARLEN_TYPE_FROM_NUMERIC(VARBINARY)
 #undef GDV_FN_CAST_VARCHAR_REAL
 
 GANDIVA_EXPORT

Review comment:
   I removed it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669991533



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}
+
+FORCE_INLINE
+const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in,
+ const char* pattern, gdv_int32 
pattern_len,
+ gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+  gdv_int64 millis = in % MILLIS_IN_SEC;
+
+  if (pattern_len <= 0) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size");
+*out_len = 0;
+return "";
+  }
+
+  static const int kTimeStampStringLen = pattern_len;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  const char* regex_format =
+  "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?"
+  "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?";

Review comment:
   Ok, I will take a look further within the already used libs on the 
project to process more patterns.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669991187



##
File path: cpp/src/gandiva/precompiled/time_test.cc
##
@@ -839,4 +839,86 @@ TEST(TestTime, TestToTimeNumeric) {
   EXPECT_EQ(expected_output, to_time_float64(3601.500));
 }
 
-}  // namespace gandiva
+TEST(TestTime, TestFromUnixtimeWithoutPattern) {

Review comment:
   Ok, will do it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}
+
+FORCE_INLINE
+const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in,
+ const char* pattern, gdv_int32 
pattern_len,
+ gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+  gdv_int64 millis = in % MILLIS_IN_SEC;
+
+  if (pattern_len <= 0) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size");
+*out_len = 0;
+return "";
+  }
+
+  static const int kTimeStampStringLen = pattern_len;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  const char* regex_format =
+  "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?"
+  "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?";
+  bool match = std::regex_match(pattern, std::regex(regex_format));
+
+  if (!match) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern");
+*out_len = 0;
+return "";
+  }
+
+  // length from pattern
+  int res = 0;
+
+  switch (pattern_len) {
+// 
+case 4:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year);
+  break;
+// -MM
+case 7:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" 
PRId64, year,
+ month);
+  break;
+// -MM-dd
+case 10:

Review comment:
   No, I focused mainly on the first pattern.

##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret =

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}
+
+FORCE_INLINE
+const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in,
+ const char* pattern, gdv_int32 
pattern_len,
+ gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+  gdv_int64 millis = in % MILLIS_IN_SEC;
+
+  if (pattern_len <= 0) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size");
+*out_len = 0;
+return "";
+  }
+
+  static const int kTimeStampStringLen = pattern_len;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  const char* regex_format =
+  "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?"
+  "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?";
+  bool match = std::regex_match(pattern, std::regex(regex_format));
+
+  if (!match) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern");
+*out_len = 0;
+return "";
+  }
+
+  // length from pattern
+  int res = 0;
+
+  switch (pattern_len) {
+// 
+case 4:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year);
+  break;
+// -MM
+case 7:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" 
PRId64, year,
+ month);
+  break;
+// -MM-dd
+case 10:

Review comment:
   Yes, the comment was misplaced.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669990486



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}
+
+FORCE_INLINE
+const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in,
+ const char* pattern, gdv_int32 
pattern_len,
+ gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+  gdv_int64 millis = in % MILLIS_IN_SEC;
+
+  if (pattern_len <= 0) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size");
+*out_len = 0;
+return "";
+  }
+
+  static const int kTimeStampStringLen = pattern_len;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  const char* regex_format =
+  "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?"
+  "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?";
+  bool match = std::regex_match(pattern, std::regex(regex_format));

Review comment:
   Ok, I will do it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] yordan-pavlov commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox



yordan-pavlov commented on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-88024


   @mcassels thank you for reporting this - good find; in the implementation of 
the pruning predicate there is an assumption that for a predicate expression 
`f(v) OP c`, where v is any value in a column chunk and c is a constant / 
literal, then `f(v_min) <= f(v) <= f(v_max)` for any value v in that column 
chunk. What I did not take into account with the initial implementation of the 
parquet predicate push-down implementation is that this assumption is, sadly, 
not true for all functions supported by datafusion (obviously the function 
`ABS(c0 - 251.10794896957802)` being one example).
   
   As a shorter-term fix, in order to ensure correctness, the predicate 
push-down feature could be limited to simpler expressions (such as `col < 
literal`).
   
   Longer term, in order to support predicate push-down with more complex 
expressions, there must be a way to find out if the predicate expression 
satisfies the assumption detailed above (such as having a list of compatible 
functions as @lvheyang suggested above), without actually evaluating the 
expression.
   
   Sadly, I don't have time at the moment to work on this, due to things in my 
personal life.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



augustoasilva commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669989632



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}

Review comment:
   They are similar, but the from_unixtime does not has the seconds' 
fractions. Also, from_unixtime without the pattern the size is fixed. I think I 
could call the one that you have mentioned inside it to not duplicate the code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type

2021-07-14 Thread GitBox



github-actions[bot] commented on pull request #10721:
URL: https://github.com/apache/arrow/pull/10721#issuecomment-880238653


   https://issues.apache.org/jira/browse/ARROW-11673


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nirandaperera opened a new pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type

2021-07-14 Thread GitBox



nirandaperera opened a new pull request #10721:
URL: https://github.com/apache/arrow/pull/10721


   This PR adds casting from one dictionary type to anther dictionary type:
   ex:
   ```
   dictionary(int8(), int16()) --> dictionary(int32(), int64())
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace closed issue #10699: ModuleNotFoundError: No module named 'pyarrow._orc'

2021-07-14 Thread GitBox



westonpace closed issue #10699:
URL: https://github.com/apache/arrow/issues/10699


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox



alamb commented on issue #554:
URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880210994


   > Thanks for this @alamb, I hadn't noticed that not linking PRs to existing 
issues affects the changelog.
   
   Yes @nevi-me  I don't fully understand the intricacies of the changelog 
generator that @jorgecarleitao  found but it seems to be mostly derived from 
issues (rather than PRs) 路  I figured I will keep honing the process in future 
releases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kevingurney commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox



kevingurney commented on pull request #10614:
URL: https://github.com/apache/arrow/pull/10614#issuecomment-880204002


   @kou - we updated the description of the pull request to reflect the latest 
status.
   
   We also discovered a small issue during qualification. If you specified 
`MATLAB_BUILD_TESTS=ON`, then CMake would always use the Arrow C++ libraries 
built from source and ignore any custom `ARROW_HOME` or system-installed 
distributions of Arrow.
   
   We just pushed a small fix for this issue in 
b12b6fe1398dc4ee2ebb8fb00ce54ab4bfe4012a. Let us know if you have any questions.
   
   Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox



lidavidm commented on pull request #10608:
URL: https://github.com/apache/arrow/pull/10608#issuecomment-880197573


   Broke up the main function a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox



lidavidm commented on pull request #10557:
URL: https://github.com/apache/arrow/pull/10557#issuecomment-880197464


   Removed support for toplevel nulls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #719: Optimize min/max queries with table statistics

2021-07-14 Thread GitBox



alamb commented on a change in pull request #719:
URL: https://github.com/apache/arrow-datafusion/pull/719#discussion_r669933745



##
File path: datafusion/src/physical_plan/parquet.rs
##
@@ -312,22 +431,47 @@ impl ParquetExec {
 if let Some(x) = _statistics {
 let part_nulls: Vec> =
 x.iter().map(|c| c.null_count).collect();
-has_null_counts = true;
+has_statistics = true;
+
+let part_max_values: Vec> =
+x.iter().map(|c| c.max_value.clone()).collect();
+let part_min_values: Vec> =
+x.iter().map(|c| c.min_value.clone()).collect();
 
 for  in projection.iter() {
 null_counts[i] = part_nulls[i].unwrap_or(0);
+if let Some(part_max_value) = part_max_values[i].clone() {
+if let Some(max_value) =  max_values[i] {
+let _ = max_value.update(&[part_max_value]);

Review comment:
   This doesn't seem right to me -- it seems like it ignores errors if the 
min or max values can't be updated (maybe due to overflow or data type 
mismatch). I think it might be safer if there is an error calling `update()` to 
reset the `min_values[i]` or `max_values[i]` back to None
   
   Otherwise if you have something like
   ```
   Row Group 1: min=1
   Row Group 2: min=MIN_INT <-- and this causes errors for some reason
   Row Group 3: min=3
   ```
   The code as written will return `min =1` for this column in the parquet file 
which may be incorrect. I think a more conservative approach here would be for 
this code to return `None` so we don't erroneously use these statistics
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox



alamb merged pull request #687:
URL: https://github.com/apache/arrow-datafusion/pull/687


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #554: implement lead and lag with 2nd and 3rd argument

2021-07-14 Thread GitBox



alamb closed issue #554:
URL: https://github.com/apache/arrow-datafusion/issues/554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox



alamb commented on a change in pull request #687:
URL: https://github.com/apache/arrow-datafusion/pull/687#discussion_r669925638



##
File path: datafusion/src/physical_plan/expressions/lead_lag.rs
##
@@ -176,6 +240,28 @@ mod tests {
 .iter()
 .collect::(),
 )?;
+
+test_i32_result(
+lag(
+"lead".to_owned(),
+DataType::Int32,
+Arc::new(Column::new("c3", 0)),
+None,
+Some(ScalarValue::Int32(Some(100))),
+),
+vec![
+Some(100),

Review comment:
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb closed issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox



alamb closed issue #554:
URL: https://github.com/apache/arrow-rs/issues/554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] nevi-me commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox



nevi-me commented on issue #554:
URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880178885


   Thanks for this @alamb, I hadn't noticed that not linking PRs to existing 
issues affects the changelog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb merged pull request #389: make slice work for nested types

2021-07-14 Thread GitBox



alamb merged pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] nevi-me commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox



nevi-me commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880177931


   this also relates to #45 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb opened a new issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox



alamb opened a new issue #554:
URL: https://github.com/apache/arrow-rs/issues/554


   **Describe the bug**
   `ArrayData::slice()` does not work for nested types, because only the 
`ArrayData::buffers` are updated with the new offset and length. This has 
caused a lot of issues in the past.
   
   This blocks us from being able to implement `RecordBatch::slice()`, and has 
led to creating #381 to sidestep this issue.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox



alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880175659


   @Jimexist  no one that I know of is taking this on. If you were interested 
that would be awesome
   
   The reference implementation would probably still be postgres in my mind. 
There is a nice table 
https://www.postgresql.org/docs/current/functions-datetime.html#OPERATORS-DATETIME-TABLE
 that describes all the operators, input types, and output types. I think it 
helps because certain combinations, such as `date + date` don't really make 
logical sense so having the reference to guide what is needed would help a lot
   
   @velvia / @andygrove / @jorgecarleitao  any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nealrichardson commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox



nealrichardson commented on a change in pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#discussion_r669913235



##
File path: r/R/dplyr-functions.R
##
@@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", 
"left", "right")) {
   Expression$create(trim_fun, string)
 }
 
+nse_funcs$substr <- function(string, start, stop) {
+  assert_that(
+length(start) == 1,
+msg = "`start` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(stop) == 1,
+msg = "`stop` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (start <= 0) {
+start <- 1
+  }
+
+  if (stop < start) {
+stop <- 0

Review comment:
   Can you explain these? It's not obvious why this is correct, and the 
base R code and docs don't discuss these cases.

##
File path: r/R/dplyr-functions.R
##
@@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", 
"left", "right")) {
   Expression$create(trim_fun, string)
 }
 
+nse_funcs$substr <- function(string, start, stop) {
+  assert_that(
+length(start) == 1,
+msg = "`start` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(stop) == 1,
+msg = "`stop` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (start <= 0) {
+start <- 1
+  }
+
+  if (stop < start) {
+stop <- 0
+  }
+
+  Expression$create(
+"utf8_slice_codeunits",
+string,
+options = list(start = start - 1L, stop = stop)
+  )
+}
+
+nse_funcs$substring <- function(text, first, last = 100L) {

Review comment:
   You could just implement this one by calling nse_funcs$substr. The 
validation messages won't be exactly right because the argument names are 
different, but per the docs this function is only kept for compatibility with 
S, so I don't think it's a big deal.

##
File path: r/R/dplyr-functions.R
##
@@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", 
"left", "right")) {
   Expression$create(trim_fun, string)
 }
 
+nse_funcs$substr <- function(string, start, stop) {
+  assert_that(
+length(start) == 1,
+msg = "`start` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(stop) == 1,
+msg = "`stop` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (start <= 0) {
+start <- 1
+  }
+
+  if (stop < start) {
+stop <- 0
+  }
+
+  Expression$create(
+"utf8_slice_codeunits",
+string,
+options = list(start = start - 1L, stop = stop)
+  )
+}
+
+nse_funcs$substring <- function(text, first, last = 100L) {
+  assert_that(
+length(first) == 1,
+msg = "`first` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(last) == 1,
+msg = "`last` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (first <= 0) {
+first <- 1
+  }
+
+  if (last < first) {
+last <- 0
+  }
+
+  Expression$create(
+"utf8_slice_codeunits",
+text,
+options = list(start = first - 1L, stop = last)
+  )
+}
+
+nse_funcs$str_sub <- function(string, start = 1L, end = -1L) {
+  assert_that(
+length(start) == 1,
+msg = "`start` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(end) == 1,
+msg = "`end` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (start == 0) start <- 1
+
+  if (end == -1) end <- .Machine$integer.max
+
+  if (end < start) end <- 0
+
+  if (start > 0) start <- start - 1L

Review comment:
   Why does this version subtract 1 up here but the others don't? Why only 
if start > 0? Is start <= 0 valid?

##
File path: r/R/dplyr-functions.R
##
@@ -391,7 +470,7 @@ nse_funcs$str_split <- function(string, pattern, n = Inf, 
simplify = FALSE) {
 string,
 options = list(
   pattern =
-  opts$pattern,
+opts$pattern,

Review comment:
   I know this is just linting but IDK why opts$pattern is on its own line 
instead of next to = above it.

##
File path: r/R/dplyr-functions.R
##
@@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", 
"left", "right")) {
   Expression$create(trim_fun, string)
 }
 
+nse_funcs$substr <- function(string, start, stop) {
+  assert_that(
+length(start) == 1,
+msg = "`start` must be length 1 - other lengths are not supported in Arrow"
+  )
+  assert_that(
+length(stop) == 1,
+msg = "`stop` must be length 1 - other lengths are not supported in Arrow"
+  )
+
+  if (start <= 0) {
+start <- 1
+  }
+
+  if (stop < start) {
+stop <- 0
+  }
+
+  Expression$create(
+"utf8_slice_codeunits",
+string,
+options = list(start = start - 1L, stop = stop)

Review comment:
   Why do we only subtract 1 from start but not stop?

##
File path: r/R/dplyr-functions.R

[GitHub] [arrow-datafusion] alamb commented on pull request #716: #699 fix return type conflict when calling builtin math fuctions

2021-07-14 Thread GitBox



alamb commented on pull request #716:
URL: https://github.com/apache/arrow-datafusion/pull/716#issuecomment-880173938


   Any thoughts @Dandandan  or @jorgecarleitao ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #716: #699 fix return type conflict when calling builtin math fuctions

2021-07-14 Thread GitBox



alamb commented on a change in pull request #716:
URL: https://github.com/apache/arrow-datafusion/pull/716#discussion_r669916709



##
File path: datafusion/src/execution/context.rs
##
@@ -2364,6 +2365,75 @@ mod tests {
 assert_batches_sorted_eq!(expected, );
 }
 
+#[tokio::test]
+async fn case_builtin_math_expression() {
+let mut ctx = ExecutionContext::new();
+
+let type_values = vec![
+(
+DataType::Int8,
+Arc::new(Int8Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::Int16,
+Arc::new(Int16Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::Int32,
+Arc::new(Int32Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::Int64,
+Arc::new(Int64Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::UInt8,
+Arc::new(UInt8Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::UInt16,
+Arc::new(UInt16Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::UInt32,
+Arc::new(UInt32Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::UInt64,
+Arc::new(UInt64Array::from(vec![1])) as ArrayRef,
+),
+(
+DataType::Float32,
+Arc::new(Float32Array::from(vec![1.0_f32])) as ArrayRef,
+),
+(
+DataType::Float64,
+Arc::new(Float64Array::from(vec![1.0_f64])) as ArrayRef,
+),
+];
+
+for (data_type, array) in type_values.iter() {
+let schema =
+Arc::new(Schema::new(vec![Field::new("v", data_type.clone(), 
false)]));
+let batch =
+RecordBatch::try_new(schema.clone(), 
vec![array.clone()]).unwrap();
+let provider = MemTable::try_new(schema, 
vec![vec![batch]]).unwrap();
+ctx.register_table("t", Arc::new(provider)).unwrap();
+let expected = vec![
+"+-+",
+"| sqrt(v) |",
+"+-+",
+"| 1   |",
+"+-+",
+];
+let results = plan_and_collect( ctx, "SELECT sqrt(v) FROM t")

Review comment:
     it is a good test. We can finagle the types later if needed (e.g. if 
we ever want to support `sqrt(numeric)` --> `numeric`. 
   
   I think this PR makes DataFusion better than it is on master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz commented on pull request #10636: ARROW-13153: [C++] `parquet_dataset` loses ordering of files in `_metadata`

2021-07-14 Thread GitBox



bkietz commented on pull request #10636:
URL: https://github.com/apache/arrow/pull/10636#issuecomment-880169894


   @westonpace  needs rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10629: ARROW-13218: [Doc] Document/clarify conventions for timestamp storage

2021-07-14 Thread GitBox



bkietz closed pull request #10629:
URL: https://github.com/apache/arrow/pull/10629


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10628: ARROW-12364: [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-07-14 Thread GitBox



bkietz closed pull request #10628:
URL: https://github.com/apache/arrow/pull/10628


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-07-14 Thread GitBox



alamb commented on issue #349:
URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-880147276


   Thank you @MichaelBitard  for taking the time to report it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox



bkietz closed pull request #10720:
URL: https://github.com/apache/arrow/pull/10720


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox



lidavidm commented on pull request #10557:
URL: https://github.com/apache/arrow/pull/10557#issuecomment-880137081


   Should be fine. It would certainly trim down the inner loop a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox



bkietz commented on pull request #10557:
URL: https://github.com/apache/arrow/pull/10557#issuecomment-880136525


   @lidavidm what would you think about just raising an error for top level 
nulls? It doesn't seem like a useful case to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox



bkietz commented on a change in pull request #10608:
URL: https://github.com/apache/arrow/pull/10608#discussion_r669873043



##
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##
@@ -676,7 +677,339 @@ void AddPrimitiveIfElseKernels(const 
std::shared_ptr& scalar_fun
   }
 }
 
-}  // namespace
+// Helper to copy or broadcast fixed-width values between buffers.
+template 
+struct CopyFixedWidth {};
+template <>
+struct CopyFixedWidth {
+  static void CopyScalar(const Scalar& scalar, const int64_t length,
+ uint8_t* raw_out_values, const int64_t out_offset) {
+const bool value = UnboxScalar::Unbox(scalar);
+BitUtil::SetBitsTo(raw_out_values, out_offset, length, value);
+  }
+  static void CopyArray(const DataType&, const uint8_t* in_values,
+const int64_t in_offset, const int64_t length,
+uint8_t* raw_out_values, const int64_t out_offset) {
+arrow::internal::CopyBitmap(in_values, in_offset, length, raw_out_values, 
out_offset);
+  }
+};
+template 
+struct CopyFixedWidth> {
+  using CType = typename TypeTraits::CType;
+  static void CopyScalar(const Scalar& scalar, const int64_t length,
+ uint8_t* raw_out_values, const int64_t out_offset) {
+CType* out_values = reinterpret_cast(raw_out_values);
+const CType value = UnboxScalar::Unbox(scalar);
+std::fill(out_values + out_offset, out_values + out_offset + length, 
value);
+  }
+  static void CopyArray(const DataType&, const uint8_t* in_values,
+const int64_t in_offset, const int64_t length,
+uint8_t* raw_out_values, const int64_t out_offset) {
+std::memcpy(raw_out_values + out_offset * sizeof(CType),
+in_values + in_offset * sizeof(CType), length * sizeof(CType));
+  }
+};
+template 
+struct CopyFixedWidth> {
+  static void CopyScalar(const Scalar& values, const int64_t length,
+ uint8_t* raw_out_values, const int64_t out_offset) {
+const int32_t width =
+checked_cast(*values.type).byte_width();
+uint8_t* next = raw_out_values + (width * out_offset);
+const auto& scalar = checked_cast(values);
+// Scalar may have null value buffer
+if (!scalar.value) return;
+DCHECK_EQ(scalar.value->size(), width);
+for (int i = 0; i < length; i++) {
+  std::memcpy(next, scalar.value->data(), width);
+  next += width;
+}
+  }
+  static void CopyArray(const DataType& type, const uint8_t* in_values,
+const int64_t in_offset, const int64_t length,
+uint8_t* raw_out_values, const int64_t out_offset) {
+const int32_t width = checked_cast(type).byte_width();
+uint8_t* next = raw_out_values + (width * out_offset);
+std::memcpy(next, in_values + in_offset * width, length * width);
+  }
+};
+template 
+struct CopyFixedWidth> {
+  using ScalarType = typename TypeTraits::ScalarType;
+  static void CopyScalar(const Scalar& values, const int64_t length,
+ uint8_t* raw_out_values, const int64_t out_offset) {
+const int32_t width =
+checked_cast(*values.type).byte_width();
+uint8_t* next = raw_out_values + (width * out_offset);
+const auto& scalar = checked_cast(values);
+const auto value = scalar.value.ToBytes();
+for (int i = 0; i < length; i++) {
+  std::memcpy(next, value.data(), width);
+  next += width;
+}
+  }
+  static void CopyArray(const DataType& type, const uint8_t* in_values,
+const int64_t in_offset, const int64_t length,
+uint8_t* raw_out_values, const int64_t out_offset) {
+const int32_t width = checked_cast(type).byte_width();
+uint8_t* next = raw_out_values + (width * out_offset);
+std::memcpy(next, in_values + in_offset * width, length * width);
+  }
+};
+// Copy fixed-width values from a scalar/array datum into an output values 
buffer
+template 
+void CopyValues(const Datum& in_values, const int64_t in_offset, const int64_t 
length,
+uint8_t* out_valid, uint8_t* out_values, const int64_t 
out_offset) {
+  if (in_values.is_scalar()) {
+const auto& scalar = *in_values.scalar();
+if (out_valid) {
+  BitUtil::SetBitsTo(out_valid, out_offset, length, scalar.is_valid);
+}
+CopyFixedWidth::CopyScalar(scalar, length, out_values, out_offset);
+  } else {
+const ArrayData& array = *in_values.array();
+if (out_valid) {
+  if (array.MayHaveNulls()) {
+arrow::internal::CopyBitmap(array.buffers[0]->data(), array.offset + 
in_offset,
+length, out_valid, out_offset);
+  } else {
+BitUtil::SetBitsTo(out_valid, out_offset, length, true);
+  }
+}
+CopyFixedWidth::CopyArray(*array.type, array.buffers[1]->data(),
+array.offset + in_offset, length, 
out_values,
+

[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox



thisisnic commented on pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#issuecomment-880129596


   @nealrichardson - have made some updates; please could you re-review this 
when you have a chance?  Tomorrow I'm going to add in the tests for 
warnings/errors raised when incorrect parameter values are supplied but is 
there anything else that needs updating in this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10606: ARROW-13005: [C++] Add support for take implementation on dense union type

2021-07-14 Thread GitBox



bkietz closed pull request #10606:
URL: https://github.com/apache/arrow/pull/10606


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox



thisisnic commented on pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#issuecomment-880113887


   This PR now also contains some unrelated styling changes as I ran styler on 
the files I changed before pushing my changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset

2021-07-14 Thread GitBox



lidavidm commented on a change in pull request #10693:
URL: https://github.com/apache/arrow/pull/10693#discussion_r669830370



##
File path: docs/source/python/dataset.rst
##
@@ -456,20 +456,163 @@ is materialized as columns when reading the data and can 
be used for filtering:
 dataset.to_table().to_pandas()
 dataset.to_table(filter=ds.field('year') == 2019).to_pandas()
 
+Another benefit of manually scheduling the files is that the order of the files
+controls the order of the data.  When performing an ordered read (or a read to
+a table) then the rows returned will match the order of the files given.  This
+only applies when the dataset is constructed with a list of files.  There
+are no order guarantees given when the files are instead discovered by scanning

Review comment:
   Ah thanks. Sorry for the churn here @westonpace - we should keep stating 
that there's no guarantee then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset

2021-07-14 Thread GitBox



bkietz commented on a change in pull request #10693:
URL: https://github.com/apache/arrow/pull/10693#discussion_r669825752



##
File path: docs/source/python/dataset.rst
##
@@ -456,20 +456,163 @@ is materialized as columns when reading the data and can 
be used for filtering:
 dataset.to_table().to_pandas()
 dataset.to_table(filter=ds.field('year') == 2019).to_pandas()
 
+Another benefit of manually scheduling the files is that the order of the files
+controls the order of the data.  When performing an ordered read (or a read to
+a table) then the rows returned will match the order of the files given.  This
+only applies when the dataset is constructed with a list of files.  There
+are no order guarantees given when the files are instead discovered by scanning

Review comment:
   We don't guarantee order for selectors because ARROW-8163 (asynchronous 
fragment discovery) might not guarantee order. Lexicographic sorting *could* be 
maintained for synchronous discovery from a selector, but in general we'd want 
to push a fragment into scan as soon as it's yielded by 
`FileSystem::GetFileInfoGenerator`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox



alamb commented on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880076415






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox



github-actions[bot] commented on pull request #10720:
URL: https://github.com/apache/arrow/pull/10720#issuecomment-880075978


   https://issues.apache.org/jira/browse/ARROW-13341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] bkietz opened a new pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox



bkietz opened a new pull request #10720:
URL: https://github.com/apache/arrow/pull/10720


   Multiple threads starting DoConsume would already have incremented 
`num_received_`, so if one were delayed another might erroneously begin to 
merge/finalize (leaving invalidated states)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] jgoday commented on pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox



jgoday commented on pull request #687:
URL: https://github.com/apache/arrow-datafusion/pull/687#issuecomment-880075811


   > > @Jimexist do you think this PR is ready? Do you need help reviewing ?
   > 
   > looks okay after rebasing.
   
   Hi @Jimexist, just made the rebase from master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox



lvheyang commented on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880059465


   @alamb I think the root problem is some of the scalar functions ( such as 
abs/ sin/ cos/ pow ) are not monotonous. We cannot prune all the rows when 
`fun(min) < Value` or `fun(max) > Value` is false. Do you have any idea how 
should we deal with such a situation?
   
   A possible way is to maintain a map, to mark the scalar functions' 
monotonicity. If the function is not monotonous, we would stop building the 
pruning predicate. Do you think if it is acceptable? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox



anthonylouisbsb commented on a change in pull request #10711:
URL: https://github.com/apache/arrow/pull/10711#discussion_r669769114



##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+*out_len = 0;
+return "";
+  }
+
+  char* ret = reinterpret_cast(gdv_fn_context_arena_malloc(context, 
*out_len));
+  if (ret == nullptr) {
+gdv_fn_context_set_error_msg(context, "Could not allocate memory for 
output string");
+*out_len = 0;
+return "";
+  }
+
+  memcpy(ret, char_buffer, *out_len);
+  return ret;
+}
+
+FORCE_INLINE
+const char* from_unixtime_int64_utf8(gdv_int64 context, gdv_timestamp in,
+ const char* pattern, gdv_int32 
pattern_len,
+ gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+  gdv_int64 millis = in % MILLIS_IN_SEC;
+
+  if (pattern_len <= 0) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern size");
+*out_len = 0;
+return "";
+  }
+
+  static const int kTimeStampStringLen = pattern_len;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  const char* regex_format =
+  "y{4}(-[M]{2})?+.*?(-[d]{2})?+.*?( [h]{2})?+.*?"
+  "(:[mm]{2})?+.*?(:[s]{2})?+.*?(.[s]{3})?+.*?";
+  bool match = std::regex_match(pattern, std::regex(regex_format));
+
+  if (!match) {
+gdv_fn_context_set_error_msg(context, "Invalid allowed pattern");
+*out_len = 0;
+return "";
+  }
+
+  // length from pattern
+  int res = 0;
+
+  switch (pattern_len) {
+// 
+case 4:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64, year);
+  break;
+// -MM
+case 7:
+  res = snprintf(char_buffer, char_buffer_length, "%04" PRId64 "-%02" 
PRId64, year,
+ month);
+  break;
+// -MM-dd
+case 10:

Review comment:
   What happens if the user defines a pattern like this: `dd-MM-`, the 
method will process it correctly?

##
File path: cpp/src/gandiva/precompiled/time.cc
##
@@ -841,6 +843,161 @@ gdv_int64 
castBIGINT_daytimeinterval(gdv_day_time_interval in) {
  extractDay_daytimeinterval(in) * MILLIS_IN_DAY;
 }
 
+FORCE_INLINE
+const char* from_unixtime_int64(gdv_int64 context, gdv_timestamp in, 
gdv_int32* out_len) {
+  gdv_int64 year = extractYear_timestamp(in);
+  gdv_int64 month = extractMonth_timestamp(in);
+  gdv_int64 day = extractDay_timestamp(in);
+  gdv_int64 hour = extractHour_timestamp(in);
+  gdv_int64 minute = extractMinute_timestamp(in);
+  gdv_int64 second = extractSecond_timestamp(in);
+
+  static const int kTimeStampStringLen = 19;
+  const int char_buffer_length = kTimeStampStringLen + 1;  // snprintf adds \0
+  char char_buffer[char_buffer_length];
+
+  // -MM-dd hh:mm:ss
+  int res = snprintf(char_buffer, char_buffer_length,
+ "%04" PRId64 "-%02" PRId64 "-%02" PRId64 " %02" PRId64 
":%02" PRId64
+ ":%02" PRId64,
+ year, month, day, hour, minute, second);
+  if (res < 0) {
+gdv_fn_context_set_error_msg(context, "Could not format the timestamp");
+*out_len = 0;
+return "";
+  }
+
+  *out_len = kTimeStampStringLen;
+
+  if (*out_len <= 0) {
+if (*out_len < 0) {
+  gdv_fn_context_set_error_msg(context, "Length of output string cannot be 
negative");
+}
+

[GitHub] [arrow-rs] Jimexist commented on pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox



Jimexist commented on pull request #552:
URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880037932


   ```
   sort 2^10   time:   [110.68 us 111.64 us 112.55 us]
   change: [-14.710% -13.112% -11.406%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
 2 (2.00%) high mild
 1 (1.00%) high severe
   
   sort 2^12   time:   [516.32 us 519.97 us 523.58 us]
   change: [-10.932% -9.5913% -8.2390%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
 2 (2.00%) high mild
 1 (1.00%) high severe
   
   sort nulls 2^10 time:   [88.568 us 89.197 us 89.907 us]
   change: [-20.378% -18.966% -17.484%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
 4 (4.00%) high mild
   
   sort nulls 2^12 time:   [372.34 us 375.56 us 378.70 us]
   change: [-25.293% -24.019% -22.641%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
 2 (2.00%) low mild
 1 (1.00%) high mild
 4 (4.00%) high severe
   
   bool sort 2^12  time:   [184.53 us 185.82 us 187.17 us]
   change: [-60.022% -59.238% -58.451%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
 3 (3.00%) high mild
 3 (3.00%) high severe
   
   bool sort nulls 2^12time:   [210.24 us 213.40 us 216.47 us]
   change: [-54.063% -53.122% -52.165%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
 2 (2.00%) high mild
 1 (1.00%) high severe
   
   sort 2^12 limit 10  time:   [61.881 us 62.244 us 62.614 us]
   change: [-8.6258% -6.5015% -4.3522%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
 3 (3.00%) high mild
 5 (5.00%) high severe
   
   sort 2^12 limit 100 time:   [67.314 us 67.729 us 68.205 us]
   change: [-5.5755% -3.5509% -1.4872%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
 5 (5.00%) high mild
 3 (3.00%) high severe
   
   sort 2^12 limit 1000time:   [178.07 us 179.37 us 180.82 us]
   change: [-2.1034% +0.0621% +2.3026%] (p = 0.96 > 
0.05)
   No change in performance detected.
   Found 13 outliers among 100 measurements (13.00%)
 3 (3.00%) high mild
 10 (10.00%) high severe
   
   sort 2^12 limit 2^12time:   [459.51 us 462.14 us 464.90 us]
   change: [-25.238% -23.036% -21.081%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
 3 (3.00%) high mild
 3 (3.00%) high severe
   
   sort nulls 2^12 limit 10
   time:   [117.28 us 118.29 us 119.29 us]
   change: [+7.7135% +9.3966% +11.186%] (p = 0.00 < 
0.05)
   Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
 6 (6.00%) low mild
 2 (2.00%) high mild
 5 (5.00%) high severe
   
   sort nulls 2^12 limit 100
   time:   [106.53 us 107.89 us 109.50 us]
   change: [-6.4134% -4.0486% -1.6589%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
 3 (3.00%) high mild
 1 (1.00%) high severe
   
   sort nulls 2^12 limit 1000
   time:   [113.71 us 114.30 us 114.92 us]
   change: [-5.9266% -4.5141% -3.0736%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
 5 (5.00%) high mild
 2 (2.00%) high severe
   
   sort nulls 2^12 limit 2^12
   time:   [336.84 us 340.39 us 344.08 us]
   change: [-31.546% -30.224% -28.822%] (p = 0.00 < 
0.05)
   Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
 3 (3.00%) high mild
 2 (2.00%) high severe
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] codecov-commenter commented on pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox



codecov-commenter commented on pull request #552:
URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880036383


   # 
[Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#552](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8418f1c) into 
[master](https://codecov.io/gh/apache/arrow-rs/commit/fc78af6324513cc3da9fea8c80658d85dfcd8263?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fc78af6) will **not change** coverage.
   > The diff coverage is `90.90%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow-rs/pull/552/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master #552   +/-   ##
   ===
 Coverage   82.47%   82.47%   
   ===
 Files 167  167   
 Lines   4614246142   
   ===
 Hits3805638056   
 Misses   8086 8086   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow-rs/pull/552/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9zb3J0LnJz)
 | `94.15% <90.90%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Last update 
[fc78af6...8418f1c](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pachadotdev commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



pachadotdev commented on a change in pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#discussion_r669741245



##
File path: dev/archery/archery/crossbow/cli.py
##
@@ -233,6 +233,27 @@ def latest_prefix(obj, prefix, fetch):
 click.echo(latest.branch)
 
 
+@crossbow.command()
+@click.argument('job-name', required=True)
+@click.pass_obj
+def save_report_data(obj, job_name):
+"""
+Just print there the state of the job
+"""
+output = obj['output']
+
+queue = obj['queue']
+print(dir(queue))

Review comment:
   not really




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] Jimexist commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox



Jimexist commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880028583


   anyone taking this? also is there any reference implementation in other 
languages?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] lvheyang edited a comment on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox



lvheyang edited a comment on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602


   I have reproduced this problem. I found the problem is in 
datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. 
   
   In its comment :
   
   ```
   /// A pruning predicate is one that has been rewritten in terms of
   /// the min and max values of column references and that evaluates
   /// to FALSE if the filter predicate would evaluate FALSE *for
   /// every row* whose values fell within the min / max ranges (aka
   /// could be pruned).
   ```
   
   It emphasizes that if the pruning predicate evaluates to FALSE , then the 
filter should evaluates FALSE for **every row**
   
   In this case, it built the pruning predicate as `abs(c0_min - 
251.10794896957802) < 1`, unluckily it's not right. 
   
   c0_min is 107.0090813093981, this pruning predicate would evaluate False.
   
   But for the row with value 251.10794896957802, this predicate evaluate True. 
   
   So it violates the rule in the comment, this causes the whole row group was 
omitted, so there is no result when compared with 1 or 111.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox



lvheyang commented on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602


   I have reproduced this problem. I found the problem is in 
datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. 
   
   In its comment :
   
   ```
   /// A pruning predicate is one that has been rewritten in terms of
   /// the min and max values of column references and that evaluates
   /// to FALSE if the filter predicate would evaluate FALSE *for
   /// every row* whose values fell within the min / max ranges (aka
   /// could be pruned).
   ```
   
   It emphasizes that if the pruning predicate evaluates to FALSE , then the 
filter should evaluates FALSE for **every row**
   
   In this case, it built the pruning predicate as `abs(c0_min - 
251.10794896957802) < 1`, unluckily it's not right. 
   
   c0_min is 107.0090813093981, this pruning predicate would evaluate False.
   
   But for the row with value 251.10794896957802, this predicate evaluate True. 
   
   So it violates the rule in the comment, and this causes the whole row group 
was omitted, so there is no result when compared with 1 or 111.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox



ursabot edited a comment on pull request #10608:
URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122


   Benchmark runs are scheduled for baseline = 
9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = 
e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available as each 
benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark 
groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 
(mimalloc)](https://conbench.ursa.dev/compare/runs/d391a5e50d69453b8c1648f74e28b5e1...0b2618a2ff3f4c8e879c59938408b00f/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on 
ursa-i9-9960x] [ursa-i9-9960x 
(mimalloc)](https://conbench.ursa.dev/compare/runs/158e7a4062ba424aa32aab6410f2675f...354ae9f639ba47928a0c6e4c3781d22c/)
   [Finished :arrow_down:0.53% :arrow_up:0.05%] [ursa-thinkcentre-m75q 
(mimalloc)](https://conbench.ursa.dev/compare/runs/4df834022c414dd39486a4bbca516589...f2036834887e4b4b8c6ef3cf6b65c8f6/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] Jimexist opened a new issue #553: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox



Jimexist opened a new issue #553:
URL: https://github.com/apache/arrow-rs/issues/553


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   
   use sort_unstable_by in primitive sorting so that we can have:
   1. better memory footprint
   2. faster sort
   3. consistent stableness behavior with and without `limit`
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   use sort_unstable_by in primitive sorting
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] Jimexist opened a new pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox



Jimexist opened a new pull request #552:
URL: https://github.com/apache/arrow-rs/pull/552


   # Which issue does this PR close?
   
   use 
[`sort_unstable_by`](https://doc.rust-lang.org/std/primitive.slice.html#method.sort_unstable_by)
 in primitive sorting
   
   Closes #.
   
   # Rationale for this change

   1. less memory usage
   2. likely faster
   
   # What changes are included in this PR?
   
   
   
   # Are there any user-facing changes?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs edited a comment on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs edited a comment on pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349


   > nevermind... a spaces problem, which led to
   > 
   > ```
   > ___ ERROR collecting archery/crossbow/tests/test_reports.py 

   > archery/crossbow/tests/test_reports.py:20: in 
   > from archery.crossbow.core import yaml
   > archery/crossbow/__init__.py:19: in 
   > from .reports import CommentReport, ConsoleReport, EmailReport  # noqa
   > archery/crossbow/reports.py:124: in 
   > class JsonReport(Report):
   > archery/crossbow/reports.py:175: in JsonReport
   > self.getJsonTasks()
   > E   NameError: name 'self' is not defined
   > ```
   
   `self` is avilable in the method's body, but it seems like that line is 
outside of the method. Could you try to indent that line to align with the rest 
of the method's body?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349


   > nevermind... a spaces problem, which led to
   > 
   > ```
   > ___ ERROR collecting archery/crossbow/tests/test_reports.py 

   > archery/crossbow/tests/test_reports.py:20: in 
   > from archery.crossbow.core import yaml
   > archery/crossbow/__init__.py:19: in 
   > from .reports import CommentReport, ConsoleReport, EmailReport  # noqa
   > archery/crossbow/reports.py:124: in 
   > class JsonReport(Report):
   > archery/crossbow/reports.py:175: in JsonReport
   > self.getJsonTasks()
   > E   NameError: name 'self' is not defined
   > ```
   
   `self` is avilable in the method's budy, but it seems like that line is 
outside of the method. Could you try to indent that line to align with the rest 
of the method's body?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#issuecomment-880018883


   > I got stuck with this
   > 
   > ```
   > archery/crossbow/tests/test_reports.py:20: in 
   > from archery.crossbow.core import yaml
   > archery/crossbow/__init__.py:19: in 
   > from .reports import CommentReport, ConsoleReport, EmailReport  # noqa
   > E File 
"/home/pacha/github/arrow/dev/archery/archery/crossbow/reports.py", line 175
   > E   self.getJsonTasks()
   > E  ^
   > E   IndentationError: unindent does not match any outer indentation level
   > ```
   
   This is usually due to an extra or missing whitespace at the begining of the 
line. Python requires consistent indentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] rok commented on pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-07-14 Thread GitBox



rok commented on pull request #10476:
URL: https://github.com/apache/arrow/pull/10476#issuecomment-880017358


   Thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jonkeane closed pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-07-14 Thread GitBox



jonkeane closed pull request #10476:
URL: https://github.com/apache/arrow/pull/10476


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan commented on issue #725: Global limit isn't really limiting parquet file reads and stops earlier

2021-07-14 Thread GitBox



Dandandan commented on issue #725:
URL: 
https://github.com/apache/arrow-datafusion/issues/725#issuecomment-880011030


   I added some form of limit push down to parquet some time ago.
   Might be that it isn't applied to your dataset somehow? Or maybe getting the 
metadata / statistics itself might be slow?
   
   https://github.com/apache/arrow/pull/9672


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on pull request #10659: ARROW-12122: [Python] Cannot install via pip M1 mac

2021-07-14 Thread GitBox



kszucs commented on pull request #10659:
URL: https://github.com/apache/arrow/pull/10659#issuecomment-880010837


   @wesm @xhochy could you please verify locally the produced wheels?
   - 
[pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl](https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-arm64/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl)
   - 
[pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl](https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-universal2/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl)
   
   There is an universal2 installer for Python 3.9: 
https://www.python.org/ftp/python/3.9.6/python-3.9.6-macos11.pkg, can be 
installed using:
   
   ```bash
   arrow/ci/scripts/install_python.sh macos 3.9
   ```
   
   Verify the `arm64` wheel:
   
   ```bash
   rm -rf arrow/python/dist
   export ARROW_FLIGHT=OFF
   wget -P arrow/python/dist 
https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-arm64/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl
 
   arch -arm64 arrow/ci/scripts/python_wheel_macos_test.sh
   ```
   
   Verify the `universal2` wheel:
   
   ```bash
   rm -rf arrow/python/dist
   export ARROW_FLIGHT=OFF
   wget -P arrow/python/dist 
https://github.com/ursacomputing/crossbow/releases/download/actions-601-github-wheel-macos-big-sur-cp39-universal2/pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_universal2.whl
   arch -arm64 arrow/ci/scripts/python_wheel_macos_test.sh
   arch -x86_64 arrow/ci/scripts/python_wheel_macos_test.sh
   ```
   
   (have not tested the snippets, the paths might be wrong, but the test script 
looks for wheels under `arrow/python/dist`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



pachadotdev commented on pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#issuecomment-880009474


   nevermind... a spaces problem, which led to
   ```
   ___ ERROR collecting archery/crossbow/tests/test_reports.py 

   archery/crossbow/tests/test_reports.py:20: in 
   from archery.crossbow.core import yaml
   archery/crossbow/__init__.py:19: in 
   from .reports import CommentReport, ConsoleReport, EmailReport  # noqa
   archery/crossbow/reports.py:124: in 
   class JsonReport(Report):
   archery/crossbow/reports.py:175: in JsonReport
   self.getJsonTasks()
   E   NameError: name 'self' is not defined
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



pachadotdev commented on pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#issuecomment-880008583


   I got stuck with this
   ```
   archery/crossbow/tests/test_reports.py:20: in 
   from archery.crossbow.core import yaml
   archery/crossbow/__init__.py:19: in 
   from .reports import CommentReport, ConsoleReport, EmailReport  # noqa
   E File 
"/home/pacha/github/arrow/dev/archery/archery/crossbow/reports.py", line 175
   E   self.getJsonTasks()
   E  ^
   E   IndentationError: unindent does not match any outer indentation level
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox



lidavidm commented on pull request #10412:
URL: https://github.com/apache/arrow/pull/10412#issuecomment-880001407


   Merged, thanks. This should unblock ARROW-9431 if you do still plan to look 
at it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm closed pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox



lidavidm closed pull request #10412:
URL: https://github.com/apache/arrow/pull/10412


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] MichaelBitard commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-07-14 Thread GitBox



MichaelBitard commented on issue #349:
URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-87541


   Oops, you are right, sorry. 
   
   If I generate the sample.parquet with the latest version, it not longer 
hangs during reading.
   
   Thanks for noticing and sorry again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox



nirandaperera commented on pull request #10412:
URL: https://github.com/apache/arrow/pull/10412#issuecomment-879991878


   > I think I've addressed all the feedback here.
   
   I'm +1 for this!  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on a change in pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045



##
File path: dev/archery/archery/crossbow/reports.py
##
@@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):
asset))
 
 
+class JsonReport(Report):

Review comment:
   You can run unittest by 
   
   ```bash
   cd arrow/dev/archery
   pip install -e .[crossbow]  # install archery in develop/editable mode
   pytest -sv archery/crossbow/tests/test_reports.py  # execute specific 
archery unittests
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on a change in pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045



##
File path: dev/archery/archery/crossbow/reports.py
##
@@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):
asset))
 
 
+class JsonReport(Report):

Review comment:
   You can run unittest by 
   
   ```bash
   cd arrow/dev/archery
   pip install -e .[crossbow]  # install archery in develop mode
   pytest -sv archery/crossbow/tests/test_reports.py  # execute specific 
archery unittests
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox



ursabot commented on pull request #10608:
URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122


   Benchmark runs are scheduled for baseline = 
9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = 
e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available as each 
benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark 
groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 
(mimalloc)](https://conbench.ursa.dev/compare/runs/d391a5e50d69453b8c1648f74e28b5e1...0b2618a2ff3f4c8e879c59938408b00f/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on 
ursa-i9-9960x] [ursa-i9-9960x 
(mimalloc)](https://conbench.ursa.dev/compare/runs/158e7a4062ba424aa32aab6410f2675f...354ae9f639ba47928a0c6e4c3781d22c/)
   [Scheduled] [ursa-thinkcentre-m75q 
(mimalloc)](https://conbench.ursa.dev/compare/runs/4df834022c414dd39486a4bbca516589...f2036834887e4b4b8c6ef3cf6b65c8f6/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox



lidavidm commented on pull request #10608:
URL: https://github.com/apache/arrow/pull/10608#issuecomment-879985810


   @ursabot please benchmark lang=C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox



lidavidm commented on pull request #10412:
URL: https://github.com/apache/arrow/pull/10412#issuecomment-879985272


   I think I've addressed all the feedback here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on a change in pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#discussion_r669711765



##
File path: dev/archery/archery/crossbow/reports.py
##
@@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):
asset))
 
 
+class JsonReport(Report):
+
+HEADER = textwrap.dedent("""
+Arrow Build Report for Job {job_name}
+
+All tasks: {all_tasks_url}
+""")
+
+TASK = textwrap.dedent("""
+  - {name}:
+URL: {url}
+""").strip()
+
+STATUS_HEADERS = {
+# from CombinedStatus
+'error': 'Errored Tasks:',
+'failure': 'Failed Tasks:',
+'pending': 'Pending Tasks:',
+'success': 'Succeeded Tasks:',
+}
+
+def __init__(self, job):
+super().__init__(job)
+
+def url(self, query):
+repo_url = self.job.queue.remote_url.strip('.git')
+return '{}/branches/all?query={}'.format(repo_url, query)
+
+def todayStr(self):
+date = datetime.utcnow()
+return "{}-{}-{}".format(date.year, date.month, date.day)
+
+def tasksToDict(self, date, tasks):
+jsonTasks = []
+for task_name, task in tasks.items():
+jsonTasks.append({

Review comment:
   Do we plan to store the log as well or jus the metadata about the build?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages

2021-07-14 Thread GitBox



lidavidm commented on a change in pull request #10663:
URL: https://github.com/apache/arrow/pull/10663#discussion_r669705163



##
File path: cpp/src/arrow/flight/serialization_internal.cc
##
@@ -201,9 +193,7 @@ grpc::Status FlightDataSerialize(const FlightPayload& msg, 
ByteBuffer* out,
   // Write the descriptor if present
   int32_t descriptor_size = 0;
   if (msg.descriptor != nullptr) {
-if (msg.descriptor->size() > kInt32Max) {
-  return ToGrpcStatus(Status::CapacityError("Descriptor size overflow (>= 
2**31)"));

Review comment:
   See https://github.com/grpc/grpc/issues/26695




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] thisisnic commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox



thisisnic commented on a change in pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#discussion_r669142885



##
File path: r/src/compute.cpp
##
@@ -316,6 +316,19 @@ std::shared_ptr 
make_compute_options(
 return std::make_shared(max_splits, reverse);
   }
 
+  if (func_name == "utf8_slice_codeunits") {
+using Options = arrow::compute::SliceOptions;
+int64_t start = 1;
+int64_t stop = -1;

Review comment:
   Sorry, I didn't explain this properly, my fault!  What I mean is that 
here stop has been set to `-1` but the default C++ value isn't `-1`, so the 
default value here probably shouldn't be `-1` either.
   
   Check out this line of code here that shows the default value of `stop` in 
the C++: 
https://github.com/apache/arrow/blob/7114c4b6fdb639b3500d77cfd66649af8c5c5e6b/cpp/src/arrow/compute/api_scalar.h#L205-L206
 
   
   The default value of `stop` is `std::numeric_limits::max()` which 
is the biggest int64.  So if you don't supply a value for `stop` and just use 
the default, this allows you to slice to the end of the string.
   
   In some of the other functions in this file, this line has been used to set 
the default values to the ones from the C++ code:
   `std::make_shared(Options::Defaults());`
   
   Maybe instead of manually setting the value of stop to `-1`, if you use the 
line above, it might make it easier to fix some of the failing tests as now 
you'll be able to slice to the end of a string.  If this doesn't make sense, 
let me know!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Jimexist opened a new issue #725: Global limit isn't really limiting parquet file reads and stops earlier

2021-07-14 Thread GitBox



Jimexist opened a new issue #725:
URL: https://github.com/apache/arrow-datafusion/issues/725


   **Describe the bug**
   A clear and concise description of what the bug is.
   
   When given a global limit:
   
   ```sql
   select * from some_large_data limit 50;
   ```
   
   even with a `-c` being 100 i.e. small batch size, the global limit isn't 
really cutting running time small.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   this is easily reproducible given a large data set.
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox



kszucs commented on a change in pull request #10650:
URL: https://github.com/apache/arrow/pull/10650#discussion_r669699660



##
File path: dev/archery/archery/crossbow/reports.py
##
@@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):
asset))
 
 
+class JsonReport(Report):

Review comment:
   We don't have many tests yet for crossbow, but this looks like one we 
could actually test.
   
   You can [reuse the test case including the 
fixture](https://github.com/apache/arrow/blob/master/dev/archery/archery/crossbow/tests/test_reports.py#L24-L35)
 from `crossbow/test_reports.py`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 173 matches

Mail list logo