[GitHub] [arrow-rs] sunchao commented on a change in pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail
sunchao commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r651247944 ## File path: parquet/src/data_type.rs ## @@ -661,8 +661,15 @@ pub(crate) mod private { _: W, bit_writer: BitWriter, ) -> Result<()> { +if bit_writer.bytes_written() + values.len() >= bit_writer.capacity() { Review comment: Hmm I'm sorry but can you elaborate why the unit of `values` is also byte? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #561: Fix pruning on not equal predicate
alamb commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651282910 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -553,12 +553,14 @@ fn build_predicate_expression( let corrected_op = expr_builder.correct_operator(op); let statistics_expr = match corrected_op { Operator::NotEq => { -// column != literal => (min, max) = literal => min > literal || literal > max +// column != literal => (min, max) = literal => +// !(min != literal && max != literal) ==> +// min != literal || literal != max let min_column_expr = expr_builder.min_column_expr()?; let max_column_expr = expr_builder.max_column_expr()?; min_column_expr -.gt(expr_builder.scalar_expr().clone()) -.or(expr_builder.scalar_expr().clone().gt(max_column_expr)) +.not_eq(expr_builder.scalar_expr().clone()) +.or(expr_builder.scalar_expr().clone().not_eq(max_column_expr)) Review comment: The way I think about it = predicate is effectively `!(col == min && col == max)` which when you apply Demorgan's law you get `(col != min) || (col != max)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10520: ARROW-12709: [C++] Add binary_join_element_wise
lidavidm commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861035926 The null handling behavior never affects the separator: it only describes how to handle the other values. It's intended to let you mimic libcudf. ``` binary_join_element_wise(['a', 'b', 'c', 'h'], ['d', null, 'e', 'i'], ['f', 'g', null, 'j'], [';', '-', ',', null]) EMIT_NULL:['a', null, null, null] SKIP: ['a;d;f', 'b-g', 'c,e', null] # b and g are joined; the middle null value is ignored REPLACE(replacement=foo): ['a;d;f', 'b-foo-g', 'c,e,foo', null] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #10520: ARROW-12709: [C++] Add binary_join_element_wise
ianmcook commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861046053 Oh, I see; I was confused about what happens when the separator is an array; I see now. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #520: Implement window functions with `order_by` clause
Jimexist commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r651361200 ## File path: datafusion/src/physical_plan/windows.rs ## @@ -156,31 +162,72 @@ impl WindowExpr for BuiltInWindowExpr { self.window.expressions() } -fn create_accumulator() -> Result> { -self.window.create_accumulator() +fn partition_by() -> &[Arc] { +_by +} + +fn order_by() -> &[PhysicalSortExpr] { +_by +} + +fn evaluate(, batch: ) -> Result { +// FIXME, for now we assume all the rows belong to the same partition, which will not be the +// case when partition_by is supported, in which case we'll parallelize the calls. +// See https://github.com/apache/arrow-datafusion/issues/299 +let values = self.evaluate_args(batch)?; +self.window.evaluate(batch.num_rows(), ) } } /// A window expr that takes the form of an aggregate function #[derive(Debug)] pub struct AggregateWindowExpr { aggregate: Arc, +partition_by: Vec>, +order_by: Vec, +window_frame: Option, } -#[derive(Debug)] -struct AggregateWindowAccumulator { -accumulator: Box, -} +impl AggregateWindowExpr { +/// the aggregate window function operates based on window frame, and by default the mode is +/// "range". +fn evaluation_mode() -> WindowFrameUnits { +self.window_frame.unwrap_or_default().units +} -impl WindowAccumulator for AggregateWindowAccumulator { -fn scan( self, values: &[ScalarValue]) -> Result> { -self.accumulator.update(values)?; -Ok(None) +/// create a new accumulator based on the underlying aggregation function +fn create_accumulator() -> Result { +let accumulator = self.aggregate.create_accumulator()?; +Ok(AggregateWindowAccumulator { accumulator }) } -/// returns its value based on its current state. -fn evaluate() -> Result> { -Ok(Some(self.accumulator.evaluate()?)) +/// peer based evaluation based on the fact that batch is pre-sorted given the sort columns +/// and then per partition point we'll evaluate the peer group (e.g. SUM or MAX gives the same +/// results for peers) and concatenate the results. +fn peer_based_evaluate(, batch: ) -> Result { Review comment: since this is private function i guess i can leave the naming part for later changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #558: Implement window functions with `partition_by` clause
codecov-commenter edited a comment on pull request #558: URL: https://github.com/apache/arrow-datafusion/pull/558#issuecomment-860569767 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/558?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#558](https://codecov.io/gh/apache/arrow-datafusion/pull/558?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4a7a499) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/e3e7e293c4482af1475406bf4e922d5c99e7a792?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e3e7e29) will **decrease** coverage by `0.03%`. > The diff coverage is `81.01%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/558/graphs/tree.svg?width=650=150=pr=JXwWBKD3D9_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/558?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #558 +/- ## == - Coverage 76.12% 76.08% -0.04% == Files 156 156 Lines 2707427121 +47 == + Hits2060920635 +26 - Misses 6465 6486 +21 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/558?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [datafusion/src/physical\_plan/window\_functions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi93aW5kb3dfZnVuY3Rpb25zLnJz) | `86.42% <ø> (+0.71%)` | :arrow_up: | | [datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc3FsL3BsYW5uZXIucnM=) | `84.75% <ø> (ø)` | | | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `79.84% <33.33%> (+2.30%)` | :arrow_up: | | [datafusion/src/physical\_plan/mod.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9tb2QucnM=) | `80.00% <70.96%> (+0.90%)` | :arrow_up: | | [datafusion/src/physical\_plan/windows.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi93aW5kb3dzLnJz) | `82.59% <75.00%> (-3.88%)` | :arrow_down: | | [...afusion/src/physical\_plan/expressions/nth\_value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9udGhfdmFsdWUucnM=) | `79.41% <75.67%> (-11.07%)` | :arrow_down: | | [datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvZXhlY3V0aW9uL2NvbnRleHQucnM=) | `92.13% <100.00%> (+0.13%)` | :arrow_up: | | [...fusion/src/physical\_plan/expressions/row\_number.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9yb3dfbnVtYmVyLnJz) | `94.28% <100.00%> (+13.03%)` | :arrow_up: | | [datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/558/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2FnZ3JlZ2F0ZS5ycw==) | `86.54% <100.00%> (ø)` | | |
[GitHub] [arrow-datafusion] Jimexist commented on pull request #520: Implement window functions with `order_by` clause
Jimexist commented on pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#issuecomment-861088442 > This looks very nice @Jimexist -- I went over the code and saw only goodness :) > > All that this PR needs to be mergeable in my opinion is to reset the Cargo `arrow*` references (now that arrow 4.3.0 has been released) thank you for taking time to review. the changes to arrow are now reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist edited a comment on pull request #520: Implement window functions with `order_by` clause
Jimexist edited a comment on pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#issuecomment-861088442 > This looks very nice @Jimexist -- I went over the code and saw only goodness :) > > All that this PR needs to be mergeable in my opinion is to reset the Cargo `arrow*` references (now that arrow 4.3.0 has been released) thank you for taking time to review. the changes to arrow references are now reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware
rok commented on a change in pull request #10457: URL: https://github.com/apache/arrow/pull/10457#discussion_r651290261 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) { CheckScalarUnary("quarter", unit, times, int64(), quarter); CheckScalarUnary("hour", unit, times, int64(), hour); CheckScalarUnary("minute", unit, times, int64(), minute); -CheckScalarUnary("second", unit, times, float64(), second); +CheckScalarUnary("second", unit, times, int64(), second); CheckScalarUnary("millisecond", unit, times, int64(), zeros); CheckScalarUnary("microsecond", unit, times, int64(), zeros); CheckScalarUnary("nanosecond", unit, times, int64(), zeros); CheckScalarUnary("subsecond", unit, times, float64(), zeros); } } -TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) { - std::string timezone = "Asia/Kolkata"; - const char* times = R"(["1970-01-01T00:00:59", null])"; +TEST(ScalarTemporalTest, TestZoned1) { + const char* times = + R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.9", + "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.0", + "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002", + "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132", + "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163", + "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45", + "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])"; + auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas"); + auto iso_calendar_type = + struct_({field("iso_year", int64()), field("iso_week", int64()), + field("iso_day_of_week", int64())}); + auto year = + "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, " + "2008, 2008, 2011, null]"; + auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]"; + auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]"; + auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]"; + auto day_of_year = + "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]"; + auto iso_year = + "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, " + "2008, 2008, 2011, null]"; + auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]"; + auto iso_calendar = + ArrayFromJSON(iso_calendar_type, +R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3}, +{"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2}, +{"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6}, +{"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2}, +{"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2}, +{"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1}, +{"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7}, +{"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3}, +{"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4}, +{"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6}, +{"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7}, +{"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6}, +{"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6}, +{"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6}, +{"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7}, +{"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])"); + auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]"; + auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]"; + auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]"; + auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]"; + auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]"; + auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]"; + auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]"; + auto subsecond = + "[0.123456789, 0.9, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, " + "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]"; - for (auto u : internal::AllTimeUnits()) { -auto unit = timestamp(u, timezone); -auto timestamps = ArrayFromJSON(unit, times); - -ASSERT_RAISES(Invalid, Year(timestamps)); -
[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #520: Implement window functions with `order_by` clause
Jimexist commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r651360294 ## File path: datafusion/src/physical_plan/expressions/nth_value.rs ## @@ -113,54 +111,32 @@ impl BuiltInWindowFunctionExpr for NthValue { } -fn create_accumulator() -> Result> { -Ok(Box::new(NthValueAccumulator::try_new( -self.kind, -self.data_type.clone(), -)?)) -} -} - -#[derive(Debug)] -struct NthValueAccumulator { -kind: NthValueKind, -offset: u32, -value: ScalarValue, -} - -impl NthValueAccumulator { -/// new count accumulator -pub fn try_new(kind: NthValueKind, data_type: DataType) -> Result { -Ok(Self { -kind, -offset: 0, -// null value of that data_type by default -value: ScalarValue::try_from(_type)?, -}) -} -} - -impl WindowAccumulator for NthValueAccumulator { -fn scan( self, values: &[ScalarValue]) -> Result> { -self.offset += 1; -match self.kind { -NthValueKind::Last => { -self.value = values[0].clone(); -} -NthValueKind::First if self.offset == 1 => { -self.value = values[0].clone(); -} -NthValueKind::Nth(n) if self.offset == n => { -self.value = values[0].clone(); -} -_ => {} +fn evaluate(, num_rows: usize, values: &[ArrayRef]) -> Result { +if values.is_empty() { +return Err(DataFusionError::Execution(format!( +"No arguments supplied to {}", +self.name() +))); } - -Ok(None) -} - -fn evaluate() -> Result> { -Ok(Some(self.value.clone())) +let value = [0]; +if value.len() != num_rows { +return Err(DataFusionError::Execution(format!( +"Invalid data supplied to {}, expect {} rows, got {} rows", +self.name(), +num_rows, +value.len() +))); +} +if num_rows == 0 { Review comment: but you are right this would _not_ be passed with 0 length input. this check is just being pedantic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware
rok commented on a change in pull request #10457: URL: https://github.com/apache/arrow/pull/10457#discussion_r651241888 ## File path: python/pyarrow/tests/test_compute.py ## @@ -1212,6 +1212,80 @@ def test_strptime(): assert got == expected +def _check_datetime_components(ts): +tsa = [pa.array(ts)] +subseconds = ((ts.microsecond * 10**3 + ts.nanosecond) * + 10**-9).values.round(9) +iso_calendar_fields = [ +pa.field('iso_year', pa.int64()), +pa.field('iso_week', pa.int64()), +pa.field('iso_day_of_week', pa.int64()) +] +iso_calendar = pa.StructArray.from_arrays([ +ts.isocalendar()["year"].astype(int), +ts.isocalendar()["week"].astype(int), +ts.isocalendar()["day"].astype(int)], +fields=iso_calendar_fields) + +assert pc.call_function("year", tsa).equals(pa.array(ts.year)) Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10530: ARROW-13072: [C++] Add bit-wise arithmetic kernels
github-actions[bot] commented on pull request #10530: URL: https://github.com/apache/arrow/pull/10530#issuecomment-860958622 https://issues.apache.org/jira/browse/ARROW-13072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware
rok commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-861007346 > I think there was no opposition for this one (i.e. that the field extraction should yield local hour/minute/etc), so I don't think there is a need to close the PR. Huh, I forget exactly what I was thinking there and you appear to be right. I think the main remaining task is the tz db issue. I'll take a look at it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
rok commented on a change in pull request #10476: URL: https://github.com/apache/arrow/pull/10476#discussion_r651318115 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -166,32 +168,48 @@ struct BooleanAnyImpl : public ScalarAggregator { Status MergeFrom(KernelContext*, KernelState&& src) override { const auto& other = checked_cast(src); this->any |= other.any; +this->has_nulls |= other.has_nulls; return Status::OK(); } - Status Finalize(KernelContext*, Datum* out) override { -out->value = std::make_shared(this->any); + Status Finalize(KernelContext* ctx, Datum* out) override { +if (!options.skip_nulls && !this->any && this->has_nulls) { Review comment: I documented the new behavior in [compute.rst](https://github.com/apache/arrow/pull/10476/files#diff-ce5b94577014735990903d3d03bd4ea4b8c8e6d32f5227592e60b7dd6a912d59) and [api_aggregate.h](https://github.com/apache/arrow/pull/10476/files#diff-694a635f185cf0d22dd6eefd91acedb48ed0bf71b15fd591aa6e7c508ba4cb09). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist commented on pull request #448: Use partition for bool sort
Jimexist commented on pull request #448: URL: https://github.com/apache/arrow-rs/pull/448#issuecomment-861078094 i guess 1024 is too short an array so it's dominated by the memory allocation rather than sorting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #561: Fix pruning on not equal predicate
Dandandan commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651243037 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -553,12 +553,14 @@ fn build_predicate_expression( let corrected_op = expr_builder.correct_operator(op); let statistics_expr = match corrected_op { Operator::NotEq => { -// column != literal => (min, max) = literal => min > literal || literal > max +// column != literal => (min, max) = literal => +// !(min != literal && max != literal) ==> +// min != literal || literal != max let min_column_expr = expr_builder.min_column_expr()?; let max_column_expr = expr_builder.max_column_expr()?; min_column_expr -.gt(expr_builder.scalar_expr().clone()) -.or(expr_builder.scalar_expr().clone().gt(max_column_expr)) +.not_eq(expr_builder.scalar_expr().clone()) +.or(expr_builder.scalar_expr().clone().not_eq(max_column_expr)) Review comment: Shouldn't this be `and`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] adsharma edited a comment on issue #533: Add extension plugin to parse SQL into logical plan
adsharma edited a comment on issue #533: URL: https://github.com/apache/arrow-datafusion/issues/533#issuecomment-860999601 I don't have much context about the proposal. Trying to understand things better. Please bear with me. The reason why SQL doesn't have `select * from t version as of n` is that it violates the isolation property in ACID, by allowing a query to read data from two different points in time (or logical sequence numbers if you prefer). Instead, they prefer a flow such as: ``` BEGIN // implicitly selects a version and all queries following would read from that version select * from t1 union all select * from t2; COMMIT ``` I can also imagine a variant such as: ``` BEGIN TRANSACTION @n1 ... COMMIT ``` which has the same effect. The benefit of these variants is that it makes it harder to write SQL that violates isolation properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] adsharma commented on issue #533: Add extension plugin to parse SQL into logical plan
adsharma commented on issue #533: URL: https://github.com/apache/arrow-datafusion/issues/533#issuecomment-860999601 I don't have much context about the proposal. Trying to understand things better. Please bear with me. The reason why SQL doesn't have `select * from t version as of n` is that it violates the isolation property in ACID, by allowing a query to read data from two different points in time (or logical sequence numbers if you prefer). Instead, they prefer a flow such as: ``` BEGIN // implicitly selections a version and all queries following would read from that version select * from t1 union all select * from t2; COMMIT ``` I can also imagine a variant such as: ``` BEGIN TRANSACTION @n1 ... COMMIT ``` which has the same effect. The benefit of these variants is that it makes it harder to write SQL that violates isolation properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] edponce commented on issue #10502: AttributeError: module 'pyarrow.lib' has no attribute '_Weakrefable'
edponce commented on issue #10502: URL: https://github.com/apache/arrow/issues/10502#issuecomment-861009813 @bhargav-inthezone Kaggle notebook [installs latest pyarrow by default](https://github.com/Kaggle/docker-python/blob/main/Dockerfile#L347) but it seems the Docker image was created with Arrow 3.0.0. What version of Arrow do you have installed? Does the issue persists with Arrow 4.0.1? I installed ``vaex`` in local Jupyter notebook and Google Colab notebook and it works fine. I have not been able to test in Kaggle notebook as I do not have an active account (...yet). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bhargav-inthezone commented on issue #10502: AttributeError: module 'pyarrow.lib' has no attribute '_Weakrefable'
bhargav-inthezone commented on issue #10502: URL: https://github.com/apache/arrow/issues/10502#issuecomment-861147363 I had to uninstall pyarrow before installing Vaex before but I think kaggle fixed the problem. Now its working with installing just Vaex in the notebook -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] p5a0u9l opened a new issue #10531: How to resolve `pyarrow.deserialize` FutureWarning
p5a0u9l opened a new issue #10531: URL: https://github.com/apache/arrow/issues/10531 I'd like to resolve this warning ``` /usr/lib/python3.7/asyncio/events.py:88: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. ``` I'm calling `pyarrow.deserialize` on a bytes-like object that was initially a dictionary, but then stored in the plasma store with `client.put` and retrieved with ```python buff = plasma_client.get_buffers([object_id])[0] reader = pyarrow.BufferReader(buff) data = reader.read() ``` Starting with the bytes in `data`, is there a different method than `pyarrow.deserialize` to retrieve the original dictionary? Thank you for arrow! Happy to move this to JIRA if you'd prefer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #10513: ARROW-13044: [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector
BryanCutler commented on pull request #10513: URL: https://github.com/apache/arrow/pull/10513#issuecomment-861157094 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler closed pull request #10513: ARROW-13044: [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector
BryanCutler closed pull request #10513: URL: https://github.com/apache/arrow/pull/10513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] jorgecarleitao commented on issue #455: Not able to read nano-second timestamp columns in 1.0 parquet files written by pyarrow
jorgecarleitao commented on issue #455: URL: https://github.com/apache/arrow-rs/issues/455#issuecomment-861181763 This may be related to which priority we give when converting, the parquet schema or the arrow schema. I would expect pyarrow to write them in a consistent manner though, so, as @nevi-me mentioned, an arrow schema in ns with a parquet schema in ms does seem odd. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new issue #562: publish datafusion-cli to brew repo so that macOS users can install easily
Jimexist opened a new issue #562: URL: https://github.com/apache/arrow-datafusion/issues/562 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) publish datafusion-cli to brew repo so that macOS users can install easily **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #10530: ARROW-13072: [C++] Add bit-wise arithmetic kernels
cyb70289 commented on a change in pull request #10530: URL: https://github.com/apache/arrow/pull/10530#discussion_r651405645 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -397,6 +397,130 @@ struct PowerChecked { } }; +// Bitwise operations + +struct BitWiseNot { + template + static T Call(KernelContext*, Arg arg, Status*) { +return ~arg; + } +}; + +struct BitWiseAnd { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs & rhs; + } +}; + +struct BitWiseOr { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs | rhs; + } +}; + +struct BitWiseXor { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs ^ rhs; + } +}; + +struct ShiftLeftLogical { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +static_assert(std::is_same::value, ""); +return lhs << rhs; + } +}; + +// See SEI CERT C Coding Standard rule INT34-C +struct ShiftLeftLogicalChecked { + template + static enable_if_unsigned_integer Call(KernelContext*, Arg0 lhs, Arg1 rhs, +Status* st) { +static_assert(std::is_same::value, ""); +if (ARROW_PREDICT_FALSE(rhs < 0)) { + *st = Status::Invalid("Both operands must be non-negative"); + return lhs; +} +if (ARROW_PREDICT_FALSE(rhs >= std::numeric_limits::digits)) { + *st = Status::Invalid("overflow"); + return lhs; +} +return lhs << rhs; + } + + template + static enable_if_signed_integer Call(KernelContext*, Arg0 lhs, Arg1 rhs, + Status* st) { +static_assert(std::is_same::value, ""); +if (ARROW_PREDICT_FALSE(lhs < 0 || rhs < 0)) { + *st = Status::Invalid("Both operands must be non-negative"); + return lhs; +} +if (ARROW_PREDICT_FALSE(rhs >= std::numeric_limits::digits)) { + *st = Status::Invalid("overflow"); + return lhs; +} +if (ARROW_PREDICT_FALSE(lhs > (std::numeric_limits::max() >> rhs))) { + *st = Status::Invalid("overflow"); + return lhs; +} +return lhs << rhs; + } +}; + +struct ShiftRightArithmetic { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +using Signed = typename std::make_signed::type; +static_assert(std::is_same::value, ""); +// N.B. this is implementation-defined but GCC and MSVC document +// this as arithmetic right shift. References: +// https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation +// https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160#right-shifts +// Clang doesn't document their behavior. +return static_cast(static_cast(lhs) >> rhs); Review comment: For unsigned char 128, arithmetic right shift 1 bit results to 192, looks a bit surprising. Looks arithmetic right shift is more meaningful for signed integers, and logical right shift for unsigned integers. ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -397,6 +397,130 @@ struct PowerChecked { } }; +// Bitwise operations + +struct BitWiseNot { + template + static T Call(KernelContext*, Arg arg, Status*) { +return ~arg; + } +}; + +struct BitWiseAnd { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs & rhs; + } +}; + +struct BitWiseOr { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs | rhs; + } +}; + +struct BitWiseXor { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +return lhs ^ rhs; + } +}; + +struct ShiftLeftLogical { + template + static T Call(KernelContext*, Arg0 lhs, Arg1 rhs, Status*) { +static_assert(std::is_same::value, ""); +return lhs << rhs; Review comment: Though it's not a checked kernel, I think it's still necessary to make sure no UB can happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] emkornfield commented on issue #455: Not able to read nano-second timestamp columns in 1.0 parquet files written by pyarrow
emkornfield commented on issue #455: URL: https://github.com/apache/arrow-rs/issues/455#issuecomment-861130175 > @emkornfield do you know what we should do on the Rust side to roundtrip the file correctly? Sorry, I would have to dig in the code to have a better understanding. In general we try to avoid int96 because it is a deprecated type. If this is an issue with pyarrow can we open a separate bug to track that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] frmnboi commented on issue #10488: Passing back and forth from Python and C++ with Pyarrow C++ extension and pybind11.
frmnboi commented on issue #10488: URL: https://github.com/apache/arrow/issues/10488#issuecomment-861180963 After running python with debug symbols in GDB, Here is the relevant part of the GDB backtrace with directory names redacted: ``` #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:314 #1 0x7fffcf18a9d2 in arrow::BufferBuilder::UnsafeAppend (this=0x7fffd100, data=0x7fffcff0, length=8) at SOURCEPATH/Python_venv_3.8.5/lib/python3.8/site-packages/pyarrow/include/arrow/buffer_builder.h:136 #2 0x7fffcf195871 in arrow::TypedBufferBuilder::UnsafeAppend (this=0x7fffd100, value=0.10351288056206089) at SOURCEPATH/Python_venv_3.8.5/lib/python3.8/site-packages/pyarrow/include/arrow/buffer_builder.h:232 #3 0x7fffcf1902c0 in arrow::NumericBuilder::UnsafeAppend (this=0x7fffd070, val=0.10351288056206089) at SOURCEPATH/Python_venv_3.8.5/lib/python3.8/site-packages/pyarrow/include/arrow/array/builder_primitive.h:265 #4 0x7fffcf18b455 in vol_adj_close (close=..., volume=...) at CODEPATH/Database/helperfuncs.cpp:94 ``` in addition to: ``` Thread 1 "python" received signal SIGSEGV, Segmentation fault. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:314 314 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory. ``` As suspected, the issue is in **UnsafeAppend**, but I'm not sure why this file is missing in my install/system. Do I need to build pyarrow from source to get this to install? It looks like it is a reference to an X86 AVX-512 SIMD command, as referenced [here](https://arrow.apache.org/docs/format/Columnar.html?highlight=avx). The missing file in question can presumably be found publicly online at places like [this](https://sources.debian.org/src/glibc/2.28-10/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S/). I think part of the problem may be that I have a 3rd gen Ryzen processor, which according to some reports does not support AVX-512. I can't really tell if it isn't working because of hardware limitations, or because the capability and file exists, but I am not linking all the dependencies I need. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alippai commented on pull request #453: Add C data interface for decimal128
alippai commented on pull request #453: URL: https://github.com/apache/arrow-rs/pull/453#issuecomment-860251015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alippai opened a new issue #454: Add missing kernels for decimal128
alippai opened a new issue #454: URL: https://github.com/apache/arrow-rs/issues/454 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Decimal128 type doesn't have the basic kernels like add, take, filter **Describe the solution you'd like** A basic set of operations available -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb merged pull request #539: Refactor hash aggregates's planner building code
alamb merged pull request #539: URL: https://github.com/apache/arrow-datafusion/pull/539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail
codecov-commenter commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860050867 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#443](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (eb11b2b) into [master](https://codecov.io/gh/apache/arrow-rs/commit/0c0077697e55eb154dbfcf3127a3f39e63be2df8?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0c00776) will **increase** coverage by `0.00%`. > The diff coverage is `81.81%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/443/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #443 +/- ## === Coverage 82.71% 82.71% === Files 163 163 Lines 4479544805 +10 === + Hits3705137061 +10 Misses 7744 7744 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [parquet/src/data\_type.rs](https://codecov.io/gh/apache/arrow-rs/pull/443/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZGF0YV90eXBlLnJz) | `77.64% <60.00%> (-0.27%)` | :arrow_down: | | [parquet/src/util/bit\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/443/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvdXRpbC9iaXRfdXRpbC5ycw==) | `93.14% <100.00%> (+0.07%)` | :arrow_up: | | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/443/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `95.04% <0.00%> (+0.19%)` | :arrow_up: | | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/443/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [0c00776...eb11b2b](https://codecov.io/gh/apache/arrow-rs/pull/443?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new issue #551: support for automatic value promotion for aggregation functions
Jimexist opened a new issue #551: URL: https://github.com/apache/arrow-datafusion/issues/551 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) for now, ```sql select sum(c5) from test; ``` will err: ``` > select sum(c5) from test; thread 'tokio-runtime-worker' panicked at 'attempt to add with overflow', /Users/jiayu_liu/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/arith.rs:107:1 ArrowError(ExternalError(Canceled)) ``` even avg will do the same because it depends on sum. **Describe the solution you'd like** A clear and concise description of what you want to happen. have automatic value promotion into wider type **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan opened a new pull request #559: Push down filter through UNION
Dandandan opened a new pull request #559: URL: https://github.com/apache/arrow-datafusion/pull/559 # Which issue does this PR close? Closes #557 # Rationale for this change # What changes are included in this PR? Filter is pushed down through union (all), so it can be pushed down further towards other operations like table scans. # Are there any user-facing changes? FYI @nevi-me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] nevi-me closed issue #193: JSON reader does not implement iterator
nevi-me closed issue #193: URL: https://github.com/apache/arrow-rs/issues/193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #552: mv register_schema() to impl
codecov-commenter edited a comment on pull request #552: URL: https://github.com/apache/arrow-datafusion/pull/552#issuecomment-860214243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist opened a new pull request #545: turn on clippy rule for needless borrow
Jimexist opened a new pull request #545: URL: https://github.com/apache/arrow-datafusion/pull/545 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #548: reuse code for now function expr creation
codecov-commenter commented on pull request #548: URL: https://github.com/apache/arrow-datafusion/pull/548#issuecomment-860131843 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#548](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b38ef9a) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/8f4078d83f7ea0348fa43906d26156bf8a95de4c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8f4078d) will **decrease** coverage by `0.00%`. > The diff coverage is `73.33%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/548/graphs/tree.svg?width=650=150=pr=JXwWBKD3D9_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #548 +/- ## == - Coverage 76.09% 76.08% -0.01% == Files 156 156 Lines 2704827041 -7 == - Hits2058120574 -7 Misses 6467 6467 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/548/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9mdW5jdGlvbnMucnM=) | `92.66% <73.33%> (-0.04%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [8f4078d...b38ef9a](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] houqp edited a comment on pull request #55: Support qualified columns in queries
houqp edited a comment on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-860150632 @andygrove @Dandandan @jorgecarleitao @alamb @Jimexist this PR is now ready for review. We are now able to pass both tpch-7 and tpch-8. I filed https://github.com/apache/arrow-datafusion/issues/549 to track compound column support as a fast follow task since this PR is already very large and keeps growing every time I rebase on top of the latest master. If you all think I should include compound column support in this PR, I am more than happy to add it. Output field name are not 100% conforming to https://github.com/apache/arrow-datafusion/blob/master/docs/specification/output-field-name-semantic.md yet. The remaining differences are all minor formatting issues like converting function names to lower cases, so I decided to leave that to follow up PRs to reduce the diff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #533: Add extension plugin to parse SQL into logical plan
jorgecarleitao commented on issue #533: URL: https://github.com/apache/arrow-datafusion/issues/533#issuecomment-860156321 I also do not see the issue with the example above, but I would say that, In general, custom SQL parsers effectively modify the SQL dialect that is being used and therefore the responsibility to document variations, including any limitation that they may introduce to the "default" postgres dialect, lays to the applications that use/install custom parsers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] jorgecarleitao commented on a change in pull request #439: [WIP] FFI bridge for Schema, Field and DataType
jorgecarleitao commented on a change in pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#discussion_r650633827 ## File path: arrow/src/ffi.rs ## @@ -208,15 +213,25 @@ impl FFI_ArrowSchema { pub fn name() -> { assert!(!self.name.is_null()); // safe because the lifetime of `self.name` equals `self` -unsafe { CStr::from_ptr(self.name) }.to_str().unwrap() +unsafe { CStr::from_ptr(self.name) } +.to_str() +.expect("The external API has a non-utf8 as name") +} + +pub fn flags() -> Option { +Flags::from_bits(self.flags) } pub fn child(, index: usize) -> { assert!(index < self.n_children as usize); -assert!(!self.name.is_null()); +// assert!(!self.name.is_null()); Review comment: why was this commented? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist commented on pull request #550: Upgrade arrow 430
Jimexist commented on pull request #550: URL: https://github.com/apache/arrow-datafusion/pull/550#issuecomment-860197215 > I plan to release arrow 4.3 to crates.io tomorrow (process is underway -- see https://lists.apache.org/thread.html/r5bfdcf0e069b5df14477b1ac6cd3584e5ee0b0be2d0d4bd18db8f9ab%40%3Cdev.arrow.apache.org%3E) > > Once I release to crates.io this change is not necessary (any local checkout will just need to do `cargo update` and will update to arrow 4.3.0) and other datafusion users can use whatever version of arrow that is in Cargo.lock > > Sorry that this is confusing / slow. it's okay not slow at all. was just trying to see if things would break. Thanks for the work of releasing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on issue #455: Not able to read nano-second timestamp columns in 1.0 parquet files written by pyarrow
nevi-me commented on issue #455: URL: https://github.com/apache/arrow-rs/issues/455#issuecomment-860299977 I'd have expected the ns timestamp to be written to `int96` as that is the legacy nanosecond timestamp format. I'll also dig into this to see, maybe if the format is `1.0` then we should coerce the timestamp to a different resolution, but I'd also see this as a pyarrow quirk that could be documented somewhere. @emkornfield do you know what we should do on the Rust side to roundtrip the file correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #520: Implement window functions with `order_by` clause
Dandandan commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r650546802 ## File path: ballista/rust/core/Cargo.toml ## @@ -40,7 +40,7 @@ tokio = "1.0" tonic = "0.4" uuid = { version = "0.8", features = ["v4"] } -arrow-flight = { version = "4.0" } +arrow-flight = { git = "https://github.com/apache/arrow-rs.git;, rev = "e5cda312b697c3d610637b28c58b6f1b104b41cc" } Review comment: For this we need to wait on 4.3.0. this includes the required changes, no? https://github.com/apache/arrow-rs/pull/444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] projjal commented on pull request #10501: ARROW-13032: [Java] Update guava version
projjal commented on pull request #10501: URL: https://github.com/apache/arrow/pull/10501#issuecomment-860475551 > Is there any reason not to use the latest version https://mvnrepository.com/artifact/com.google.guava/guava/30.1.1-jre ? Not really. I just updated to the patched version. I will update the PR with latest version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on pull request #10525: ARROW-12709: [CI] Use LLVM 10 for s390x
kiszk commented on pull request #10525: URL: https://github.com/apache/arrow/pull/10525#issuecomment-860477976 Yes, I will try to merge myself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #546: parallelize window function evaluations
Dandandan edited a comment on pull request #546: URL: https://github.com/apache/arrow-datafusion/pull/546#issuecomment-860194116 > maybe a better way is to use rayon In my opinion we should try to stay away from Rayon and probably also should stay away from introducing parallelism at a low level (per expression) as it will likely add quite some overhead (in terms of extra threads, memory allocations hold on to and Rayon scheduling overhead). I added some more thoughts in this discussion https://the-asf.slack.com/archives/C01QUFS30TD/p1623571458296000?thread_ts=1623523766.291500=C01QUFS30TD Or we should have some convincing benchmarks / reasoning why at a certain part it is reasonable to introduce parallelism using tool x. I think for other parts of the code we could use Tokios `spawn_blocking` function at the moment, while we design a better way to do parallelism. On the general level I think we mostly should check whether we perform parallelism at a partition / file level and introduce enough parallelism in the query plan (e.g. by using partitioning https://medium.com/@danilheres/increasing-the-level-of-parallelism-in-datafusion-4-0-d2a15b5a2093 ). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #547: support table alias in join clause
alamb commented on a change in pull request #547: URL: https://github.com/apache/arrow-datafusion/pull/547#discussion_r650511150 ## File path: datafusion/src/sql/planner.rs ## @@ -424,17 +424,24 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { ctes: HashMap, ) -> Result { match relation { -TableFactor::Table { name, .. } => { +TableFactor::Table { name, alias, .. } => { let table_name = name.to_string(); let cte = ctes.get(_name); match ( cte, self.schema_provider.get_table_provider(name.try_into()?), ) { (Some(cte_plan), _) => Ok(cte_plan.clone()), -(_, Some(provider)) => { -LogicalPlanBuilder::scan(_name, provider, None)?.build() -} +(_, Some(provider)) => LogicalPlanBuilder::scan( +// take alias into account to support `JOIN table1 as table2` +alias +.as_ref() +.map(|a| a.name.value.as_str()) +.unwrap_or(_name), +provider, +None, +)? +.build(), (_, None) => Err(DataFusionError::Plan(format!( Review comment: ```suggestion (None, None) => Err(DataFusionError::Plan(format!( ``` Might make it clearer that now both branches are covered? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #414: Doctests for DecimalArray.
codecov-commenter edited a comment on pull request #414: URL: https://github.com/apache/arrow-rs/pull/414#issuecomment-855393423 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#414](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (fda84cf) into [master](https://codecov.io/gh/apache/arrow-rs/commit/e21f576e5a01577f63f7d19a0e921417eb14b8db?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e21f576) will **decrease** coverage by `0.00%`. > The diff coverage is `n/a`. > :exclamation: Current head fda84cf differs from pull request most recent head 2233ab2. Consider uploading reports for the commit 2233ab2 to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/414/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #414 +/- ## == - Coverage 82.64% 82.64% -0.01% == Files 164 164 Lines 4550845508 == - Hits3760937608 -1 - Misses 7899 7900 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow-rs/pull/414/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X2JpbmFyeS5ycw==) | `90.13% <ø> (ø)` | | | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/414/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [e21f576...2233ab2](https://codecov.io/gh/apache/arrow-rs/pull/414?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #545: turn on clippy rule for needless borrow
alamb commented on pull request #545: URL: https://github.com/apache/arrow-datafusion/pull/545#issuecomment-860193523 Thank you @Jimexist ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist commented on pull request #429: implement lead and lag built-in window function
Jimexist commented on pull request #429: URL: https://github.com/apache/arrow-datafusion/pull/429#issuecomment-860225216 > Actually let's park this pull request for a while - I plan to implement sort and partition first and then window frame, after which the window shift approach might not be relevant. now that #520 is implemented, this PR is ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #10488: Passing back and forth from Python and C++ with Pyarrow C++ extension and pybind11.
pitrou commented on issue #10488: URL: https://github.com/apache/arrow/issues/10488#issuecomment-860451738 I would suggest debugging these crashes using gdb. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #438: use iterator for partition kernel instead of generating vec
alamb merged pull request #438: URL: https://github.com/apache/arrow-rs/pull/438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] yordan-pavlov commented on issue #200: Use iterators to increase performance of creating Arrow arrays
yordan-pavlov commented on issue #200: URL: https://github.com/apache/arrow-rs/issues/200#issuecomment-860275967 I finally had some time to check how the new `ArrowArrayReader` affects TPC-H benchmark results - for queries which use string columns (queries 1 and 12), there is a performance improvement of about 30%, other queries that I tested, which mostly use non-string columns are unaffected. This makes sense as the new `ArrowArrayReader` is only enabled for string arrays currently. Here are the results: **before new ArrowArrayReader:** Query 1 avg time: 822.14 ms Query 3 avg time: 432.85 ms Query 5 avg time: 698.90 ms Query 6 avg time: 319.38 ms Query 12 avg time: 682.50 ms **after new ArrowArrayReader:** Query 1 avg time: 514.88 ms Query 3 avg time: 441.08 ms Query 5 avg time: 702.91 ms Query 6 avg time: 324.05 ms Query 12 avg time: 425.38 ms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #419: Remove DictionaryArray::keys_array method
alamb merged pull request #419: URL: https://github.com/apache/arrow-rs/pull/419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #559: Filter push down for Union
codecov-commenter commented on pull request #559: URL: https://github.com/apache/arrow-datafusion/pull/559#issuecomment-860438515 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#559](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1385342) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/d3828541a61b5681b93590a47e22d63715949136?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (d382854) will **increase** coverage by `0.00%`. > The diff coverage is `100.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/559/graphs/tree.svg?width=650=150=pr=JXwWBKD3D9_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #559 +/- ## === Coverage 76.09% 76.10% === Files 156 156 Lines 2704727056+9 === + Hits2058120590+9 Misses 6466 6466 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [datafusion/src/optimizer/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/559/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2ZpbHRlcl9wdXNoX2Rvd24ucnM=) | `97.84% <100.00%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [d382854...1385342](https://codecov.io/gh/apache/arrow-datafusion/pull/559?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #548: reuse code for now function expr creation
codecov-commenter edited a comment on pull request #548: URL: https://github.com/apache/arrow-datafusion/pull/548#issuecomment-860131843 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#548](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (5083dfe) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/d3828541a61b5681b93590a47e22d63715949136?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (d382854) will **decrease** coverage by `0.00%`. > The diff coverage is `73.33%`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/548/graphs/tree.svg?width=650=150=pr=JXwWBKD3D9_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #548 +/- ## == - Coverage 76.09% 76.08% -0.01% == Files 156 156 Lines 2704727040 -7 == - Hits2058120574 -7 Misses 6466 6466 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/548/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9mdW5jdGlvbnMucnM=) | `92.66% <73.33%> (-0.04%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Last update [d382854...5083dfe](https://codecov.io/gh/apache/arrow-datafusion/pull/548?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] novemberkilo commented on a change in pull request #414: Doctests for DecimalArray.
novemberkilo commented on a change in pull request #414: URL: https://github.com/apache/arrow-rs/pull/414#discussion_r650718790 ## File path: arrow/src/array/array_binary.rs ## @@ -613,6 +613,32 @@ impl Array for FixedSizeBinaryArray { } /// A type of `DecimalArray` whose elements are binaries. +/// +/// # Examples +/// +/// ``` +///use arrow::array::{ArrayData, Array, DecimalArray}; +///use arrow::buffer::Buffer; +///use arrow::datatypes::DataType; +///// let val_8887: [u8; 16] = [192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; +///// let val_neg_8887: [u8; 16] = [64, 36, 75, 238, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255]; +///let values: [u8; 32] = [ +///192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 36, 75, 238, 253, +///255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, +///]; +///let array_data = ArrayData::builder(DataType::Decimal(23, 6)) +///.len(2) +///.add_buffer(Buffer::from([..])) +///.build(); +///let decimal_array = DecimalArray::from(array_data); +///assert_eq!(2, decimal_array.len()); +///assert_eq!(8_887_000_000, decimal_array.value(0)); +///assert_eq!(-8_887_000_000, decimal_array.value(1)); +///assert_eq!(16, decimal_array.value_length()); +///assert_eq!(23, decimal_array.precision()); +///assert_eq!(6, decimal_array.scale()); Review comment: Thanks @alippai - I have added those doctests per your suggestion. // @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10507: ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, minute, and second
jorisvandenbossche commented on a change in pull request #10507: URL: https://github.com/apache/arrow/pull/10507#discussion_r650738642 ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", + "second" = "second" Review comment: Agreed, I would also say that it's not because lubridate does not support nanoseconds that *if* you actually have nanoseconds (which is possible since arrow supports it) those should be discarded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] garyanaplan commented on a change in pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail
garyanaplan commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r650746454 ## File path: parquet/src/data_type.rs ## @@ -661,8 +661,15 @@ pub(crate) mod private { _: W, bit_writer: BitWriter, ) -> Result<()> { +if bit_writer.bytes_written() + values.len() >= bit_writer.capacity() { Review comment: I think this calculation is entirely in terms of bytes, so units should all be correct as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] garyanaplan commented on a change in pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail
garyanaplan commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r650750468 ## File path: parquet/src/data_type.rs ## @@ -661,8 +661,15 @@ pub(crate) mod private { _: W, bit_writer: BitWriter, ) -> Result<()> { +if bit_writer.bytes_written() + values.len() >= bit_writer.capacity() { +bit_writer.extend(256); +} for value in values { -bit_writer.put_value(*value as u64, 1); +if !bit_writer.put_value(*value as u64, 1) { Review comment: I found it hard to think of a good way to test this with the fix in place. I preferred the "don't auto expand memory at the point of failure" approach because I'm fairly conservative and didn't want to make a change that was too wide in impact without a better understanding of the code. i.e.: my fix specifically targeted the error I reported and made it possible to detect in other locations. I think a better fix would be to (somehow) pre-size the vector or avoid having to size a vector for all the bytes that could be written, but that would be a much bigger scope to the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou edited a comment on pull request #10525: ARROW-12709: [CI] Use LLVM 10 for s390x
kou edited a comment on pull request #10525: URL: https://github.com/apache/arrow/pull/10525#issuecomment-860263925 @kiszk Do you want to merge this yourself? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane closed pull request #10508: ARROW-13041: [C++] Ensure unary kernels zero-initialize data behind null entries
jonkeane closed pull request #10508: URL: https://github.com/apache/arrow/pull/10508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] anthonylouisbsb commented on pull request #10195: ARROW-12595: [C++][Gandiva] Implement TO_HEX([binary] field), HEX, UNHEX and FROM_HEX([string]field] functions
anthonylouisbsb commented on pull request #10195: URL: https://github.com/apache/arrow/pull/10195#issuecomment-859521494 @projjal The functions related to `HEX` and `UNHEX` were added inside this PR too, so I think you need to review the PR again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #541: WIP: ShuffleReaderExec now supports multiple locations per partition
andygrove commented on a change in pull request #541: URL: https://github.com/apache/arrow-datafusion/pull/541#discussion_r650138952 ## File path: ballista/rust/client/src/context.rs ## @@ -74,32 +71,6 @@ impl BallistaContextState { } } -struct WrappedStream { -stream: Pin> + Send + Sync>>, -schema: SchemaRef, -} - -impl RecordBatchStream for WrappedStream { -fn schema() -> SchemaRef { -self.schema.clone() -} -} - -impl Stream for WrappedStream { -type Item = ArrowResult; - -fn poll_next( -mut self: Pin< Self>, -cx: std::task::Context<'_>, -) -> std::task::Poll> { -self.stream.poll_next_unpin(cx) -} - -fn size_hint() -> (usize, Option) { -self.stream.size_hint() -} -} - Review comment: This is moved to ballista-core utils ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) .await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e))) +.into_iter() +.collect::>>()?; + +let result = WrappedStream::new( +Box::pin(futures::stream::iter(result).flatten()), +Arc::new(self.schema.as_ref().clone()), +); +Ok(Box::pin(result)) } Review comment: This is the main change ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) .await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e))) +.into_iter() +.collect::>>()?; + +let result = WrappedStream::new( +Box::pin(futures::stream::iter(result).flatten()), Review comment: Once the basic shuffle mechanism is implemented, there will be a lot of optimization work to follow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #541: ShuffleReaderExec now supports multiple locations per partition
edrevo commented on a change in pull request #541: URL: https://github.com/apache/arrow-datafusion/pull/541#discussion_r650183206 ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) Review comment: nit; if you change `fetch_partition` to take a refernce, you can avoid the `.clone`: ```suggestion let partition_locations = [partition]; let result = future::join_all(partition_locations.iter().map(fetch_partition)) ``` ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) .await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e))) +.into_iter() +.collect::>>()?; + +let result = WrappedStream::new( +Box::pin(futures::stream::iter(result).flatten()), Review comment: > I wonder if it will serialize them all (aka not start reading from the second until the first is entirely consumed)? Yes, that's exactly what it does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new pull request #448: Use partition for bool sort
Jimexist opened a new pull request #448: URL: https://github.com/apache/arrow-rs/pull/448 # Which issue does this PR close? Closes #447 # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove opened a new issue #540: Ballista ShuffleReaderExec should be able to read from multiple locations per partition
andygrove opened a new issue #540: URL: https://github.com/apache/arrow-datafusion/issues/540 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a step towards implementing true shuffle, we need `ShuffleReaderExec` to be able to read from multiple locations per partition instead of from a single location. **Describe the solution you'd like** `ShuffleReaderExec` should accept `Vec>` instead of `Vec`. **Describe alternatives you've considered** None **Additional context** None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #10507: ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, minute, and second func
nealrichardson commented on a change in pull request #10507: URL: https://github.com/apache/arrow/pull/10507#discussion_r650076060 ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", + "second" = "second" Review comment: IDK that we should truncate the data like this. A slight (technical) API difference is probably better than throwing away precision. ## File path: r/tests/testthat/test-dplyr-lubridate.R ## @@ -0,0 +1,119 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +library(lubridate) +library(dplyr) + +test_date <- ymd_hms("1987-10-09 23:00:00", tz = NULL) +test_df <- tibble::tibble(date = test_date) + +test_that("extract datetime components from date", { + expect_dplyr_equal( +input %>% + mutate(x = year(date)) %>% + collect(), +test_df + ) + + expect_dplyr_equal( +input %>% + mutate(x = isoyear(date)) %>% + collect(), +test_df + ) + + expect_dplyr_equal( +input %>% + mutate(x = quarter(date)) %>% + collect(), +test_df + ) + + expect_dplyr_equal( +input %>% + mutate(x = month(date)) %>% + collect(), +test_df + ) + + expect_dplyr_equal( +input %>% + mutate(x = wday(date)) %>% + collect(), +test_df + ) + + expect_dplyr_equal( +input %>% + mutate(x = wday(date, week_start = 3)) %>% + collect(), +test_df + ) + + expect_warning( +test_df %>% + Table$create() %>% + mutate(x = wday(date, label = TRUE)) %>% + collect(), +regexp = "Label argument not supported by Arrow; pulling data into R" + ) + + expect_warning( +test_df %>% + Table$create() %>% + mutate(x = wday(date, locale = Sys.getlocale("LC_TIME"))) %>% + collect(), +regexp = 'Expression wday(date, locale = Sys.getlocale("LC_TIME")) not supported in Arrow; pulling data into R', +fixed = TRUE + ) Review comment: Yeah I think we can do this more unit-testy version where appropriate; we test elsewhere that the errors we throw in the nse_funcs get handled by filter/mutate correctly. ## File path: r/R/dplyr-functions.R ## @@ -442,3 +442,37 @@ nse_funcs$strptime <- function(x, format = "%Y-%m-%d %H:%M:%S", tz = NULL, unit Expression$create("strptime", x, options = list(format = format, unit = unit)) } + +nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7)) { + if (label) { +arrow_not_supported("Label argument") Review comment: What is the label argument? Is it something we should support? I think it's good to document these unsupported exceptions in comments, to note either why Arrow should behave differently, or what Jira issue exists to remove the exception. ## File path: r/R/dplyr-functions.R ## @@ -442,3 +442,37 @@ nse_funcs$strptime <- function(x, format = "%Y-%m-%d %H:%M:%S", tz = NULL, unit Expression$create("strptime", x, options = list(format = format, unit = unit)) } + +nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7)) { + if (label) { +arrow_not_supported("Label argument") + } + offset <- get_date_offset(week_start) + Expression$create("add", Expression$create("day_of_week", x), Expression$scalar(offset)) +} + +#' Get date offset +#' +#' Arrow's `day_of_week` kernel counts from 0 (Monday) to 6 (Sunday), whereas +#' `lubridate::wday` counts from 1 to 7, and allows users to specify which day +#' of the week is first (Sunday by default). This function converts the returned Review comment: Please leave a comment with that JIRA issue somewhere in this code that we expect to remove. -- This is an automated message from the Apache Git Service. To respond to the message, please
[GitHub] [arrow] kou closed pull request #10515: ARROW-13030: [CI][Go] Setup Arm64 golang CI
kou closed pull request #10515: URL: https://github.com/apache/arrow/pull/10515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10517: ARROW-13050: [C++][Gandiva] Implement SPACE Hive function on Gandiva
github-actions[bot] commented on pull request #10517: URL: https://github.com/apache/arrow/pull/10517#issuecomment-859490703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] AlenkaF commented on a change in pull request #10519: ARROW-12867: [R] Bindings for abs()
AlenkaF commented on a change in pull request #10519: URL: https://github.com/apache/arrow/pull/10519#discussion_r650068914 ## File path: r/R/dplyr-functions.R ## @@ -108,6 +108,10 @@ nse_funcs$is.infinite <- function(x) { is_inf & !nse_funcs$is.na(is_inf) } +nse_funcs$abs <- function(x){ + Expression$create("abs_checked", x) Review comment: I used `abs_checked` to be in sync with [ARROW-12575: [R] Use unary negative kernel](https://github.com/apache/arrow/pull/10196), see [discussion](https://github.com/apache/arrow/pull/10196/files/288e06ed681c170de587591b88c6b37f96a3eddd#r625885713). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #449: remove clippy unnecessary wraps in cast kernel
codecov-commenter commented on pull request #449: URL: https://github.com/apache/arrow-rs/pull/449#issuecomment-859619048 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/449?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#449](https://codecov.io/gh/apache/arrow-rs/pull/449?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e1c2dc4) into [master](https://codecov.io/gh/apache/arrow-rs/commit/0c0077697e55eb154dbfcf3127a3f39e63be2df8?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0c00776) will **decrease** coverage by `0.05%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/449/graphs/tree.svg?width=650=150=pr=pq9V9qWZ1N_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/449?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff @@ ## master #449 +/- ## == - Coverage 82.71% 82.65% -0.06% == Files 163 164 +1 Lines 4479545468 +673 == + Hits3705137581 +530 - Misses 7744 7887 +143 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/449?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jYXN0LnJz) | `94.53% <ø> (ø)` | | | [parquet/src/util/test\_common/page\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvdXRpbC90ZXN0X2NvbW1vbi9wYWdlX3V0aWwucnM=) | `91.00% <0.00%> (-0.67%)` | :arrow_down: | | [parquet/src/arrow/record\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvcmVjb3JkX3JlYWRlci5ycw==) | `93.44% <0.00%> (-0.54%)` | :arrow_down: | | [parquet/src/column/reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3JlYWRlci5ycw==) | `74.36% <0.00%> (-0.38%)` | :arrow_down: | | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.06% <0.00%> (-0.09%)` | :arrow_down: | | [parquet/src/column/page.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3BhZ2UucnM=) | `98.68% <0.00%> (ø)` | | | [parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvc2NoZW1hL3R5cGVzLnJz) | `88.07% <0.00%> (ø)` | | | [parquet/src/arrow/arrow\_array\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfYXJyYXlfcmVhZGVyLnJz) | `78.12% <0.00%> (ø)` | | | [parquet/src/file/serialized\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZmlsZS9zZXJpYWxpemVkX3JlYWRlci5ycw==) | `94.98% <0.00%> (+0.02%)` | :arrow_up: | | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/449/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `91.98% <0.00%> (+0.07%)` | :arrow_up: | | ... and [4
[GitHub] [arrow] github-actions[bot] commented on pull request #10521: ARROW-13065: [Packaging][RPM] Add missing required LZ4 version information
github-actions[bot] commented on pull request #10521: URL: https://github.com/apache/arrow/pull/10521#issuecomment-859921976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] viirya commented on issue #531: `cargo build` cannot build the project
viirya commented on issue #531: URL: https://github.com/apache/arrow-datafusion/issues/531#issuecomment-859727700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #10514: ARROW-13045: [Packaging][RPM][deb] Don't install system utf8proc if it's old
kou opened a new pull request #10514: URL: https://github.com/apache/arrow/pull/10514 See also: * #10477 30f52a202d0a2f6393366ea1e4a8e5182077c72a * ARROW-13002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] AlenkaF opened a new pull request #10519: ARROW-12867: [R] Bindings for abs()
AlenkaF opened a new pull request #10519: URL: https://github.com/apache/arrow/pull/10519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10515: ARROW-13030: [CI][Go] Setup Arm64 golang CI
github-actions[bot] commented on pull request #10515: URL: https://github.com/apache/arrow/pull/10515#issuecomment-859308213 https://issues.apache.org/jira/browse/ARROW-13030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan closed issue #131: Implement hash partitioning
Dandandan closed issue #131: URL: https://github.com/apache/arrow-datafusion/issues/131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #539: Refactor hash aggregates's planner building code
Jimexist commented on a change in pull request #539: URL: https://github.com/apache/arrow-datafusion/pull/539#discussion_r649920230 ## File path: datafusion/src/physical_plan/mod.rs ## @@ -343,7 +343,8 @@ pub enum Partitioning { RoundRobinBatch(usize), /// Allocate rows based on a hash of one of more expressions and the specified /// number of partitions -/// This partitioning scheme is not yet fully supported. See [ARROW-11011](https://issues.apache.org/jira/browse/ARROW-11011) +/// FIXME: This partitioning scheme is not yet fully supported. Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #532: reuse datafusion physical planner in ballista building from protobuf
edrevo commented on a change in pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#discussion_r649800622 ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: I just realized that this PR changes the behavior when there is a non-empty partition_by/order_by/window_frame: before (at least in ballista) it would error, whereas now it is silently ignored. Maybe it is worth erroring is they aren't empty to make it explicit that there is no support? ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: nit: once this is merged, it would be cool to add a link to this code in the respective issues that are tracking these missing features to make it easier for new contributors to start contributing. ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: I just realized that this PR changes the behavior when there is a non-empty partition_by/order_by/window_frame: before (at least in ballista) it would error, whereas now it is silently ignored. Maybe it is worth erroring if they aren't empty to make it explicit that there is no support? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] edponce commented on issue #10502: AttributeError: module 'pyarrow.lib' has no attribute '_Weakrefable'
edponce commented on issue #10502: URL: https://github.com/apache/arrow/issues/10502#issuecomment-859270240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10507: ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, minute, and second
jorisvandenbossche commented on a change in pull request #10507: URL: https://github.com/apache/arrow/pull/10507#discussion_r649723619 ## File path: r/R/dplyr-functions.R ## @@ -442,3 +442,37 @@ nse_funcs$strptime <- function(x, format = "%Y-%m-%d %H:%M:%S", tz = NULL, unit Expression$create("strptime", x, options = list(format = format, unit = unit)) } + +nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7)) { + if (label) { +arrow_not_supported("Label argument") + } + offset <- get_date_offset(week_start) + Expression$create("add", Expression$create("day_of_week", x), Expression$scalar(offset)) +} + +#' Get date offset +#' +#' Arrow's `day_of_week` kernel counts from 0 (Monday) to 6 (Sunday), whereas +#' `lubridate::wday` counts from 1 to 7, and allows users to specify which day +#' of the week is first (Sunday by default). This function converts the returned Review comment: I was going to mention that we actually also have an "ISO weekday" in the C++ kernels, but only as field in the `iso_calendar` kernel, and not as separate one. We could also add `iso_day_of_week` if that helps, or even just make `day_of_week` follow the ISO conventions. Because the 0 (monday) - 6 (sunday) might be something Python specific. But then looked at what C++ does, and there the default is actually 0 (sunday) - 6 (saturday), so yet something else ;) (although I see that also Postgres uses that for `dow`) In the end, once you have it in one form, it's an easy conversion to any of the other of course, so it might not matter that much. ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", Review comment: "hour" is missing here? ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", + "second" = "second" Review comment: Note that second might do something different. I think "second" in lubridate is the equivalent of "second + subsecond" in arrow ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", + "second" = "second" Review comment: What do you mean exactly with rounding? A quick try gives me: ``` > second(ymd_hms("2011-06-04 12:00:01.123456")) [1] 1.123456 ``` which seems to give all decimals ## File path: r/R/expression.R ## @@ -28,8 +28,17 @@ # stringr spellings of those "str_length" = "utf8_length", "str_to_lower" = "utf8_lower", - "str_to_upper" = "utf8_upper" + "str_to_upper" = "utf8_upper", # str_trim is defined in dplyr.R + "year" = "year", + "isoyear" = "iso_year", + "quarter" = "quarter", + "month" = "month", + "day" = "day", + "yday" = "day_of_year", + "isoweek" = "iso_week", + "minute" = "minute", + "second" = "second" Review comment: I don't think lubridate / R supports nanosecond resolution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove merged pull request #535: Make BallistaContext::collect streaming
andygrove merged pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new issue #446: sort kernel has a lot of unnecessary wrapping
Jimexist opened a new issue #446: URL: https://github.com/apache/arrow-rs/issues/446 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) we can get rid of the wrapping and remove the suppression on clippy warning as well **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new issue #447: sort kernel boolean sort can be O(n)
Jimexist opened a new issue #447: URL: https://github.com/apache/arrow-rs/issues/447 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) we can use partition to speed up boolean sort **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #299: Support window functions with PARTITION BY clause
Dandandan edited a comment on issue #299: URL: https://github.com/apache/arrow-datafusion/issues/299#issuecomment-859516691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jpedroantunes opened a new pull request #10516: ARROW-13049: [C++][Gandiva] Implement BIN Hive function on Gandiva
jpedroantunes opened a new pull request #10516: URL: https://github.com/apache/arrow/pull/10516 Implement BIN Hive function on Gandiva -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #532: reuse datafusion physical planner in ballista building from protobuf
Jimexist commented on a change in pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#discussion_r649843237 ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: it'll be further developed in https://github.com/apache/arrow-datafusion/pull/520 ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: but that's a good point let me add guard here. ## File path: datafusion/src/physical_plan/windows.rs ## @@ -61,24 +63,27 @@ pub struct WindowAggExec { /// Create a physical expression for window function pub fn create_window_expr( fun: , +name: String, args: &[Arc], +_partition_by: &[Arc], +_order_by: &[PhysicalSortExpr], +_window_frame: WindowFrame, Review comment: this is fixed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist opened a new pull request #449: remove clippy unnecessary wraps
Jimexist opened a new pull request #449: URL: https://github.com/apache/arrow-rs/pull/449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10518: ARROW-13052: [C++][Gandiva] Implements REGEXP_EXTRACT function
github-actions[bot] commented on pull request #10518: URL: https://github.com/apache/arrow/pull/10518#issuecomment-859529990 https://issues.apache.org/jira/browse/ARROW-13052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-rs] Jimexist closed pull request #445: remove unnecessary wraps in sort
Jimexist closed pull request #445: URL: https://github.com/apache/arrow-rs/pull/445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on pull request #543: Ballista: Implement map-side shuffle
andygrove commented on pull request #543: URL: https://github.com/apache/arrow-datafusion/pull/543#issuecomment-859939472 @edrevo fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz closed pull request #10500: ARROW-13031: [JS] Support arm in closure compiler on macOS
domoritz closed pull request #10500: URL: https://github.com/apache/arrow/pull/10500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on pull request #10501: ARROW-13032: [Java] Update guava version
kiszk commented on pull request #10501: URL: https://github.com/apache/arrow/pull/10501#issuecomment-859690248 Is there any reason not to use the latest version https://mvnrepository.com/artifact/com.google.guava/guava/30.1.1-jre ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10519: ARROW-12867: [R] Bindings for abs()
github-actions[bot] commented on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-859534414 https://issues.apache.org/jira/browse/ARROW-12867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #10510: ARROW-13043: [GLib][Ruby] Add GArrowEqualOptions
kou commented on pull request #10510: URL: https://github.com/apache/arrow/pull/10510#issuecomment-859913376 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
nirandaperera commented on pull request #10487: URL: https://github.com/apache/arrow/pull/10487#issuecomment-859187060 @github-actions autotune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane closed pull request #10506: ARROW-13039: [R] Fix error message handling
jonkeane closed pull request #10506: URL: https://github.com/apache/arrow/pull/10506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #541: ShuffleReaderExec now supports multiple locations per partition
alamb commented on a change in pull request #541: URL: https://github.com/apache/arrow-datafusion/pull/541#discussion_r650205163 ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) .await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e))) +.into_iter() +.collect::>>()?; + +let result = WrappedStream::new( +Box::pin(futures::stream::iter(result).flatten()), Review comment: this is a clever way of flattening the streams (though I wonder if it will serialize them all (aka not start reading from the second until the first is entirely consumed)? ## File path: ballista/rust/client/src/context.rs ## @@ -74,32 +71,6 @@ impl BallistaContextState { } } -struct WrappedStream { -stream: Pin> + Send + Sync>>, -schema: SchemaRef, -} - -impl RecordBatchStream for WrappedStream { -fn schema() -> SchemaRef { -self.schema.clone() -} -} - -impl Stream for WrappedStream { -type Item = ArrowResult; - -fn poll_next( -mut self: Pin< Self>, -cx: std::task::Context<'_>, -) -> std::task::Poll> { -self.stream.poll_next_unpin(cx) -} - -fn size_hint() -> (usize, Option) { -self.stream.size_hint() -} -} - Review comment: this is a cool abstraction -- i have had need of something similar elsewhere -- perhaps it would be good to move to datafusion itself eventually ## File path: ballista/rust/core/src/execution_plans/shuffle_reader.rs ## @@ -86,23 +87,18 @@ impl ExecutionPlan for ShuffleReaderExec { partition: usize, ) -> Result>> { info!("ShuffleReaderExec::execute({})", partition); -let partition_location = _location[partition]; - -let mut client = BallistaClient::try_new( -_location.executor_meta.host, -partition_location.executor_meta.port, -) -.await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e)))?; -client -.fetch_partition( -_location.partition_id.job_id, -partition_location.partition_id.stage_id, -partition, -) +let x = self.partition[partition].clone(); +let result = future::join_all(x.into_iter().map(fetch_partition)) .await -.map_err(|e| DataFusionError::Execution(format!("Ballista Error: {:?}", e))) +.into_iter() +.collect::>>()?; + +let result = WrappedStream::new( +Box::pin(futures::stream::iter(result).flatten()), Review comment: I guess I figured I would point it out (that the different partitions wouldn't be producing in parallel) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org