On Thu, Oct 19, 2023 at 12:18 PM Kenneth Knowles wrote:
> +1 to more helpful guide on "how to usefully participate in RC validation"
> but also big +1 to Robert, Jack, Johanna.
>
> TL;DR the RC validation is an opportunity for downstream testing.
>
> Robert alluded to the origin of the
On Thu, Oct 19, 2023 at 12:53 PM Reuven Lax wrote:
>
> Is the schema Group transform (in Java) something along these lines?
Yes, for sure it is. It (and Python's and Typescript's equivalent) are
linked in the original post. The open question is how to best express
this in YAML.
> On Wed, Oct
Is the schema Group transform (in Java) something along these lines?
On Wed, Oct 18, 2023 at 1:11 PM Robert Bradshaw via dev
wrote:
> Beam Yaml has good support for IOs and mappings, but one key missing
> feature for even writing a WordCount is the ability to do Aggregations
> [1]. While the
Or are you specifically referring to the declarative YAML pipelines?
On Thu, Oct 19, 2023 at 12:53 PM Reuven Lax wrote:
> Is the schema Group transform (in Java) something along these lines?
>
> On Wed, Oct 18, 2023 at 1:11 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Beam
Yeah, I already implemented these partitioners for my use case (I just
pasted the classnames/docstrings for them) and I used both combiners.Top
and combiners.Sample.
In fact, before writing these partitioners I had misunderstood those
combiners and thought they would partition my pcollections.
FYI, there is a Top transform[1] that will fetch the greatest n elements in
Python SDK. It is not a partitioner but It may be useful for your reference.
[1]
https://github.com/apache/beam/blob/68e9c997a9085b0cb045238ae406d534011e7c21/sdks/python/apache_beam/transforms/combiners.py#L191
On Thu,
Yes, both need to be small enough to fit into state.
Yeah a percentage sampler would also be great, we have a bunch of use cases
for that ourselves. Not sure if it'd be too clever, but I was imagining
three public sampling partitioners: FixedSample, PercentageSample, and
Sample. Sample could
W
On Wed, Oct 18, 2023 at 4:19 PM Byron Ellis via dev
wrote:
> Awesome!
>
> On Wed, Oct 18, 2023 at 1:14 PM Alexey Romanenko
> wrote:
>
>> Heads up!
>>
>> Finally, all Avro-related code and Avro dependency, that was deprecated
>> before (see a message above), has been removed from Beam
+1 to more helpful guide on "how to usefully participate in RC validation"
but also big +1 to Robert, Jack, Johanna.
TL;DR the RC validation is an opportunity for downstream testing.
Robert alluded to the origin of the spreadsheet: I created it long ago to
validate that the human language on our
On Thu, Oct 19, 2023 at 11:12 AM Kenneth Knowles wrote:
>
> Using SQL expressions in strings is maybe OK given we are all
> relational all the time. Either way you have to define what the
> universe of `fn` is. Here's a compact possibility:
>
> type: Combine
> config:
> group_by: [field1,
I'm interested adding something like this, I could see these being
generally useful for a number of cases (one that immediately comes to mind
is partitioning datasets into train/test/validation sets and writing each
to a different place).
I'm assuming Top (or FixedSample) needs to be small enough
On Thu, Oct 19, 2023 at 11:42 AM Jan Lukavský wrote:
>
> On 10/19/23 19:41, Robert Bradshaw via dev wrote:
> > On Thu, Oct 19, 2023 at 10:25 AM Jan Lukavský wrote:
> >> On 10/19/23 18:28, Robert Bradshaw via dev wrote:
> >>> On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
> Rill is
Makes sense to me. Let's deprecate for the 2.52.0 release unless there is
some objection. You can also look at the maven central downloads (I believe
all PMC and maybe all committers can view this) compared to other Beam jars.
Kenn
On Mon, Oct 16, 2023 at 9:28 AM Jan Lukavský wrote:
> Sure,
On 10/19/23 19:41, Robert Bradshaw via dev wrote:
On Thu, Oct 19, 2023 at 10:25 AM Jan Lukavský wrote:
On 10/19/23 18:28, Robert Bradshaw via dev wrote:
On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
Rill is definitely SQL-oriented but I think that's going to be the most common.
Well I accidentally conflated "stateful" and "persisting", but anyhow
yea we aren't targeting to have one Beam primitive for each thing that
is probably a runner primitive.
On Thu, Oct 19, 2023 at 2:25 PM Kenneth Knowles wrote:
>
> On Fri, Oct 13, 2023 at 12:51 PM Jan Lukavský wrote:
> >
> >
On Fri, Oct 13, 2023 at 12:51 PM Jan Lukavský wrote:
>
> Hi,
>
> I think there's been already said nearly everything in this thread, but ...
> it is time for Friday discussions. :)
>
> Today I recalled of a discussion we've had long time ago, when we were
> designing Euphoria (btw, deprecating
Using SQL expressions in strings is maybe OK given we are all
relational all the time. Either way you have to define what the
universe of `fn` is. Here's a compact possibility:
type: Combine
config:
group_by: [field1, field2]
aggregates:
max_cost: "MAX(cost)"
total_cost: "SUM(cost)"
On Thu, Oct 19, 2023 at 10:25 AM Jan Lukavský wrote:
>
> On 10/19/23 18:28, Robert Bradshaw via dev wrote:
> > On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
> >> Rill is definitely SQL-oriented but I think that's going to be the most
> >> common. Dataframes are explicitly modeled on the
On 10/19/23 18:28, Robert Bradshaw via dev wrote:
On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
Rill is definitely SQL-oriented but I think that's going to be the most common.
Dataframes are explicitly modeled on the relational approach so that's going to
look a lot like SQL,
I think
On Thu, Oct 19, 2023 at 9:28 AM Robert Bradshaw wrote:
> On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
> >
> > Rill is definitely SQL-oriented but I think that's going to be the most
> common. Dataframes are explicitly modeled on the relational approach so
> that's going to look a lot like
On Thu, Oct 19, 2023 at 9:00 AM Byron Ellis wrote:
>
> Rill is definitely SQL-oriented but I think that's going to be the most
> common. Dataframes are explicitly modeled on the relational approach so
> that's going to look a lot like SQL,
I think pretty much any approach that fits here is
Rill is definitely SQL-oriented but I think that's going to be the most
common. Dataframes are explicitly modeled on the relational approach so
that's going to look a lot like SQL, which leaves us with S-style formulas
(which I like but are pretty niche) and I guess pivot tables coming from
the
Hey all,
While writing a few pipelines, I was surprised by how few partitioners
there were in the python SDK. I wrote a couple that are pretty generic and
possibly generally useful. Just wanted to do a quick poll to see if they
seem useful enough to be in the sdk's library of transforms. If so, I
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/29022 [Failing Test]:
24 matches
Mail list logo