Re: Bigtable for BeamSQL - question about the schema design

2020-11-16 Thread Piotr Szuberski
> >Is there a jira for this issue? Sorry for the delay. Luckily Rui answered it better than I would. > https://issues.apache.org/jira/browse/BEAM-10896 is the one that I am aware > of. Though it says to aim to improve UNNEST, I think it could improve > ARRAY in general. Also like Kenneth

Re: Getting ClassCastException

2020-11-16 Thread Sonam Ramchand
Here you are, org.apache.beam.sdk.extensions.sql.zetasql.translation.SqlOperators$1 cannot be cast to org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlAggFunction java.lang.ClassCastException: org.apache.beam.sdk.extensions.sql.zetasql.translation.SqlOperators$1 cannot be cast to

Website Revamp Update – Week 3 & 4

2020-11-16 Thread Griselda Cuevas
Hi Beam Community! Sprint 3 and 4 in the website revamp project have concluded and below is a recap of the work the team has done on it. The working notes can be found here [1] and the presentations from the past two meetings here [2]. The PRD has been split in the requirements doc and a

Re: PTransform Annotations Proposal

2020-11-16 Thread Robert Bradshaw
I agree things like GPU, high-mem, etc. belong to the environment. If annotations are truly advisory, one can imagine merging environments by taking the union of annotations and still producing a correct pipeline. (This would mean that annotations would have to be a multi-map...) On the other

Re: PTransform Annotations Proposal

2020-11-16 Thread Robert Burke
That's good historical context. But then we'd still need to codify the annotation would need to be optional, and not affect correctness. Conflicts become easier to manage, (as environments with conflicting annotations simply don't get merged, and stay as distinct environments) but are still

Re: PTransform Annotations Proposal

2020-11-16 Thread Kenneth Knowles
I am +1 to the proposal but believe it should be moved to the Environment. I could be convinced otherwise, but would want to really understand the details. I think we haven't done a great job communicating the purpose of the Environment proto. It was explicitly created for this purpose. 1. It

Re: Bigtable for BeamSQL - question about the schema design

2020-11-16 Thread Rui Wang
On Tue, Nov 10, 2020 at 10:25 AM Brian Hulette wrote: > > > On Tue, Nov 10, 2020 at 5:46 AM Piotr Szuberski < > piotr.szuber...@polidea.com> wrote: > >> Unfortunately according to the documentation, BeamSQL doesn't work well >> with ARRAY, like ARRAY> which I confirmed empirically. >> >> > Is

Re: Bigtable for BeamSQL - question about the schema design

2020-11-16 Thread Kenneth Knowles
If I recall correctly, we need to upgrade Calcite for this. On Tue, Nov 10, 2020 at 10:24 AM Brian Hulette wrote: > > > On Tue, Nov 10, 2020 at 5:46 AM Piotr Szuberski < > piotr.szuber...@polidea.com> wrote: > >> Unfortunately according to the documentation, BeamSQL doesn't work well >> with

Re: PTransform Annotations Proposal

2020-11-16 Thread Jan Lukavský
Minor correction, the CoGBK broadcast vs. full shuffle is probably not ideal example, because it still requires grouping the larger PCollection (if not already grouped). If we take Join PTransform that acts on cartesian product of these groups, then it works well. Jan On 11/16/20 8:39 PM,

Re: PTransform Annotations Proposal

2020-11-16 Thread Jan Lukavský
Hi, could this proposal be generalized to annotations of PCollections as well? Maybe that reduces to several types of annotations of a PTransform - e.g.  a) runtime annotations of a PTransform (that might be scheduling hints - i.e. schedule this task to nodes with GPUs, etc.)  b) output

Re: PTransform Annotations Proposal

2020-11-16 Thread Robert Burke
I imagine it has everything to do with the specific annotation to define that. The runner notionally doesn't need to do anything with them, as they are optional, and not required for correctness. On Mon, Nov 16, 2020, 10:56 AM Reuven Lax wrote: > PTransforms are hierarchical - namely a

Re: PTransform Annotations Proposal

2020-11-16 Thread Reuven Lax
PTransforms are hierarchical - namely a PTransform contains other PTransforms, and so on. Is the runner expected to resolve all annotations down to leaf nodes? What happens if that results in conflicting annotations? On Mon, Nov 16, 2020 at 10:54 AM Robert Burke wrote: > That's a good question.

Re: PTransform Annotations Proposal

2020-11-16 Thread Robert Burke
That's a good question. I think the main difference is a matter of scope. Annotations would apply to a PTransform while an environment applies to sets of transforms. A difference is the optional nature of the annotations they don't affect correctness. Runners don't need to do anything with them

Re: PTransform Annotations Proposal

2020-11-16 Thread Chad Dombrova
> > > Another example of an optional annotation is marking a transform to run on > secure hardware, or to give hints to profiling/dynamic analysis tools. > There seems to be a lot of overlap between this idea and Environments. Can you talk about how you feel they may be different or related?

Beam Dependency Check Report (2020-11-16)

2020-11-16 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue chromedriver-binary 86.0.4240.22.0

Re: PTransform Annotations Proposal

2020-11-16 Thread Reza Ardeshir Rokni
+1 having a NeedsRam(x) annotation would be incredibly helpful. On Fri, 13 Nov 2020 at 05:57, Robert Burke wrote: > (Disclaimer, Mirac and their team did approach me about this beforehand as > their interest is in the Go SDK.) > > +1 I think it's a good idea. As you've pointed out, there are

Re: Question about saving data to use across runner's instances

2020-11-16 Thread Reza Ardeshir Rokni
Hi, Do you have an upper bound on how large the file will become? If it's small enough to fit into a sideinput you may be able to make use of the Slow update sideinput pattern: https://beam.apache.org/documentation/patterns/side-inputs/ If not, then SatefulDoFn would be a good choice, but note