Vincent, I will be out in the sense that I cannot really engage myself into more
activities because I have apart of your review two more pending + other work to
finish so I prefer not to add more work I cannot finish. I am still available
for the review however so let’s get this finally finished
It has been really interesting to read all the perspectives in this thread and I
have even switched sides back and forth given the advantages / issues exposed
here, so it means we have clear pros/cons.
One ‘not so nice‘ discovery related to this discussion for me was BEAM-10375 [1]
tldr; Reads
OK, I'm +0 on this change. Using the PTransform as an element is
probably better than duplicating the full API on another interface,
and think it's worth getting this ublocked. This will require a Read2
if we have to add options in a upgrade-compatible way.
On Tue, Jul 7, 2020 at 3:19 PM Luke
On Tue, Jul 7, 2020 at 2:06 PM Luke Cwik wrote:
>
> Robert, the intent is that the Read object would use a schema coder and for
> XLang purposes would be no different then a POJO.
Just to clarify, you're saying that the Read PTransform would be
encoded via the schema coder? That still feels a
Robert, the intent is that the Read object would use a schema coder and for
XLang purposes would be no different then a POJO. The issue of how to deal
with closures applies to both equally and that is why I suggested to favor
using data over closures. Once there is an implementation for how to
On Thu, Jul 2, 2020 at 10:26 AM Kenneth Knowles wrote:
>
> On Wed, Jul 1, 2020 at 4:17 PM Eugene Kirpichov
> wrote:
>
>> Kenn - I don't mean an enum of common closures, I mean expressing
>> closures in a restricted sub-language such as the language of SQL
>> expressions.
>>
>
> My lack of
On Wed, Jul 1, 2020 at 4:17 PM Eugene Kirpichov
wrote:
> Kenn - I don't mean an enum of common closures, I mean expressing closures
> in a restricted sub-language such as the language of SQL expressions.
>
My lack of clarity: enums was my phrasing of Luke's item 1). I understood
what you meant.
I, too, have a general preference for using a POJO (and cross-language)
compatible config object for ReadAll rather than having the Read elements
themselves as PCollection elements. It seem the motivation is that the Read
object already has all the builder methods (plus some other configuration
Kenn - I don't mean an enum of common closures, I mean expressing closures
in a restricted sub-language such as the language of SQL expressions. That
would only work if there is a portable way to interpret SQL expressions,
but if there isn't, maybe there should be - for the sake of, well,
>From the Go side of the table, the Go language doesn't provide a mechanism
to serialize or access closure data, which means DoFns can't be functional
closures.This combined with the move to have the "Structural DoFns" be
serialized using Beam Schemas, has the net result that if Go transforms are
I don't think an enum of most common closures will work. The input types
are typically generics that are made concrete by the caller who also
provides the closures. I think Luke's (2) is the same idea as my "Java
still assembles it [using opaque Python closures/transforms]". It seems
like an
Yeah, mainly I just feel like dynamic reads and dynamic writes (and perhaps
not-yet-invented similar transforms of other kinds) are tightly related -
they are either very similar, or are duals of each other - so they should
use the same approach. If they are using different approaches, it is a
Kenn, I'm not too worried about closures since:
1) the expansion service for a transform could have a well set of defined
closures by name that are returned as serialized objects that don't need to
be interpretable by the caller
2) the language could store serialized functions of another language
Sorry for jumping into this late and casting a vote against the
consensus... but I think I'd prefer standardizing on a pattern like
PCollection rather than PCollection. That
approach clearly separates the parameters that are allowed to vary across a
ReadAll (the ones defined in
Eugene, let me see if I understand correctly. Your initial email suggested
we consider adopting the "dynamic write" style for ReadAll transforms.
Supposing Read had three fields x, y, and z, and you have a
PCollection with all the info you need.
(Ismaël's) "Read" way: Use a PTransform to
I'd like to raise one more time the question of consistency between dynamic
reads and dynamic writes, per my email at the beginning of the thread.
If the community prefers ReadAll to read from Read, then should
dynamicWrite's write to Write?
On Mon, Jun 29, 2020 at 8:57 AM Boyuan Zhang wrote:
>
It seems like most of us agree on the idea that ReadAll should read from
Read. I'm going to update the Kafka ReadAll with the same pattern.
Thanks for all your help!
On Fri, Jun 26, 2020 at 12:12 PM Chamikara Jayalath
wrote:
>
>
> On Fri, Jun 26, 2020 at 11:49 AM Luke Cwik wrote:
>
>> I would
On Fri, Jun 26, 2020 at 11:49 AM Luke Cwik wrote:
> I would also like to suggest that transforms that implement ReadAll via
> Read should also provide methods like:
>
> // Uses the specified values if unspecified in the input element from the
> PCollection.
> withDefaults(Read read);
> // Uses
I would also like to suggest that transforms that implement ReadAll via
Read should also provide methods like:
// Uses the specified values if unspecified in the input element from the
PCollection.
withDefaults(Read read);
// Uses the specified values regardless of what the input element from the
Ismael, it is good to hear that using Read as the input didn't have a bunch
of parameters that were being skipped/ignored. Also, for the polymorphism
issue you have to rely on the user correctly telling you the type in such a
way where it is a common ancestor of all the runtime types that will
Discussion regarding cross-language transforms is a slight tangent here.
But I think, in general, it's great if we can use existing transforms (for
example, IO connectors) as cross-language transforms without having to
build more composites (irrespective of whether in ExternalTransformBuilders
or
For unbounded SDF in Kafka, we also consider the upgrading/downgrading
compatibility in the pipeline update scenario(For detailed discussion,
please refer to
https://lists.apache.org/thread.html/raf073b8741317244339eb5b2bce844c0f9e0d700c3e4de392fc648d6%40%3Cdev.beam.apache.org%3E).
In order to
We forgot to mention (5) External.Config used in cross-lang, see KafkaIO
ExternalTransformBuilder. This approach is the predecessor of (4) and probably a
really good candidate to be replaced by the Row based Configuration Boyuan is
envisioning (so good to be aware of this).
Thanks for the clear
I had mentioned that approach 1 and approach 2 work for cross language. The
difference being that the cross language transform would take a well known
definition and convert it to the Read transform. A normal user would have a
pipeline that would look like:
1: PCollection -> PTransform(ReadAll) ->
I believe that the initial goal of unifying ReadAll as a general
"PTransform, PCollection>” was to reduce the
amount of code duplication and error-prone approach related to this. It makes
much sense since usually we have all needed configuration set in Read objects
and, as Ismaeil mentioned,
Sorry for the typo. I mean I think we can go with *(3)* and (4): use the
data type that is schema-aware as the input of ReadAll.
On Wed, Jun 24, 2020 at 7:42 PM Boyuan Zhang wrote:
> Thanks for the summary, Cham!
>
> I think we can go with (2) and (4): use the data type that is schema-aware
>
Thanks for the summary, Cham!
I think we can go with (2) and (4): use the data type that is schema-aware
as the input of ReadAll.
Converting Read into ReadAll helps us to stick with SDF-like IO. But only
having (3) is not enough to solve the problem of using ReadAll in x-lang
case.
The key
I see. So it seems like there are three options discussed so far when it
comes to defining source descriptors for ReadAll type transforms
(1) Use Read PTransform as the element type of the input PCollection
(2) Use a POJO that describes the source as the data element of the input
PCollection
(3)
I believe we do require PTransforms to be serializable since anonymous
DoFns typically capture the enclosing PTransform.
On Wed, Jun 24, 2020 at 10:52 AM Chamikara Jayalath
wrote:
> Seems like Read in PCollection refers to a transform, at least here:
>
Seems like Read in PCollection refers to a transform, at least here:
https://github.com/apache/beam/blob/master/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseIO.java#L353
I'm in favour of separating construction time transforms from execution
time data objects that we store
Hi Ismael,
I think the ReadAll in the IO connector refers to the IO with SDF
implementation despite the type of input, where Read refers to
UnboundedSource. One major pushback of using KafkaIO.Read as source
description is that not all configurations of KafkaIO.Read are meaningful
to populate
To provide additional context, the KafkaIO ReadAll transform takes a
PCollection. This KafkaSourceDescriptor is a POJO
that contains the configurable parameters for reading from Kafka. This is
different from the pattern that Ismael listed because they take
PCollection as input and the Read is the
Hi Ismael,
Thanks for taking this on. Have you considered an approach similar (or
dual) to FileIO.write(), where we in a sense also have to configure a
dynamic number different IO transforms of the same type (file writes)?
E.g. how in this example we configure many aspects of many file writes:
33 matches
Mail list logo