Re: Supporting Dynamic Destinations in a portable context

2024-04-03 Thread Robert Bradshaw via dev
On Wed, Apr 3, 2024 at 4:15 AM Kenneth Knowles  wrote:

> Let me summarize the most recent proposal on-list to frame my question
> about this last suggestion. It looks like this:
>
> 1. user has an element, call it `data`
> 2. user maps `data` to an arbitrary metadata row, call it `dest`
> 3. we can do things like shuffle on `dest` because it isn't too big
> 4. we map `dest` to a concrete destination (aka URL) to write to by a
> string format that uses fields of `dest`
>
> I believe steps 1-3 are identical is expressivity to non-portable
> DynamicDestinations. So Reuven the question is for step 4: what are the
> mappings from `dest` to URL that cannot be expressed by string formatting
> but need SQL or Lua, etc? That would be a useful guide to consideration of
> those possibilities.
>

I think any non-trivial mapping can be done in step 2. It may be possible
to come up with a case where something other than string substitution is
needed to be done to make dest small enough to shuffle, but I think that'd
be a really rare corner case, and then it's just an optimization rather
than feature completeness question.


> FWIW I think even if we add a mini-language that string formatting has
> better ease of use (can easily be displayed in UI, etc) so it would be the
> first choice, and more advanced stuff is a fallback for rare cases. So they
> are both valuable and I'd be happy to implement the easier-to-use path
> right away while we discuss.
>

+1. Note that this even lets us share the config "path/table/..." field
that is a static string for non-dynamic destinations.

In light of the above, let's avoid a complex mini-language. I'd start with
nothing but plugging things in w/o any formatting options.


> On Tue, Apr 2, 2024 at 2:59 PM Reuven Lax via dev 
> wrote:
>
>> I do suspect that over time we'll find more and more cases we can't
>> express, and will be asked to extend this little templating in more
>> directions. To head that off - could we easily just reuse an existing
>> language (SQL, LUA, something of the form?) instead of creating something
>> new?
>>
>> On Tue, Apr 2, 2024 at 8:55 AM Kenneth Knowles  wrote:
>>
>>> I really like this proposal. I think it has narrowed down and solved the
>>> essential problem of not shuffling excess redundant data, and also provides
>>> the vast majority of the functionality that a lambda would, with
>>> significantly better debugability and usability too, since the dynamic
>>> destination pattern string can be in display data, etc.
>>>
>>> Kenn
>>>
>>> On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax  wrote:

> Can the prefix still be generated programmatically at graph creation
> time?
>

 Yes. It's just a property of the transform passed by the user at
 configuration time.


> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw 
> wrote:
>
>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:
>>
>>> This does seem like the best compromise, though I think there will
>>> still end up being performance issues. A common pattern I've seen is 
>>> that
>>> there is a long common prefix to the dynamic destination followed the
>>> dynamic component. e.g. the destination might be
>>> long/common/path/to/destination/files/. In this case, the
>>> prefix is often much larger than messages themselves and is what gets
>>> effectively encoded in the lambda.
>>>
>>
>> The idea here is that the destination would be given as a format
>> string, say, "long/common/path/to/destination/files/{dest_info.user}".
>> Another way to put this is that we support (only) "lambdas" that are
>> represented as string substitutions. (The fact that dest_info does not 
>> have
>> to be part of the record, and can be the output of an arbitrary map if 
>> need
>> be, makes this restriction not so bad.)
>>
>> As well as solving the performance issues, I think this is actually a
>> pretty convenient and natural way for the user to name their destination
>> (for the common usecase, even easier than providing a lambda), and has 
>> the
>> benefit of being much more transparent than an arbitrary callable as well
>> for introspection (for both machine and human that may look at the
>> resulting pipeline).
>>
>>
>>> I'm not entirely sure how to address this in a portable context. We
>>> might simply have to accept the extra overhead when going cross 
>>> language.
>>>
>>> Reuven
>>>
>>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Thanks for putting this together, it will be a really
 useful feature to have.

 I am in favor of the string-pattern approaches. I think we need to
 support both the {record=..., dest_info=...} and the 

Re: Supporting Dynamic Destinations in a portable context

2024-04-03 Thread Kenneth Knowles
Let me summarize the most recent proposal on-list to frame my question
about this last suggestion. It looks like this:

1. user has an element, call it `data`
2. user maps `data` to an arbitrary metadata row, call it `dest`
3. we can do things like shuffle on `dest` because it isn't too big
4. we map `dest` to a concrete destination (aka URL) to write to by a
string format that uses fields of `dest`

I believe steps 1-3 are identical is expressivity to non-portable
DynamicDestinations. So Reuven the question is for step 4: what are the
mappings from `dest` to URL that cannot be expressed by string formatting
but need SQL or Lua, etc? That would be a useful guide to consideration of
those possibilities.

FWIW I think even if we add a mini-language that string formatting has
better ease of use (can easily be displayed in UI, etc) so it would be the
first choice, and more advanced stuff is a fallback for rare cases. So they
are both valuable and I'd be happy to implement the easier-to-use path
right away while we discuss.

Kenn

On Tue, Apr 2, 2024 at 2:59 PM Reuven Lax via dev 
wrote:

> I do suspect that over time we'll find more and more cases we can't
> express, and will be asked to extend this little templating in more
> directions. To head that off - could we easily just reuse an existing
> language (SQL, LUA, something of the form?) instead of creating something
> new?
>
> On Tue, Apr 2, 2024 at 8:55 AM Kenneth Knowles  wrote:
>
>> I really like this proposal. I think it has narrowed down and solved the
>> essential problem of not shuffling excess redundant data, and also provides
>> the vast majority of the functionality that a lambda would, with
>> significantly better debugability and usability too, since the dynamic
>> destination pattern string can be in display data, etc.
>>
>> Kenn
>>
>> On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax  wrote:
>>>
 Can the prefix still be generated programmatically at graph creation
 time?

>>>
>>> Yes. It's just a property of the transform passed by the user at
>>> configuration time.
>>>
>>>
 On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw 
 wrote:

> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:
>
>> This does seem like the best compromise, though I think there will
>> still end up being performance issues. A common pattern I've seen is that
>> there is a long common prefix to the dynamic destination followed the
>> dynamic component. e.g. the destination might be
>> long/common/path/to/destination/files/. In this case, the
>> prefix is often much larger than messages themselves and is what gets
>> effectively encoded in the lambda.
>>
>
> The idea here is that the destination would be given as a format
> string, say, "long/common/path/to/destination/files/{dest_info.user}".
> Another way to put this is that we support (only) "lambdas" that are
> represented as string substitutions. (The fact that dest_info does not 
> have
> to be part of the record, and can be the output of an arbitrary map if 
> need
> be, makes this restriction not so bad.)
>
> As well as solving the performance issues, I think this is actually a
> pretty convenient and natural way for the user to name their destination
> (for the common usecase, even easier than providing a lambda), and has the
> benefit of being much more transparent than an arbitrary callable as well
> for introspection (for both machine and human that may look at the
> resulting pipeline).
>
>
>> I'm not entirely sure how to address this in a portable context. We
>> might simply have to accept the extra overhead when going cross language.
>>
>> Reuven
>>
>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Thanks for putting this together, it will be a really useful feature
>>> to have.
>>>
>>> I am in favor of the string-pattern approaches. I think we need to
>>> support both the {record=..., dest_info=...} and the elide-fields
>>> approaches, as the former is nicer when one has a fixed representation 
>>> for
>>> the output record (e.g. a proto or avro schema) and the flattened form 
>>> for
>>> ease of use in more free-form contexts (e.g. when producing records from
>>> YAML and SQL).
>>>
>>> Also left some comments on the doc.
>>>
>>>
>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey all,

 There have been some conversations lately about how best to enable
 dynamic destinations in a portable context. Usually, this comes up for
 cross-language transforms and more recently for Beam YAML.

 I've started a short doc outlining some 

Re: Supporting Dynamic Destinations in a portable context

2024-04-02 Thread Reuven Lax via dev
I do suspect that over time we'll find more and more cases we can't
express, and will be asked to extend this little templating in more
directions. To head that off - could we easily just reuse an existing
language (SQL, LUA, something of the form?) instead of creating something
new?

On Tue, Apr 2, 2024 at 8:55 AM Kenneth Knowles  wrote:

> I really like this proposal. I think it has narrowed down and solved the
> essential problem of not shuffling excess redundant data, and also provides
> the vast majority of the functionality that a lambda would, with
> significantly better debugability and usability too, since the dynamic
> destination pattern string can be in display data, etc.
>
> Kenn
>
> On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax  wrote:
>>
>>> Can the prefix still be generated programmatically at graph creation
>>> time?
>>>
>>
>> Yes. It's just a property of the transform passed by the user at
>> configuration time.
>>
>>
>>> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw 
>>> wrote:
>>>
 On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:

> This does seem like the best compromise, though I think there will
> still end up being performance issues. A common pattern I've seen is that
> there is a long common prefix to the dynamic destination followed the
> dynamic component. e.g. the destination might be
> long/common/path/to/destination/files/. In this case, the
> prefix is often much larger than messages themselves and is what gets
> effectively encoded in the lambda.
>

 The idea here is that the destination would be given as a format
 string, say, "long/common/path/to/destination/files/{dest_info.user}".
 Another way to put this is that we support (only) "lambdas" that are
 represented as string substitutions. (The fact that dest_info does not have
 to be part of the record, and can be the output of an arbitrary map if need
 be, makes this restriction not so bad.)

 As well as solving the performance issues, I think this is actually a
 pretty convenient and natural way for the user to name their destination
 (for the common usecase, even easier than providing a lambda), and has the
 benefit of being much more transparent than an arbitrary callable as well
 for introspection (for both machine and human that may look at the
 resulting pipeline).


> I'm not entirely sure how to address this in a portable context. We
> might simply have to accept the extra overhead when going cross language.
>
> Reuven
>
> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for putting this together, it will be a really useful feature
>> to have.
>>
>> I am in favor of the string-pattern approaches. I think we need to
>> support both the {record=..., dest_info=...} and the elide-fields
>> approaches, as the former is nicer when one has a fixed representation 
>> for
>> the output record (e.g. a proto or avro schema) and the flattened form 
>> for
>> ease of use in more free-form contexts (e.g. when producing records from
>> YAML and SQL).
>>
>> Also left some comments on the doc.
>>
>>
>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey all,
>>>
>>> There have been some conversations lately about how best to enable
>>> dynamic destinations in a portable context. Usually, this comes up for
>>> cross-language transforms and more recently for Beam YAML.
>>>
>>> I've started a short doc outlining some routes we could take. The
>>> purpose is to establish a good standard for supporting dynamic 
>>> destinations
>>> with portability, one that can be applied to most use cases and IOs. 
>>> Please
>>> take a look and add any thoughts!
>>>
>>> https://s.apache.org/portable-dynamic-destinations
>>>
>>> Best,
>>> Ahmed
>>>
>>


Re: Supporting Dynamic Destinations in a portable context

2024-04-02 Thread Kenneth Knowles
I really like this proposal. I think it has narrowed down and solved the
essential problem of not shuffling excess redundant data, and also provides
the vast majority of the functionality that a lambda would, with
significantly better debugability and usability too, since the dynamic
destination pattern string can be in display data, etc.

Kenn

On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev 
wrote:

> On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax  wrote:
>
>> Can the prefix still be generated programmatically at graph creation time?
>>
>
> Yes. It's just a property of the transform passed by the user at
> configuration time.
>
>
>> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:
>>>
 This does seem like the best compromise, though I think there will
 still end up being performance issues. A common pattern I've seen is that
 there is a long common prefix to the dynamic destination followed the
 dynamic component. e.g. the destination might be
 long/common/path/to/destination/files/. In this case, the
 prefix is often much larger than messages themselves and is what gets
 effectively encoded in the lambda.

>>>
>>> The idea here is that the destination would be given as a format string,
>>> say, "long/common/path/to/destination/files/{dest_info.user}". Another way
>>> to put this is that we support (only) "lambdas" that are represented as
>>> string substitutions. (The fact that dest_info does not have to be part of
>>> the record, and can be the output of an arbitrary map if need be, makes
>>> this restriction not so bad.)
>>>
>>> As well as solving the performance issues, I think this is actually a
>>> pretty convenient and natural way for the user to name their destination
>>> (for the common usecase, even easier than providing a lambda), and has the
>>> benefit of being much more transparent than an arbitrary callable as well
>>> for introspection (for both machine and human that may look at the
>>> resulting pipeline).
>>>
>>>
 I'm not entirely sure how to address this in a portable context. We
 might simply have to accept the extra overhead when going cross language.

 Reuven

 On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
 dev@beam.apache.org> wrote:

> Thanks for putting this together, it will be a really useful feature
> to have.
>
> I am in favor of the string-pattern approaches. I think we need to
> support both the {record=..., dest_info=...} and the elide-fields
> approaches, as the former is nicer when one has a fixed representation for
> the output record (e.g. a proto or avro schema) and the flattened form for
> ease of use in more free-form contexts (e.g. when producing records from
> YAML and SQL).
>
> Also left some comments on the doc.
>
>
> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Hey all,
>>
>> There have been some conversations lately about how best to enable
>> dynamic destinations in a portable context. Usually, this comes up for
>> cross-language transforms and more recently for Beam YAML.
>>
>> I've started a short doc outlining some routes we could take. The
>> purpose is to establish a good standard for supporting dynamic 
>> destinations
>> with portability, one that can be applied to most use cases and IOs. 
>> Please
>> take a look and add any thoughts!
>>
>> https://s.apache.org/portable-dynamic-destinations
>>
>> Best,
>> Ahmed
>>
>


Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Robert Bradshaw via dev
On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax  wrote:

> Can the prefix still be generated programmatically at graph creation time?
>

Yes. It's just a property of the transform passed by the user at
configuration time.


> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw 
> wrote:
>
>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:
>>
>>> This does seem like the best compromise, though I think there will still
>>> end up being performance issues. A common pattern I've seen is that there
>>> is a long common prefix to the dynamic destination followed the dynamic
>>> component. e.g. the destination might be
>>> long/common/path/to/destination/files/. In this case, the
>>> prefix is often much larger than messages themselves and is what gets
>>> effectively encoded in the lambda.
>>>
>>
>> The idea here is that the destination would be given as a format string,
>> say, "long/common/path/to/destination/files/{dest_info.user}". Another way
>> to put this is that we support (only) "lambdas" that are represented as
>> string substitutions. (The fact that dest_info does not have to be part of
>> the record, and can be the output of an arbitrary map if need be, makes
>> this restriction not so bad.)
>>
>> As well as solving the performance issues, I think this is actually a
>> pretty convenient and natural way for the user to name their destination
>> (for the common usecase, even easier than providing a lambda), and has the
>> benefit of being much more transparent than an arbitrary callable as well
>> for introspection (for both machine and human that may look at the
>> resulting pipeline).
>>
>>
>>> I'm not entirely sure how to address this in a portable context. We
>>> might simply have to accept the extra overhead when going cross language.
>>>
>>> Reuven
>>>
>>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Thanks for putting this together, it will be a really useful feature to
 have.

 I am in favor of the string-pattern approaches. I think we need to
 support both the {record=..., dest_info=...} and the elide-fields
 approaches, as the former is nicer when one has a fixed representation for
 the output record (e.g. a proto or avro schema) and the flattened form for
 ease of use in more free-form contexts (e.g. when producing records from
 YAML and SQL).

 Also left some comments on the doc.


 On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
 dev@beam.apache.org> wrote:

> Hey all,
>
> There have been some conversations lately about how best to enable
> dynamic destinations in a portable context. Usually, this comes up for
> cross-language transforms and more recently for Beam YAML.
>
> I've started a short doc outlining some routes we could take. The
> purpose is to establish a good standard for supporting dynamic 
> destinations
> with portability, one that can be applied to most use cases and IOs. 
> Please
> take a look and add any thoughts!
>
> https://s.apache.org/portable-dynamic-destinations
>
> Best,
> Ahmed
>



Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Reuven Lax via dev
Can the prefix still be generated programmatically at graph creation time?

On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw  wrote:

> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:
>
>> This does seem like the best compromise, though I think there will still
>> end up being performance issues. A common pattern I've seen is that there
>> is a long common prefix to the dynamic destination followed the dynamic
>> component. e.g. the destination might be
>> long/common/path/to/destination/files/. In this case, the
>> prefix is often much larger than messages themselves and is what gets
>> effectively encoded in the lambda.
>>
>
> The idea here is that the destination would be given as a format string,
> say, "long/common/path/to/destination/files/{dest_info.user}". Another way
> to put this is that we support (only) "lambdas" that are represented as
> string substitutions. (The fact that dest_info does not have to be part of
> the record, and can be the output of an arbitrary map if need be, makes
> this restriction not so bad.)
>
> As well as solving the performance issues, I think this is actually a
> pretty convenient and natural way for the user to name their destination
> (for the common usecase, even easier than providing a lambda), and has the
> benefit of being much more transparent than an arbitrary callable as well
> for introspection (for both machine and human that may look at the
> resulting pipeline).
>
>
>> I'm not entirely sure how to address this in a portable context. We might
>> simply have to accept the extra overhead when going cross language.
>>
>> Reuven
>>
>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Thanks for putting this together, it will be a really useful feature to
>>> have.
>>>
>>> I am in favor of the string-pattern approaches. I think we need to
>>> support both the {record=..., dest_info=...} and the elide-fields
>>> approaches, as the former is nicer when one has a fixed representation for
>>> the output record (e.g. a proto or avro schema) and the flattened form for
>>> ease of use in more free-form contexts (e.g. when producing records from
>>> YAML and SQL).
>>>
>>> Also left some comments on the doc.
>>>
>>>
>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey all,

 There have been some conversations lately about how best to enable
 dynamic destinations in a portable context. Usually, this comes up for
 cross-language transforms and more recently for Beam YAML.

 I've started a short doc outlining some routes we could take. The
 purpose is to establish a good standard for supporting dynamic destinations
 with portability, one that can be applied to most use cases and IOs. Please
 take a look and add any thoughts!

 https://s.apache.org/portable-dynamic-destinations

 Best,
 Ahmed

>>>


Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Robert Bradshaw via dev
On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax  wrote:

> This does seem like the best compromise, though I think there will still
> end up being performance issues. A common pattern I've seen is that there
> is a long common prefix to the dynamic destination followed the dynamic
> component. e.g. the destination might be
> long/common/path/to/destination/files/. In this case, the
> prefix is often much larger than messages themselves and is what gets
> effectively encoded in the lambda.
>

The idea here is that the destination would be given as a format string,
say, "long/common/path/to/destination/files/{dest_info.user}". Another way
to put this is that we support (only) "lambdas" that are represented as
string substitutions. (The fact that dest_info does not have to be part of
the record, and can be the output of an arbitrary map if need be, makes
this restriction not so bad.)

As well as solving the performance issues, I think this is actually a
pretty convenient and natural way for the user to name their destination
(for the common usecase, even easier than providing a lambda), and has the
benefit of being much more transparent than an arbitrary callable as well
for introspection (for both machine and human that may look at the
resulting pipeline).


> I'm not entirely sure how to address this in a portable context. We might
> simply have to accept the extra overhead when going cross language.
>
> Reuven
>
> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for putting this together, it will be a really useful feature to
>> have.
>>
>> I am in favor of the string-pattern approaches. I think we need to
>> support both the {record=..., dest_info=...} and the elide-fields
>> approaches, as the former is nicer when one has a fixed representation for
>> the output record (e.g. a proto or avro schema) and the flattened form for
>> ease of use in more free-form contexts (e.g. when producing records from
>> YAML and SQL).
>>
>> Also left some comments on the doc.
>>
>>
>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey all,
>>>
>>> There have been some conversations lately about how best to enable
>>> dynamic destinations in a portable context. Usually, this comes up for
>>> cross-language transforms and more recently for Beam YAML.
>>>
>>> I've started a short doc outlining some routes we could take. The
>>> purpose is to establish a good standard for supporting dynamic destinations
>>> with portability, one that can be applied to most use cases and IOs. Please
>>> take a look and add any thoughts!
>>>
>>> https://s.apache.org/portable-dynamic-destinations
>>>
>>> Best,
>>> Ahmed
>>>
>>


Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Ahmed Abualsaud via dev
> This does seem like the best compromise, though I think there will still
end up being performance issues. A common pattern I've seen is that there
is a long common prefix to the dynamic destination followed the dynamic
component. e.g. the destination might be
long/common/path/to/destination/files/. In this case, the
prefix is often much larger than messages themselves and is what gets
effectively encoded in the lambda.

The last option is meant to address this issue. The prefix is specified in
the configuration instead of being present with each message. The "K" in KV
will contain just the part(s) to be appended to the prefix (via string
substitution). This way only the minimal/necessary destination information
gets encoded with the message.

On Wed, Mar 27, 2024 at 12:12 PM Reuven Lax via dev 
wrote:

> This does seem like the best compromise, though I think there will still
> end up being performance issues. A common pattern I've seen is that there
> is a long common prefix to the dynamic destination followed the dynamic
> component. e.g. the destination might be
> long/common/path/to/destination/files/. In this case, the
> prefix is often much larger than messages themselves and is what gets
> effectively encoded in the lambda.
>
> I'm not entirely sure how to address this in a portable context. We might
> simply have to accept the extra overhead when going cross language.
>
> Reuven
>
> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for putting this together, it will be a really useful feature to
>> have.
>>
>> I am in favor of the string-pattern approaches. I think we need to
>> support both the {record=..., dest_info=...} and the elide-fields
>> approaches, as the former is nicer when one has a fixed representation for
>> the output record (e.g. a proto or avro schema) and the flattened form for
>> ease of use in more free-form contexts (e.g. when producing records from
>> YAML and SQL).
>>
>> Also left some comments on the doc.
>>
>>
>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey all,
>>>
>>> There have been some conversations lately about how best to enable
>>> dynamic destinations in a portable context. Usually, this comes up for
>>> cross-language transforms and more recently for Beam YAML.
>>>
>>> I've started a short doc outlining some routes we could take. The
>>> purpose is to establish a good standard for supporting dynamic destinations
>>> with portability, one that can be applied to most use cases and IOs. Please
>>> take a look and add any thoughts!
>>>
>>> https://s.apache.org/portable-dynamic-destinations
>>>
>>> Best,
>>> Ahmed
>>>
>>


Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Reuven Lax via dev
This does seem like the best compromise, though I think there will still
end up being performance issues. A common pattern I've seen is that there
is a long common prefix to the dynamic destination followed the dynamic
component. e.g. the destination might be
long/common/path/to/destination/files/. In this case, the
prefix is often much larger than messages themselves and is what gets
effectively encoded in the lambda.

I'm not entirely sure how to address this in a portable context. We might
simply have to accept the extra overhead when going cross language.

Reuven

On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev 
wrote:

> Thanks for putting this together, it will be a really useful feature to
> have.
>
> I am in favor of the string-pattern approaches. I think we need to support
> both the {record=..., dest_info=...} and the elide-fields approaches, as
> the former is nicer when one has a fixed representation for the
> output record (e.g. a proto or avro schema) and the flattened form for ease
> of use in more free-form contexts (e.g. when producing records from YAML
> and SQL).
>
> Also left some comments on the doc.
>
>
> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Hey all,
>>
>> There have been some conversations lately about how best to enable
>> dynamic destinations in a portable context. Usually, this comes up for
>> cross-language transforms and more recently for Beam YAML.
>>
>> I've started a short doc outlining some routes we could take. The purpose
>> is to establish a good standard for supporting dynamic destinations with
>> portability, one that can be applied to most use cases and IOs. Please take
>> a look and add any thoughts!
>>
>> https://s.apache.org/portable-dynamic-destinations
>>
>> Best,
>> Ahmed
>>
>


Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Robert Bradshaw via dev
Thanks for putting this together, it will be a really useful feature to
have.

I am in favor of the string-pattern approaches. I think we need to support
both the {record=..., dest_info=...} and the elide-fields approaches, as
the former is nicer when one has a fixed representation for the
output record (e.g. a proto or avro schema) and the flattened form for ease
of use in more free-form contexts (e.g. when producing records from YAML
and SQL).

Also left some comments on the doc.


On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev 
wrote:

> Hey all,
>
> There have been some conversations lately about how best to enable dynamic
> destinations in a portable context. Usually, this comes up for
> cross-language transforms and more recently for Beam YAML.
>
> I've started a short doc outlining some routes we could take. The purpose
> is to establish a good standard for supporting dynamic destinations with
> portability, one that can be applied to most use cases and IOs. Please take
> a look and add any thoughts!
>
> https://s.apache.org/portable-dynamic-destinations
>
> Best,
> Ahmed
>