Re: Increase Portable SDK Harness share of memory?

2019-04-03 Thread Lukasz Cwik
Turns out much of the work was completed to populate and consume the urn +
payloads.

I have deprecated the single "url" field in enviornment with
https://github.com/apache/beam/pull/8213 which will allow us to close of
BEAM-5433.

On Mon, Apr 1, 2019 at 1:48 PM Lukasz Cwik  wrote:

> Yes, need to use the new fields everywhere and then deprecate the old
> fields.
>
> On Mon, Apr 1, 2019 at 1:33 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:
>>
>>> To clarify, docker isn't the only environment type we are using. We have
>>> a process based and "existing" environment mode that don't fit the current
>>> protobuf and is being worked around.
>>>
>>
>> Ah, understood.
>>
>>
>>> The idea would be to move to a URN + payload model like our PTransforms
>>> and coders with a docker specific one. Using the URN + payload would allow
>>> us to have a versioned way to update the environment specifications and
>>> deprecate/remove things that are ill defined.
>>>
>>
>> Makes sense to me. It looks like this migration path is already in place
>> in `message Environment` in beam_runner_api.proto, with `message
>> StandardEnvironments` enumerating some URNs and corresponding payload
>> messages just below. So is the gap just getting the two portable runners to
>> look at the new fields?
>>
>> Kenn
>>
>>
>>> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>>>


 On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:

> The intention is that these kinds of hints such as CPU and/or memory
> should be embedded in the environment specification that is associated 
> with
> the transforms that need resource hints.
>
> The environment spec is woefully ill prepared as it only has a docker
> URL right now.
>

 FWIW I think this is actually "extremely well prepared" :-)

 Protobuf is great for adding fields when you need more but removing is
 nearly impossible once deployed, so it is best to do the absolute minimum
 until you need to expand.

 Kenn


>
> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke 
> wrote:
>
>> A question came over the beam-go slack that I wasn't able to answer,
>> in particular for Dataflow*, is there a way to increase how much of a
>> Portable FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>
>> My assumption is that runners should manage it, and have the Runner
>> Harness side be as lightweight as possible, to operate under reasonable
>> memory bounds, allowing the user-code more room to spread, since it's
>> largely unknown.
>>
>> I saw there's the Provisioning API
>> 
>> which to communicates resource limits to the SDK side, but is there a way
>> to make the request (probably on job start up) in the other direction?
>>
>> I imagine it has to do with the container boot code, but I have only
>> vague knowledge of how that works at present.
>>
>> If there's a portable way for it, that's ideal, but I suspect this
>> will be require a Dataflow specific answer.
>>
>> Thanks!
>> Robert B
>>
>> *Dataflow doesn't support the Go SDK, but the Go SDK supports
>> Dataflow.
>>
>


Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Lukasz Cwik
Yes, need to use the new fields everywhere and then deprecate the old
fields.

On Mon, Apr 1, 2019 at 1:33 PM Kenneth Knowles  wrote:

>
>
> On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:
>
>> To clarify, docker isn't the only environment type we are using. We have
>> a process based and "existing" environment mode that don't fit the current
>> protobuf and is being worked around.
>>
>
> Ah, understood.
>
>
>> The idea would be to move to a URN + payload model like our PTransforms
>> and coders with a docker specific one. Using the URN + payload would allow
>> us to have a versioned way to update the environment specifications and
>> deprecate/remove things that are ill defined.
>>
>
> Makes sense to me. It looks like this migration path is already in place
> in `message Environment` in beam_runner_api.proto, with `message
> StandardEnvironments` enumerating some URNs and corresponding payload
> messages just below. So is the gap just getting the two portable runners to
> look at the new fields?
>
> Kenn
>
>
>> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>>>
 The intention is that these kinds of hints such as CPU and/or memory
 should be embedded in the environment specification that is associated with
 the transforms that need resource hints.

 The environment spec is woefully ill prepared as it only has a docker
 URL right now.

>>>
>>> FWIW I think this is actually "extremely well prepared" :-)
>>>
>>> Protobuf is great for adding fields when you need more but removing is
>>> nearly impossible once deployed, so it is best to do the absolute minimum
>>> until you need to expand.
>>>
>>> Kenn
>>>
>>>

 On Thu, Mar 28, 2019 at 8:45 AM Robert Burke 
 wrote:

> A question came over the beam-go slack that I wasn't able to answer,
> in particular for Dataflow*, is there a way to increase how much of a
> Portable FnAPI worker is dedicated for the SDK side, vs the Runner side?
>
> My assumption is that runners should manage it, and have the Runner
> Harness side be as lightweight as possible, to operate under reasonable
> memory bounds, allowing the user-code more room to spread, since it's
> largely unknown.
>
> I saw there's the Provisioning API
> 
> which to communicates resource limits to the SDK side, but is there a way
> to make the request (probably on job start up) in the other direction?
>
> I imagine it has to do with the container boot code, but I have only
> vague knowledge of how that works at present.
>
> If there's a portable way for it, that's ideal, but I suspect this
> will be require a Dataflow specific answer.
>
> Thanks!
> Robert B
>
> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>



Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Kenneth Knowles
On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:

> To clarify, docker isn't the only environment type we are using. We have a
> process based and "existing" environment mode that don't fit the current
> protobuf and is being worked around.
>

Ah, understood.


> The idea would be to move to a URN + payload model like our PTransforms
> and coders with a docker specific one. Using the URN + payload would allow
> us to have a versioned way to update the environment specifications and
> deprecate/remove things that are ill defined.
>

Makes sense to me. It looks like this migration path is already in place in
`message Environment` in beam_runner_api.proto, with `message
StandardEnvironments` enumerating some URNs and corresponding payload
messages just below. So is the gap just getting the two portable runners to
look at the new fields?

Kenn


> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>>
>>> The intention is that these kinds of hints such as CPU and/or memory
>>> should be embedded in the environment specification that is associated with
>>> the transforms that need resource hints.
>>>
>>> The environment spec is woefully ill prepared as it only has a docker
>>> URL right now.
>>>
>>
>> FWIW I think this is actually "extremely well prepared" :-)
>>
>> Protobuf is great for adding fields when you need more but removing is
>> nearly impossible once deployed, so it is best to do the absolute minimum
>> until you need to expand.
>>
>> Kenn
>>
>>
>>>
>>> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:
>>>
 A question came over the beam-go slack that I wasn't able to answer, in
 particular for Dataflow*, is there a way to increase how much of a Portable
 FnAPI worker is dedicated for the SDK side, vs the Runner side?

 My assumption is that runners should manage it, and have the Runner
 Harness side be as lightweight as possible, to operate under reasonable
 memory bounds, allowing the user-code more room to spread, since it's
 largely unknown.

 I saw there's the Provisioning API
 
 which to communicates resource limits to the SDK side, but is there a way
 to make the request (probably on job start up) in the other direction?

 I imagine it has to do with the container boot code, but I have only
 vague knowledge of how that works at present.

 If there's a portable way for it, that's ideal, but I suspect this will
 be require a Dataflow specific answer.

 Thanks!
 Robert B

 *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.

>>>


Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Lukasz Cwik
To clarify, docker isn't the only environment type we are using. We have a
process based and "existing" environment mode that don't fit the current
protobuf and is being worked around.

The idea would be to move to a URN + payload model like our PTransforms and
coders with a docker specific one. Using the URN + payload would allow us
to have a versioned way to update the environment specifications and
deprecate/remove things that are ill defined.

On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:

>
>
> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>
>> The intention is that these kinds of hints such as CPU and/or memory
>> should be embedded in the environment specification that is associated with
>> the transforms that need resource hints.
>>
>> The environment spec is woefully ill prepared as it only has a docker URL
>> right now.
>>
>
> FWIW I think this is actually "extremely well prepared" :-)
>
> Protobuf is great for adding fields when you need more but removing is
> nearly impossible once deployed, so it is best to do the absolute minimum
> until you need to expand.
>
> Kenn
>
>
>>
>> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:
>>
>>> A question came over the beam-go slack that I wasn't able to answer, in
>>> particular for Dataflow*, is there a way to increase how much of a Portable
>>> FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>>
>>> My assumption is that runners should manage it, and have the Runner
>>> Harness side be as lightweight as possible, to operate under reasonable
>>> memory bounds, allowing the user-code more room to spread, since it's
>>> largely unknown.
>>>
>>> I saw there's the Provisioning API
>>> 
>>> which to communicates resource limits to the SDK side, but is there a way
>>> to make the request (probably on job start up) in the other direction?
>>>
>>> I imagine it has to do with the container boot code, but I have only
>>> vague knowledge of how that works at present.
>>>
>>> If there's a portable way for it, that's ideal, but I suspect this will
>>> be require a Dataflow specific answer.
>>>
>>> Thanks!
>>> Robert B
>>>
>>> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>>>
>>


Re: Increase Portable SDK Harness share of memory?

2019-03-29 Thread Kenneth Knowles
On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:

> The intention is that these kinds of hints such as CPU and/or memory
> should be embedded in the environment specification that is associated with
> the transforms that need resource hints.
>
> The environment spec is woefully ill prepared as it only has a docker URL
> right now.
>

FWIW I think this is actually "extremely well prepared" :-)

Protobuf is great for adding fields when you need more but removing is
nearly impossible once deployed, so it is best to do the absolute minimum
until you need to expand.

Kenn


>
> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:
>
>> A question came over the beam-go slack that I wasn't able to answer, in
>> particular for Dataflow*, is there a way to increase how much of a Portable
>> FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>
>> My assumption is that runners should manage it, and have the Runner
>> Harness side be as lightweight as possible, to operate under reasonable
>> memory bounds, allowing the user-code more room to spread, since it's
>> largely unknown.
>>
>> I saw there's the Provisioning API
>> 
>> which to communicates resource limits to the SDK side, but is there a way
>> to make the request (probably on job start up) in the other direction?
>>
>> I imagine it has to do with the container boot code, but I have only
>> vague knowledge of how that works at present.
>>
>> If there's a portable way for it, that's ideal, but I suspect this will
>> be require a Dataflow specific answer.
>>
>> Thanks!
>> Robert B
>>
>> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>>
>


Re: Increase Portable SDK Harness share of memory?

2019-03-28 Thread Lukasz Cwik
The intention is that these kinds of hints such as CPU and/or memory should
be embedded in the environment specification that is associated with the
transforms that need resource hints.

The environment spec is woefully ill prepared as it only has a docker URL
right now.

On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:

> A question came over the beam-go slack that I wasn't able to answer, in
> particular for Dataflow*, is there a way to increase how much of a Portable
> FnAPI worker is dedicated for the SDK side, vs the Runner side?
>
> My assumption is that runners should manage it, and have the Runner
> Harness side be as lightweight as possible, to operate under reasonable
> memory bounds, allowing the user-code more room to spread, since it's
> largely unknown.
>
> I saw there's the Provisioning API
> 
> which to communicates resource limits to the SDK side, but is there a way
> to make the request (probably on job start up) in the other direction?
>
> I imagine it has to do with the container boot code, but I have only vague
> knowledge of how that works at present.
>
> If there's a portable way for it, that's ideal, but I suspect this will be
> require a Dataflow specific answer.
>
> Thanks!
> Robert B
>
> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>