Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-06 Thread Chamikara Jayalath
On Thu, May 6, 2021 at 4:58 AM Nir Gazit  wrote:

> Hey,
> Not sure I follow - from the code you sent me, it seems that the
> environment is chosen according to the pipeline options
> (JAVA_SDK_HARNESS_ENVIRONMENT is used by createOrGetDefaultEnvironment only
> AFAIU). So if I'm passing `--environment_type=EXTERNAL` why wouldn't I get
> to using an external env as in line 152 there?
>
>
Did you mean that you set a Python pipeline option ? Note that Python
pipeline options do not automatically get converted to Java pipeline
options when using cross-language transforms.


> Nir
>
> On Wed, May 5, 2021 at 12:07 AM Chamikara Jayalath 
> wrote:
>
>> When you use cross-language Java transforms from Python we use the
>> default environment for Java transforms which always gets set to Docker.
>>
>> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L447
>>
>> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L134
>>
>> I don't think we support running Java cross-language transforms in other
>> environments yet.
>>
>> Thanks,
>> Cham
>>
>> On Tue, May 4, 2021 at 12:30 PM Nir Gazit  wrote:
>>
>>> But looking at the code of the exception
>>> 
>>>  it
>>> seems that it tries to use docker only because it thinks it's in a docker
>>> environment, no? Shouldn't it use the external environment (that's available
>>> there as well
>>> )
>>> if it got that in PipelineOptions?
>>>
>>> Otherwise, how can I have docker on a k8s pod? I couldn't seem to find
>>> any examples for that and saw that it's highly unrecommended.
>>>
>>> Thanks!
>>> Nir
>>>
>>> On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
>>> wrote:
>>>
 Ah, I think you need the DOCKER environment to use cross-language
 transforms not the EXTERNAL environment (agree that the terminology is
 confusing).

 On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:

> Yes that’s on purpose. I’m running in Kubernetes which makes it hard
> to install docker on the pods so I don’t want to use the docker
> environment. That’s why I specified EXTERNAL environment in
> PipelineOptions. However, it seems that it doesn’t get propagated.
>
> On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
> wrote:
>
>> Is it possible that you don't have the "docker" command available in
>> your system ?
>>
>> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>>
>>> Hey,
>>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>>> environment. However, when the pipeline is run, the error below is 
>>> thrown,
>>> which implies that for some reason the external environment pipeline
>>> options didn't get in. When replacing the Kafka Source with an S3 source
>>> (for example) I don't get this error so it implies that it's
>>> somewhere around the external transform / expansion service area. How 
>>> can I
>>> debug this?
>>>
>>> Caused by: java.io.IOException: Cannot run program "docker":
>>> error=2, No such file or directory
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>>
>>


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-06 Thread Nir Gazit
Hey,
Not sure I follow - from the code you sent me, it seems that the
environment is chosen according to the pipeline options
(JAVA_SDK_HARNESS_ENVIRONMENT is used by createOrGetDefaultEnvironment only
AFAIU). So if I'm passing `--environment_type=EXTERNAL` why wouldn't I get
to using an external env as in line 152 there?

Nir

On Wed, May 5, 2021 at 12:07 AM Chamikara Jayalath 
wrote:

> When you use cross-language Java transforms from Python we use the default
> environment for Java transforms which always gets set to Docker.
>
> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L447
>
> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L134
>
> I don't think we support running Java cross-language transforms in other
> environments yet.
>
> Thanks,
> Cham
>
> On Tue, May 4, 2021 at 12:30 PM Nir Gazit  wrote:
>
>> But looking at the code of the exception
>> 
>>  it
>> seems that it tries to use docker only because it thinks it's in a docker
>> environment, no? Shouldn't it use the external environment (that's available
>> there as well
>> )
>> if it got that in PipelineOptions?
>>
>> Otherwise, how can I have docker on a k8s pod? I couldn't seem to find
>> any examples for that and saw that it's highly unrecommended.
>>
>> Thanks!
>> Nir
>>
>> On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
>> wrote:
>>
>>> Ah, I think you need the DOCKER environment to use cross-language
>>> transforms not the EXTERNAL environment (agree that the terminology is
>>> confusing).
>>>
>>> On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:
>>>
 Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
 install docker on the pods so I don’t want to use the docker environment.
 That’s why I specified EXTERNAL environment in PipelineOptions. However, it
 seems that it doesn’t get propagated.

 On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
 wrote:

> Is it possible that you don't have the "docker" command available in
> your system ?
>
> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>
>> Hey,
>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>> environment. However, when the pipeline is run, the error below is 
>> thrown,
>> which implies that for some reason the external environment pipeline
>> options didn't get in. When replacing the Kafka Source with an S3 source
>> (for example) I don't get this error so it implies that it's
>> somewhere around the external transform / expansion service area. How 
>> can I
>> debug this?
>>
>> Caused by: java.io.IOException: Cannot run program "docker": error=2,
>> No such file or directory
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>
>


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Chamikara Jayalath
When you use cross-language Java transforms from Python we use the default
environment for Java transforms which always gets set to Docker.
https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L447
https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L134

I don't think we support running Java cross-language transforms in other
environments yet.

Thanks,
Cham

On Tue, May 4, 2021 at 12:30 PM Nir Gazit  wrote:

> But looking at the code of the exception
> 
>  it
> seems that it tries to use docker only because it thinks it's in a docker
> environment, no? Shouldn't it use the external environment (that's available
> there as well
> )
> if it got that in PipelineOptions?
>
> Otherwise, how can I have docker on a k8s pod? I couldn't seem to find any
> examples for that and saw that it's highly unrecommended.
>
> Thanks!
> Nir
>
> On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
> wrote:
>
>> Ah, I think you need the DOCKER environment to use cross-language
>> transforms not the EXTERNAL environment (agree that the terminology is
>> confusing).
>>
>> On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:
>>
>>> Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
>>> install docker on the pods so I don’t want to use the docker environment.
>>> That’s why I specified EXTERNAL environment in PipelineOptions. However, it
>>> seems that it doesn’t get propagated.
>>>
>>> On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
>>> wrote:
>>>
 Is it possible that you don't have the "docker" command available in
 your system ?

 On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:

> Hey,
> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
> environment. However, when the pipeline is run, the error below is thrown,
> which implies that for some reason the external environment pipeline
> options didn't get in. When replacing the Kafka Source with an S3 source
> (for example) I don't get this error so it implies that it's
> somewhere around the external transform / expansion service area. How can 
> I
> debug this?
>
> Caused by: java.io.IOException: Cannot run program "docker": error=2,
> No such file or directory
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
> at
> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
> at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
> at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>



Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
But looking at the code of the exception

it
seems that it tries to use docker only because it thinks it's in a docker
environment, no? Shouldn't it use the external environment (that's available
there as well
)
if it got that in PipelineOptions?

Otherwise, how can I have docker on a k8s pod? I couldn't seem to find any
examples for that and saw that it's highly unrecommended.

Thanks!
Nir

On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
wrote:

> Ah, I think you need the DOCKER environment to use cross-language
> transforms not the EXTERNAL environment (agree that the terminology is
> confusing).
>
> On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:
>
>> Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
>> install docker on the pods so I don’t want to use the docker environment.
>> That’s why I specified EXTERNAL environment in PipelineOptions. However, it
>> seems that it doesn’t get propagated.
>>
>> On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
>> wrote:
>>
>>> Is it possible that you don't have the "docker" command available in
>>> your system ?
>>>
>>> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>>>
 Hey,
 I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
 environment. However, when the pipeline is run, the error below is thrown,
 which implies that for some reason the external environment pipeline
 options didn't get in. When replacing the Kafka Source with an S3 source
 (for example) I don't get this error so it implies that it's
 somewhere around the external transform / expansion service area. How can I
 debug this?

 Caused by: java.io.IOException: Cannot run program "docker": error=2,
 No such file or directory
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 at
 org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
 at
 org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
 at
 org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
 at
 org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
 at
 org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
 at
 org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)

>>>


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Chamikara Jayalath
Ah, I think you need the DOCKER environment to use cross-language
transforms not the EXTERNAL environment (agree that the terminology is
confusing).

On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:

> Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
> install docker on the pods so I don’t want to use the docker environment.
> That’s why I specified EXTERNAL environment in PipelineOptions. However, it
> seems that it doesn’t get propagated.
>
> On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
> wrote:
>
>> Is it possible that you don't have the "docker" command available in your
>> system ?
>>
>> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>>
>>> Hey,
>>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>>> environment. However, when the pipeline is run, the error below is thrown,
>>> which implies that for some reason the external environment pipeline
>>> options didn't get in. When replacing the Kafka Source with an S3 source
>>> (for example) I don't get this error so it implies that it's
>>> somewhere around the external transform / expansion service area. How can I
>>> debug this?
>>>
>>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
>>> such file or directory
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>>
>>


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
install docker on the pods so I don’t want to use the docker environment.
That’s why I specified EXTERNAL environment in PipelineOptions. However, it
seems that it doesn’t get propagated.

On Tue, 4 May 2021 at 20:59 Chamikara Jayalath  wrote:

> Is it possible that you don't have the "docker" command available in your
> system ?
>
> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>
>> Hey,
>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>> environment. However, when the pipeline is run, the error below is thrown,
>> which implies that for some reason the external environment pipeline
>> options didn't get in. When replacing the Kafka Source with an S3 source
>> (for example) I don't get this error so it implies that it's
>> somewhere around the external transform / expansion service area. How can I
>> debug this?
>>
>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
>> such file or directory
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>
>


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Chamikara Jayalath
Is it possible that you don't have the "docker" command available in your
system ?

On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:

> Hey,
> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
> environment. However, when the pipeline is run, the error below is thrown,
> which implies that for some reason the external environment pipeline
> options didn't get in. When replacing the Kafka Source with an S3 source
> (for example) I don't get this error so it implies that it's
> somewhere around the external transform / expansion service area. How can I
> debug this?
>
> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
> such file or directory
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
> at
> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
> at
> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
> at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
> at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>


Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
Hey,
I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
environment. However, when the pipeline is run, the error below is thrown,
which implies that for some reason the external environment pipeline
options didn't get in. When replacing the Kafka Source with an S3 source
(for example) I don't get this error so it implies that it's
somewhere around the external transform / expansion service area. How can I
debug this?

Caused by: java.io.IOException: Cannot run program "docker": error=2, No
such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at
org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
at
org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
at
org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
at
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
at
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
at
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)