[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2020-06-01 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-7975:

Labels: stale-P2  (was: )

> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: P2
>  Labels: stale-P2
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups isn't scaling as well as I 
> expect. Perhaps this is contributing (how do I tell if workers are being 
> utilized or not?)
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7975:
---
Status: Open  (was: Triage Needed)

> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups isn't scaling as well as I 
> expect. Perhaps this is contributing (how do I tell if workers are being 
> utilized or not?)
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-14 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups isn't scaling as well as I 
expect. Perhaps this is contributing (how do I tell if workers are being 
utilized or not?)

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
this when running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
this when running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups isn't scaling as well as I 
> expect. Perhaps this is contributing (how do I tell if workers are being 
> utilized or not?)
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> 

[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
this when running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 workers in streaming mode. I don't see this when 
> running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I haven't checked to 
see if number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 and 7 workers in streaming mode. I don't see this when 
> running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I haven't checked to 
see if number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 7 workers in streaming mode. I haven't checked to see if 
number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 and 7 workers in streaming mode. I haven't checked to 
> see if number of workers is a factor.
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA