[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner

2019-08-14 Thread James Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907722#comment-16907722
 ] 

James Hutchison commented on BEAM-7930:
---

>From what I can tell this is coming from the grouping steps

> bundle_processor log spam using python SDK on dataflow runner
> -
>
> Key: BEAM-7930
> URL: https://issues.apache.org/jira/browse/BEAM-7930
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Minor
>
> When running my pipeline on dataflow, I can see in the stackdriver logs a 
> large amount of spam for the following messages (note that the numbers in 
> them change every message):
>  * [INFO] (bundle_processor.create_operation) No unique name set for 
> transform generatedPtransform-67
>  * [INFO] (bundle_processor.create_operation) No unique name for transform -19
>  * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port 
> for -19; using deprecated fallback.
> I tried running locally using the debugger and setting breakpoints on where 
> these log messages originate using the direct runner and it never hit it, so 
> I don't know specifically what is causing them.
> I also tried using the logging module to change the threshold and also mocked 
> out the logging attribute in the bundle_processor module to change the log 
> level to CRITICAL and I still see the log messages.
> The pipeline is a streaming pipeline that reads from two pubsub topics, 
> merges the inputs and runs distinct on the inputs over each processing time 
> window, fetches from an external service, does processing, and inserts into 
> elasticsearch with failures going into bigquery. I notice the log messages 
> seem to cluster and this appears early on before any other log messages in 
> any of the other steps so I wonder if maybe this is coming from the pubsub 
> read or windowing portion.
> Expected behavior:
>  * I don't expect to see these noisy log messages which seem to indicate 
> something is wrong
>  * The missing required coder_id message is at the ERROR log level so it 
> pollutes the error logs. I would expect this to be at the WARNING or INFO 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner

2019-08-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904983#comment-16904983
 ] 

Ismaël Mejía commented on BEAM-7930:


[~robertwb] You may know about this one? or can you pass to someone who can 
check.

> bundle_processor log spam using python SDK on dataflow runner
> -
>
> Key: BEAM-7930
> URL: https://issues.apache.org/jira/browse/BEAM-7930
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Minor
>
> When running my pipeline on dataflow, I can see in the stackdriver logs a 
> large amount of spam for the following messages (note that the numbers in 
> them change every message):
>  * [INFO] (bundle_processor.create_operation) No unique name set for 
> transform generatedPtransform-67
>  * [INFO] (bundle_processor.create_operation) No unique name for transform -19
>  * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port 
> for -19; using deprecated fallback.
> I tried running locally using the debugger and setting breakpoints on where 
> these log messages originate using the direct runner and it never hit it, so 
> I don't know specifically what is causing them.
> I also tried using the logging module to change the threshold and also mocked 
> out the logging attribute in the bundle_processor module to change the log 
> level to CRITICAL and I still see the log messages.
> The pipeline is a streaming pipeline that reads from two pubsub topics, 
> merges the inputs and runs distinct on the inputs over each processing time 
> window, fetches from an external service, does processing, and inserts into 
> elasticsearch with failures going into bigquery. I notice the log messages 
> seem to cluster and this appears early on before any other log messages in 
> any of the other steps so I wonder if maybe this is coming from the pubsub 
> read or windowing portion.
> Expected behavior:
>  * I don't expect to see these noisy log messages which seem to indicate 
> something is wrong
>  * The missing required coder_id message is at the ERROR log level so it 
> pollutes the error logs. I would expect this to be at the WARNING or INFO 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner

2019-08-08 Thread James Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903189#comment-16903189
 ] 

James Hutchison commented on BEAM-7930:
---

If this isn't already a known issue I can try to provide more information.

> bundle_processor log spam using python SDK on dataflow runner
> -
>
> Key: BEAM-7930
> URL: https://issues.apache.org/jira/browse/BEAM-7930
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Minor
>
> When running my pipeline on dataflow, I can see in the stackdriver logs a 
> large amount of spam for the following messages (note that the numbers in 
> them change every message):
>  * [INFO] (bundle_processor.create_operation) No unique name set for 
> transform generatedPtransform-67
>  * [INFO] (bundle_processor.create_operation) No unique name for transform -19
>  * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port 
> for -19; using deprecated fallback.
> I tried using a breakpoint on where these log messages originate using the 
> direct runner and it never hit it, so I don't know specifically what is 
> causing them.
> I also tried using the logging module to change the threshold and also mocked 
> out the logging attribute in the bundle_processor module to change the log 
> level to CRITICAL and I still see the log messages.
> The pipeline is a streaming pipeline that reads from two pubsub topics, 
> merges the inputs and runs distinct on the inputs over each processing time 
> window, fetches from an external service, does processing, and inserts into 
> elasticsearch with failures going into bigquery. I notice the log messages 
> seem to cluster and this appears early on before any other log messages in 
> any of the other steps so I wonder if maybe this is coming from the pubsub 
> read or windowing portion.
> Expected behavior:
>  * I don't expect to see these noisy log messages which seem to indicate 
> something is wrong
>  * The missing required coder_id message is at the ERROR log level so it 
> pollutes the error logs. I would expect this to be at the WARNING or INFO 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)