[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907722#comment-16907722 ] James Hutchison commented on BEAM-7930: --- >From what I can tell this is coming from the grouping steps > bundle_processor log spam using python SDK on dataflow runner > - > > Key: BEAM-7930 > URL: https://issues.apache.org/jira/browse/BEAM-7930 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Affects Versions: 2.13.0 >Reporter: James Hutchison >Priority: Minor > > When running my pipeline on dataflow, I can see in the stackdriver logs a > large amount of spam for the following messages (note that the numbers in > them change every message): > * [INFO] (bundle_processor.create_operation) No unique name set for > transform generatedPtransform-67 > * [INFO] (bundle_processor.create_operation) No unique name for transform -19 > * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port > for -19; using deprecated fallback. > I tried running locally using the debugger and setting breakpoints on where > these log messages originate using the direct runner and it never hit it, so > I don't know specifically what is causing them. > I also tried using the logging module to change the threshold and also mocked > out the logging attribute in the bundle_processor module to change the log > level to CRITICAL and I still see the log messages. > The pipeline is a streaming pipeline that reads from two pubsub topics, > merges the inputs and runs distinct on the inputs over each processing time > window, fetches from an external service, does processing, and inserts into > elasticsearch with failures going into bigquery. I notice the log messages > seem to cluster and this appears early on before any other log messages in > any of the other steps so I wonder if maybe this is coming from the pubsub > read or windowing portion. > Expected behavior: > * I don't expect to see these noisy log messages which seem to indicate > something is wrong > * The missing required coder_id message is at the ERROR log level so it > pollutes the error logs. I would expect this to be at the WARNING or INFO > level. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904983#comment-16904983 ] Ismaël Mejía commented on BEAM-7930: [~robertwb] You may know about this one? or can you pass to someone who can check. > bundle_processor log spam using python SDK on dataflow runner > - > > Key: BEAM-7930 > URL: https://issues.apache.org/jira/browse/BEAM-7930 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Affects Versions: 2.13.0 >Reporter: James Hutchison >Priority: Minor > > When running my pipeline on dataflow, I can see in the stackdriver logs a > large amount of spam for the following messages (note that the numbers in > them change every message): > * [INFO] (bundle_processor.create_operation) No unique name set for > transform generatedPtransform-67 > * [INFO] (bundle_processor.create_operation) No unique name for transform -19 > * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port > for -19; using deprecated fallback. > I tried running locally using the debugger and setting breakpoints on where > these log messages originate using the direct runner and it never hit it, so > I don't know specifically what is causing them. > I also tried using the logging module to change the threshold and also mocked > out the logging attribute in the bundle_processor module to change the log > level to CRITICAL and I still see the log messages. > The pipeline is a streaming pipeline that reads from two pubsub topics, > merges the inputs and runs distinct on the inputs over each processing time > window, fetches from an external service, does processing, and inserts into > elasticsearch with failures going into bigquery. I notice the log messages > seem to cluster and this appears early on before any other log messages in > any of the other steps so I wonder if maybe this is coming from the pubsub > read or windowing portion. > Expected behavior: > * I don't expect to see these noisy log messages which seem to indicate > something is wrong > * The missing required coder_id message is at the ERROR log level so it > pollutes the error logs. I would expect this to be at the WARNING or INFO > level. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7930) bundle_processor log spam using python SDK on dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903189#comment-16903189 ] James Hutchison commented on BEAM-7930: --- If this isn't already a known issue I can try to provide more information. > bundle_processor log spam using python SDK on dataflow runner > - > > Key: BEAM-7930 > URL: https://issues.apache.org/jira/browse/BEAM-7930 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.13.0 >Reporter: James Hutchison >Priority: Minor > > When running my pipeline on dataflow, I can see in the stackdriver logs a > large amount of spam for the following messages (note that the numbers in > them change every message): > * [INFO] (bundle_processor.create_operation) No unique name set for > transform generatedPtransform-67 > * [INFO] (bundle_processor.create_operation) No unique name for transform -19 > * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port > for -19; using deprecated fallback. > I tried using a breakpoint on where these log messages originate using the > direct runner and it never hit it, so I don't know specifically what is > causing them. > I also tried using the logging module to change the threshold and also mocked > out the logging attribute in the bundle_processor module to change the log > level to CRITICAL and I still see the log messages. > The pipeline is a streaming pipeline that reads from two pubsub topics, > merges the inputs and runs distinct on the inputs over each processing time > window, fetches from an external service, does processing, and inserts into > elasticsearch with failures going into bigquery. I notice the log messages > seem to cluster and this appears early on before any other log messages in > any of the other steps so I wonder if maybe this is coming from the pubsub > read or windowing portion. > Expected behavior: > * I don't expect to see these noisy log messages which seem to indicate > something is wrong > * The missing required coder_id message is at the ERROR log level so it > pollutes the error logs. I would expect this to be at the WARNING or INFO > level. -- This message was sent by Atlassian JIRA (v7.6.14#76016)