Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
kennknowles commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286599532 > > The automatic opt in happens on the service side so I don't think it will trigger this. > > Actually, this is on the SDK side ( > > https://github.com/apache/beam/blob/b2d26b6b5f376db079679d620a812af25c4a90f8/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1387 > > ) - fixed by my change to only do it for v1. I mean that automatic opt in to v2 is on the service side. For jobs that do not specify v1 or v2, we submit the necessary information for both and the service decides which is which. Jobs like that won't trigger the problem. This bug doesn't impact the involuntary migration to v2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
damccorm commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286576847 > The automatic opt in happens on the service side so I don't think it will trigger this. Actually, this is on the SDK side (https://github.com/apache/beam/blob/b2d26b6b5f376db079679d620a812af25c4a90f8/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1387) - fixed by my change to only do it for v1. > Didn't someone add automatic upload_graph logic when the graph is too big, so we don't need to ever set it explicitly any more? It was always an extraneous knob that could/should be either automated or made the v1 default. Yes, but I assume folks still have it set manually -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
kennknowles commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286528990 Didn't someone add automatic `upload_graph` logic when the graph is too big, so we don't need to ever set it explicitly any more? It was always an extraneous knob that could/should be either automated or made the v1 default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
kennknowles commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286526045 The automatic opt in happens on the service side so I don't think it will trigger this. But I also agree that makes it do nothing is probably better than failing it, because if people have command lines assembled (either manually or automatically) and they add `user_runner_v2` we would like it to work if possible. So if they have some old v1 command line that has `upload_graph` on it, I think it is benign to ignore it, since there is no change in the semantics of the job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
damccorm commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286213770 Mostly, I don't want to fail the job because it harms our ability to auto-upgrade users to runner v2. Removing the option seems reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
liferoad commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2286155239 Agree with @kennknowles. We can just error it out when `upload_graph` is used with V2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
kennknowles commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2285146885 Yea, he most internally-consistent thing to do is to ignore or disable `upload_graph` if V2 is used. That'll be more targeted than a null check and won't mask other errors. Just to state it for posterity, though you probably already know: - The intent of `upload_graph` is that we upload the V1 post-translation Dataflow graph in case it is too big to be inlined in the CreateJob request due to various service-side limitations. - With V2 we always upload the portable proto and it is translated on the service side. When I say they may be mutually exclusive, I think there is a chance that some piece of code somewhere looks in the same variable for the uploaded file, since the mechanism looks the same if you aren't type-conscious. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
damccorm commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2284898025 Also reproduced on 2.56.0 ([example failures](https://github.com/apache/beam/actions/runs/10359229549/job/28675182221)) - another datapoint towards https://github.com/apache/beam/pull/30604 being the thing that exposed this FYI @kennknowles - looks like we missed an assumption. I think the easy fix may be to just do a null check on https://github.com/apache/beam/blob/edf4c4f5f19ef6ffd25493262261c713ba045980/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1404 before calling `clear()`, but I need to test it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]
damccorm commented on issue #32159: URL: https://github.com/apache/beam/issues/32159#issuecomment-2284872555 I think https://github.com/apache/beam/pull/30604 likely caused this. https://github.com/kennknowles/beam/blob/0a0186de555221a41fc851c805bf6b3b6714/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L431 is the only runner v1/v2 logic on this path AFAIK, it only reproes on v2, and I confirmed it does work on 2.55.1 (before this change) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org