Re: [PR] Fix upload_graph on v2 [beam]
kennknowles commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2297452105 (that was what caused the bug) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
kennknowles commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2297451860 If the user explicitly requests runner v2, we already clear the steps from the REST API request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
chamikaramj commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2297077025 > stage both the v1 dataflow graph and the portable pipeline to GCS Note that this will require service changes since I believe currently we use the same Dataflow job request property for both. But agree that if we are not hitting request RPC errors with Runner v2, trimming steps in the the Dataflow job request for Runner v2 might not be a priority. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
kennknowles commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2297057635 I think these are the cases: - choose v2: we don't even do the v1 translation, and `upload_graph` is turned off / does nothing, portable pipeline staged to GCS - choose v1: `upload_graph` works and also we will set it automatically if necessary - leave up to the service: v1 graph follows the usual v1 path and portable pipeline staged to GCS So there might be a corner case for a very large graph that does not choose v1 or v2. In this case, we will automatically turn on `upload_graph` and stage both the v1 dataflow graph and the portable pipeline to GCS. I am not certain if this works. What I said before would suggest it doesn't but I was probably wrong. Basically if the staging locations or service-side code paths have collisions then it won't work, otherwise it will. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
damccorm commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2295901909 > I think there's a secondary fix needed (separate PR) where we just minimize the unnecessary steps submitted in the Dataflow job request when using Runner v2 so the the request does not go over the Dataflow RPC limits (Dataflow with Runner v2 do not use such steps anyways but we just submit them for historical reasons). I don't think we need this - upload_graph has never worked with Runner v2 and users don't seem to be running into this problem, so I don't think the problem persists in the v2 case. There's also already some divergence here (e.g. https://github.com/apache/beam/blob/391dbf6b24355b613ade4446a94518fd61784722/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1236). There may be other places which need updating, but they at least don't seem problematic at this point -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
chamikaramj commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2294126864 I think there's a secondary fix needed (separate PR) where we just minimize the unnecessary steps submitted in the Dataflow job request when using Runner v2 so the the request does not go over the Dataflow RPC limits (Dataflow with Runner v2 do not use such steps anyways but we just submit them for historical reasons). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
damccorm merged PR #32165: URL: https://github.com/apache/beam/pull/32165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
github-actions[bot] commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2286528758 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
damccorm commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2286525632 R: @kennknowles @liferoad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
damccorm commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2286364836 Example run of real streaming jobs with this option - https://github.com/apache/beam/actions/runs/10371405701/job/28711629663 - I added it to the streaming test suite, ran the suite, then removed it after it was clear it worked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix upload_graph on v2 [beam]
damccorm commented on PR #32165: URL: https://github.com/apache/beam/pull/32165#issuecomment-2286231917 Example run of real streaming jobs with this option - https://github.com/apache/beam/actions/runs/10370496169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org