Re: [PR] Fix upload_graph on v2 [beam]

2024-08-19 Thread via GitHub


kennknowles commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2297452105

   (that was what caused the bug)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-19 Thread via GitHub


kennknowles commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2297451860

   If the user explicitly requests runner v2, we already clear the steps from 
the REST API request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-19 Thread via GitHub


chamikaramj commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2297077025

   > stage both the v1 dataflow graph and the portable pipeline to GCS
   
   Note that this will require service changes since I believe currently we use 
the same Dataflow job request property for both.
   
   But agree that if we are not hitting request RPC errors with Runner v2, 
trimming steps in the the Dataflow job request for Runner v2 might not be a 
priority.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-19 Thread via GitHub


kennknowles commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2297057635

   I think these are the cases:
   
- choose v2: we don't even do the v1 translation, and `upload_graph` is 
turned off / does nothing, portable pipeline staged to GCS
- choose v1: `upload_graph` works and also we will set it automatically if 
necessary
- leave up to the service: v1 graph follows the usual v1 path and portable 
pipeline staged to GCS
   
   So there might be a corner case for a very large graph that does not choose 
v1 or v2. In this case, we will automatically turn on `upload_graph` and stage 
both the v1 dataflow graph and the portable pipeline to GCS. I am not certain 
if this works. What I said before would suggest it doesn't but I was probably 
wrong. Basically if the staging locations or service-side code paths have 
collisions then it won't work, otherwise it will.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-19 Thread via GitHub


damccorm commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2295901909

   > I think there's a secondary fix needed (separate PR) where we just 
minimize the unnecessary steps submitted in the Dataflow job request when using 
Runner v2 so the the request does not go over the Dataflow RPC limits (Dataflow 
with Runner v2 do not use such steps anyways but we just submit them for 
historical reasons).
   
   I don't think we need this - upload_graph has never worked with Runner v2 
and users don't seem to be running into this problem, so I don't think the 
problem persists in the v2 case.
   
   There's also already some divergence here (e.g. 
https://github.com/apache/beam/blob/391dbf6b24355b613ade4446a94518fd61784722/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1236).
 There may be other places which need updating, but they at least don't seem 
problematic at this point


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-16 Thread via GitHub


chamikaramj commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2294126864

   I think there's a secondary fix needed (separate PR) where we just minimize 
the unnecessary steps submitted in the Dataflow job request when using Runner 
v2 so the the request does not go over the Dataflow RPC limits (Dataflow with 
Runner v2 do not use such steps anyways but we just submit them for historical 
reasons).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-13 Thread via GitHub


damccorm merged PR #32165:
URL: https://github.com/apache/beam/pull/32165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-13 Thread via GitHub


github-actions[bot] commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2286528758

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control. If you'd like to restart, comment 
`assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-13 Thread via GitHub


damccorm commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2286525632

   R: @kennknowles @liferoad 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-13 Thread via GitHub


damccorm commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2286364836

   Example run of real streaming jobs with this option - 
https://github.com/apache/beam/actions/runs/10371405701/job/28711629663 - I 
added it to the streaming test suite, ran the suite, then removed it after it 
was clear it worked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix upload_graph on v2 [beam]

2024-08-13 Thread via GitHub


damccorm commented on PR #32165:
URL: https://github.com/apache/beam/pull/32165#issuecomment-2286231917

   Example run of real streaming jobs with this option - 
https://github.com/apache/beam/actions/runs/10370496169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org