Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


kennknowles commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286599532

   > > The automatic opt in happens on the service side so I don't think it 
will trigger this.
   > 
   > Actually, this is on the SDK side (
   > 
   > 
https://github.com/apache/beam/blob/b2d26b6b5f376db079679d620a812af25c4a90f8/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1387
   > 
   > ) - fixed by my change to only do it for v1.
   
   I mean that automatic opt in to v2 is on the service side. For jobs that do 
not specify v1 or v2, we submit the necessary information for both and the 
service decides which is which. Jobs like that won't trigger the problem. This 
bug doesn't impact the involuntary migration to v2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


damccorm commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286576847

   > The automatic opt in happens on the service side so I don't think it will 
trigger this.
   
   Actually, this is on the SDK side 
(https://github.com/apache/beam/blob/b2d26b6b5f376db079679d620a812af25c4a90f8/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1387)
 - fixed by my change to only do it for v1.
   
   > Didn't someone add automatic upload_graph logic when the graph is too big, 
so we don't need to ever set it explicitly any more? It was always an 
extraneous knob that could/should be either automated or made the v1 default.
   
   Yes, but I assume folks still have it set manually


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


kennknowles commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286528990

   Didn't someone add automatic `upload_graph` logic when the graph is too big, 
so we don't need to ever set it explicitly any more? It was always an 
extraneous knob that could/should be either automated or made the v1 default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


kennknowles commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286526045

   The automatic opt in happens on the service side so I don't think it will 
trigger this.
   
   But I also agree that makes it do nothing is probably better than failing 
it, because if people have command lines assembled (either manually or 
automatically) and they add `user_runner_v2` we would like it to work if 
possible. So if they have some old v1 command line that has `upload_graph` on 
it, I think it is benign to ignore it, since there is no change in the 
semantics of the job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


damccorm commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286213770

   Mostly, I don't want to fail the job because it harms our ability to 
auto-upgrade users to runner v2. Removing the option seems reasonable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-13 Thread via GitHub


liferoad commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2286155239

   Agree with @kennknowles. We can just error it out when `upload_graph` is 
used with V2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-12 Thread via GitHub


kennknowles commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2285146885

   Yea, he most internally-consistent thing to do is to ignore or disable 
`upload_graph` if V2 is used. That'll be more targeted than a null check and 
won't mask other errors.
   
   Just to state it for posterity, though you probably already know:
   
- The intent of `upload_graph` is that we upload the V1 post-translation 
Dataflow graph in case it is too big to be inlined in the CreateJob request due 
to various service-side limitations.
- With V2 we always upload the portable proto and it is translated on the 
service side.
   
   When I say they may be mutually exclusive, I think there is a chance that 
some piece of code somewhere looks in the same variable for the uploaded file, 
since the mechanism looks the same if you aren't type-conscious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-12 Thread via GitHub


damccorm commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2284898025

   Also reproduced on 2.56.0 ([example 
failures](https://github.com/apache/beam/actions/runs/10359229549/job/28675182221))
 - another datapoint towards https://github.com/apache/beam/pull/30604 being 
the thing that exposed this 
   
   FYI @kennknowles - looks like we missed an assumption. I think the easy fix 
may be to just do a null check on 
https://github.com/apache/beam/blob/edf4c4f5f19ef6ffd25493262261c713ba045980/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1404
 before calling `clear()`, but I need to test it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: upload_graph option does not work with Dataflow Java Runner v2 [beam]

2024-08-12 Thread via GitHub


damccorm commented on issue #32159:
URL: https://github.com/apache/beam/issues/32159#issuecomment-2284872555

   I think https://github.com/apache/beam/pull/30604 likely caused this.
   
   
https://github.com/kennknowles/beam/blob/0a0186de555221a41fc851c805bf6b3b6714/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L431
 is the only runner v1/v2 logic on this path AFAIK, it only reproes on v2, and 
I confirmed it does work on 2.55.1 (before this change)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org