Do you see any errors in Dataflow Cloud Console logs ? Note that you have to click the Worker Logs tab and select all "log names" under Dataflow to see all logs.
I suspect your job might not be starting up properly but hard to say without looking at details. On Thu, Aug 25, 2022 at 5:36 PM Drew Forbes <[email protected]> wrote: > Yeah I'm not sure, I've tried a couple different ways to run it and still > no luck. I'm able to read from that subscription with a Java beam job > (which is likely what you were seeing in the internal metrics) but I just > can't get the python job to do anything. > > Unfortunately my Java skills have atrophied since college, and I'm having > a lot of trouble developing within the Beam paradigm in Java, so we're > going to try moving in a different direction for now. > > On Wed, Aug 24, 2022 at 11:53 AM Daniel Collins <[email protected]> > wrote: > >> Hello Drew, >> >> The object type is the SequencedMessage type here >> https://github.com/googleapis/python-pubsublite/blob/b77cf6ddeaae4e950ed069b652a22a1fc79f74ea/google/cloud/pubsublite_v1/types/common.py#L109, >> so the correct lambda is likely `lambda x: json.loads(x.message.data)` >> >> It does appear from internal metrics that your client is reading data >> from that subscription. If your json.loads call fails, I'm unsure why this >> wouldn't surface as an error in the runtime. >> >> -Daniel >> >> On Tuesday, August 23, 2022 at 2:52:54 PM UTC-4 Drew Forbes wrote: >> >>> Hey all, thank you for these updates. We ran into some other issues with >>> our Spark - PubSubLite pipeline so I've had time to re-evaluate Beam with >>> PubSubLite. I was able to get the package importing correctly and the job >>> to build using "from apache_beam.io.gcp.pubsublite import *". Not sure why >>> that worked when the other permutations on that failed, but I'll take it. >>> >>> At risk of turning this into a troubleshooting thread (please feel free >>> to turn me away if that's not something y'all have interest in), I'm going >>> to ask if you've got any ideas on why this pipeline isn't actually reading >>> from PSL. I'm submitting jobs either to DirectRunner or DataflowRunner and >>> they are spinning up and staying up without error, but they're not actually >>> reading anything from the PSL subscription and thus not doing any work. >>> I've got several Java Beam jobs that read correctly from PSL subscriptions >>> as well as a test Python Beam job that can read from regular PubSub, but >>> I'm not sure what's happening here. There aren't any logs to go off of >>> either. >>> >>> Below is the pipeline code, I actually expect this to fail since I'm not >>> sure how to do the parsing, this is the same code I used with >>> ReadFromPubSub except for with Lite. But I'm not even getting errors yet >>> because it's not reading from the Subscription. Any rough ideas I can try? >>> >>> with beam.Pipeline(options=pipeline_options) as p: >>>> >>>> p | 'Read From PubSubLite' >> ReadFromPubSubLite( >>>> >>>> subscription_path='projects/starwatch/locations/us-west1-a/subscriptions/sw-test-sky-events-timescale' >>>> ) | 'JSONParse' >> beam.Map(lambda x: json.loads(x) >>>> ) | 'Extract JSON Columns' >> beam.Map(extract_json_columns >>>> ) | 'To string' >> beam.ToString.Element() | beam.Map(print >>>> ) | 'Writing to DB' >> relational_db.Write( >>>> source_config=source_config, >>>> table_config=table_config >>>> ) >>>> >>> >>> On Fri, Aug 5, 2022 at 1:11 AM Austin Bennett < >>> [email protected]> wrote: >>> >>>> @cham thanks for bringing the conversation back to the list ( esp. for >>>> anyone else searching/wondering in the future )! >>>> >>>> From what I understand/summary: Python should be able to call via >>>> X-Lang the [ Java ] PubSubLite IO for use with any underlying runner ( >>>> well, that utilizes portable runner, ex: Spark, Flink, DataflowV2, etc ) >>>> >>>> >>>> >>>> On Thu, Aug 4, 2022 at 5:49 PM Chamikara Jayalath via user < >>>> [email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 5:29 PM Daniel Collins <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello Drew, >>>>>> >>>>>> > I upgraded to apache-beam 2.40.0 and tried to access >>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite >>>>>> >>>>>> You should ensure to import `apache_beam.io.gcp.pubsublite.*`. I have >>>>>> no idea why the specific import isn't working- but that should work. If >>>>>> its not, I'll look into it more. >>>>>> >>>>>> > writing native Spark code to pull from PubSub Lite >>>>>> >>>>>> Note that we have a spark native source you can use. I'm unsure if >>>>>> spark works with beam python however, Chamikara would know that better. >>>>>> https://github.com/googleapis/java-pubsublite-spark >>>>>> >>>>> >>>>> It should be supported. See instructions here under "Portable >>>>> (Java/Python/Go)": >>>>> https://beam.apache.org/documentation/runners/spark/ >>>>> >>>>> >>>>>> >>>>>> >>>>>> -Daniel >>>>>> >>>>>> On Thu, Aug 4, 2022 at 7:48 PM Drew Forbes < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I've actually not used PyBeam, I just meant writing Beam code with >>>>>>> Python. Didn't realize there was a whole separate PyBeam package. >>>>>>> >>>>>> >>>>> Thanks for clarifying. >>>>> >>>>> Thanks, >>>>> Cham >>>>> >>>>> >>>>>> >>>>>>> I feel dumb asking, but basically we just couldn't get the import to >>>>>>> work. I upgraded to apache-beam 2.40.0 and tried to access >>>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite through various >>>>>>> means (regular import, proto_api, something like .external., etc) within >>>>>>> Python and determined that there just wasn't anything to access. We >>>>>>> could >>>>>>> definitely have been wrong about that but it wasn't clear how to move >>>>>>> forward so we just switched our focus to writing native Spark code to >>>>>>> pull >>>>>>> from PubSub Lite >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 6:46 PM Chamikara Jayalath < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I believe this should be fully working. I'm not familiar with >>>>>>>> PyBeam though. Is the execution mechanism the same as running a regular >>>>>>>> Beam pipeline ? Also, note that for multi-language, you need to use a >>>>>>>> portable Beam runner. >>>>>>>> >>>>>>>> +Daniel Collins <[email protected]> who implemented this. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Cham >>>>>>>> >>>>>>>> On Thu, Aug 4, 2022 at 11:24 AM Austin Bennett < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Users/Devs, >>>>>>>>> >>>>>>>>> Drew, copied, reported having troubles with PubSub Lite: >>>>>>>>> >>>>>>>>> "we just weren’t able to get PubSub Lite working with PyBeam. It’s >>>>>>>>> been a few weeks since we last tried, but we were just trying to use >>>>>>>>> `apache_beam.io.gcp.pubsublite.ReadFromPubSubLite` (here >>>>>>>>> <https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsublite.html> >>>>>>>>> ) in PyBeam and couldn’t get it to import so we just gave up. From the >>>>>>>>> looks of the repo we couldn’t tell if it was ever actually fully >>>>>>>>> implemented and published" >>>>>>>>> >>>>>>>>> I haven't used myself, and figured others might be able to >>>>>>>>> comment/share at least if any have had success using and/or at least >>>>>>>>> whether fully tested/implemented IO ( whether available via >>>>>>>>> cross-language >>>>>>>>> or 'native' python ). >>>>>>>>> >>>>>>>>> Please share any thoughts here. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Austin >>>>>>>>> >>>>>>>>>
