Do you see any errors in Dataflow Cloud Console logs ?
Note that you have to click the Worker Logs tab and select all "log names"
under Dataflow to see all logs.

I suspect your job might not be starting up properly but hard to say
without looking at details.

On Thu, Aug 25, 2022 at 5:36 PM Drew Forbes <[email protected]>
wrote:

> Yeah I'm not sure, I've tried a couple different ways to run it and still
> no luck. I'm able to read from that subscription with a Java beam job
> (which is likely what you were seeing in the internal metrics) but I just
> can't get the python job to do anything.
>
> Unfortunately my Java skills have atrophied since college, and I'm having
> a lot of trouble developing within the Beam paradigm in Java, so we're
> going to try moving in a different direction for now.
>
> On Wed, Aug 24, 2022 at 11:53 AM Daniel Collins <[email protected]>
> wrote:
>
>> Hello Drew,
>>
>> The object type is the SequencedMessage type here
>> https://github.com/googleapis/python-pubsublite/blob/b77cf6ddeaae4e950ed069b652a22a1fc79f74ea/google/cloud/pubsublite_v1/types/common.py#L109,
>> so the correct lambda is likely `lambda x: json.loads(x.message.data)`
>>
>> It does appear from internal metrics that your client is reading data
>> from that subscription. If your json.loads call fails, I'm unsure why this
>> wouldn't surface as an error in the runtime.
>>
>> -Daniel
>>
>> On Tuesday, August 23, 2022 at 2:52:54 PM UTC-4 Drew Forbes wrote:
>>
>>> Hey all, thank you for these updates. We ran into some other issues with
>>> our Spark - PubSubLite pipeline so I've had time to re-evaluate Beam with
>>> PubSubLite. I was able to get the package importing correctly and the job
>>> to build using "from apache_beam.io.gcp.pubsublite import *". Not sure why
>>> that worked when the other permutations on that failed, but I'll take it.
>>>
>>> At risk of turning this into a troubleshooting thread (please feel free
>>> to turn me away if that's not something y'all have interest in), I'm going
>>> to ask if you've got any ideas on why this pipeline isn't actually reading
>>> from PSL. I'm submitting jobs either to DirectRunner or DataflowRunner and
>>> they are spinning up and staying up without error, but they're not actually
>>> reading anything from the PSL subscription and thus not doing any work.
>>> I've got several Java Beam jobs that read correctly from PSL subscriptions
>>> as well as a test Python Beam job that can read from regular PubSub, but
>>> I'm not sure what's happening here. There aren't any logs to go off of
>>> either.
>>>
>>> Below is the pipeline code, I actually expect this to fail since I'm not
>>> sure how to do the parsing, this is the same code I used with
>>> ReadFromPubSub except for with Lite. But I'm not even getting errors yet
>>> because it's not reading from the Subscription. Any rough ideas I can try?
>>>
>>>     with beam.Pipeline(options=pipeline_options) as p:
>>>>
>>>>         p | 'Read From PubSubLite' >> ReadFromPubSubLite(
>>>>
>>>> subscription_path='projects/starwatch/locations/us-west1-a/subscriptions/sw-test-sky-events-timescale'
>>>>         ) | 'JSONParse' >> beam.Map(lambda x: json.loads(x)
>>>>         ) | 'Extract JSON Columns' >> beam.Map(extract_json_columns
>>>>         ) | 'To string' >> beam.ToString.Element() | beam.Map(print
>>>>         ) | 'Writing to DB' >> relational_db.Write(
>>>>             source_config=source_config,
>>>>             table_config=table_config
>>>>         )
>>>>
>>>
>>> On Fri, Aug 5, 2022 at 1:11 AM Austin Bennett <
>>> [email protected]> wrote:
>>>
>>>> @cham thanks for bringing the conversation back to the list ( esp. for
>>>> anyone else searching/wondering in the future )!
>>>>
>>>> From what I understand/summary:  Python should be able to call via
>>>> X-Lang the [ Java ] PubSubLite IO for use with any underlying runner (
>>>> well, that utilizes portable runner, ex: Spark, Flink, DataflowV2, etc  )
>>>>
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 5:49 PM Chamikara Jayalath via user <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 5:29 PM Daniel Collins <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Drew,
>>>>>>
>>>>>> > I upgraded to apache-beam 2.40.0 and tried to access
>>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite
>>>>>>
>>>>>> You should ensure to import `apache_beam.io.gcp.pubsublite.*`. I have
>>>>>> no idea why the specific import isn't working- but that should work. If
>>>>>> its not, I'll look into it more.
>>>>>>
>>>>>> > writing native Spark code to pull from PubSub Lite
>>>>>>
>>>>>> Note that we have a spark native source you can use. I'm unsure if
>>>>>> spark works with beam python however, Chamikara would know that better.
>>>>>> https://github.com/googleapis/java-pubsublite-spark
>>>>>>
>>>>>
>>>>> It should be supported. See instructions here under "Portable
>>>>> (Java/Python/Go)":
>>>>> https://beam.apache.org/documentation/runners/spark/
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 7:48 PM Drew Forbes <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I've actually not used PyBeam, I just meant writing Beam code with
>>>>>>> Python. Didn't realize there was a whole separate PyBeam package.
>>>>>>>
>>>>>>
>>>>> Thanks for clarifying.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>>
>>>>>>
>>>>>>> I feel dumb asking, but basically we just couldn't get the import to
>>>>>>> work. I upgraded to apache-beam 2.40.0 and tried to access
>>>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite through various
>>>>>>> means (regular import, proto_api, something like .external., etc) within
>>>>>>> Python and determined that there just wasn't anything to access. We 
>>>>>>> could
>>>>>>> definitely have been wrong about that but it wasn't clear how to move
>>>>>>> forward so we just switched our focus to writing native Spark code to 
>>>>>>> pull
>>>>>>> from PubSub Lite
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 6:46 PM Chamikara Jayalath <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I believe this should be fully working. I'm not familiar with
>>>>>>>> PyBeam though. Is the execution mechanism the same as running a regular
>>>>>>>> Beam pipeline ? Also, note that for multi-language, you need to use a
>>>>>>>> portable Beam runner.
>>>>>>>>
>>>>>>>> +Daniel Collins <[email protected]> who implemented this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Cham
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 11:24 AM Austin Bennett <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Users/Devs,
>>>>>>>>>
>>>>>>>>> Drew, copied, reported having troubles with PubSub Lite:
>>>>>>>>>
>>>>>>>>> "we just weren’t able to get PubSub Lite working with PyBeam. It’s
>>>>>>>>> been a few weeks since we last tried, but we were just trying to use
>>>>>>>>> `apache_beam.io.gcp.pubsublite.ReadFromPubSubLite` (here
>>>>>>>>> <https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsublite.html>
>>>>>>>>> ) in PyBeam and couldn’t get it to import so we just gave up. From the
>>>>>>>>> looks of the repo we couldn’t tell if it was ever actually fully
>>>>>>>>> implemented and published"
>>>>>>>>>
>>>>>>>>> I haven't used myself, and figured others might be able to
>>>>>>>>> comment/share at least if any have had success using and/or at least
>>>>>>>>> whether fully tested/implemented IO ( whether available via 
>>>>>>>>> cross-language
>>>>>>>>> or 'native' python ).
>>>>>>>>>
>>>>>>>>> Please share any thoughts here.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Austin
>>>>>>>>>
>>>>>>>>>

Reply via email to