Do you see anything under the "kubelet" log ? For example, this might show
errors related to container startup. (you have to specifically select it to
see the logs).

If you are actually not seeing any errors, you might have to go
through Google Cloud support so that they can look at your specific
Dataflow job.

On Thu, Aug 25, 2022 at 5:57 PM Drew Forbes <[email protected]>
wrote:

> Attached screenshot is all I get from the Worker Logs on debug. The job
> logs all look pretty normal, it's even got a "Autoscaling: Reduced the
> number of workers to 0 based on low average worker CPU utilization, and the
> pipeline having sufficiently low backlog and keeping up with input rate.",
> which seems to me like it's just not really reading things correctly
>
> On Thu, Aug 25, 2022 at 8:44 PM Chamikara Jayalath <[email protected]>
> wrote:
>
>> Do you see any errors in Dataflow Cloud Console logs ?
>> Note that you have to click the Worker Logs tab and select all "log
>> names" under Dataflow to see all logs.
>>
>> I suspect your job might not be starting up properly but hard to say
>> without looking at details.
>>
>> On Thu, Aug 25, 2022 at 5:36 PM Drew Forbes <
>> [email protected]> wrote:
>>
>>> Yeah I'm not sure, I've tried a couple different ways to run it and
>>> still no luck. I'm able to read from that subscription with a Java beam job
>>> (which is likely what you were seeing in the internal metrics) but I just
>>> can't get the python job to do anything.
>>>
>>> Unfortunately my Java skills have atrophied since college, and I'm
>>> having a lot of trouble developing within the Beam paradigm in Java, so
>>> we're going to try moving in a different direction for now.
>>>
>>> On Wed, Aug 24, 2022 at 11:53 AM Daniel Collins <[email protected]>
>>> wrote:
>>>
>>>> Hello Drew,
>>>>
>>>> The object type is the SequencedMessage type here
>>>> https://github.com/googleapis/python-pubsublite/blob/b77cf6ddeaae4e950ed069b652a22a1fc79f74ea/google/cloud/pubsublite_v1/types/common.py#L109,
>>>> so the correct lambda is likely `lambda x: json.loads(x.message.data)`
>>>>
>>>> It does appear from internal metrics that your client is reading data
>>>> from that subscription. If your json.loads call fails, I'm unsure why this
>>>> wouldn't surface as an error in the runtime.
>>>>
>>>> -Daniel
>>>>
>>>> On Tuesday, August 23, 2022 at 2:52:54 PM UTC-4 Drew Forbes wrote:
>>>>
>>>>> Hey all, thank you for these updates. We ran into some other issues
>>>>> with our Spark - PubSubLite pipeline so I've had time to re-evaluate Beam
>>>>> with PubSubLite. I was able to get the package importing correctly and the
>>>>> job to build using "from apache_beam.io.gcp.pubsublite import *". Not sure
>>>>> why that worked when the other permutations on that failed, but I'll take
>>>>> it.
>>>>>
>>>>> At risk of turning this into a troubleshooting thread (please feel
>>>>> free to turn me away if that's not something y'all have interest in), I'm
>>>>> going to ask if you've got any ideas on why this pipeline isn't actually
>>>>> reading from PSL. I'm submitting jobs either to DirectRunner or
>>>>> DataflowRunner and they are spinning up and staying up without error, but
>>>>> they're not actually reading anything from the PSL subscription and thus
>>>>> not doing any work. I've got several Java Beam jobs that read correctly
>>>>> from PSL subscriptions as well as a test Python Beam job that can read 
>>>>> from
>>>>> regular PubSub, but I'm not sure what's happening here. There aren't any
>>>>> logs to go off of either.
>>>>>
>>>>> Below is the pipeline code, I actually expect this to fail since I'm
>>>>> not sure how to do the parsing, this is the same code I used with
>>>>> ReadFromPubSub except for with Lite. But I'm not even getting errors yet
>>>>> because it's not reading from the Subscription. Any rough ideas I can try?
>>>>>
>>>>>     with beam.Pipeline(options=pipeline_options) as p:
>>>>>>
>>>>>>         p | 'Read From PubSubLite' >> ReadFromPubSubLite(
>>>>>>
>>>>>> subscription_path='projects/starwatch/locations/us-west1-a/subscriptions/sw-test-sky-events-timescale'
>>>>>>         ) | 'JSONParse' >> beam.Map(lambda x: json.loads(x)
>>>>>>         ) | 'Extract JSON Columns' >> beam.Map(extract_json_columns
>>>>>>         ) | 'To string' >> beam.ToString.Element() | beam.Map(print
>>>>>>         ) | 'Writing to DB' >> relational_db.Write(
>>>>>>             source_config=source_config,
>>>>>>             table_config=table_config
>>>>>>         )
>>>>>>
>>>>>
>>>>> On Fri, Aug 5, 2022 at 1:11 AM Austin Bennett <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> @cham thanks for bringing the conversation back to the list ( esp.
>>>>>> for anyone else searching/wondering in the future )!
>>>>>>
>>>>>> From what I understand/summary:  Python should be able to call via
>>>>>> X-Lang the [ Java ] PubSubLite IO for use with any underlying runner (
>>>>>> well, that utilizes portable runner, ex: Spark, Flink, DataflowV2, etc  )
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 5:49 PM Chamikara Jayalath via user <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 5:29 PM Daniel Collins <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Drew,
>>>>>>>>
>>>>>>>> > I upgraded to apache-beam 2.40.0 and tried to access
>>>>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite
>>>>>>>>
>>>>>>>> You should ensure to import `apache_beam.io.gcp.pubsublite.*`. I
>>>>>>>> have no idea why the specific import isn't working- but that should 
>>>>>>>> work.
>>>>>>>> If its not, I'll look into it more.
>>>>>>>>
>>>>>>>> > writing native Spark code to pull from PubSub Lite
>>>>>>>>
>>>>>>>> Note that we have a spark native source you can use. I'm unsure if
>>>>>>>> spark works with beam python however, Chamikara would know that better.
>>>>>>>> https://github.com/googleapis/java-pubsublite-spark
>>>>>>>>
>>>>>>>
>>>>>>> It should be supported. See instructions here under "Portable
>>>>>>> (Java/Python/Go)":
>>>>>>> https://beam.apache.org/documentation/runners/spark/
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 7:48 PM Drew Forbes <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I've actually not used PyBeam, I just meant writing Beam code with
>>>>>>>>> Python. Didn't realize there was a whole separate PyBeam package.
>>>>>>>>>
>>>>>>>>
>>>>>>> Thanks for clarifying.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Cham
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> I feel dumb asking, but basically we just couldn't get the import
>>>>>>>>> to work. I upgraded to apache-beam 2.40.0 and tried to access
>>>>>>>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite through various
>>>>>>>>> means (regular import, proto_api, something like .external., etc) 
>>>>>>>>> within
>>>>>>>>> Python and determined that there just wasn't anything to access. We 
>>>>>>>>> could
>>>>>>>>> definitely have been wrong about that but it wasn't clear how to move
>>>>>>>>> forward so we just switched our focus to writing native Spark code to 
>>>>>>>>> pull
>>>>>>>>> from PubSub Lite
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 6:46 PM Chamikara Jayalath <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I believe this should be fully working. I'm not familiar with
>>>>>>>>>> PyBeam though. Is the execution mechanism the same as running a 
>>>>>>>>>> regular
>>>>>>>>>> Beam pipeline ? Also, note that for multi-language, you need to use a
>>>>>>>>>> portable Beam runner.
>>>>>>>>>>
>>>>>>>>>> +Daniel Collins <[email protected]> who implemented this.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Cham
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 4, 2022 at 11:24 AM Austin Bennett <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Users/Devs,
>>>>>>>>>>>
>>>>>>>>>>> Drew, copied, reported having troubles with PubSub Lite:
>>>>>>>>>>>
>>>>>>>>>>> "we just weren’t able to get PubSub Lite working with PyBeam.
>>>>>>>>>>> It’s been a few weeks since we last tried, but we were just trying 
>>>>>>>>>>> to use
>>>>>>>>>>> `apache_beam.io.gcp.pubsublite.ReadFromPubSubLite` (here
>>>>>>>>>>> <https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsublite.html>
>>>>>>>>>>> ) in PyBeam and couldn’t get it to import so we just gave up. From 
>>>>>>>>>>> the
>>>>>>>>>>> looks of the repo we couldn’t tell if it was ever actually fully
>>>>>>>>>>> implemented and published"
>>>>>>>>>>>
>>>>>>>>>>> I haven't used myself, and figured others might be able to
>>>>>>>>>>> comment/share at least if any have had success using and/or at least
>>>>>>>>>>> whether fully tested/implemented IO ( whether available via 
>>>>>>>>>>> cross-language
>>>>>>>>>>> or 'native' python ).
>>>>>>>>>>>
>>>>>>>>>>> Please share any thoughts here.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Austin
>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to