Hello Beamer:
We have been aware of some of the shortcoming with reporting, so we actually
make that part of our Batch Code includes Beam Counters on what we want to
track (standardize “elements” and “errors” as the main ones ) and then with the
outcome of
result = p.run()
we call the following function that allows us to store a JSON representation of
the counters for Direct Runner, Flink Runner and Dataflow Runner in a somewhat
standardized way. Notice we added support to do a POST to and endpoint and
return the json to print on stdout if wanted to.
# =============================================================================
#
# Report Metrics in a JSON Payload to a local path or remote post
#
# =============================================================================
def reportMetrics(result,
reportTarget=None,
counterNames=["errors", "elements"]):
import re
import json
import requests
#Provide Json structure
def toJsonEntry(metric):
step = metric.key.step
#Check to see if it has Flink Runner naming
m = re.match("ref_AppliedPTransform_(.+)_\d+$", step)
if not m is None:
step = m.group(1).replace("-", " ")
#Standarized with lower case step name
step = step.lower().strip()
return {
"kind": "counter",
"step": step,
"name": metric.key.metric.name,
"value": metric.committed
}
counters = result.metrics().query(beam.metrics.MetricsFilter())['counters']
metrics = [
toJsonEntry(counter)
for counter in counters
if counter.key.metric.name in counterNames
]
#Report to external resource
if not reportTarget is None:
payload = json.dumps(metrics)
#If HTTP(S) then POST
if reportTarget.startswith("http"):
requests.post(reportTarget, data=payload)
else:
with open(reportTarget, "w") as fd:
fd.write(payload)
return metrics
It sure would be nice to have a standardized way to get it OOTB
INTERNAL USE
From: LDesire <[email protected]>
Sent: Friday, March 15, 2024 5:30 AM
To: [email protected]
Subject: [EXTERNAL] [QUESTION] about Metric REST spec
Hello Apache Beam community.
I'm reading the programming-guide in the official documentation, and I'm seeing
<https://urldefense.com/v3/__https:/beam.apache.org/documentation/programming-guide/*export-metrics__;Iw!!M-nmYVHPHQ!MncdL4elY0X2XfTO98cJQMDPrQzA0zt_Tv2omZA6G5jL6DItbagTfhJLY8AvnEBNciHsxBHiIqPlBTBZ3bTuMHnXJJO-x2MP68A$>
<REDACTED>
There it states the following
"""
As for now only the REST HTTP and the Graphite sinks are supported and only
Flink and Spark runners support metrics export.
"""
Is there any specification for this REST?
Thank you.
________________________________
The information in this Internet Email is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this Email by
anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. When addressed to our
clients any opinions or advice contained in this Email are subject to the terms
and conditions expressed in any applicable governing The Home Depot terms of
business or client engagement letter. The Home Depot disclaims all
responsibility and liability for the accuracy and content of this attachment
and for any damages or losses arising from any inaccuracies, errors, viruses,
e.g., worms, trojan horses, etc., or other items of a destructive nature, which
may be contained in this attachment and shall not be liable for direct,
indirect, consequential or special damages in connection with this e-mail
message or its attachment.