How about writing out a lineage record to a different topic from every
samza job? Throw a little meta data along with the data such that every
job that touches a piece of data also writes out read/wrote records to a
separate tracking topic.
A read/wrote record would look something like:
{orig_id,
Tim,
I've done something similar where I can bury control messages in my
regular data streams (with some sillyness of putting a message out per
partition), it gives me eventual consistency for minor logic updates like a
config change. For the larger non-compatable stuff I spin a new job up
concu
Yi,
What you just summarized makes a whole lot more sense to me. Shamelessly
I am looking at this shift as a customer with a production workflow riding
on it so I am looking for some kind of consistency into the future of
Samza. This makes me feel a lot better about it.
Thank you!
On Sun, Ju
Hey all, just want to chime in before it too late. Been following samza
for a long time, and using it in production for the past 6 months or so.
In no particular order the things I like most about Samza are:
- Yarn support, resiliency of my deployment is paramount. This is why I
use Samza ove
It must have been junk data. I started using a new topic for everything
(metrics/source/dest) and data is flowing fine now. When I switch it back
to the other topics I get the hung behavior with nothing erroring out in
the logs.
On Wed, Jun 3, 2015 at 10:10 AM, Garrett Barton
wrote:
> Tha
If you can check
> > the Samza log and show us, we will be more helpful. Also, maybe pasting
> the
> > config here (if you dont mind), we can help to see if you miss something.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> &g
Greetings all,
I am trying to translate an existing workflow from MR into Samza. Thus far
everything is coded and kinks with deploying have been worked out. My task
deploys into yarn (2.6.0), consumes records from Kafka (0.8.2.1) fine, but
no data from metrics and my output streams are showing