Re: Thinking about message formats

2014-02-13 Thread Steve Yates
It's not an easy problem to solve however we chose protocol buffers in the past as schema evolution and maintaning compatibility with older messages was supported as long as you appended the additional attributes on the bottom of the schema def. -S Original message From: Mart

Re: Review Request 17875: SAMZA-30

2014-02-13 Thread Steve Yates
Thanks Jakob allow me to correct and merge master into my branch etc -S Original message From: Jakob Homan Date: To: samza ,Jakob Homan ,Steven Yates Subject: Re: Review Request 17875: SAMZA-30 This is an automatically generated e-mail. To reply, visit: https://review

Re: Review Request 17875: SAMZA-30

2014-02-13 Thread Jakob Homan
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17875/#review34384 --- samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProx

Re: Thinking about message formats

2014-02-13 Thread Martin Kleppmann
Hi Garry, Yes, that issue often trips up people who are new to Avro. It's also Avro's main difference to Thrift and Protocol Buffers, which use a different approach to schema evolution. This blog post may be helpful for understanding the different approaches: http://martin.kleppmann.com/2012/12

Re: Review Request 18088: SAMZA-146

2014-02-13 Thread Chris Riccomini
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18088/#review34385 --- Ship it! +1 Some white space nits, but other than that, looks great

Review Request 18088: SAMZA-146

2014-02-13 Thread Jakob Homan
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18088/ --- Review request for samza. Bugs: SAMZA-146 https://issues.apache.org/jira/br

Re: Thinking about message formats

2014-02-13 Thread Jakob Homan
Yeah, schema evolution for never-ending jobs has been a bit annoying so far. So long as the Samza job has the latest version of the schema for producing and the messages arrive as generic records with their schema intact, one can transcode the newly arrived messages to the latest schema. This is

RE: Thinking about message formats

2014-02-13 Thread Garry Turkington
Hi Martin, Thanks for the input, and don't worry, it did make sense. :) Though it did identify a hole in my understanding of Avro; I think I've been spoiled using container files that included the schema. I thought that a reader could process an Avro message with a previous version of the schem

Re: Thinking about message formats

2014-02-13 Thread Martin Kleppmann
Hi Garry, We use Avro a lot, and it works well with Samza. Schema evolution is very good thing to have in your toolbox. One thing to keep in mind with Avro: in order to parse a message, you need to know the exact schema with which the data was written. You may have multiple different producers

Thinking about message formats

2014-02-13 Thread Garry Turkington
Hi, I was thinking about how best to do testing on Samza jobs. The ability to replay streams appears to help a lot here as by pushing some data into the consumed streams then rewinding it is always possible to get the same data fed through the tasks. So that helps a lot in terms of dealing with

RE: Using bootstrap streams

2014-02-13 Thread Garry Turkington
Hi Chris, Woot! Yes, bootstrapped streams look like they are working properly with the SAMZA-145 patch. Thanks for the quick fix, appreciated. Setting the default offset to smallest at the system level is a bit of a kludge for me, I'll indeed need set up separate system definitions to have diff