Avro is basically used by most Big Data streaming systems: Kafka,
Pulsar, Flink, Beam, etc, so most companies use it. The reason why
Avro is used is not only structuring data with Schemas and the compact
binary representation, but people also use Avro because its support of
versioning/evolution.

You can find more details about this on Confluent's Avro documentation:
https://docs.confluent.io/platform/current/schema-registry/avro.html

A good reference  and comparison between formats and the advantages of
Avro that you can refer on your paper is on Martin Kleppmann book:
https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/ch04.html

On Wed, Jan 27, 2021 at 10:27 AM Lee Hambley <[email protected]> wrote:
>>
>> Thanks for the extensive response! I think a lot of what you are saying
>> is very spot on!
>
>
> I hope it's useful, if and when your paper is published, I'd love to give it 
> a read.
>
>>
>> I'm working on a paper surveying ~13 binary schema-less and
>> schema-driven serialization formats (including Avro) that can handle any
>> data structure that JSON can represent. Therefore, I was particularly
>> interested on why you wanted to convert JSON Schema to Avro IDL.
>
>
> So GraphQL's IDL (interface definition language) isn't quite a JSON Schema, 
> but the responses are often represented as JSON, but it's not JSON Schema 
> per-se.
>
> For use the use-case is very different, even if Avro and GraphQL's IDLs could 
> _almost_ be losslessly interchanged at some level, they both have a decent 
> type system, they both allow definition of RPC services, one is a great 
> candidate for public APIs (GraphQL has "directives", and really nice 
> annotation and documentation generation tools), and Avro is ideal for our 
> internal APIs. An Avro payload for us runs ~12-30 bytes, where JSON would be 
> at least 2-3x the size (we send a lot of very small messages, very similar 
> ones, so JSON reserializing the keys every time would kill us). So Avro gives 
> us something nothing JSON oriented can. Also, we use Avro as our archival 
> format using the 
> https://avro.apache.org/docs/current/spec.html#Object+Container+Files which I 
> believe is also sort-of unique.
>
>>
>> Is JSON Schema the ubiquitous "contract" language that you are using in
>> your company, so you want to keep it as the source of truth while also
>> being able to work with Avro?
>
>
> It's just public vs. private (or, internal) APIs, and being deliberate about 
> storing those IDL files in separate repositories and training teams to get 
> into the habit of planning and co-designing changes to these ubiquitous 
> contracts before they need to do implementation work, since changing the 
> contract affects everyone. (be that some "near realtime" RPC service that is 
> in the hot path of customer requests  on the web API, or whether that's 
> offline processing by our BI teams who are running reports based on the 
> archived data from the datawarehouse)
>
> The company just went through explosive growth, so, whilst we 
> adopted/inherited Avro as part of adopting JVM/Akka stuff for some parts of 
> the infra, the pivot to nominate these IDLs as the point at which teams have 
> to synchronize and coordinate is still something we are building out.
>
>>
>> On Tue, Jan 26, 2021 at 04:38:18PM +0100, Lee Hambley wrote:
>> > I would say that in general, being around the industry for 15 or so years
>> > now, that there has been a definite uptake in these binary protocols.
>> >
>> > If I had to speculate, I'd that that outside a few niches, the ASN.1 and
>> > similar protocols never *really* took-off outside telecoms, which is
>> > regrettable because they are really fantastic protocols (they are used
>> > extensively in certificates, DER/PEM are in the ASN.1 family of things, SSL
>> > certs are all ASN.1 encoded, usually, etc.)
>> >
>> > These days seems like everyone has some "big data" pipe, and having
>> > Hadoop/Spark/etc has become the must-have thing in most SMEs, so you
>> > inherit some of these things by "accident".
>> >
>> > I personally come from the event-sourcing, CQRS, domain-driven-design
>> > circles, here having a ubiquitous language "contract", preferably a
>> > bullet-proof one with good change management tooling is something that you
>> > explicitly go looking for. In that sphere you come across msgpack,
>> > capnproto, protobufs, thrift, etc which all offer insane performance, very
>> > compact payloads, but Avro is unique in offering something like a schema
>> > registry and concrete guarantees about rolling coordinating deploys with
>> > between producers and consumers (note: I _think_ protobufs got something
>> > like a schema registry now, but I never used it)
>> >
>> > Another increasingly good option for this in the "SDL" (schema definition
>> > language) spec space is GraphQL which isn't a _binary_ packing format, but
>> > does offer a standalone schema definition language for defining service
>> > contracts. Whilst Avro does account for RPC protocols
>> > <https://avro.apache.org/docs/current/spec.html#Protocol+Declaration>, I
>> > haven't really seen that used so much in the wild, but maybe that's just my
>> > "bubble" speaking. GraphQL doesn't *really* have the schema migration tools
>> > that Avro has, but at least when dealing with GraphQL payloads, most
>> > language implementations give you the underlying syntax tree for the
>> > payload, so it's a bit easier to see what clients are requesting and what
>> > fields need various levels of scrutiny before being changed.
>> >
>> > Anyway, probably nothing of this is really interesting to your paper, but I
>> > never miss a good opportunity to share unsolicited opinions :D
>> >
>> > Lee Hambley
>> > http://lee.hambley.name/
>> > +49 (0) 170 298 5667
>> >
>> >
>> > On Tue, 26 Jan 2021 at 16:27, Juan Cruz Viotti <[email protected]> wrote:
>> >
>> > > > I don't mean to make light of your question, just to point out that I
>> > > > don't think many companies are proudly announcing to the world that
>> > > > they use Avro... why would they?
>> > >
>> > > Indeed, I totally agree. I'm writing a research paper involving Apache
>> > > Avro and just wanted to enrich the historical sections a bit with some
>> > > industry usage information!
>> > >
>> > > On Mon, Jan 25, 2021 at 10:40:31PM +0100, Lee Hambley wrote:
>> > > > I work for two companies using Avro (contractor, I won't name them) 
>> > > > but I
>> > > > don't know what good it serves anyone knowing that we use them. Would 
>> > > > you
>> > > > ask the same question about JSON, or XML, or whether we use nginx or
>> > > > apache?
>> > > >
>> > > > Avro is one of about 5 components in the distributed messaging
>> > > > architectures, and aside that is is very nicely designed (I believe the
>> > > > schema versioning and rigorously documented canonical forms are an 
>> > > > almost
>> > > > unique point of attraction)
>> > > >
>> > > > I don't mean to make light of your question, just to point out that I
>> > > don't
>> > > > think many companies are proudly announcing to the world that they use
>> > > > Avro... why would they?
>> > > >
>> > > > Lee Hambley
>> > > > http://lee.hambley.name/
>> > > > +49 (0) 170 298 5667
>> > > >
>> > > >
>> > > > On Mon, 25 Jan 2021 at 22:30, M. Manna <[email protected]> wrote:
>> > > >
>> > > > >
>> > > > > I believe Confluent and Imply are the two companies I know of.
>> > > > >
>> > > > >
>> > > > > On Mon, 25 Jan 2021 at 20:28, Juan Cruz Viotti <[email protected]> 
>> > > > > wrote:
>> > > > >
>> > > > >> Hey there!
>> > > > >>
>> > > > >> Do you know where can I find a list of relatively well-known 
>> > > > >> companies
>> > > > >> that make use of Apache Avro? I'm trying to collect a small list for
>> > > > >> research purposes and my search is not yielding many results apart
>> > > from
>> > > > >> Facebook.
>> > > > >>
>> > > > >> Thanks in advance,
>> > > > >>
>> > > > >> --
>> > > > >> Juan Cruz Viotti
>> > > > >> Software Engineer
>> > > > >> https://www.jviotti.com
>> > > > >>
>> > > > >
>> > >
>> > > --
>> > > Juan Cruz Viotti
>> > > Software Engineer
>> > > https://www.jviotti.com
>> > >
>>
>> --
>> Juan Cruz Viotti
>> Software Engineer
>> https://www.jviotti.com

Reply via email to