Re: Radical idea for 1.9.0
Perhaps I don’t understand you. What is “a recursive record”? AFAIU real data can not be recursive. (I would love to be shown to be wrong about this.) Real data is very recursive, consider the canonical definitons for List/Tree, and anything with cons/cdr https://en.wikipedia.org/wiki/Cons JSON is a recursive record. At any point, a Json object contains a map from a string to a Json object. The recursion is there for most AST like descriptions. Denormalizing this data means that you'd insert parent/child pointers and emit multiple messages, but this kinda sucks, since you'd end up writing something like a WITH RECURSIVE SQL query or similar to be able to unmelt the data back into it's recursive form. Incedentally, with the stuff in AVRO-530, you can write a transform to take a true recursive type and turn it into a sorta-recursive Avro type. See my issue here: https://github.com/sksamuel/avro4s/issues/307 since generalized to arbitrary fixpoint types. Instead of straight indexing, to index lower than the toplevel the thing to use would be a F-Algebra/visitor pattern, as well as binary serialization. Most data that's not recursive would simply apply the F-Algebra once, but in the case of recursion, you'd apply it multiple times to generate the binary serialization, and annotate with something like a coelgot algebra to give you a toplevel schema. Between AVRO-530 + AVRO-248, the sketch of what to do is already there. I'm confident I could do this in Haskell/Scala, but I don't know Java so can't contribute to Avro. -- Sent from: http://apache-avro.679487.n3.nabble.com/Avro-Developers-f679485.html
Re: Radical idea for 1.9.0
(Thanks for the [Discuss] tip.) Recursion is primarily a code-maintenance problem. I'm not sure how to quantify the complexity, but certainly Schema.java itself has a lot of logic in it to deal with recursion, as do all the "Grammar Generator" classes plus Generic Data -- the classes that are performance sensitive. I don't think that recursion is inherently expensive performance-wise, but by making the encoding/resolution/decoding logic unnecessarily complicated, it makes it difficult to implement more aggressive strategies for higher-performance (e.g., dynamic code generation). On a similar note, the complexity of recursion could make it hard to add new features to Avro. I've yet to see compelling uses of recursion surface in this thread. Perhaps we deprecate recursion in 1.9, with the goal of eliminating it in 1.10? (Specifically, we write error messages to stderr when we parse recursive types -- with a flag to silence those message in case they get in someone's way.) If this deprecation creates howls of complaint because the feature is more useful than this thread seems to suggest, then we can keep it in. On Tue, Dec 4, 2018 at 9:25 AM Sean Busbey wrote: > > In the future please use "[DISCUSS]" at the start of your subject line > for these kinds of proposals. that'll get more folks to see the > discussion, e.g. when they filter this list. > > as a point of clarification, 1.9.0 is a major version change for the > Avro project. the "1" is a file format version. that's why API > incompatibilities are allowed in a new 1.y version. As Doug mentioned, > "would not be able to read binary files" means you're talking about a > file format incompatibility, which I don't think we should do. > > Are you interested in removing this feature to improve code > maintenance or to improve performance? both? > > Can you quantify the amount of complexity you're referring to? > On Fri, Nov 30, 2018 at 12:22 PM Raymie Stata wrote: > > > > I understand we've been willing to introduce backward-incompatible API > > changes (not file-format changes) into minor release versions. If so, > > here's an idea for consideration: > > > > Let's eliminate recursive records from Avro 1.9.x. Recursion > > introduces a _lot_ of complexity into many parts of the Avro code > > base. We could vastly simplify the code base, and probably speed > > things up, by getting rid of this feature. > > > > The specific proposal would be that Avro 1.9.x would refuse to accept > > recursive records, and thus would not be able to read binary files > > written by older versions of Avro. I haven't heard of anyone actually > > using them, so I don't think this would be a problem. > > > > -- > busbey
Re: Radical idea for 1.9.0
In the future please use "[DISCUSS]" at the start of your subject line for these kinds of proposals. that'll get more folks to see the discussion, e.g. when they filter this list. as a point of clarification, 1.9.0 is a major version change for the Avro project. the "1" is a file format version. that's why API incompatibilities are allowed in a new 1.y version. As Doug mentioned, "would not be able to read binary files" means you're talking about a file format incompatibility, which I don't think we should do. Are you interested in removing this feature to improve code maintenance or to improve performance? both? Can you quantify the amount of complexity you're referring to? On Fri, Nov 30, 2018 at 12:22 PM Raymie Stata wrote: > > I understand we've been willing to introduce backward-incompatible API > changes (not file-format changes) into minor release versions. If so, > here's an idea for consideration: > > Let's eliminate recursive records from Avro 1.9.x. Recursion > introduces a _lot_ of complexity into many parts of the Avro code > base. We could vastly simplify the code base, and probably speed > things up, by getting rid of this feature. > > The specific proposal would be that Avro 1.9.x would refuse to accept > recursive records, and thus would not be able to read binary files > written by older versions of Avro. I haven't heard of anyone actually > using them, so I don't think this would be a problem. -- busbey
Re: Radical idea for 1.9.0
> using Avro to store millions to billions of non-recursive records Perhaps I don’t understand you. What is “a recursive record”? AFAIU real data can not be recursive. (I would love to be shown to be wrong about this.) Given any real data, proportional coffee, sandwiches and time, I can write an avro schema for that data without using recursion. However, for some real data, writing such a schema would be onerous, repetitive and error prone. Updating that schema would be likewise a chore. So, do we want to make the avro codebase easier to support, or do we want to make (some not-insignificant fraction of) avro schema easier to write? I’m in the latter camp. -Michael Smith On Sat, Dec 1, 2018 at 06:29 Raymie Stata wrote: > (Keep the following in mind: perhaps 95%+ of Avro users do not depend > on recursion but don't understand the opportunity costs of maintaining > it (and thus won't speak up on this thread); the remaining 5% who > depend on recursion are highly motivated to speak out against it's > removal. In such a world, I'm speaking for 95% of the Avro community > :-) > > So far, the main justification of retaining recursion in Avro seems to > be that it allows Avro to be a binary representation of JSON using the > schema "share/schemas/org/apache/avro/data/Json.avsc". This > justification is a bit odd. The founding philosophy of Avro is that > data should have schemas (and those schemas should be able to evolve). > The Avro-as-binary-rep-for-JSON argument is really the following: > "recursion in Avro is good because it allows us to model JSON which > allows us to model data with no schema." Grumble. > > But let's leave aside philosophical arguments regarding static vs > dynamic typing. Let's consider communities. Json.avsc was committed > in 2011 and hasn't changed since. Clients who are using Avro as a > binary representation of JSON can continue to depend on the 1.8.x line > of Avro. If that community of users is big-enough/mission-critical > enough to maintain 1.8.x going forward, then all power to them. 1.8.x > can live forever. > > In the meantime, for the 95% of the Avro community that is using Avro > to store millions to billions of non-recursive records, and who are > tired with putting up with the (opportunity) costs of supporting > recursion that they never use, let's move on. > > On Fri, Nov 30, 2018 at 2:33 PM Ryan Blue > wrote: > > > > I've used recursion in the past to use Avro to get a binary > representation > > of JSON. Given the popularity of JSON and the fact that Avro includes > > support for converting it, I think it makes sense to continue allowing > > recursive schemas. > > > > On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith > > wrote: > > > > > I’m against this proposal. Sure, recursion adds complexity, but > recursive > > > types are also extremely powerful and one of the most interesting > features > > > of a tool like this. I have been experimenting in the other direction; > > > considering a way to compose avro schema descriptions in avro. > Recursion is > > > crucial for that, so that we can lay out possible types as a union of > named > > > types that include themselves. > > > > > > Granted, it’s an experiment that would also imply a compatibility break > > > with previous avros, but it opens what I think is an interesting set of > > > doors instead of closing them. > > > > > > My 2 cents. > > > > > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata wrote: > > > > > > > I understand we've been willing to introduce backward-incompatible > API > > > > changes (not file-format changes) into minor release versions. If > so, > > > > here's an idea for consideration: > > > > > > > > Let's eliminate recursive records from Avro 1.9.x. Recursion > > > > introduces a _lot_ of complexity into many parts of the Avro code > > > > base. We could vastly simplify the code base, and probably speed > > > > things up, by getting rid of this feature. > > > > > > > > The specific proposal would be that Avro 1.9.x would refuse to accept > > > > recursive records, and thus would not be able to read binary files > > > > written by older versions of Avro. I haven't heard of anyone > actually > > > > using them, so I don't think this would be a problem. > > > > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix >
Re: Radical idea for 1.9.0
(Keep the following in mind: perhaps 95%+ of Avro users do not depend on recursion but don't understand the opportunity costs of maintaining it (and thus won't speak up on this thread); the remaining 5% who depend on recursion are highly motivated to speak out against it's removal. In such a world, I'm speaking for 95% of the Avro community :-) So far, the main justification of retaining recursion in Avro seems to be that it allows Avro to be a binary representation of JSON using the schema "share/schemas/org/apache/avro/data/Json.avsc". This justification is a bit odd. The founding philosophy of Avro is that data should have schemas (and those schemas should be able to evolve). The Avro-as-binary-rep-for-JSON argument is really the following: "recursion in Avro is good because it allows us to model JSON which allows us to model data with no schema." Grumble. But let's leave aside philosophical arguments regarding static vs dynamic typing. Let's consider communities. Json.avsc was committed in 2011 and hasn't changed since. Clients who are using Avro as a binary representation of JSON can continue to depend on the 1.8.x line of Avro. If that community of users is big-enough/mission-critical enough to maintain 1.8.x going forward, then all power to them. 1.8.x can live forever. In the meantime, for the 95% of the Avro community that is using Avro to store millions to billions of non-recursive records, and who are tired with putting up with the (opportunity) costs of supporting recursion that they never use, let's move on. On Fri, Nov 30, 2018 at 2:33 PM Ryan Blue wrote: > > I've used recursion in the past to use Avro to get a binary representation > of JSON. Given the popularity of JSON and the fact that Avro includes > support for converting it, I think it makes sense to continue allowing > recursive schemas. > > On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith > wrote: > > > I’m against this proposal. Sure, recursion adds complexity, but recursive > > types are also extremely powerful and one of the most interesting features > > of a tool like this. I have been experimenting in the other direction; > > considering a way to compose avro schema descriptions in avro. Recursion is > > crucial for that, so that we can lay out possible types as a union of named > > types that include themselves. > > > > Granted, it’s an experiment that would also imply a compatibility break > > with previous avros, but it opens what I think is an interesting set of > > doors instead of closing them. > > > > My 2 cents. > > > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata wrote: > > > > > I understand we've been willing to introduce backward-incompatible API > > > changes (not file-format changes) into minor release versions. If so, > > > here's an idea for consideration: > > > > > > Let's eliminate recursive records from Avro 1.9.x. Recursion > > > introduces a _lot_ of complexity into many parts of the Avro code > > > base. We could vastly simplify the code base, and probably speed > > > things up, by getting rid of this feature. > > > > > > The specific proposal would be that Avro 1.9.x would refuse to accept > > > recursive records, and thus would not be able to read binary files > > > written by older versions of Avro. I haven't heard of anyone actually > > > using them, so I don't think this would be a problem. > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix
Re: Radical idea for 1.9.0
I think that recursive schemas are a powerful feature of Avro that enable the modelling of hierarchies, a pretty common data structure. l myself have used this feature in a large scale system before. The recursion is handled elegantly and naturally in the code with recursive functions so I don't think it adds much complexity. On Fri, 30 Nov 2018 at 22:12, Michael A. Smith wrote: > I’m against this proposal. Sure, recursion adds complexity, but recursive > types are also extremely powerful and one of the most interesting features > of a tool like this. I have been experimenting in the other direction; > considering a way to compose avro schema descriptions in avro. Recursion is > crucial for that, so that we can lay out possible types as a union of named > types that include themselves. > > Granted, it’s an experiment that would also imply a compatibility break > with previous avros, but it opens what I think is an interesting set of > doors instead of closing them. > > My 2 cents. > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata wrote: > > > I understand we've been willing to introduce backward-incompatible API > > changes (not file-format changes) into minor release versions. If so, > > here's an idea for consideration: > > > > Let's eliminate recursive records from Avro 1.9.x. Recursion > > introduces a _lot_ of complexity into many parts of the Avro code > > base. We could vastly simplify the code base, and probably speed > > things up, by getting rid of this feature. > > > > The specific proposal would be that Avro 1.9.x would refuse to accept > > recursive records, and thus would not be able to read binary files > > written by older versions of Avro. I haven't heard of anyone actually > > using them, so I don't think this would be a problem. > > >
Re: Radical idea for 1.9.0
I've used recursion in the past to use Avro to get a binary representation of JSON. Given the popularity of JSON and the fact that Avro includes support for converting it, I think it makes sense to continue allowing recursive schemas. On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith wrote: > I’m against this proposal. Sure, recursion adds complexity, but recursive > types are also extremely powerful and one of the most interesting features > of a tool like this. I have been experimenting in the other direction; > considering a way to compose avro schema descriptions in avro. Recursion is > crucial for that, so that we can lay out possible types as a union of named > types that include themselves. > > Granted, it’s an experiment that would also imply a compatibility break > with previous avros, but it opens what I think is an interesting set of > doors instead of closing them. > > My 2 cents. > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata wrote: > > > I understand we've been willing to introduce backward-incompatible API > > changes (not file-format changes) into minor release versions. If so, > > here's an idea for consideration: > > > > Let's eliminate recursive records from Avro 1.9.x. Recursion > > introduces a _lot_ of complexity into many parts of the Avro code > > base. We could vastly simplify the code base, and probably speed > > things up, by getting rid of this feature. > > > > The specific proposal would be that Avro 1.9.x would refuse to accept > > recursive records, and thus would not be able to read binary files > > written by older versions of Avro. I haven't heard of anyone actually > > using them, so I don't think this would be a problem. > > > -- Ryan Blue Software Engineer Netflix
Re: Radical idea for 1.9.0
I’m against this proposal. Sure, recursion adds complexity, but recursive types are also extremely powerful and one of the most interesting features of a tool like this. I have been experimenting in the other direction; considering a way to compose avro schema descriptions in avro. Recursion is crucial for that, so that we can lay out possible types as a union of named types that include themselves. Granted, it’s an experiment that would also imply a compatibility break with previous avros, but it opens what I think is an interesting set of doors instead of closing them. My 2 cents. On Fri, Nov 30, 2018 at 13:22 Raymie Stata wrote: > I understand we've been willing to introduce backward-incompatible API > changes (not file-format changes) into minor release versions. If so, > here's an idea for consideration: > > Let's eliminate recursive records from Avro 1.9.x. Recursion > introduces a _lot_ of complexity into many parts of the Avro code > base. We could vastly simplify the code base, and probably speed > things up, by getting rid of this feature. > > The specific proposal would be that Avro 1.9.x would refuse to accept > recursive records, and thus would not be able to read binary files > written by older versions of Avro. I haven't heard of anyone actually > using them, so I don't think this would be a problem. >
Re: Radical idea for 1.9.0
First, this creates a data incompatibility, not just an API incompatibility, so it should not be permitted in 1.9.0. Apps that worked, even when updated for API changes, will not be able to read data they could before they upgraded. Second, folks might actually use this feature in reasonable ways. For example, Avro provides a recursive schema for arbitrary Json data is in the class org.apache.avro.data.Json. This class is used by at least some code on Github. https://github.com/search?q=%22org.apache.avro.data.Json%22=Code Many of these are probably dead code, or forks of Avro itself, but a few appear to be valid uses. Does anyone reading this list use this feature? Thanks, Doug On Fri, Nov 30, 2018 at 10:22 AM Raymie Stata wrote: > I understand we've been willing to introduce backward-incompatible API > changes (not file-format changes) into minor release versions. If so, > here's an idea for consideration: > > Let's eliminate recursive records from Avro 1.9.x. Recursion > introduces a _lot_ of complexity into many parts of the Avro code > base. We could vastly simplify the code base, and probably speed > things up, by getting rid of this feature. > > The specific proposal would be that Avro 1.9.x would refuse to accept > recursive records, and thus would not be able to read binary files > written by older versions of Avro. I haven't heard of anyone actually > using them, so I don't think this would be a problem. >
Radical idea for 1.9.0
I understand we've been willing to introduce backward-incompatible API changes (not file-format changes) into minor release versions. If so, here's an idea for consideration: Let's eliminate recursive records from Avro 1.9.x. Recursion introduces a _lot_ of complexity into many parts of the Avro code base. We could vastly simplify the code base, and probably speed things up, by getting rid of this feature. The specific proposal would be that Avro 1.9.x would refuse to accept recursive records, and thus would not be able to read binary files written by older versions of Avro. I haven't heard of anyone actually using them, so I don't think this would be a problem.