Hi John,
I recommend opening a JIRA for the suggested doc updates. Others may have
different opinions on what to document.

Thanks,
Bob

On Sat, Apr 2, 2016 at 6:38 AM, John Omernik <[email protected]> wrote:

> This has been an interesting topic, and I am sorry I could not participate
> more since my original post due to traveling.  Stefán is obviously
> frustrated, and I can empathize with him. Being in a position of making
> architectural decisions as well, it can be difficult to help define a
> strategy for your org based on available documentation, be willing to
> working through problems (these are "new" projects), and feel like you are
> yelling in a canyon.  The level of frustration there is real.  I do think,
> as mentioned, the documentation for Avro should be updated ASAP.
>
> To that end, here is a recommendation:  Avro needs to be called out as
> experimental.  On the documentation page, under "Query Data -> Querying a
> File System, let's add "Querying Avro Files".  On this page,  I think we
> should, in the first paragraph, state Avro Support has been moved to
> experimental, and as of now the Drill project is working through the
> following problems with Avro files. Basically, let's take Stefán's list,
> and outline the problems, the JIRAs, and the errors that coming up, as well
> as outline what works and how it works. I will be willing to work on this
> with Stefán. My reasoning is this:  obviously Avro support has been implied
> in the docs thus far, others who may have chosen Avro may be going down a
> path like Stefán based on the documentation, and may end up in a similar
> frustrated state. I want to avoid that. This situation has caused community
> tension, and does nothing for the project if we don't look to fix it. Yes,
> this is a different approach then other "experimental" type features in
> Drill, but I feel in order to avoid this situation particularly on Avro, it
> makes sense to call this out.
>
> Now, this does not fix Stefán's current problem.  As a user and community
> member who doesn't code Java, I often struggle to balance asking for
> help/changes with the fact that  I personally can't force that change or
> write the change myself, and thus am looking for ways contribute other
> ways.  Stefán has been contributing, and I do think we need to acknowledge
> that. We are all busy, we all have commitments, from the developer side, to
> those with day jobs, and even Stefán in his job.  We all do; in this
> situation it's easy to point fingers and send the blame around,  and yet, I
> don't think any individual completely to blame; there is a confluence of
> situations that has contributed here.  Frustrations are high, but we can
> handle this, and I think we should be able to handle it in a way that ends
> positively for Drill, for the community, and for Stefán.  To that end, here
> are my suggestions for discussion:
>
>
>    1. My documentation suggestion above. It puts it clearly out there that
>    Avro is experimental, and lets users know the risks of Avro. As the
> issues
>    get knocked off the list, we can track there as well as JIRAs. While
> this
>    is "extra" work, and one may ask "why can't we just use JIRA?".  I think
>    since the documentation in the past has been wrong on this, in response
> we
>    should use the documentation in this special case to pull out of the
>    situation.  I commit to helping this by facilitating the Avro page, I
> just
>    need discussion and approval to go this route, and someone who has
> access
>    to change the pages to work with me. In addition, it may help pull
> others
>    in who have Java/Avro knowledge into contributing to some of the fixes.
>    2. Let's ensure going forward we consider the challenges of new features
>    like this and making them as experimental for a while.  I think for new
>    plugins/readers we could develop a process where we mark as experimental
>    for a number of releases to help work out test cases from users.   The
>    issues that are brought up by users will help identify bugs as well as
> test
>    cases we can use in the code to not only ensure solid interfaces, but
> help
>    prevent regressions in future releases.
>    3. I know this one will be asking a lot, if 1 and 2 seem reasonable,
>    let's roll up our sleeves on the Avro stuff.  Identify those "I can't
> use
>    this" issues  and separate from "I really want this" issues for
>    prioritization, and work to resolve the issues starting with the
> blockers.
>    For the Drill project, "our bad" on the supported nature of Avro in the
>    docs, and instead of pulling back and forth on resources, commitments,
> etc,
>    on user lists, (which in my opinion really hurts community) we say ,
> "this
>    sucks, it puts everyone in a bad position, let's steer out of this and
> get
>    on track".  Based on some of the responses, I don't think this is
>    unreasonable thus far. I think Stefán, while I don't speak for him,
>    understands the nature of what the community can provide to "him" and
> that
>    the community doesn't work for him, at the same time, this is a really
> good
>    opportunity for us to band together, and right the course here.
>
> I welcome discussion here. Jacques and Julian, I know that there are some
> challenges around topics like this, and you've outlined them, and I can't
> disagree with your points. At the same time, I don't think anyone is saying
> the project path, the project itself, or anything Dremio, MapR, or
> individual  committers are doing is at fault or should be responsible for
> fixing stuff on their own.  I think as I've stated before we have a
> confluence of little things that have added up, and in the end looking for
> a community solution is our best path.
>
> Cheers,
>
> John
>
>
>
>
> On Sat, Apr 2, 2016 at 1:37 AM, Stefán Baxter <[email protected]>
> wrote:
>
> > Hi Jason,
> >
> > Thank you for writing this up, it's appreciated.
> >
> > First things first. We would be more than happy to help on these Avro
> > related issues but the Drill code base is quite complex, with a fairly
> > steep learning curve, and lately a lot of my time has been spent on
> dealing
> > with the repercussions of having decided to use Avro for fresh/inbound
> > data.  (I realize some here might not see this a contribution but I beg
> to
> > differ. Any project requires regular users to put in the time to adapt
> > new/unhardened projects to their solutions and in the case of using Avro
> > with Drill it's been more like testing and duck-taping than a "simple
> > adaption of free software")
> >
> > I find these Avro problems interesting for other reasons as well:
> >
> >    - They raises the question of the commitment behind accepting a plugin
> >    like this (and not marking it experimental)
> >
> >    - There are design decision the I think are very wrong
> >    - enforcing schema looks to me like a serious violation of the, no
> where
> >    to be found, "Drill Manifesto" that I have asked about
> >    - see the original entry
> >
> >    - The level of noise required to get feedback on a topic like this
> >    - I apologize to everyone but ask them to appreciate that this
> >    provocative approach was by no means the first option
> >
> > As a "user" I'm obviously not a person that can call for or insist on
> > having these things address but perhaps that changes with time.
> >
> > Now on towards fixing the outstanding bugs.  If someone can point us in
> the
> > reght direction and discuss the best approach to fixing each bug then we
> > can at least try to help (and we do so gladly).
> >
> > It's at least clear to me that many users of Drill, those working on
> > streaming data, need the support for a schema capable format to store
> their
> > inbound/fresh data before it's converted into Parquet.
> > Currently there seems to be no real alternative.
> >
> > So, If we can help then we are willing and I suggest that, if you want,
> we
> > take this to Jira and try to work ir from there.
> >
> > Regards,
> >  -Stefán
> >
> >
> >
> >
> >
> > On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <[email protected]>
> wrote:
> >
> > > I take some responsibility for your lack of response on this, because I
> > had
> > > said I would try to take a look at the dirN issue that has been
> > outstanding
> > > for some time with Avro. This might have prevented others from jumping
> in
> > > to help and I will work on communicating when I don't have time to work
> > on
> > > something that I raise my hand for.
> > >
> > > That being said, there are lots of parts of Drill that still need
> > > attention. I do think that you are the only active user of the Avro
> > support
> > > that I know of. Even though that is the case, I have been trying to
> make
> > > the feature useable for you and and other possible users, like John.
> > >
> > > One thing that would likely be worth discussing as a follow up to this
> is
> > > our expectations for code quality we accept from contributors. There
> were
> > > several issues with Avro when it was merged, and no one ever really
> took
> > on
> > > the task of fully testing it.
> > > I do know there is another issue around a lack of responses of recent
> > > requests, but I'm tabling that for a little bit. I would like to see it
> > > discussed, but I want to scope this discussion for now.
> > >
> > > I don't think the plugin is far from fully complete, and I have been
> > > working to improve the tests each time I fix an issue with it. I think
> it
> > > would be very useful for us to define a clear set of criteria for a
> > feature
> > > like a format plugin to be considered fully tested and ready for
> > inclusion
> > > in the core project. I think this would have the benefit of both
> helping
> > > users to avoid issues, as well as give a clearer definition of the task
> > of
> > > writing a format plugin. This is a community contribution that should
> be
> > > easier and more strongly encouraged than it is today, and could really
> > help
> > > new users adopt Drill if they are using other data formats.
> > >
> > > Jason Altekruse
> > > Software Engineer at Dremio
> > > Apache Drill Committer
> > >
> > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Yes Parth, you are 100% right and we are willing to help.
> > > >
> > > > The relationship one builds with a community also depends on the
> > > > "wipe/feeling" of the community and I know it reflects on me here, as
> > > well
> > > > as the community, that many of my attempts to help and get help have
> > not
> > > > been fruitful.
> > > >
> > > > I also acknowledge that I this topic get's me frustrated and that my
> > > > manners could easily improve but it's not as if that is a "first
> > > response"
> > > > but an eventual state caused by indifference on one side and the
> > > > determination to get some response on the other.
> > > >
> > > > Marking Avro as experimental is a considered towards new users and
> > > > something I wish was in place before we decided to depend on it and
> > spend
> > > > all this time on trying to make it work.
> > > >
> > > > Ideally, for us, the decision would be to support Avro properly.
> > > >
> > > > My +1 for improving Avro support so that it can truly be used as an
> > > interim
> > > > file format before data is converted to Parquet. (I see no real
> > > alternative
> > > > here)
> > > >
> > > > - Stefán
> > > >
> > > >
> > > > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <[email protected]>
> > wrote:
> > > >
> > > > > +1 on marking Avro experimental.
> > > > >
> > > > > @Stefan, we have been trying to help you as much as our time
> > permits. I
> > > > > know that I held up the 1.6 release while Jason fixed the issues
> that
> > > you
> > > > > brought up. As was said earlier, this is personal time we are
> > spending
> > > to
> > > > > help users in the community, so providing an immediate response to
> > > > everyone
> > > > > is difficult. Ultimately, it boils down to the relationships one
> > builds
> > > > > within the community. Folks with shared goals help each other and
> > > > everyone
> > > > > benefits.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Stefan,
> > > > > >
> > > > > > It makes sense to me to mark the Avro plugin experimental.
> Clearly,
> > > > there
> > > > > > are bugs. I also want to note your requirements and expectations
> > > > haven't
> > > > > > always been in alignment with what the Avro plugin developers
> > > > > > built/envisioned (especially around schemas). As part of trying
> to
> > > > > address
> > > > > > these gaps, I'd like to ask again for you to provide actual data
> > and
> > > > > tests
> > > > > > cases so we make sure that the Avro plugin includes those as
> future
> > > > test
> > > > > > cases. (This is absolutely the best way to ensure that the
> project
> > > > > > continues to work for your use case.)
> > > > > >
> > > > > > The bigger issue I see here is that you expect the community to
> > spend
> > > > > time
> > > > > > doing what you want. You have already received a lot of that via
> > free
> > > > > > support and numerous bug fixes by myself, Jason and others. You
> > need
> > > to
> > > > > > remember: this community is run by a bunch of volunteers.
> Everybody
> > > > here
> > > > > > has a day job. A lot of time I spend in the community is at the
> > cost
> > > of
> > > > > my
> > > > > > personal life. For others, it is the same.
> > > > > >
> > > > > > This is a good place to ask for help but you should never demand
> > it.
> > > If
> > > > > you
> > > > > > want paid support, I know Ted offered this from MapR and I'm sure
> > if
> > > > you
> > > > > > went that route, your issues would get addressed very quickly. If
> > you
> > > > > don't
> > > > > > want to go that route, then I suggest that you help by creating
> > more
> > > > > > example data and test cases and focusing on what are the most
> > > important
> > > > > > issues that you need to solve. From there, you can continue to
> > expect
> > > > > that
> > > > > > people will help you--as they can. There are no guarantees in
> open
> > > > > source.
> > > > > > Everything comes through the kindness and shared goals of those
> in
> > > the
> > > > > > community.
> > > > > >
> > > > > > thanks,
> > > > > > Jacques
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter <
> > > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is it at all possible that we are the only company trying to
> use
> > > Avro
> > > > > > with
> > > > > > > Drill to some serious extent?
> > > > > > >
> > > > > > > We continue to coma across all sorts of embarrassing
> shortcomings
> > > > like
> > > > > > the
> > > > > > > one we are dealing with now where a schema change exception is
> > > thrown
> > > > > > even
> > > > > > > when working with a single Avro file (that has the same
> schema).
> > > > > > >
> > > > > > > Can a non project member call for a discussion on this topic
> and
> > > the
> > > > > > level
> > > > > > > of support that is offered for Avro in Drill?
> > > > > > >
> > > > > > > My discussion topics would be:
> > > > > > >
> > > > > > >    - Strange schema validation that ... :
> > > > > > >    ... currently fails on single file
> > > > > > >    ... prevents dirX variables to work
> > > > > > >    ... would require Drill to scan all Avro files to establish
> > > schema
> > > > > > (even
> > > > > > >    when pruning would be used)
> > > > > > >    ... would ALWAY fail for old queries if the an old Avro
> file,
> > > > > > containing
> > > > > > >    the original fields, was removed and could not be scanned
> > > > > > >    ... does not rhyme with the "eliminate ETL" and "Evolving
> > > Schema"
> > > > > > goals
> > > > > > >    of Drill
> > > > > > >
> > > > > > >    - Simple union types do not work to declare nullable fields
> > > > > > >
> > > > > > >    - Drill can not read Parquet that is created by
> > parquet-mr-avro
> > > > > > >
> > > > > > >    - What is the intention for Avro in Drill
> > > > > > >    - Should we select to use some other format to buffer/badge
> > data
> > > > > > before
> > > > > > >    creating a Parquet file for it?
> > > > > > >
> > > > > > >    - The culture here regarding talking about boring/hard
> topics
> > > like
> > > > > > this
> > > > > > >    - Where serious complaints/issues are met with silence
> > > > > > >    - I know full well that my frustration shines through here
> and
> > > > that
> > > > > it
> > > > > > >    not helping but this Drill+Avro mess is really getting too
> > much
> > > > for
> > > > > us
> > > > > > > to
> > > > > > >    handle
> > > > > > >
> > > > > > > Look forward do discuss this here or during the next hangout.
> > > > > > >
> > > > > > > Regards,
> > > > > > >  -Stefán (or ... mr. old & frustrated)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to