I am frustrated that I couldn't make the hangout, and apologize to all
here.  Based on the notes from Jason, things seem to be on track. Are there
any action items at his point, or was a good plan laid out on the hangout?
If there anything else I can help with let me know.

John

On Saturday, April 2, 2016, John Omernik <[email protected]> wrote:

> Prior to opening a JIRA for the doc change, I thought we could discuss
> here, unless I am misinterpreting how to use JIRA. My thought is this is
> more than a dev only thing, the documentation, and this "exception" to
> documenting Avro like this is due to how  Avro support was explained, and I
> was thinking for this case, we establish this alternative doc page in sync
> with JIRA to help bring this issue around.  By putting discussion here, I
> am hoping to reach more than JIRA users in asking for opinion/thought on
> the subject.
>
> John
>
> On Sat, Apr 2, 2016 at 10:43 AM, Bob Rumsby <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi John,
>> I recommend opening a JIRA for the suggested doc updates. Others may have
>> different opinions on what to document.
>>
>> Thanks,
>> Bob
>>
>> On Sat, Apr 2, 2016 at 6:38 AM, John Omernik <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>> > This has been an interesting topic, and I am sorry I could not
>> participate
>> > more since my original post due to traveling.  Stefán is obviously
>> > frustrated, and I can empathize with him. Being in a position of making
>> > architectural decisions as well, it can be difficult to help define a
>> > strategy for your org based on available documentation, be willing to
>> > working through problems (these are "new" projects), and feel like you
>> are
>> > yelling in a canyon.  The level of frustration there is real.  I do
>> think,
>> > as mentioned, the documentation for Avro should be updated ASAP.
>> >
>> > To that end, here is a recommendation:  Avro needs to be called out as
>> > experimental.  On the documentation page, under "Query Data -> Querying
>> a
>> > File System, let's add "Querying Avro Files".  On this page,  I think we
>> > should, in the first paragraph, state Avro Support has been moved to
>> > experimental, and as of now the Drill project is working through the
>> > following problems with Avro files. Basically, let's take Stefán's list,
>> > and outline the problems, the JIRAs, and the errors that coming up, as
>> well
>> > as outline what works and how it works. I will be willing to work on
>> this
>> > with Stefán. My reasoning is this:  obviously Avro support has been
>> implied
>> > in the docs thus far, others who may have chosen Avro may be going down
>> a
>> > path like Stefán based on the documentation, and may end up in a similar
>> > frustrated state. I want to avoid that. This situation has caused
>> community
>> > tension, and does nothing for the project if we don't look to fix it.
>> Yes,
>> > this is a different approach then other "experimental" type features in
>> > Drill, but I feel in order to avoid this situation particularly on
>> Avro, it
>> > makes sense to call this out.
>> >
>> > Now, this does not fix Stefán's current problem.  As a user and
>> community
>> > member who doesn't code Java, I often struggle to balance asking for
>> > help/changes with the fact that  I personally can't force that change or
>> > write the change myself, and thus am looking for ways contribute other
>> > ways.  Stefán has been contributing, and I do think we need to
>> acknowledge
>> > that. We are all busy, we all have commitments, from the developer
>> side, to
>> > those with day jobs, and even Stefán in his job.  We all do; in this
>> > situation it's easy to point fingers and send the blame around,  and
>> yet, I
>> > don't think any individual completely to blame; there is a confluence of
>> > situations that has contributed here.  Frustrations are high, but we can
>> > handle this, and I think we should be able to handle it in a way that
>> ends
>> > positively for Drill, for the community, and for Stefán.  To that end,
>> here
>> > are my suggestions for discussion:
>> >
>> >
>> >    1. My documentation suggestion above. It puts it clearly out there
>> that
>> >    Avro is experimental, and lets users know the risks of Avro. As the
>> > issues
>> >    get knocked off the list, we can track there as well as JIRAs. While
>> > this
>> >    is "extra" work, and one may ask "why can't we just use JIRA?".  I
>> think
>> >    since the documentation in the past has been wrong on this, in
>> response
>> > we
>> >    should use the documentation in this special case to pull out of the
>> >    situation.  I commit to helping this by facilitating the Avro page, I
>> > just
>> >    need discussion and approval to go this route, and someone who has
>> > access
>> >    to change the pages to work with me. In addition, it may help pull
>> > others
>> >    in who have Java/Avro knowledge into contributing to some of the
>> fixes.
>> >    2. Let's ensure going forward we consider the challenges of new
>> features
>> >    like this and making them as experimental for a while.  I think for
>> new
>> >    plugins/readers we could develop a process where we mark as
>> experimental
>> >    for a number of releases to help work out test cases from users.
>>  The
>> >    issues that are brought up by users will help identify bugs as well
>> as
>> > test
>> >    cases we can use in the code to not only ensure solid interfaces, but
>> > help
>> >    prevent regressions in future releases.
>> >    3. I know this one will be asking a lot, if 1 and 2 seem reasonable,
>> >    let's roll up our sleeves on the Avro stuff.  Identify those "I can't
>> > use
>> >    this" issues  and separate from "I really want this" issues for
>> >    prioritization, and work to resolve the issues starting with the
>> > blockers.
>> >    For the Drill project, "our bad" on the supported nature of Avro in
>> the
>> >    docs, and instead of pulling back and forth on resources,
>> commitments,
>> > etc,
>> >    on user lists, (which in my opinion really hurts community) we say ,
>> > "this
>> >    sucks, it puts everyone in a bad position, let's steer out of this
>> and
>> > get
>> >    on track".  Based on some of the responses, I don't think this is
>> >    unreasonable thus far. I think Stefán, while I don't speak for him,
>> >    understands the nature of what the community can provide to "him" and
>> > that
>> >    the community doesn't work for him, at the same time, this is a
>> really
>> > good
>> >    opportunity for us to band together, and right the course here.
>> >
>> > I welcome discussion here. Jacques and Julian, I know that there are
>> some
>> > challenges around topics like this, and you've outlined them, and I
>> can't
>> > disagree with your points. At the same time, I don't think anyone is
>> saying
>> > the project path, the project itself, or anything Dremio, MapR, or
>> > individual  committers are doing is at fault or should be responsible
>> for
>> > fixing stuff on their own.  I think as I've stated before we have a
>> > confluence of little things that have added up, and in the end looking
>> for
>> > a community solution is our best path.
>> >
>> > Cheers,
>> >
>> > John
>> >
>> >
>> >
>> >
>> > On Sat, Apr 2, 2016 at 1:37 AM, Stefán Baxter <
>> [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>> > wrote:
>> >
>> > > Hi Jason,
>> > >
>> > > Thank you for writing this up, it's appreciated.
>> > >
>> > > First things first. We would be more than happy to help on these Avro
>> > > related issues but the Drill code base is quite complex, with a fairly
>> > > steep learning curve, and lately a lot of my time has been spent on
>> > dealing
>> > > with the repercussions of having decided to use Avro for fresh/inbound
>> > > data.  (I realize some here might not see this a contribution but I
>> beg
>> > to
>> > > differ. Any project requires regular users to put in the time to adapt
>> > > new/unhardened projects to their solutions and in the case of using
>> Avro
>> > > with Drill it's been more like testing and duck-taping than a "simple
>> > > adaption of free software")
>> > >
>> > > I find these Avro problems interesting for other reasons as well:
>> > >
>> > >    - They raises the question of the commitment behind accepting a
>> plugin
>> > >    like this (and not marking it experimental)
>> > >
>> > >    - There are design decision the I think are very wrong
>> > >    - enforcing schema looks to me like a serious violation of the, no
>> > where
>> > >    to be found, "Drill Manifesto" that I have asked about
>> > >    - see the original entry
>> > >
>> > >    - The level of noise required to get feedback on a topic like this
>> > >    - I apologize to everyone but ask them to appreciate that this
>> > >    provocative approach was by no means the first option
>> > >
>> > > As a "user" I'm obviously not a person that can call for or insist on
>> > > having these things address but perhaps that changes with time.
>> > >
>> > > Now on towards fixing the outstanding bugs.  If someone can point us
>> in
>> > the
>> > > reght direction and discuss the best approach to fixing each bug then
>> we
>> > > can at least try to help (and we do so gladly).
>> > >
>> > > It's at least clear to me that many users of Drill, those working on
>> > > streaming data, need the support for a schema capable format to store
>> > their
>> > > inbound/fresh data before it's converted into Parquet.
>> > > Currently there seems to be no real alternative.
>> > >
>> > > So, If we can help then we are willing and I suggest that, if you
>> want,
>> > we
>> > > take this to Jira and try to work ir from there.
>> > >
>> > > Regards,
>> > >  -Stefán
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>> > wrote:
>> > >
>> > > > I take some responsibility for your lack of response on this,
>> because I
>> > > had
>> > > > said I would try to take a look at the dirN issue that has been
>> > > outstanding
>> > > > for some time with Avro. This might have prevented others from
>> jumping
>> > in
>> > > > to help and I will work on communicating when I don't have time to
>> work
>> > > on
>> > > > something that I raise my hand for.
>> > > >
>> > > > That being said, there are lots of parts of Drill that still need
>> > > > attention. I do think that you are the only active user of the Avro
>> > > support
>> > > > that I know of. Even though that is the case, I have been trying to
>> > make
>> > > > the feature useable for you and and other possible users, like John.
>> > > >
>> > > > One thing that would likely be worth discussing as a follow up to
>> this
>> > is
>> > > > our expectations for code quality we accept from contributors. There
>> > were
>> > > > several issues with Avro when it was merged, and no one ever really
>> > took
>> > > on
>> > > > the task of fully testing it.
>> > > > I do know there is another issue around a lack of responses of
>> recent
>> > > > requests, but I'm tabling that for a little bit. I would like to
>> see it
>> > > > discussed, but I want to scope this discussion for now.
>> > > >
>> > > > I don't think the plugin is far from fully complete, and I have been
>> > > > working to improve the tests each time I fix an issue with it. I
>> think
>> > it
>> > > > would be very useful for us to define a clear set of criteria for a
>> > > feature
>> > > > like a format plugin to be considered fully tested and ready for
>> > > inclusion
>> > > > in the core project. I think this would have the benefit of both
>> > helping
>> > > > users to avoid issues, as well as give a clearer definition of the
>> task
>> > > of
>> > > > writing a format plugin. This is a community contribution that
>> should
>> > be
>> > > > easier and more strongly encouraged than it is today, and could
>> really
>> > > help
>> > > > new users adopt Drill if they are using other data formats.
>> > > >
>> > > > Jason Altekruse
>> > > > Software Engineer at Dremio
>> > > > Apache Drill Committer
>> > > >
>> > > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter <
>> > [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Yes Parth, you are 100% right and we are willing to help.
>> > > > >
>> > > > > The relationship one builds with a community also depends on the
>> > > > > "wipe/feeling" of the community and I know it reflects on me
>> here, as
>> > > > well
>> > > > > as the community, that many of my attempts to help and get help
>> have
>> > > not
>> > > > > been fruitful.
>> > > > >
>> > > > > I also acknowledge that I this topic get's me frustrated and that
>> my
>> > > > > manners could easily improve but it's not as if that is a "first
>> > > > response"
>> > > > > but an eventual state caused by indifference on one side and the
>> > > > > determination to get some response on the other.
>> > > > >
>> > > > > Marking Avro as experimental is a considered towards new users and
>> > > > > something I wish was in place before we decided to depend on it
>> and
>> > > spend
>> > > > > all this time on trying to make it work.
>> > > > >
>> > > > > Ideally, for us, the decision would be to support Avro properly.
>> > > > >
>> > > > > My +1 for improving Avro support so that it can truly be used as
>> an
>> > > > interim
>> > > > > file format before data is converted to Parquet. (I see no real
>> > > > alternative
>> > > > > here)
>> > > > >
>> > > > > - Stefán
>> > > > >
>> > > > >
>> > > > > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>> > > wrote:
>> > > > >
>> > > > > > +1 on marking Avro experimental.
>> > > > > >
>> > > > > > @Stefan, we have been trying to help you as much as our time
>> > > permits. I
>> > > > > > know that I held up the 1.6 release while Jason fixed the issues
>> > that
>> > > > you
>> > > > > > brought up. As was said earlier, this is personal time we are
>> > > spending
>> > > > to
>> > > > > > help users in the community, so providing an immediate response
>> to
>> > > > > everyone
>> > > > > > is difficult. Ultimately, it boils down to the relationships one
>> > > builds
>> > > > > > within the community. Folks with shared goals help each other
>> and
>> > > > > everyone
>> > > > > > benefits.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau <
>> > [email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Stefan,
>> > > > > > >
>> > > > > > > It makes sense to me to mark the Avro plugin experimental.
>> > Clearly,
>> > > > > there
>> > > > > > > are bugs. I also want to note your requirements and
>> expectations
>> > > > > haven't
>> > > > > > > always been in alignment with what the Avro plugin developers
>> > > > > > > built/envisioned (especially around schemas). As part of
>> trying
>> > to
>> > > > > > address
>> > > > > > > these gaps, I'd like to ask again for you to provide actual
>> data
>> > > and
>> > > > > > tests
>> > > > > > > cases so we make sure that the Avro plugin includes those as
>> > future
>> > > > > test
>> > > > > > > cases. (This is absolutely the best way to ensure that the
>> > project
>> > > > > > > continues to work for your use case.)
>> > > > > > >
>> > > > > > > The bigger issue I see here is that you expect the community
>> to
>> > > spend
>> > > > > > time
>> > > > > > > doing what you want. You have already received a lot of that
>> via
>> > > free
>> > > > > > > support and numerous bug fixes by myself, Jason and others.
>> You
>> > > need
>> > > > to
>> > > > > > > remember: this community is run by a bunch of volunteers.
>> > Everybody
>> > > > > here
>> > > > > > > has a day job. A lot of time I spend in the community is at
>> the
>> > > cost
>> > > > of
>> > > > > > my
>> > > > > > > personal life. For others, it is the same.
>> > > > > > >
>> > > > > > > This is a good place to ask for help but you should never
>> demand
>> > > it.
>> > > > If
>> > > > > > you
>> > > > > > > want paid support, I know Ted offered this from MapR and I'm
>> sure
>> > > if
>> > > > > you
>> > > > > > > went that route, your issues would get addressed very
>> quickly. If
>> > > you
>> > > > > > don't
>> > > > > > > want to go that route, then I suggest that you help by
>> creating
>> > > more
>> > > > > > > example data and test cases and focusing on what are the most
>> > > > important
>> > > > > > > issues that you need to solve. From there, you can continue to
>> > > expect
>> > > > > > that
>> > > > > > > people will help you--as they can. There are no guarantees in
>> > open
>> > > > > > source.
>> > > > > > > Everything comes through the kindness and shared goals of
>> those
>> > in
>> > > > the
>> > > > > > > community.
>> > > > > > >
>> > > > > > > thanks,
>> > > > > > > Jacques
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Jacques Nadeau
>> > > > > > > CTO and Co-Founder, Dremio
>> > > > > > >
>> > > > > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter <
>> > > > > [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > Is it at all possible that we are the only company trying to
>> > use
>> > > > Avro
>> > > > > > > with
>> > > > > > > > Drill to some serious extent?
>> > > > > > > >
>> > > > > > > > We continue to coma across all sorts of embarrassing
>> > shortcomings
>> > > > > like
>> > > > > > > the
>> > > > > > > > one we are dealing with now where a schema change exception
>> is
>> > > > thrown
>> > > > > > > even
>> > > > > > > > when working with a single Avro file (that has the same
>> > schema).
>> > > > > > > >
>> > > > > > > > Can a non project member call for a discussion on this topic
>> > and
>> > > > the
>> > > > > > > level
>> > > > > > > > of support that is offered for Avro in Drill?
>> > > > > > > >
>> > > > > > > > My discussion topics would be:
>> > > > > > > >
>> > > > > > > >    - Strange schema validation that ... :
>> > > > > > > >    ... currently fails on single file
>> > > > > > > >    ... prevents dirX variables to work
>> > > > > > > >    ... would require Drill to scan all Avro files to
>> establish
>> > > > schema
>> > > > > > > (even
>> > > > > > > >    when pruning would be used)
>> > > > > > > >    ... would ALWAY fail for old queries if the an old Avro
>> > file,
>> > > > > > > containing
>> > > > > > > >    the original fields, was removed and could not be scanned
>> > > > > > > >    ... does not rhyme with the "eliminate ETL" and "Evolving
>> > > > Schema"
>> > > > > > > goals
>> > > > > > > >    of Drill
>> > > > > > > >
>> > > > > > > >    - Simple union types do not work to declare nullable
>> fields
>> > > > > > > >
>> > > > > > > >    - Drill can not read Parquet that is created by
>> > > parquet-mr-avro
>> > > > > > > >
>> > > > > > > >    - What is the intention for Avro in Drill
>> > > > > > > >    - Should we select to use some other format to
>> buffer/badge
>> > > data
>> > > > > > > before
>> > > > > > > >    creating a Parquet file for it?
>> > > > > > > >
>> > > > > > > >    - The culture here regarding talking about boring/hard
>> > topics
>> > > > like
>> > > > > > > this
>> > > > > > > >    - Where serious complaints/issues are met with silence
>> > > > > > > >    - I know full well that my frustration shines through
>> here
>> > and
>> > > > > that
>> > > > > > it
>> > > > > > > >    not helping but this Drill+Avro mess is really getting
>> too
>> > > much
>> > > > > for
>> > > > > > us
>> > > > > > > > to
>> > > > > > > >    handle
>> > > > > > > >
>> > > > > > > > Look forward do discuss this here or during the next
>> hangout.
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > >  -Stefán (or ... mr. old & frustrated)
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

-- 
Sent from my iThing

Reply via email to