I am frustrated that I couldn't make the hangout, and apologize to all here. Based on the notes from Jason, things seem to be on track. Are there any action items at his point, or was a good plan laid out on the hangout? If there anything else I can help with let me know.
John On Saturday, April 2, 2016, John Omernik <[email protected]> wrote: > Prior to opening a JIRA for the doc change, I thought we could discuss > here, unless I am misinterpreting how to use JIRA. My thought is this is > more than a dev only thing, the documentation, and this "exception" to > documenting Avro like this is due to how Avro support was explained, and I > was thinking for this case, we establish this alternative doc page in sync > with JIRA to help bring this issue around. By putting discussion here, I > am hoping to reach more than JIRA users in asking for opinion/thought on > the subject. > > John > > On Sat, Apr 2, 2016 at 10:43 AM, Bob Rumsby <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi John, >> I recommend opening a JIRA for the suggested doc updates. Others may have >> different opinions on what to document. >> >> Thanks, >> Bob >> >> On Sat, Apr 2, 2016 at 6:38 AM, John Omernik <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> > This has been an interesting topic, and I am sorry I could not >> participate >> > more since my original post due to traveling. Stefán is obviously >> > frustrated, and I can empathize with him. Being in a position of making >> > architectural decisions as well, it can be difficult to help define a >> > strategy for your org based on available documentation, be willing to >> > working through problems (these are "new" projects), and feel like you >> are >> > yelling in a canyon. The level of frustration there is real. I do >> think, >> > as mentioned, the documentation for Avro should be updated ASAP. >> > >> > To that end, here is a recommendation: Avro needs to be called out as >> > experimental. On the documentation page, under "Query Data -> Querying >> a >> > File System, let's add "Querying Avro Files". On this page, I think we >> > should, in the first paragraph, state Avro Support has been moved to >> > experimental, and as of now the Drill project is working through the >> > following problems with Avro files. Basically, let's take Stefán's list, >> > and outline the problems, the JIRAs, and the errors that coming up, as >> well >> > as outline what works and how it works. I will be willing to work on >> this >> > with Stefán. My reasoning is this: obviously Avro support has been >> implied >> > in the docs thus far, others who may have chosen Avro may be going down >> a >> > path like Stefán based on the documentation, and may end up in a similar >> > frustrated state. I want to avoid that. This situation has caused >> community >> > tension, and does nothing for the project if we don't look to fix it. >> Yes, >> > this is a different approach then other "experimental" type features in >> > Drill, but I feel in order to avoid this situation particularly on >> Avro, it >> > makes sense to call this out. >> > >> > Now, this does not fix Stefán's current problem. As a user and >> community >> > member who doesn't code Java, I often struggle to balance asking for >> > help/changes with the fact that I personally can't force that change or >> > write the change myself, and thus am looking for ways contribute other >> > ways. Stefán has been contributing, and I do think we need to >> acknowledge >> > that. We are all busy, we all have commitments, from the developer >> side, to >> > those with day jobs, and even Stefán in his job. We all do; in this >> > situation it's easy to point fingers and send the blame around, and >> yet, I >> > don't think any individual completely to blame; there is a confluence of >> > situations that has contributed here. Frustrations are high, but we can >> > handle this, and I think we should be able to handle it in a way that >> ends >> > positively for Drill, for the community, and for Stefán. To that end, >> here >> > are my suggestions for discussion: >> > >> > >> > 1. My documentation suggestion above. It puts it clearly out there >> that >> > Avro is experimental, and lets users know the risks of Avro. As the >> > issues >> > get knocked off the list, we can track there as well as JIRAs. While >> > this >> > is "extra" work, and one may ask "why can't we just use JIRA?". I >> think >> > since the documentation in the past has been wrong on this, in >> response >> > we >> > should use the documentation in this special case to pull out of the >> > situation. I commit to helping this by facilitating the Avro page, I >> > just >> > need discussion and approval to go this route, and someone who has >> > access >> > to change the pages to work with me. In addition, it may help pull >> > others >> > in who have Java/Avro knowledge into contributing to some of the >> fixes. >> > 2. Let's ensure going forward we consider the challenges of new >> features >> > like this and making them as experimental for a while. I think for >> new >> > plugins/readers we could develop a process where we mark as >> experimental >> > for a number of releases to help work out test cases from users. >> The >> > issues that are brought up by users will help identify bugs as well >> as >> > test >> > cases we can use in the code to not only ensure solid interfaces, but >> > help >> > prevent regressions in future releases. >> > 3. I know this one will be asking a lot, if 1 and 2 seem reasonable, >> > let's roll up our sleeves on the Avro stuff. Identify those "I can't >> > use >> > this" issues and separate from "I really want this" issues for >> > prioritization, and work to resolve the issues starting with the >> > blockers. >> > For the Drill project, "our bad" on the supported nature of Avro in >> the >> > docs, and instead of pulling back and forth on resources, >> commitments, >> > etc, >> > on user lists, (which in my opinion really hurts community) we say , >> > "this >> > sucks, it puts everyone in a bad position, let's steer out of this >> and >> > get >> > on track". Based on some of the responses, I don't think this is >> > unreasonable thus far. I think Stefán, while I don't speak for him, >> > understands the nature of what the community can provide to "him" and >> > that >> > the community doesn't work for him, at the same time, this is a >> really >> > good >> > opportunity for us to band together, and right the course here. >> > >> > I welcome discussion here. Jacques and Julian, I know that there are >> some >> > challenges around topics like this, and you've outlined them, and I >> can't >> > disagree with your points. At the same time, I don't think anyone is >> saying >> > the project path, the project itself, or anything Dremio, MapR, or >> > individual committers are doing is at fault or should be responsible >> for >> > fixing stuff on their own. I think as I've stated before we have a >> > confluence of little things that have added up, and in the end looking >> for >> > a community solution is our best path. >> > >> > Cheers, >> > >> > John >> > >> > >> > >> > >> > On Sat, Apr 2, 2016 at 1:37 AM, Stefán Baxter < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> > wrote: >> > >> > > Hi Jason, >> > > >> > > Thank you for writing this up, it's appreciated. >> > > >> > > First things first. We would be more than happy to help on these Avro >> > > related issues but the Drill code base is quite complex, with a fairly >> > > steep learning curve, and lately a lot of my time has been spent on >> > dealing >> > > with the repercussions of having decided to use Avro for fresh/inbound >> > > data. (I realize some here might not see this a contribution but I >> beg >> > to >> > > differ. Any project requires regular users to put in the time to adapt >> > > new/unhardened projects to their solutions and in the case of using >> Avro >> > > with Drill it's been more like testing and duck-taping than a "simple >> > > adaption of free software") >> > > >> > > I find these Avro problems interesting for other reasons as well: >> > > >> > > - They raises the question of the commitment behind accepting a >> plugin >> > > like this (and not marking it experimental) >> > > >> > > - There are design decision the I think are very wrong >> > > - enforcing schema looks to me like a serious violation of the, no >> > where >> > > to be found, "Drill Manifesto" that I have asked about >> > > - see the original entry >> > > >> > > - The level of noise required to get feedback on a topic like this >> > > - I apologize to everyone but ask them to appreciate that this >> > > provocative approach was by no means the first option >> > > >> > > As a "user" I'm obviously not a person that can call for or insist on >> > > having these things address but perhaps that changes with time. >> > > >> > > Now on towards fixing the outstanding bugs. If someone can point us >> in >> > the >> > > reght direction and discuss the best approach to fixing each bug then >> we >> > > can at least try to help (and we do so gladly). >> > > >> > > It's at least clear to me that many users of Drill, those working on >> > > streaming data, need the support for a schema capable format to store >> > their >> > > inbound/fresh data before it's converted into Parquet. >> > > Currently there seems to be no real alternative. >> > > >> > > So, If we can help then we are willing and I suggest that, if you >> want, >> > we >> > > take this to Jira and try to work ir from there. >> > > >> > > Regards, >> > > -Stefán >> > > >> > > >> > > >> > > >> > > >> > > On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> > wrote: >> > > >> > > > I take some responsibility for your lack of response on this, >> because I >> > > had >> > > > said I would try to take a look at the dirN issue that has been >> > > outstanding >> > > > for some time with Avro. This might have prevented others from >> jumping >> > in >> > > > to help and I will work on communicating when I don't have time to >> work >> > > on >> > > > something that I raise my hand for. >> > > > >> > > > That being said, there are lots of parts of Drill that still need >> > > > attention. I do think that you are the only active user of the Avro >> > > support >> > > > that I know of. Even though that is the case, I have been trying to >> > make >> > > > the feature useable for you and and other possible users, like John. >> > > > >> > > > One thing that would likely be worth discussing as a follow up to >> this >> > is >> > > > our expectations for code quality we accept from contributors. There >> > were >> > > > several issues with Avro when it was merged, and no one ever really >> > took >> > > on >> > > > the task of fully testing it. >> > > > I do know there is another issue around a lack of responses of >> recent >> > > > requests, but I'm tabling that for a little bit. I would like to >> see it >> > > > discussed, but I want to scope this discussion for now. >> > > > >> > > > I don't think the plugin is far from fully complete, and I have been >> > > > working to improve the tests each time I fix an issue with it. I >> think >> > it >> > > > would be very useful for us to define a clear set of criteria for a >> > > feature >> > > > like a format plugin to be considered fully tested and ready for >> > > inclusion >> > > > in the core project. I think this would have the benefit of both >> > helping >> > > > users to avoid issues, as well as give a clearer definition of the >> task >> > > of >> > > > writing a format plugin. This is a community contribution that >> should >> > be >> > > > easier and more strongly encouraged than it is today, and could >> really >> > > help >> > > > new users adopt Drill if they are using other data formats. >> > > > >> > > > Jason Altekruse >> > > > Software Engineer at Dremio >> > > > Apache Drill Committer >> > > > >> > > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter < >> > [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> > > > >> > > > wrote: >> > > > >> > > > > Yes Parth, you are 100% right and we are willing to help. >> > > > > >> > > > > The relationship one builds with a community also depends on the >> > > > > "wipe/feeling" of the community and I know it reflects on me >> here, as >> > > > well >> > > > > as the community, that many of my attempts to help and get help >> have >> > > not >> > > > > been fruitful. >> > > > > >> > > > > I also acknowledge that I this topic get's me frustrated and that >> my >> > > > > manners could easily improve but it's not as if that is a "first >> > > > response" >> > > > > but an eventual state caused by indifference on one side and the >> > > > > determination to get some response on the other. >> > > > > >> > > > > Marking Avro as experimental is a considered towards new users and >> > > > > something I wish was in place before we decided to depend on it >> and >> > > spend >> > > > > all this time on trying to make it work. >> > > > > >> > > > > Ideally, for us, the decision would be to support Avro properly. >> > > > > >> > > > > My +1 for improving Avro support so that it can truly be used as >> an >> > > > interim >> > > > > file format before data is converted to Parquet. (I see no real >> > > > alternative >> > > > > here) >> > > > > >> > > > > - Stefán >> > > > > >> > > > > >> > > > > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> > > wrote: >> > > > > >> > > > > > +1 on marking Avro experimental. >> > > > > > >> > > > > > @Stefan, we have been trying to help you as much as our time >> > > permits. I >> > > > > > know that I held up the 1.6 release while Jason fixed the issues >> > that >> > > > you >> > > > > > brought up. As was said earlier, this is personal time we are >> > > spending >> > > > to >> > > > > > help users in the community, so providing an immediate response >> to >> > > > > everyone >> > > > > > is difficult. Ultimately, it boils down to the relationships one >> > > builds >> > > > > > within the community. Folks with shared goals help each other >> and >> > > > > everyone >> > > > > > benefits. >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau < >> > [email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');> >> > >> > > > > > wrote: >> > > > > > >> > > > > > > Stefan, >> > > > > > > >> > > > > > > It makes sense to me to mark the Avro plugin experimental. >> > Clearly, >> > > > > there >> > > > > > > are bugs. I also want to note your requirements and >> expectations >> > > > > haven't >> > > > > > > always been in alignment with what the Avro plugin developers >> > > > > > > built/envisioned (especially around schemas). As part of >> trying >> > to >> > > > > > address >> > > > > > > these gaps, I'd like to ask again for you to provide actual >> data >> > > and >> > > > > > tests >> > > > > > > cases so we make sure that the Avro plugin includes those as >> > future >> > > > > test >> > > > > > > cases. (This is absolutely the best way to ensure that the >> > project >> > > > > > > continues to work for your use case.) >> > > > > > > >> > > > > > > The bigger issue I see here is that you expect the community >> to >> > > spend >> > > > > > time >> > > > > > > doing what you want. You have already received a lot of that >> via >> > > free >> > > > > > > support and numerous bug fixes by myself, Jason and others. >> You >> > > need >> > > > to >> > > > > > > remember: this community is run by a bunch of volunteers. >> > Everybody >> > > > > here >> > > > > > > has a day job. A lot of time I spend in the community is at >> the >> > > cost >> > > > of >> > > > > > my >> > > > > > > personal life. For others, it is the same. >> > > > > > > >> > > > > > > This is a good place to ask for help but you should never >> demand >> > > it. >> > > > If >> > > > > > you >> > > > > > > want paid support, I know Ted offered this from MapR and I'm >> sure >> > > if >> > > > > you >> > > > > > > went that route, your issues would get addressed very >> quickly. If >> > > you >> > > > > > don't >> > > > > > > want to go that route, then I suggest that you help by >> creating >> > > more >> > > > > > > example data and test cases and focusing on what are the most >> > > > important >> > > > > > > issues that you need to solve. From there, you can continue to >> > > expect >> > > > > > that >> > > > > > > people will help you--as they can. There are no guarantees in >> > open >> > > > > > source. >> > > > > > > Everything comes through the kindness and shared goals of >> those >> > in >> > > > the >> > > > > > > community. >> > > > > > > >> > > > > > > thanks, >> > > > > > > Jacques >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Jacques Nadeau >> > > > > > > CTO and Co-Founder, Dremio >> > > > > > > >> > > > > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter < >> > > > > [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> > > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > Is it at all possible that we are the only company trying to >> > use >> > > > Avro >> > > > > > > with >> > > > > > > > Drill to some serious extent? >> > > > > > > > >> > > > > > > > We continue to coma across all sorts of embarrassing >> > shortcomings >> > > > > like >> > > > > > > the >> > > > > > > > one we are dealing with now where a schema change exception >> is >> > > > thrown >> > > > > > > even >> > > > > > > > when working with a single Avro file (that has the same >> > schema). >> > > > > > > > >> > > > > > > > Can a non project member call for a discussion on this topic >> > and >> > > > the >> > > > > > > level >> > > > > > > > of support that is offered for Avro in Drill? >> > > > > > > > >> > > > > > > > My discussion topics would be: >> > > > > > > > >> > > > > > > > - Strange schema validation that ... : >> > > > > > > > ... currently fails on single file >> > > > > > > > ... prevents dirX variables to work >> > > > > > > > ... would require Drill to scan all Avro files to >> establish >> > > > schema >> > > > > > > (even >> > > > > > > > when pruning would be used) >> > > > > > > > ... would ALWAY fail for old queries if the an old Avro >> > file, >> > > > > > > containing >> > > > > > > > the original fields, was removed and could not be scanned >> > > > > > > > ... does not rhyme with the "eliminate ETL" and "Evolving >> > > > Schema" >> > > > > > > goals >> > > > > > > > of Drill >> > > > > > > > >> > > > > > > > - Simple union types do not work to declare nullable >> fields >> > > > > > > > >> > > > > > > > - Drill can not read Parquet that is created by >> > > parquet-mr-avro >> > > > > > > > >> > > > > > > > - What is the intention for Avro in Drill >> > > > > > > > - Should we select to use some other format to >> buffer/badge >> > > data >> > > > > > > before >> > > > > > > > creating a Parquet file for it? >> > > > > > > > >> > > > > > > > - The culture here regarding talking about boring/hard >> > topics >> > > > like >> > > > > > > this >> > > > > > > > - Where serious complaints/issues are met with silence >> > > > > > > > - I know full well that my frustration shines through >> here >> > and >> > > > > that >> > > > > > it >> > > > > > > > not helping but this Drill+Avro mess is really getting >> too >> > > much >> > > > > for >> > > > > > us >> > > > > > > > to >> > > > > > > > handle >> > > > > > > > >> > > > > > > > Look forward do discuss this here or during the next >> hangout. >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > -Stefán (or ... mr. old & frustrated) >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > -- Sent from my iThing
