This is all great news. Seeing movement, and seeing it articulated, at a higher level, what has led to the discrepancies in Avro is also very helpful (thanks Paul).
On Sat, Aug 19, 2017 at 6:29 PM, Stefán Baxter <[email protected]> wrote: > Thank you Saurabh, > > I was not really expecting a constructive reply to my previous email, was > appreciated. > > I guess some old frustration got the better of me. > > All the best, > -Stefan > > On Fri, Aug 18, 2017 at 10:55 PM, Saurabh Mahapatra < > [email protected]> wrote: > > > Thank you for this candid feedback, Stefan. The fact that you even > decided > > to write an email offering this feedback despite moving away from Drill > > just suggests to me that you are still a supporter. We need all the help > > that we can get from every member in this community to make Drill provide > > value to all users that include you. > > > > I am new to the community but I have looked at your emails where your > past > > attempts to doing this have not taken you anywhere. We have to change > that. > > > > We cannot undo the past as far as addressing your needs are concerned > but I > > want to assure you that we are bringing reform to the community in > general. > > The stakeholders who are impacted by Drill have increased beyond the > small > > group that existed a couple of years ago. So be rest assured that you > have > > a voice here. > > > > I think the biggest challenge we have in the community is that there are > > users who could get a lot of value if some work was done to support > > integrations. I know for sure that there are many developers who would > love > > to participate in this community and do the work for a modest fee. It > helps > > them get interested in the project, helps them provide support beyond > just > > the open source aspect and also helps users such as you to get the value > > that you need where you need it. > > > > Please let me know if you would be willing to pursue that route. > > > > On the Avro front, I do hear a lot of users asking for it but I hear a > lot > > more requests on Parquet. Plus, there are core issues in Drill that needs > > to be addressed first. The community is definitely trying to prioritize > > given what we have. But we do not have to feel constrained. We can get > more > > developers to participate in this and help out. And I am very positive > > about that approach-I know that I helped a user here to get help on using > > Apache Drill inside a commercial setting where there asks were very > > specific. > > > > Those are my thoughts but please do not give up on us. Your critical > > feedback may not sound nice to the ears but is exactly the kind of > feedback > > that will make this project truly successful. > > > > Best, > > Saurabh > > > > > > > > I > > > > On Fri, Aug 18, 2017 at 1:42 PM, Stefán Baxter < > [email protected]> > > wrote: > > > > > Hi John, > > > > > > Love Drill but we no longer use it in production as our main query > tool. > > > > > > I do have a fairly long list of pet peeves but I also have a long list > of > > > features that I love and would not want to be without. > > > > > > In my opinion it's time for Drill to decide where its commitment lies > > > regarding evolving schema and ETL elimination and if it wants to be > > > something more than a cogil in a Hadoop distribution wheel or an effort > > > some see as a way to their startup stardom. > > > > > > There is no denying the great effect it has had and its usefulness > (Arrow > > > also making waves now). I am, as I have been, just frustrated by > > > shortcomings I feel are not addressed because they are addressed else > > where > > > (where the true loyalties lie) > > > > > > I can name a few (I have not upgraded to 1.11): > > > - Empty values still default to double for partial/segment lists > which > > > triggers all sorts of problems (no attempt is made to convert values > to > > > lowest common denominator (string)) > > > - Two NullableX values both containing nothing (Null) still produce > > > schema change errors instead of waiting for a type to become apparent > > > - Syntax error reporting is terrible > > > - Schema change reporting is almost absent > > > - Avro schema is fixed/strict even though text formats support > > > evolving/variable schema (With all sorts of side effects) > > > - Avro still does not support dirN > > > > > > and so many more things (not to mention the politics and the defensive > > > attitude when trying to address shortcomings). > > > > > > My only regret here is that I never had proper resources to contribute > a > > > fix to some of these. > > > > > > All the best, > > > -Stefán > > > > > > On Thu, Aug 17, 2017 at 2:20 PM, Charles Givre <[email protected]> > wrote: > > > > > > > I’m not an Avro user, but I’d definitely vote for improving this. > > > > — C > > > > > > > > > On Aug 17, 2017, at 10:17, John Omernik <[email protected]> wrote: > > > > > > > > > > I was guessing you would chime in with a response ;) > > > > > > > > > > Are you still using Drill w/ Avro how has things been lately? > > > > > > > > > > On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter < > > > > [email protected]> > > > > > wrote: > > > > > > > > > >> woha!!! > > > > >> > > > > >> > > > > >> (sorry, I just had to) > > > > >> > > > > >> > > > > >> Best of luck with that! > > > > >> > > > > >> Regards, > > > > >> -Stefán > > > > >> > > > > >> On Thu, Aug 17, 2017 at 12:37 PM, John Omernik <[email protected]> > > > > wrote: > > > > >> > > > > >>> I know Avro is the unwanted child of the Drill world. (I know > > others > > > > have > > > > >>> tried to mature the Avro support and that has been something that > > > still > > > > >> is > > > > >>> in a "experiemental" state. > > > > >>> > > > > >>> That said, isn't it time for us to clean it up? > > > > >>> > > > > >>> I am sure I there are some open JIRAs out there, (last Doc update > > on > > > > the > > > > >>> Avro Page, Nov 21, 2016) points to this > > > > >>> https://issues.apache.org/jira/browse/DRILL/component/ > > > > >>> 12328941/?selectedTab=com.atlassian.jira.jira-projects- > > > > >>> plugin:component-summary-panel > > > > >>> > > > > >>> And I just ran into a issue... I am going to run it by here to > see > > if > > > > >> it's > > > > >>> JIRA worthy or known: > > > > >>> > > > > >>> I have two directories, one json (brodns) and one avro > (brodnsavro) > > > > >>> > > > > >>> The both have subdirectories that are YYYY-MM-DD dates. > > > > >>> > > > > >>> Where I run > > > > >>> > > > > >>> select dir0, count(*) from `brodns` group by dir0 - This works > > > great! > > > > >>> > > > > >>> when I run > > > > >>> > > > > >>> select dir0, count(*) from `brodnsavro` group by dir0 - I get: > > > > >>> > > > > >>> VALIDATION ERROR: From line 1, column 58 to line 1, column 61: > > Column > > > > >>> 'dir0' not found in any table > > > > >>> > > > > >>> > > > > >>> If I run > > > > >>> > > > > >>> > > > > >>> select count(*) from `brodnsavro/2017-08-17` this works > > > > >>> > > > > >>> if I run > > > > >>> > > > > >>> > > > > >>> select count(*) from `brodnsavro` this also works > > > > >>> > > > > >>> > > > > >>> But dir0 doesn't appear to be applied to Avro. > > > > >>> > > > > >>> > > > > >>> > > > > >>> I really feel this should be consistent (in addition to fixing > the > > > > >>> other issues in Avro) and lets make Avro o a > > > > >>> > > > > >>> first class citizen of the Drill world. > > > > >>> > > > > >>> > > > > >>> (If folks are interested, I'd be happy to discuss my use case, it > > > > >> involves > > > > >>> > > > > >>> applying a schema to json records on kafka/maprstreams in > > streamsets, > > > > and > > > > >>> then > > > > >>> > > > > >>> outputting to avro files... from there I hope to convert to > > parquet, > > > > but > > > > >>> > > > > >>> don't want to use mapreduce, hence drill! > > > > >>> > > > > >>> ) > > > > >>> > > > > >> > > > > > > > > > > > > > >
