Thanks very much Todd, perfectly clear on both counts. Yeah, as a convention we will only be exposing views to analysts/report-writers/bi-tools (for several reasons), so having as long in underlying tables will only be a concern of pipeline developers.
-m On Fri, Jan 5, 2018 at 3:23 PM, Todd Lipcon <[email protected]> wrote: > Hey Mauricio, > > Answers inline below > > On Fri, Jan 5, 2018 at 2:50 PM, Mauricio Aristizabal < > [email protected]> wrote: > >> Todd, since you bring it up in this thread... what CDH version do you >> expect DECIMAL support to make it into? I recently asked Icaro Vazquez >> about it but still no news. We're hoping it makes it into 5.14 otherwise >> according to the roadmap there might not be another minor release and we'd >> be waiting till Summer for CDH 6. >> > > As this is an open source project mailing list, it would be inappropriate > for me to comment on a vendor's release schedule. Please note that Kudu is > a product of the Apache Software Foundation and the ASF doesn't have any > influence on or knowledge of Cloudera's release plans. > > Of course it happens that I and many other contributors are also employees > of Cloudera, but we participate in the ASF as individuals and not > representatives of our employer, and so generally won't comment on > questions like this in this forum. Please refer to Cloudera's forums for > questions about CDH release plans, etc. > > >> >> And just in case we're forced to make do without DECIMAL initially, is >> the recommendation really to store as string and convert? I was thinking >> of storing as int/long and dividing by 10 or 1000 as needed in an impala >> view over the kudu table. Wouldn't a division be way more performant than >> a conversion from string, especially when aggregating over thousands of >> records in a report query? >> > > You're right -- using an integer type and division by a power of 10 is > going to be much faster than casting from a string. Division by a constant > would be JITted by Impala into a pretty minimal sequence of assembly > instructions (two bitshifts, an integer multiplication, and a subtraction) > which likely take about 6 cycles total. In contrast, a cast from string to > decimal probably takes many thousands of cycles. > > The only downside is that if you have end users using the data they might > be confused by the integer representation whereas a string representation > would be a little clearer. > > Thanks > -Todd > > >> >> On Fri, Jan 5, 2018 at 11:13 AM, Todd Lipcon <[email protected]> wrote: >> >>> Oh, one other piece of feedback: maybe worth editing the title to say >>> "vs Apache Parquet" instead of "vs Apache Impala" since in all cases you >>> are using Impala as the query engine? >>> >>> -Todd >>> >>> On Fri, Jan 5, 2018 at 11:06 AM, Todd Lipcon <[email protected]> wrote: >>> >>>> Hey Boris, >>>> >>>> Thanks for publishing this. It's a great look at how an end user >>>> evaluates Kudu. I appreciate that you cover both the pros and cons of the >>>> technology, and glad to see that your conclusion leaves you excited about >>>> Kudu :) >>>> >>>> One quick note is that I think you'll be even more pleased when you >>>> upgrade to a later version (eg Kudu 1.5). We've improved performance in >>>> several areas and also improved scalability compared to the version you're >>>> testing. TIMESTAMP is also supported now, with DECIMAL soon to follow. It >>>> might be worth noting this as an addendum to the blog post if you feel like >>>> it. >>>> >>>> -Todd >>>> >>>> On Fri, Jan 5, 2018 at 10:51 AM, Boris Tyukin <[email protected]> >>>> wrote: >>>> >>>>> Hi guys, >>>>> >>>>> we just finished testing Kudu, mostly comparing Kudu to Impala on >>>>> HDFS/parquet. I wanted to share my blog post and results. We used typical >>>>> (and real) healthcare data for the test, not a synthetic data which I >>>>> think >>>>> makes it is a bit more interesting. >>>>> >>>>> I welcome any feedback! >>>>> >>>>> http://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/ >>>>> >>>>> We are really impressed with Kudu and I wanted to take an opportunity >>>>> to thank Kudu developers for such an amazing and much-needed product. >>>>> >>>>> Boris >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> >> >> -- >> *MAURICIO ARISTIZABAL* >> Architect - Business Intelligence + Data Science >> [email protected](m)+1 323 309 4260 <(323)%20309-4260> >> 223 E. De La Guerra St. | Santa Barbara, CA 93101 >> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g> >> >> Overview <http://www.impactradius.com/?src=slsap> | Twitter >> <https://twitter.com/impactradius> | Facebook >> <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn >> <https://www.linkedin.com/company/impact-radius-inc-> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- *MAURICIO ARISTIZABAL* Architect - Business Intelligence + Data Science [email protected](m)+1 323 309 4260 223 E. De La Guerra St. | Santa Barbara, CA 93101 Overview <http://www.impactradius.com/?src=slsap> | Twitter <https://twitter.com/impactradius> | Facebook <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn <https://www.linkedin.com/company/impact-radius-inc->
