Thanks for sharing, Dan. The diagrams explained clearly how the current system works.
As for things in my mind. Take the schema of <host,metric,time,...>, say, I am interested in data for the past 5 mins, 10 mins, etc. Or, aggregate at 5 mins interval for the past 3 days, 7 days, ... Looks like I need to introduce a special 5-min bar column, use that column to do range partition to spread data across the tablet servers so that I could leverage parallel filtering. The cost of this extra column (INT8) is not ideal but not too bad either (storage cost wise, compression should do wonders). So I am thinking whether it would be better to take "functions" as row split instead of only constants. Of course if business requires to drop down to 1-min bar, the data has to be re-sharded again. So a more cost effective way of doing this on a production cluster would be good. On Sat, May 7, 2016 at 8:50 AM, Dan Burkert <d...@cloudera.com> wrote: > Hi Sand, > > I've been working on some diagrams to help explain some of the more > advanced partitioning types, it's attached. Still pretty rough at this > point, but the goal is to clean it up and move it into the Kudu > documentation proper. I'm interested to hear what kind of time series you > are interested in Kudu for. I'm tasked with improving Kudu for time > series, you can follow progress here > <https://issues.apache.org/jira/browse/KUDU-1306>. If you have any > additional ideas I'd love to hear them. You may also be interested in a > small project that a JD and I have been working on in the past week to > build an OpenTSDB style store on top of Kudu, you can find it here > <https://github.com/danburkert/kudu-ts>. Still quite feature limited at > this point. > > - Dan > > On Fri, May 6, 2016 at 4:51 PM, Sand Stone <sand.m.st...@gmail.com> wrote: > >> Thanks. Will read. >> >> Given that I am researching time series data, row locality is crucial :-) >> >> >> On Fri, May 6, 2016 at 3:57 PM, Jean-Daniel Cryans <jdcry...@apache.org> >> wrote: >> >>> We do have non-covering range partitions coming in the next few months, >>> here's the design (in review): >>> http://gerrit.cloudera.org:8080/#/c/2772/9/docs/design-docs/non-covering-range-partitions.md >>> >>> The "Background & Motivation" section should give you a good idea of why >>> I'm mentioning this. >>> >>> Meanwhile, if you don't need row locality, using hash partitioning could >>> be good enough. >>> >>> J-D >>> >>> On Fri, May 6, 2016 at 3:53 PM, Sand Stone <sand.m.st...@gmail.com> >>> wrote: >>> >>>> Makes sense. >>>> >>>> Yeah it would be cool if users could specify/control the split rows >>>> after the table is created. Now, I have to "think ahead" to pre-create the >>>> range buckets. >>>> >>>> On Fri, May 6, 2016 at 3:49 PM, Jean-Daniel Cryans <jdcry...@apache.org >>>> > wrote: >>>> >>>>> You will only get 1 tablet and no data distribution, which is bad. >>>>> >>>>> That's also how HBase works, but it will split regions as you insert >>>>> data and eventually you'll get some data distribution even if it doesn't >>>>> start in an ideal situation. Tablet splitting will come later for Kudu. >>>>> >>>>> J-D >>>>> >>>>> On Fri, May 6, 2016 at 3:42 PM, Sand Stone <sand.m.st...@gmail.com> >>>>> wrote: >>>>> >>>>>> One more questions, how does the range partition work if I don't >>>>>> specify the split rows? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On Fri, May 6, 2016 at 3:37 PM, Sand Stone <sand.m.st...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks, Misty. The "advanced" impala example helped. >>>>>>> >>>>>>> I was just reading the Java API,CreateTableOptions.java, it's >>>>>>> unclear how the range partition column names associated with the partial >>>>>>> rows params in the addSplitRow API. >>>>>>> >>>>>>> On Fri, May 6, 2016 at 3:08 PM, Misty Stanley-Jones < >>>>>>> mstanleyjo...@cloudera.com> wrote: >>>>>>> >>>>>>>> Hi Sand, >>>>>>>> >>>>>>>> Please have a look at >>>>>>>> http://getkudu.io/docs/kudu_impala_integration.html#partitioning_tables >>>>>>>> and see if it is helpful to you. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Misty >>>>>>>> >>>>>>>> On Fri, May 6, 2016 at 2:00 PM, Sand Stone <sand.m.st...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, I am new to Kudu. I wonder how the split rows work. I know >>>>>>>>> from some docs, this is currently for pre-creation the table. I am >>>>>>>>> researching how to partition (hash+range) some time series test data. >>>>>>>>> >>>>>>>>> Is there an example? or notes somewhere I could read upon. >>>>>>>>> >>>>>>>>> Thanks much. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >