Using key filter on a big bucket could cause performance problems. On Jul 25, 2012 9:53 PM, "Andrew Kondratovich" < [email protected]> wrote:
> Yeap.. half a thousand requests to riak isn't cool =( I'm looking some > strategy of storing data so that i could fetch all items by 1 request. > > I could use index MR at time and filter results at map phase. I could use > special keys with from data and use key filters (with time filtering at map > phase)... I wish I could use several 2i at MR or combine 2i with > keyfilters, or perform MR on buckets... I wish... =) > > On Wed, Jul 25, 2012 at 5:35 PM, Andres Jaan Tack < > [email protected]> wrote: > >> Is that a realistic strategy for low latency requirements? Imagine this >> were some web service, and people generate this query at some reasonable >> frequency. >> >> (not that I know what Andrew is looking for, exactly) >> >> >> 2012/7/25 Yousuf Fauzan <[email protected]> >> >>> Since 500 is not that big a number, I think you can run that many M/Rs >>> with each emitting only records having "time" greater than specified. Input >>> would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>} >>> >>> If you decide to split the data into separate buckets based on "from" >>> field, input would be {index, <<"from_field_value">>, <<"time_bin">>, >>> <<"time_low">>, <<"time_high">>} >>> >>> >>> -- >>> Yousuf >>> >>> On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich < >>> [email protected]> wrote: >>> >>>> Hello, Yousuf. >>>> >>>> Thanks for your reply. >>>> >>>> We have several millions of items. It's about 10 000 of unique 'from' >>>> fields (about 1000 items for each). Usually, we need to get items for about >>>> 500 'from' identifiers with 'time' limit (about 5% of items is >>>> corresponding). >>>> >>>> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan >>>> <[email protected]>wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> First of all, the correct answer to your question is the proverbial >>>>> "it depends". Having said that, here is what I could do in your case >>>>> >>>>> 1. If there are enough data points with the same "from" field, I will >>>>> make it a bucket and then index on time. >>>>> 2. If the above is not true, I will index on "from" and "time" field. >>>>> a. If number of records where "time" is greater than the one your >>>>> require is small, I will run a map/reduce with the initial input as those >>>>> records. >>>>> b. If number of records having a particular "from" is small, I >>>>> will do the above with the initial input as records having that "from" >>>>> field. This could be a problem as Riak only supports range and exact >>>>> queries so if you want to query multiple identifiers, you will have to run >>>>> multiple queries. >>>>> In both the above cases, I will use secondary indexes to get the >>>>> initial records. >>>>> Note that we are using M/R as Riak does not support querying by >>>>> multiple indexes. >>>>> >>>>> What I would also suggest is to partition your data into different >>>>> buckets. You will need to understand the queries that you will be >>>>> supporting and partition it accordingly. >>>>> >>>>> -- >>>>> Yousuf >>>>> >>>>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich < >>>>> [email protected]> wrote: >>>>> >>>>>> Good afternoon. >>>>>> >>>>>> I am considering several storage solutions for my project, and now I >>>>>> look at Riak. >>>>>> We work with the following pattern of data: >>>>>> { >>>>>> time: unixtime >>>>>> from: int >>>>>> data: binary >>>>>> ... >>>>>> } >>>>>> >>>>>> The amount of data is about several millions items for now, but it's >>>>>> growing. It is necessary to handle the folloring requests: for a list of >>>>>> identifiers (about 500 items) return all records where id = from and time >>>>>> greater than a certain value. >>>>>> >>>>>> How to store such data and to effectively handle such requests with >>>>>> the Riak? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> -- >>>>>> Andrew Kondratovich >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> riak-users mailing list >>>>>> [email protected] >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Andrew Kondratovich >>>> >>>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > > > -- > Andrew Kondratovich > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
