Yeap.. half a thousand requests to riak isn't cool =( I'm looking some strategy of storing data so that i could fetch all items by 1 request.
I could use index MR at time and filter results at map phase. I could use special keys with from data and use key filters (with time filtering at map phase)... I wish I could use several 2i at MR or combine 2i with keyfilters, or perform MR on buckets... I wish... =) On Wed, Jul 25, 2012 at 5:35 PM, Andres Jaan Tack <[email protected] > wrote: > Is that a realistic strategy for low latency requirements? Imagine this > were some web service, and people generate this query at some reasonable > frequency. > > (not that I know what Andrew is looking for, exactly) > > > 2012/7/25 Yousuf Fauzan <[email protected]> > >> Since 500 is not that big a number, I think you can run that many M/Rs >> with each emitting only records having "time" greater than specified. Input >> would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>} >> >> If you decide to split the data into separate buckets based on "from" >> field, input would be {index, <<"from_field_value">>, <<"time_bin">>, >> <<"time_low">>, <<"time_high">>} >> >> >> -- >> Yousuf >> >> On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich < >> [email protected]> wrote: >> >>> Hello, Yousuf. >>> >>> Thanks for your reply. >>> >>> We have several millions of items. It's about 10 000 of unique 'from' >>> fields (about 1000 items for each). Usually, we need to get items for about >>> 500 'from' identifiers with 'time' limit (about 5% of items is >>> corresponding). >>> >>> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan >>> <[email protected]>wrote: >>> >>>> Hi Andrew, >>>> >>>> First of all, the correct answer to your question is the proverbial "it >>>> depends". Having said that, here is what I could do in your case >>>> >>>> 1. If there are enough data points with the same "from" field, I will >>>> make it a bucket and then index on time. >>>> 2. If the above is not true, I will index on "from" and "time" field. >>>> a. If number of records where "time" is greater than the one your >>>> require is small, I will run a map/reduce with the initial input as those >>>> records. >>>> b. If number of records having a particular "from" is small, I will >>>> do the above with the initial input as records having that "from" field. >>>> This could be a problem as Riak only supports range and exact queries so if >>>> you want to query multiple identifiers, you will have to run multiple >>>> queries. >>>> In both the above cases, I will use secondary indexes to get the >>>> initial records. >>>> Note that we are using M/R as Riak does not support querying by >>>> multiple indexes. >>>> >>>> What I would also suggest is to partition your data into different >>>> buckets. You will need to understand the queries that you will be >>>> supporting and partition it accordingly. >>>> >>>> -- >>>> Yousuf >>>> >>>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich < >>>> [email protected]> wrote: >>>> >>>>> Good afternoon. >>>>> >>>>> I am considering several storage solutions for my project, and now I >>>>> look at Riak. >>>>> We work with the following pattern of data: >>>>> { >>>>> time: unixtime >>>>> from: int >>>>> data: binary >>>>> ... >>>>> } >>>>> >>>>> The amount of data is about several millions items for now, but it's >>>>> growing. It is necessary to handle the folloring requests: for a list of >>>>> identifiers (about 500 items) return all records where id = from and time >>>>> greater than a certain value. >>>>> >>>>> How to store such data and to effectively handle such requests with >>>>> the Riak? >>>>> >>>>> Thanks. >>>>> >>>>> -- >>>>> Andrew Kondratovich >>>>> >>>>> >>>>> _______________________________________________ >>>>> riak-users mailing list >>>>> [email protected] >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>> >>> >>> >>> -- >>> Andrew Kondratovich >>> >>> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > -- Andrew Kondratovich
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
