Re: How to store data

Yousuf Fauzan Wed, 25 Jul 2012 10:54:07 -0700

Using key filter on a big bucket could cause performance problems.
On Jul 25, 2012 9:53 PM, "Andrew Kondratovich" <
[email protected]> wrote:


> Yeap.. half a thousand requests to riak isn't cool =( I'm looking some
> strategy of storing data so that i could fetch all items by 1 request.
>
> I could use index MR at time and filter results at map phase. I could use
> special keys with from data and use key filters (with time filtering at map
> phase)... I wish I could use several 2i at MR or combine 2i with
> keyfilters, or perform MR on buckets... I wish... =)
>
> On Wed, Jul 25, 2012 at 5:35 PM, Andres Jaan Tack <
> [email protected]> wrote:
>
>> Is that a realistic strategy for low latency requirements? Imagine this
>> were some web service, and people generate this query at some reasonable
>> frequency.
>>
>> (not that I know what Andrew is looking for, exactly)
>>
>>
>> 2012/7/25 Yousuf Fauzan <[email protected]>
>>
>>> Since 500 is not that big a number, I think you can run that many M/Rs
>>> with each emitting only records having "time" greater than specified. Input
>>> would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>}
>>>
>>> If you decide to split the data into separate buckets based on "from"
>>> field, input would be {index, <<"from_field_value">>, <<"time_bin">>,
>>> <<"time_low">>, <<"time_high">>}
>>>
>>>
>>> --
>>> Yousuf
>>>
>>> On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich <
>>> [email protected]> wrote:
>>>
>>>> Hello,  Yousuf.
>>>>
>>>> Thanks for your reply.
>>>>
>>>> We have several millions of items. It's about 10 000 of unique 'from'
>>>> fields (about 1000 items for each). Usually, we need to get items for about
>>>> 500 'from' identifiers with 'time' limit (about 5% of items is
>>>> corresponding).
>>>>
>>>> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan 
>>>> <[email protected]>wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> First of all, the correct answer to your question is the proverbial
>>>>> "it depends". Having said that, here is what I could do in your case
>>>>>
>>>>> 1. If there are enough data points with the same "from" field, I will
>>>>> make it a bucket and then index on time.
>>>>> 2. If the above is not true, I will index on "from" and "time" field.
>>>>>     a. If number of records where "time" is greater than the one your
>>>>> require is small, I will run a map/reduce with the initial input as those
>>>>> records.
>>>>>     b. If number of records having a particular "from" is small, I
>>>>> will do the above with the initial input as records having that "from"
>>>>> field. This could be a problem as Riak only supports range and exact
>>>>> queries so if you want to query multiple identifiers, you will have to run
>>>>> multiple queries.
>>>>>     In both the above cases, I will use secondary indexes to get the
>>>>> initial records.
>>>>>     Note that we are using M/R as Riak does not support querying by
>>>>> multiple indexes.
>>>>>
>>>>> What I would also suggest is to partition your data into different
>>>>> buckets. You will need to understand the queries that you will be
>>>>> supporting and partition it accordingly.
>>>>>
>>>>> --
>>>>> Yousuf
>>>>>
>>>>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Good afternoon.
>>>>>>
>>>>>> I am considering several storage solutions for my project, and now I
>>>>>> look at Riak.
>>>>>> We work with the following pattern of data:
>>>>>> {
>>>>>>   time: unixtime
>>>>>>   from: int
>>>>>>   data: binary
>>>>>>   ...
>>>>>> }
>>>>>>
>>>>>> The amount of data is about several millions items for now, but it's
>>>>>> growing. It is necessary to handle the folloring requests: for a list of
>>>>>> identifiers (about 500 items) return all records where id = from and time
>>>>>> greater than a certain value.
>>>>>>
>>>>>> How to store such data and to effectively handle such requests with
>>>>>> the Riak?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> --
>>>>>> Andrew Kondratovich
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Andrew Kondratovich
>>>>
>>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
>
> --
> Andrew Kondratovich
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How to store data

Reply via email to