Re: How to store data

Andrew Kondratovich Wed, 25 Jul 2012 09:23:06 -0700

Yeap.. half a thousand requests to riak isn't cool =( I'm looking some
strategy of storing data so that i could fetch all items by 1 request.


I could use index MR at time and filter results at map phase. I could use
special keys with from data and use key filters (with time filtering at map
phase)... I wish I could use several 2i at MR or combine 2i with
keyfilters, or perform MR on buckets... I wish... =)

On Wed, Jul 25, 2012 at 5:35 PM, Andres Jaan Tack <[email protected]
> wrote:

> Is that a realistic strategy for low latency requirements? Imagine this
> were some web service, and people generate this query at some reasonable
> frequency.
>
> (not that I know what Andrew is looking for, exactly)
>
>
> 2012/7/25 Yousuf Fauzan <[email protected]>
>
>> Since 500 is not that big a number, I think you can run that many M/Rs
>> with each emitting only records having "time" greater than specified. Input
>> would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>}
>>
>> If you decide to split the data into separate buckets based on "from"
>> field, input would be {index, <<"from_field_value">>, <<"time_bin">>,
>> <<"time_low">>, <<"time_high">>}
>>
>>
>> --
>> Yousuf
>>
>> On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich <
>> [email protected]> wrote:
>>
>>> Hello,  Yousuf.
>>>
>>> Thanks for your reply.
>>>
>>> We have several millions of items. It's about 10 000 of unique 'from'
>>> fields (about 1000 items for each). Usually, we need to get items for about
>>> 500 'from' identifiers with 'time' limit (about 5% of items is
>>> corresponding).
>>>
>>> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan 
>>> <[email protected]>wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> First of all, the correct answer to your question is the proverbial "it
>>>> depends". Having said that, here is what I could do in your case
>>>>
>>>> 1. If there are enough data points with the same "from" field, I will
>>>> make it a bucket and then index on time.
>>>> 2. If the above is not true, I will index on "from" and "time" field.
>>>>     a. If number of records where "time" is greater than the one your
>>>> require is small, I will run a map/reduce with the initial input as those
>>>> records.
>>>>     b. If number of records having a particular "from" is small, I will
>>>> do the above with the initial input as records having that "from" field.
>>>> This could be a problem as Riak only supports range and exact queries so if
>>>> you want to query multiple identifiers, you will have to run multiple
>>>> queries.
>>>>     In both the above cases, I will use secondary indexes to get the
>>>> initial records.
>>>>     Note that we are using M/R as Riak does not support querying by
>>>> multiple indexes.
>>>>
>>>> What I would also suggest is to partition your data into different
>>>> buckets. You will need to understand the queries that you will be
>>>> supporting and partition it accordingly.
>>>>
>>>> --
>>>> Yousuf
>>>>
>>>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich <
>>>> [email protected]> wrote:
>>>>
>>>>> Good afternoon.
>>>>>
>>>>> I am considering several storage solutions for my project, and now I
>>>>> look at Riak.
>>>>> We work with the following pattern of data:
>>>>> {
>>>>>   time: unixtime
>>>>>   from: int
>>>>>   data: binary
>>>>>   ...
>>>>> }
>>>>>
>>>>> The amount of data is about several millions items for now, but it's
>>>>> growing. It is necessary to handle the folloring requests: for a list of
>>>>> identifiers (about 500 items) return all records where id = from and time
>>>>> greater than a certain value.
>>>>>
>>>>> How to store such data and to effectively handle such requests with
>>>>> the Riak?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> Andrew Kondratovich
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> [email protected]
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Andrew Kondratovich
>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>


-- 
Andrew Kondratovich

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How to store data

Reply via email to