Re: Anyway to load certain Key/Value pair fast?

William Kang Tue, 12 Feb 2013 21:53:41 -0800

Hi Harsh,
Thanks for moving the post to the correct list.


William

On Wed, Feb 13, 2013 at 12:29 AM, Harsh J <[email protected]> wrote:
> Please do not use the general@ lists for any user-oriented questions.
> Please redirect them to [email protected] lists, which is where
> the user community and questions lie.
>
> I've moved your post there and have added you on CC in case you
> haven't subscribed there. Please reply back only to the user@
> addresses. The general@ list is for Apache Hadoop project-level
> management and release oriented discussions alone.
>
> On Wed, Feb 13, 2013 at 10:54 AM, William Kang <[email protected]> wrote:
>> Hi All,
>> I am trying to figure out a good solution for such a scenario as following.
>>
>> 1. I have a 2T file (let's call it A), filled by key/value pairs,
>> which is stored in the HDFS with the default 64M block size. In A,
>> each key is less than 1K and each value is about 20M.
>>
>> 2. Occasionally, I will run analysis by using a different type of data
>> (usually less than 10G, and let's call it B) and do look-up table
>> alike operations by using the values in A. B resides in HDFS as well.
>>
>> 3. This analysis would require loading only a small number of values
>> from A (usually less than 1000 of them) into the memory for fast
>> look-up against the data in B. The way B finds the few values in A is
>> by looking up for the key in A.
>>
>> Is there an efficient way to do this?
>>
>> I was thinking if I could identify the locality of the block that
>> contains the few values, I might be able to push the B into the few
>> nodes that contains the few values in A?  Since I only need to do this
>> occasionally, maintaining a distributed database such as HBase cant be
>> justified.
>>
>> Many thanks.
>>
>>
>> Cao
>
>
>
> --
> Harsh J

Re: Anyway to load certain Key/Value pair fast?

Reply via email to