Re: Hadoop Takes 6GB Memory to run one mapper

praveenesh kumar Tue, 25 Mar 2014 06:39:48 -0700

Can you try storing your file as bytes instead of String. I can't think of
any reason why this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are
interested.


Regards
Prav


On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde
<[email protected]>wrote:

> Yes it is in setup method, Just I am reading the file which is stored at
> hdfs
>
>
> On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
>
>> And I am guessing you are not doing this inside map() method right, its
>> in setup() method ?
>>
>>
>> On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <[email protected]>wrote:
>>
>>> private Map<String,String> mapData = new ConcurrentHashMap<String,
>>> String>(11000000);
>>> FileInputStream fis = new FileInputStream(file);
>>> GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
>>> BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
>>> gzipInputStream));
>>> String line = null;
>>> while((line=bufferedReader.readLine())!=null)
>>> {
>>> String data[] = line.split("\t");
>>> mapData.put(data[0],data[1]);
>>> }
>>>
>>> On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
>>>
>>>> Can you please share your code snippet. Just want to see how are you
>>>> loading your file into mapper ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde 
>>>> <[email protected]>wrote:
>>>>
>>>>> Thanks For your reply,
>>>>>
>>>>> Harsh,
>>>>>
>>>>> I tried THashMap but land up in same issue.
>>>>>
>>>>> David,
>>>>>
>>>>> I tried map side join and cascading approach, but time taken by them
>>>>> is lot.
>>>>>
>>>>> On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have use case where I am loading 200 MB file with 11 million
>>>>>> records, (one record length is 12 ). Into map, so while running the 
>>>>>> hadoop
>>>>>> job, i can quickly get value for the key from each input record in 
>>>>>> mapper.
>>>>>>
>>>>>> Such a small file but to load the data into map, i have to allocate
>>>>>> the 6 GB heap for the same. when i run small code to load this file on
>>>>>> standalone application, it requires 2 GB memory.
>>>>>>
>>>>>> I dont understand why hadoop required 6GB to load the data into
>>>>>> memory. Hadoop Job Runs fine after that but number of mappers i can run 
>>>>>> is
>>>>>> 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9
>>>>>> mappers per node.
>>>>>>
>>>>>> I have created gzip file(which is now only 17MB). I have kept the
>>>>>> file on HDFS. Using HDFS API to read the file and loading the data into
>>>>>> map. Block size is 128 MB. Cloudera Hadoop.
>>>>>>
>>>>>> Any help or alternate approaches to load data into memory with
>>>>>> minimum heap size. So i can run many mappers with 2-3 gb memory allocated
>>>>>> to each.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>  --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "CDH Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/a/cl
>>>>> oudera.org/d/optout.
>>>>>
>>>>
>>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "CDH Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/a/
>>> cloudera.org/d/optout.
>>>
>>
>>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.
>

Re: Hadoop Takes 6GB Memory to run one mapper

Reply via email to