Can you try storing your file as bytes instead of String. I can't think of any reason why this would require 6 GB heap space. Can you explain your use-case that might help some alternatives if you are interested.
Regards Prav On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <[email protected]>wrote: > Yes it is in setup method, Just I am reading the file which is stored at > hdfs > > > On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote: > >> And I am guessing you are not doing this inside map() method right, its >> in setup() method ? >> >> >> On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <[email protected]>wrote: >> >>> private Map<String,String> mapData = new ConcurrentHashMap<String, >>> String>(11000000); >>> FileInputStream fis = new FileInputStream(file); >>> GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream); >>> BufferedReader bufferedReader = new BufferedReader(new InputStreamReader( >>> gzipInputStream)); >>> String line = null; >>> while((line=bufferedReader.readLine())!=null) >>> { >>> String data[] = line.split("\t"); >>> mapData.put(data[0],data[1]); >>> } >>> >>> On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote: >>> >>>> Can you please share your code snippet. Just want to see how are you >>>> loading your file into mapper ? >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde >>>> <[email protected]>wrote: >>>> >>>>> Thanks For your reply, >>>>> >>>>> Harsh, >>>>> >>>>> I tried THashMap but land up in same issue. >>>>> >>>>> David, >>>>> >>>>> I tried map side join and cascading approach, but time taken by them >>>>> is lot. >>>>> >>>>> On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have use case where I am loading 200 MB file with 11 million >>>>>> records, (one record length is 12 ). Into map, so while running the >>>>>> hadoop >>>>>> job, i can quickly get value for the key from each input record in >>>>>> mapper. >>>>>> >>>>>> Such a small file but to load the data into map, i have to allocate >>>>>> the 6 GB heap for the same. when i run small code to load this file on >>>>>> standalone application, it requires 2 GB memory. >>>>>> >>>>>> I dont understand why hadoop required 6GB to load the data into >>>>>> memory. Hadoop Job Runs fine after that but number of mappers i can run >>>>>> is >>>>>> 2. I need to get it done this in 2-3 GB only so i can run ateast 8-9 >>>>>> mappers per node. >>>>>> >>>>>> I have created gzip file(which is now only 17MB). I have kept the >>>>>> file on HDFS. Using HDFS API to read the file and loading the data into >>>>>> map. Block size is 128 MB. Cloudera Hadoop. >>>>>> >>>>>> Any help or alternate approaches to load data into memory with >>>>>> minimum heap size. So i can run many mappers with 2-3 gb memory allocated >>>>>> to each. >>>>>> >>>>>> Thanks >>>>>> >>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "CDH Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> For more options, visit https://groups.google.com/a/cl >>>>> oudera.org/d/optout. >>>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "CDH Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/a/ >>> cloudera.org/d/optout. >>> >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "CDH Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/a/cloudera.org/d/optout. >
