what kind of disc using , sas or sata ? how much cpu for system /user? also can using jstack to check what is the map are doing ? whether too much map stared in one node? ________________________________ Wangwenli
From: Bulvik, Noam<mailto:[email protected]> Date: 2015-01-07 21:29 To: [email protected]<mailto:[email protected]> Subject: RE: high CPU when using bulk loading Only when doing bulk loading and only during mapping phase -----Original Message----- From: Puneet Kumar Ojha [[email protected]] Received: רביעי, 07 ינו 2015, 15:03 To: [email protected] [[email protected]] Subject: RE: high CPU when using bulk loading Is the CPU usage 100% all the time OR only while doing bulk loading? From: Bulvik, Noam [mailto:[email protected]] Sent: Wednesday, January 07, 2015 6:26 PM To: [email protected] Subject: high CPU when using bulk loading Hi, We are tuning our system for bulk loading. We managed to load ~250M records per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR bulk loading tool with pre split table and salted key. What we currently see is that while Mappers are working we have 100% CPU usage across the cluster. It was our impression that the mapper will be I/O bound and not so much CPU intensive Any idea what else can we tune /check. Regards Noam Information in this e-mail and its attachments is confidential and privileged under the TEOCO confidentiality terms that can be reviewed here<http://www.teoco.com/email-disclaimer>.
