[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint
[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073229#comment-17073229 ] Zhihua Deng commented on MAPREDUCE-7241: Thanks for reviewing, [~jlowe]! > FileInputFormat listStatus with less memory footprint > - > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.6.1 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Fix For: 3.4.0 > > Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, > MAPREDUCE-7241.05.patch, MAPREDUCE-7241.06.patch, > MAPREDUCE-7241.trunk.02.patch, MAPREDUCE-7241.trunk.patch, filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint
[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated MAPREDUCE-7241: --- Attachment: MAPREDUCE-7241.06.patch > FileInputFormat listStatus with less memory footprint > - > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.6.1 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Fix For: 3.4.0 > > Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, > MAPREDUCE-7241.05.patch, MAPREDUCE-7241.06.patch, > MAPREDUCE-7241.trunk.02.patch, MAPREDUCE-7241.trunk.patch, filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint
[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072768#comment-17072768 ] Hudson commented on MAPREDUCE-7241: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18108 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18108/]) MAPREDUCE-7241. FileInputFormat listStatus with less memory footprint. (jlowe: rev c613296dc85ac7b22c171c84f578106b315cc012) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LocatedFileStatusFetcher.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java > FileInputFormat listStatus with less memory footprint > - > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.6.1 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Fix For: 3.4.0 > > Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, > MAPREDUCE-7241.05.patch, MAPREDUCE-7241.trunk.02.patch, > MAPREDUCE-7241.trunk.patch, filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint
[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Darrell Lowe updated MAPREDUCE-7241: -- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the contribution, [~dengzh]! I committed this to trunk. > FileInputFormat listStatus with less memory footprint > - > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.6.1 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Fix For: 3.4.0 > > Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, > MAPREDUCE-7241.05.patch, MAPREDUCE-7241.trunk.02.patch, > MAPREDUCE-7241.trunk.patch, filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint
[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Darrell Lowe reassigned MAPREDUCE-7241: - Assignee: Zhihua Deng > FileInputFormat listStatus with less memory footprint > - > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.6.1 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, > MAPREDUCE-7241.05.patch, MAPREDUCE-7241.trunk.02.patch, > MAPREDUCE-7241.trunk.patch, filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org