Re: Do we support contatenated/splittable bzip2 files in branch-1?
Hi Hash, Sorry for the a little late response, busy doing some other work these days. I have pasted my test steps and result onto HADOOP-7386, and if the way of my testing is correct, I think concatenated BZip2 file support is implemented and already in branch-1. I also did some sanity testing and confirmed splitting BZip2 support also in branch-1. Please let me know if any comments, thanks. On 4 December 2012 12:07, Harsh J ha...@cloudera.com wrote: Thanks Yu, will appreciate if you can post your observances over https://issues.apache.org/jira/browse/HADOOP-7386. On Mon, Dec 3, 2012 at 9:22 PM, Yu Li car...@gmail.com wrote: Hi Harsh, Thanks a lot for the information! My fault not looking into HADOOP-4012 carefully, will try and veriry whether HADOOP-7823 has resolved the issue on both write and read side, and report back. On 3 December 2012 19:42, Harsh J ha...@cloudera.com wrote: Hi Yu Li, The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus MR support for it, into branch-1, and it is already available in the 1.1.x releases out currently. Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet (AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have fixed it - so can you try and report back? On Mon, Dec 3, 2012 at 3:19 PM, Yu Li car...@gmail.com wrote: Dear all, About splitting support for bzip2, I checked on the JIRA list and found HADOOP-7386 marked as Won't fix; I also found some work done in branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not integrated/migrated into branch-1, so I guess we don't support contatenated bzip2 in branch-1, correct? If so, is there any special reason? Many thanks! -- Best Regards, Li Yu -- Harsh J -- Best Regards, Li Yu -- Harsh J -- Best Regards, Li Yu
About FileBasedGroupMapping provider and Virtual Groups
Hi everyone, Before I open a JIRA, I'd like to know how you like it, a file based group mapping provider. The idea is as follows. 1. Have a new user group mapping provider such as FileBasedGroupMapping, which consumes a mapping file like below: $HADOOP_CONF/groupsMapping.txt: group1:user1,user2 group2:usuer3,user4 groupX:user5 group1 groupY:user6 group2 ... According to this file, the provider will get groups list for the users as: user1-group1,groupX #same for user2 user3-group2,groupY #same for user4 user5-groupX user6-groupY Note for user1, it gets group1 directly as above mapping file; then, since group1 belongs to groupX, user1 must also belong to groupX, so groupX is also user1's group. 2. So what's the benefits 1) It opens a door to role based access control for Hadoop. As you can see, in the mapping file we can define virtual groups (or roles) like groupX, groupY to hold users and other groups. Such virtual groups can just be used as real groups, for example, assign to HDFS file as owner group, assign to MR queue level acl list, or in HBase/Hive, grant them some privileges on databases, tables. 2) It makes it possible that in HDFS allows users from more than one groups to read/write some file/folder while disallows others not to. For example, if we want to allow only user1 plus users in group1, group2 to read/write into /data/secure, we can define a virtual group in the mapping file as secureGroup:user1 group1,group2, then chgrp for the folder to be secureGroup, and chmod for the folder as g+rw. 3) As told above, this makes much sense and not just try to resolve a corner case. As you may know, Hive supports HDFS as backend storage, and role based access control. Using Hive one can create a database and then grant some users/groups/roles with CREATE privilege on it. After that,some granted user (granted directly or via granted group or role) runs a cmd to create table in that database. It can pass the access control check in Hive but still may be failed by HDFS when Hive tries to create a file for the table in the database folder for the user, just due that the user hasn't write permission to the folder! To resolve such issues, we can easily achieve using this provider. 3) Minor but very convinent, we can use this mapping file and provider to define some users, groups for test purpose, when don't want to involve ShellBasedGroupMapping or LdapGroupMapping. Thanks for your feedback! Kai
[jira] [Resolved] (HADOOP-7386) Support concatenated bzip2 files
[ https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7386. - Resolution: Duplicate Thanks for confirming! Resolving as dupe. Support concatenated bzip2 files Key: HADOOP-7386 URL: https://issues.apache.org/jira/browse/HADOOP-7386 Project: Hadoop Common Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Karthik Kambatla HADOOP-6835 added the framework and direct support for concatenated gzip files. We should do the same for bzip files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')
On Fri, Dec 7, 2012 at 5:31 PM, Radim Kolar h...@filez.com wrote: 1. cmake and protoc maven plugins already exists. why you want to write a new ones? This has already been discussed; see https://groups.google.com/forum/?fromgroups=#!topic/cmake-maven-project-users/5FpfUHmg5Ho Actually the situation is even worse than it might seem from that thread, since it turns out that com.googlecode.cmake-maven-project has no support for any platforms but Windows. It also has no support for running native unit tests, which is a big motivation behind HADOOP-8887. 2. Groovy accepts java syntax. Just rewrite saveVersion.sh to java (its done already in JIRA) and put it in pom.xml - no overhaul of build infrastructure needed. Part of the reason for this thread is so that we can come up with a solution for both branch-1 and later branches. This would not be accomplished by putting all the logic into a pom.xml file, since branch-1 doesn't use Maven. best, Colin
[jira] [Created] (HADOOP-9128) MetricsDynamicMBeanBase can cause high cpu load
Nate Putnam created HADOOP-9128: --- Summary: MetricsDynamicMBeanBase can cause high cpu load Key: HADOOP-9128 URL: https://issues.apache.org/jira/browse/HADOOP-9128 Project: Hadoop Common Issue Type: Bug Components: metrics Affects Versions: 0.20.2 Environment: cdh3u4 Reporter: Nate Putnam I noticed high load on some of our Hadoop services. On closer inspection we found several threads that were consuming cpu (80-90% of a core) doing : java.lang.Thread.State: RUNNABLE at java.util.HashMap.get(HashMap.java:303) at org.apache.hadoop.metrics.util.MetricsDynamicMBeanBase.getAttribute(MetricsDynamicMBeanBase.java:135) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) This is a known issue in java when using a non thread safe hash map from multiple threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9129) ViewFs does not validate internal names in the mount table
Chris Nauroth created HADOOP-9129: - Summary: ViewFs does not validate internal names in the mount table Key: HADOOP-9129 URL: https://issues.apache.org/jira/browse/HADOOP-9129 Project: Hadoop Common Issue Type: Bug Components: viewfs Affects Versions: 3.0.0 Reporter: Chris Nauroth Currently, there is no explicit validation of {{ViewFs}} internal names in the mount table during initialization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9130) TestMapReduceChildJVM fails in branch-trunk-win
Chris Nauroth created HADOOP-9130: - Summary: TestMapReduceChildJVM fails in branch-trunk-win Key: HADOOP-9130 URL: https://issues.apache.org/jira/browse/HADOOP-9130 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth The YARN-233 patch for getting YARN working on Windows forgot to include a corresponding change in {{TestMapReduceChildJVM}}, so the test is failing now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira