Re: Do we support contatenated/splittable bzip2 files in branch-1?

2012-12-10 Thread Yu Li
Hi Hash,

Sorry for the a little late response, busy doing some other work these
days. I have pasted my test steps and result onto HADOOP-7386, and if the
way of my testing is correct, I think concatenated BZip2 file support is
implemented and already in branch-1. I also did some sanity testing and
confirmed splitting BZip2 support also in branch-1. Please let me know if
any comments, thanks.

On 4 December 2012 12:07, Harsh J ha...@cloudera.com wrote:

 Thanks Yu, will appreciate if you can post your observances over
 https://issues.apache.org/jira/browse/HADOOP-7386.

 On Mon, Dec 3, 2012 at 9:22 PM, Yu Li car...@gmail.com wrote:
  Hi Harsh,
 
  Thanks a lot for the information!
 
  My fault not looking into HADOOP-4012 carefully, will try and veriry
  whether HADOOP-7823 has resolved the issue on both write and read side,
 and
  report back.
 
  On 3 December 2012 19:42, Harsh J ha...@cloudera.com wrote:
 
  Hi Yu Li,
 
  The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus
  MR support for it, into branch-1, and it is already available in the
  1.1.x releases out currently.
 
  Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet
  (AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have
  fixed it - so can you try and report back?
 
  On Mon, Dec 3, 2012 at 3:19 PM, Yu Li car...@gmail.com wrote:
   Dear all,
  
   About splitting support for bzip2, I checked on the JIRA list and
 found
   HADOOP-7386 marked as Won't fix; I also found some work done in
   branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not
   integrated/migrated into branch-1, so I guess we don't support
  contatenated
   bzip2 in branch-1, correct? If so, is there any special reason? Many
  thanks!
  
   --
   Best Regards,
   Li Yu
 
 
 
  --
  Harsh J
 
 
 
 
  --
  Best Regards,
  Li Yu



 --
 Harsh J




-- 
Best Regards,
Li Yu


About FileBasedGroupMapping provider and Virtual Groups

2012-12-10 Thread Zheng, Kai
Hi everyone,

Before I open a JIRA, I'd like to know how you like it, a file based group 
mapping provider. The idea is as follows.
1. Have a new user group mapping provider such as FileBasedGroupMapping, which 
consumes a mapping file like below:
$HADOOP_CONF/groupsMapping.txt:
group1:user1,user2
group2:usuer3,user4
groupX:user5 group1
groupY:user6 group2
...
According to this file, the provider will get groups list for the users as:
user1-group1,groupX #same for user2
user3-group2,groupY #same for user4
user5-groupX
user6-groupY
Note for user1, it gets group1 directly as above mapping file; then, since 
group1 belongs to groupX, 
user1 must also belong to groupX, so groupX is also user1's group.

2. So what's the benefits
1) It opens a door to role based access control for Hadoop. As you can see, in 
the mapping
file we can define virtual groups (or roles) like groupX, groupY to hold users 
and other groups. Such virtual groups can just be used 
as real groups, for example, assign to HDFS file as owner group, assign to MR 
queue level acl list, or in HBase/Hive, grant them some 
privileges on databases, tables.
2) It makes it possible that in HDFS allows users from more than one groups to 
read/write some file/folder while disallows 
others not to. For example, if we want to allow only user1 plus users in 
group1, group2 to read/write into /data/secure, we can define
a virtual group in the mapping file as secureGroup:user1 group1,group2, then 
chgrp for the folder to be secureGroup, 
and chmod for the folder as g+rw. 
3) As told above, this makes much sense and not just try to resolve a corner 
case. As you may know, Hive supports HDFS as backend storage,
and role based access control. Using Hive one can create a database and then 
grant some users/groups/roles with CREATE privilege on it. 
After that,some granted user (granted directly or via granted group or role) 
runs a cmd to create table in that database. It can pass the access
control check in Hive but still may be failed by HDFS when Hive tries to create 
a file for the table in the database folder for the user,
just due that the user hasn't write permission to the folder! To resolve such 
issues, we can easily achieve using this provider.
3) Minor but very convinent, we can use this mapping file and provider to 
define some users, groups for test purpose, when don't want to
involve ShellBasedGroupMapping or LdapGroupMapping.

Thanks for your feedback!

Kai


[jira] [Resolved] (HADOOP-7386) Support concatenated bzip2 files

2012-12-10 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7386.
-

Resolution: Duplicate

Thanks for confirming! Resolving as dupe.

 Support concatenated bzip2 files
 

 Key: HADOOP-7386
 URL: https://issues.apache.org/jira/browse/HADOOP-7386
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Karthik Kambatla

 HADOOP-6835 added the framework and direct support for concatenated gzip 
 files.  We should do the same for bzip files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

2012-12-10 Thread Colin McCabe
On Fri, Dec 7, 2012 at 5:31 PM, Radim Kolar h...@filez.com wrote:
 1. cmake and protoc maven plugins already exists. why you want to write a
 new ones?

This has already been discussed; see
https://groups.google.com/forum/?fromgroups=#!topic/cmake-maven-project-users/5FpfUHmg5Ho

Actually the situation is even worse than it might seem from that
thread, since it turns out that com.googlecode.cmake-maven-project has
no support for any platforms but Windows.  It also has no support for
running native unit tests, which is a big motivation behind
HADOOP-8887.

 2. Groovy accepts java syntax. Just rewrite saveVersion.sh to java (its done
 already in JIRA) and put it in pom.xml - no overhaul of build infrastructure
 needed.

Part of the reason for this thread is so that we can come up with a
solution for both branch-1 and later branches.  This would not be
accomplished by putting all the logic into a pom.xml file, since
branch-1 doesn't use Maven.

best,
Colin


[jira] [Created] (HADOOP-9128) MetricsDynamicMBeanBase can cause high cpu load

2012-12-10 Thread Nate Putnam (JIRA)
Nate Putnam created HADOOP-9128:
---

 Summary: MetricsDynamicMBeanBase can cause high cpu load
 Key: HADOOP-9128
 URL: https://issues.apache.org/jira/browse/HADOOP-9128
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.20.2
 Environment: cdh3u4
Reporter: Nate Putnam


I noticed high load on some of our Hadoop services. On closer inspection we 
found several threads that were consuming cpu (80-90% of a core) doing : 

java.lang.Thread.State: RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at 
org.apache.hadoop.metrics.util.MetricsDynamicMBeanBase.getAttribute(MetricsDynamicMBeanBase.java:135)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)

This is a known issue in java when using a non thread safe hash map from 
multiple threads. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9129) ViewFs does not validate internal names in the mount table

2012-12-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-9129:
-

 Summary: ViewFs does not validate internal names in the mount table
 Key: HADOOP-9129
 URL: https://issues.apache.org/jira/browse/HADOOP-9129
 Project: Hadoop Common
  Issue Type: Bug
  Components: viewfs
Affects Versions: 3.0.0
Reporter: Chris Nauroth


Currently, there is no explicit validation of {{ViewFs}} internal names in the 
mount table during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9130) TestMapReduceChildJVM fails in branch-trunk-win

2012-12-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-9130:
-

 Summary: TestMapReduceChildJVM fails in branch-trunk-win
 Key: HADOOP-9130
 URL: https://issues.apache.org/jira/browse/HADOOP-9130
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: trunk-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The YARN-233 patch for getting YARN working on Windows forgot to include a 
corresponding change in {{TestMapReduceChildJVM}}, so the test is failing now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira