[jira] [Created] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-21 Thread Wenwu Peng (JIRA)
Wenwu Peng created HADOOP-10624:
---

 Summary: Fix some minors typo and add more test cases for 
hadoop_err
 Key: HADOOP-10624
 URL: https://issues.apache.org/jira/browse/HADOOP-10624
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng


Changes:
1. Add more test cases to cover method hadoop_lerr_alloc and hadoop_uverr_alloc
2. Fix typo as following:
1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
hadoop_err.h
2) Change OutOfMemory to OutOfMemoryException to consistent with other 
Exception in hadoop_err.c
3) Change DBUG to DEBUG in messenger.c
4) Change DBUG to DEBUG in reactor.c




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-21 Thread Wenwu Peng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated HADOOP-10624:


Attachment: HADOOP-10624-pnative.001.patch

submit the first version patch.

 Fix some minors typo and add more test cases for hadoop_err
 ---

 Key: HADOOP-10624
 URL: https://issues.apache.org/jira/browse/HADOOP-10624
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: HADOOP-10624-pnative.001.patch


 Changes:
 1. Add more test cases to cover method hadoop_lerr_alloc and 
 hadoop_uverr_alloc
 2. Fix typo as following:
 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
 hadoop_err.h
 2) Change OutOfMemory to OutOfMemoryException to consistent with other 
 Exception in hadoop_err.c
 3) Change DBUG to DEBUG in messenger.c
 4) Change DBUG to DEBUG in reactor.c



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10608) Support incremental data copy in DistCp

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004479#comment-14004479
 ] 

Hadoop QA commented on HADOOP-10608:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645910/HADOOP-10608.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1280 javac 
compiler warnings (more than the trunk's current 1278 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-tools/hadoop-distcp:

  org.apache.hadoop.fs.TestFilterFileSystem
  org.apache.hadoop.fs.TestHarFileSystem
  org.apache.hadoop.hdfs.TestDistributedFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-distcp.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3959//console

This message is automatically generated.

 Support incremental data copy in DistCp
 ---

 Key: HADOOP-10608
 URL: https://issues.apache.org/jira/browse/HADOOP-10608
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch


 Currently when doing distcp with -update option, for two files with the same 
 file names but with different file length or checksum, we overwrite the whole 
 file. It will be good if we can detect the case where (sourceFile = 
 targetFile + appended_data), and only transfer the appended data segment to 
 the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004498#comment-14004498
 ] 

Yi Liu commented on HADOOP-10603:
-

Andrew, thanks for your detailed review. Although they are based on a slightly 
old version, but most are also valid for latest patch :-).  
HADOOP-10617 is the common side test cases for crypto streams, since we already 
have lots of test cases and still need to increase and it's a bit large if 
merging to this, I made it as separate JIRA. 
Sure, according to your suggestion, I will merge it to this JIRA.

Following is my response to your comments, and I will update the patch later.

{quote}
Need class javadoc and interface annotations on all new classes
Need p/ to actually line break in javadoc
Some tab characters present
{quote}
I will update them.

{quote}
s/mod/mode
What does calIV mean? Javadoc here would be nice.
calIV would be simpler if we used ByteBuffer.wrap and getLong. I think right 
now, we also need to cast each value to a long before shifting, else it only 
works up to an int. Would be good to unit test this function.
{quote}
Right, I will update them.

{quote}
Could you define the term block in the #encrypt javadoc?
{quote}
It was a wrong word and should be “buffer”.  Already updated it in latest patch.

{quote}
I don't understand the reinit conditions, do you mind explaining this a bit? 
The javadoc for Cipher#update indicates that it always fully reads the input 
buffer, so is the issue that the cipher sometimes doesn't flush all the input 
to the output buffer?
{quote}
Andrew, I agree with you. The javadoc for Cipher#update indicates that it 
always fully reads the input buffer and decrypt all input data.  This will be 
always correct for CTR mode, for some of other modes input data may be buffered 
if requested padding (CTR doesn’t need padding).  Charles has concern about 
maybe some custom JCE provider implementation can’t decrypt all data for CTR 
mode using {{Cipher#update}}, so I add the reinit conditions, and I think if 
that specific provider can’t decrypt all input data of {{Cipher#update}} for 
CTR mode, that should be a bug of that provider since it doesn't follow the 
definition of {{Cipher#update}}.  

{quote}
 If this API only accepts direct ByteBuffers, we should Precondition check that 
in the implementation
{quote}
I’m not sure we have this restriction. Java heap byteBuffer is also OK.  Direct 
ByteBuffer is more efficient (no copy) when the cipher provider is native code 
and using JNI. I will add if you prefer.

{quote}
 Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, 
not {{#ShortBufferException}}. I also don't see this being thrown because we 
wrap everything in an IOException.
{quote}
Right, I will revise this.

{quote}
How was the default buffer size of 8KB chosen? This should probably be a new 
configuration parameter, or respect io.file.buffer.size.
{quote}
OK. I will add configuration parameter for the default buffer size.

{quote}
Potential for int overflow in {{#write}} where we check {{off+len  0}}. I also 
find this if statement hard to parse, would prefer if it were expanded.
{quote}
OK. I will expand them in next patch.

{quote}
Is the {{16}} in {{updateEncryptor}} something that should be hard-coded? Maybe 
pull it out into a constant and javadoc why it's 16. I'm curious if this is 
dependent on the Encryptor implementation.
{quote}
Let’s pull it out into variable.  16bytes is 128bits, and it’s in definition of 
AES: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard. Let’s define it 
as a configuration parameter, since other algorithm may have different block 
size, although we use AES.

{quote}
We need to be careful with direct BBs, since they don't trigger GC. We should 
be freeing them manually when the stream is closed, or pooling them somehow for 
reuse.
{quote}
Good point.  For pooling them, maybe they are created with different buffer 
size and not suitable in pool? So I will add freeing them manually when the 
stream is closed.

{quote}
•  In {{#process}}, we flip the inBuf, then if there's no data we just return. 
Shouldn't we restore inBuf to its previous padded state first? Also, IIUC 
{{inBuffer.remaining()}} cannot be less than padding since the inBuffer 
position does not move backwards, so I'd prefer to see a Precondition check and 
{{inBuf.remaining() == padding)}}. Test case would be nice if I'm right about 
this.
{quote}

You are right, there is a potential issue. I will fix it and add test case.  
Since in our code, only when we have input data then we go to {{#process}},  so 
{{inBuffer}} should have real data. But from view of code logic we should 
handle like you said. And agree we have a precondition check.

{quote}
Rename {{#process}} to {{#encrypt}}?
{quote}
Good, let’s do that.

{quote}
Do we need the special-case logic with tmpBuf? It looks like 

[jira] [Updated] (HADOOP-10621) Remove CRLF for xattr value base64 encoding for better display.

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10621:
-

Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-10514

 Remove CRLF for xattr value base64 encoding for better display.
 ---

 Key: HADOOP-10621
 URL: https://issues.apache.org/jira/browse/HADOOP-10621
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6426.patch


 {{Base64.encodeBase64String(value)}} encodes binary data using the base64 
 algorithm into 76 character blocks separated by CRLF.
 In fs shell, xattrs display like:
 {code}
 # file: /user
 user.a1=0sMTIz
 user.a2=0sMTIzNDU2
 user.a3=0sMTIzNDU2
 {code}
 We don't need multiple line and CRLF for xattr value, and we can use:
 {code}
 Base64 base64 = new Base64(0);
 base64.encodeToString(value);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10608) Support incremental data copy in DistCp

2014-05-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HADOOP-10608:
-

Hadoop Flags: Reviewed

+1 patch looks good.

 Support incremental data copy in DistCp
 ---

 Key: HADOOP-10608
 URL: https://issues.apache.org/jira/browse/HADOOP-10608
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch


 Currently when doing distcp with -update option, for two files with the same 
 file names but with different file length or checksum, we overwrite the whole 
 file. It will be good if we can detect the case where (sourceFile = 
 targetFile + appended_data), and only transfer the appended data segment to 
 the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Issue Type: Improvement  (was: Bug)

 Copy command with preserve option should handle Xattrs
 --

 Key: HADOOP-10561
 URL: https://issues.apache.org/jira/browse/HADOOP-10561
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu

 The design docs for Xattrs stated that we handle preserve options with copy 
 commands
 From doc:
 Preserve option of commands like “cp -p” shell command and “distcp -p” should 
 work on XAttrs. 
 In the case of source fs supports XAttrs but target fs does not support, 
 XAttrs will be ignored 
 with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Issue Type: Bug  (was: Sub-task)
Parent: (was: HADOOP-10514)

 Copy command with preserve option should handle Xattrs
 --

 Key: HADOOP-10561
 URL: https://issues.apache.org/jira/browse/HADOOP-10561
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu

 The design docs for Xattrs stated that we handle preserve options with copy 
 commands
 From doc:
 Preserve option of commands like “cp -p” shell command and “distcp -p” should 
 work on XAttrs. 
 In the case of source fs supports XAttrs but target fs does not support, 
 XAttrs will be ignored 
 with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-10561:
-

Affects Version/s: (was: HDFS XAttrs (HDFS-2006))
   3.0.0

 Copy command with preserve option should handle Xattrs
 --

 Key: HADOOP-10561
 URL: https://issues.apache.org/jira/browse/HADOOP-10561
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu

 The design docs for Xattrs stated that we handle preserve options with copy 
 commands
 From doc:
 Preserve option of commands like “cp -p” shell command and “distcp -p” should 
 work on XAttrs. 
 In the case of source fs supports XAttrs but target fs does not support, 
 XAttrs will be ignored 
 with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-05-21 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004855#comment-14004855
 ] 

Uma Maheswara Rao G commented on HADOOP-10561:
--

Moved as top level Jira as HDFS-2006 branch merged to trunk!

 Copy command with preserve option should handle Xattrs
 --

 Key: HADOOP-10561
 URL: https://issues.apache.org/jira/browse/HADOOP-10561
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu

 The design docs for Xattrs stated that we handle preserve options with copy 
 commands
 From doc:
 Preserve option of commands like “cp -p” shell command and “distcp -p” should 
 work on XAttrs. 
 In the case of source fs supports XAttrs but target fs does not support, 
 XAttrs will be ignored 
 with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004967#comment-14004967
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

{code}
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/ahs-config/log4j.properties
...
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/timelineserver-config/log4j.properties
{code}

The timeline server made more custom (and likely equally undocumented) 
log4j.properties locations. Needless to say, that's going away too just like 
their rm-config and nm-config brethren.  

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005165#comment-14005165
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

Ran across an interesting discrepancy. hadoop-env.sh says:

{code}
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
{code}

This implies that could be something that isn't a user.  However...

{code}
  chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR
{code}

... we clearly have that assumption.  Since the chown has already been removed 
from the new code, this problem goes away.  But should we explicitly state that 
HADOOP_IDENT_STRING needs to be a user?  Is anyone aware of anything else that 
uses this outside of the Hadoop shell scripts?

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005174#comment-14005174
 ] 

Mark Grover commented on HADOOP-9902:
-

Hi Alan,
Good point. In Bigtop, where we create RPM and DEB packages for hadoop and 
bundle it into our Bigtop distribution, we do rely on this property.
And, looking at the code, it looks like we set that to be a user (hdfs user in 
our case).

Here are the references:
These get used in the scripts we deploy using puppet for our integration 
testing:
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-env.sh#L78
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/hadoop/templates/hadoop-hdfs#L20

This gets used in the default configuration for our secure clusters for 
integration testing:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/conf.secure/hadoop-env.sh#L56

This gets used in the init script that starts the datanode services:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hadoop-hdfs-datanode.svc#L39

And, this gets used to set certain environment variables before starting 
various HDFS services:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/hadoop/hdfs.default#L20

Hope that helps but please let me know if you need any further info.

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005196#comment-14005196
 ] 

Allen Wittenauer commented on HADOOP-9902:
--

That's very helpful! (Especially since that was going to be the next place I 
looked since I just happen to have it cloned from git on my dev machine 
it's going to be one of the first big tests I do as I work towards a 
commit-able patch. :D )

Bigtop looks like it is doing what I would expect: setting it for Hadoop, but 
not using it directly.  Which seems to indicate that, at least as far as Bigtop 
is concerned, we could expand the definition beyond it must be a user.

Hadoop also uses HADOOP_IDENT_STR as the setting for the Java hadoop.id.str 
property.  But I can't find a single place where this property is used. IIRC, 
it was used in ancient times for logging and/or display, but if we don't need 
the property set anymore because we've gotten wiser, I'd like to just yank that 
property completely.

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number

2014-05-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005209#comment-14005209
 ] 

Owen O'Malley commented on HADOOP-10611:


I disagree on this one. There is a lot of value in having semantics behind the 
key version. For example, the MapReduce task ids used to be randomly generated. 
That was easy, but it was a pain in the tail to figure out which tasks were 
related to which job. 

 KeyVersion name should not be assumed to be the 'key name @ the version 
 number
 ---

 Key: HADOOP-10611
 URL: https://issues.apache.org/jira/browse/HADOOP-10611
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 The KeyProvider public API should treat keyversion name as an opaque value. 
 Same for the KMS client/server.
 Methods like {{KeyProvider#buildVersionName()}} and 
 {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-05-21 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005242#comment-14005242
 ] 

Mark Grover commented on HADOOP-9902:
-

Great! Yeah, sounds good to me and in my personal opinion, Bigtop will be ok 
with expanding the definition. Just let us know when you make that change and 
what release it would show up in:-)

And, we don't use HADOOP_IDENT_STR, so no objections from Bigtop side there.

Let me (or d...@bigtop.apache.org) know if you need anything else. Thank you!

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005400#comment-14005400
 ] 

Charles Lamb commented on HADOOP-10603:
---

Hi Yi,

Good work so far. I took your latest patch and incorporated it into my sandbox 
and got my unit tests running with it. I have also made some edits to 
CryptoInputStream and CryptoOutputStream. I have attached the whole file for 
those two rather than diffs.

CryptoFactory.java
Perhaps rename this to Crypto.
getEncryptor/getDecryptor should also declare throws GeneralSecurityException

Encryptor.java
encrypt should declare throws GeneralSecurityException
decl for encrypt  80 chars
Consider making this interface an inner class of Crypto (aka CryptoFactory).
Remind me again why encrypt/decrypt don't take a position argument?
I wonder if, in general, we'll also want byte[] overloadings of the methods (as 
well as BB) for encrypt()/decrypt().

Decryptor.java
decrypt should throw GeneralSecurityException
The decl for decrypt  80 chars
Consider making this interface a subclass of Crypto (aka CryptoFactory).

JCEAESCTRCryptoFactory.java
This file needs an apache license header
Perhaps rename it to JCEAESCTRCrypto.java
getDescryptor/getEncryptor should throw GeneralSecurityException

JCEAESCTRDecryptor.java
ctor should throw GeneralSecurityException instead of RTException
decrypt should throw GeneralSecurityException

JCEAESCTREncryptor.java
ctor should throw GeneralSecurityException instead of RTException
encrypt should throw GeneralSecurityException

CryptoUtils.java
put a newline after public class CryptoUtils {
Could calIV be renamed to calcIV?

CryptoFSDataOutputStream.java
Why is fsOut needed? Why can't you just reference out for (e.g.) getPos()?

CryptoInputStream.java
You'll need a getWrappedStream() method.

Why 8192? Should this be moved to a static final int CONSTANT?
IWBNI the name of the interface that a particular method is implementing were 
put in a comment before the @Override. For instance,
// PositionedRead
@Override
public int read(long position ...)

IWBNI all of the methods for a particular interface were grouped together in 
the code.

In read(byte[], int, int), isn't the if (!usingByteBufferRead) I am worried 
that throwing and catching UnsupportedOperationException will be expensive. It 
seems very likely that for any particular stream, the same byte buffer will be 
passed in for the life of the stream. That means that for every call to 
read(...) there is potential for the UnsupportedOperationException to be 
thrown. That will be expensive. Perhaps keep a piece of state in the stream 
that gets set on the first time through indicating whether the BB is readable 
or not. Or keep a reference to the BB along with a bool. If the reference 
changes (on the off chance that the caller switched BBs for the same stream), 
then you can redetermine whether read is supported or not.

In readFully, you could simplify the implementation by just calling into 
read(long, byte[]...), like this:

  @Override // PositionedReadable
  public void readFully(long position, byte[] buffer, int offset, int length)
  throws IOException {
int nread = 0;
while (nread  length) {
  int nbytes =
  read(position + nread, buffer, offset + nread, length - nread);
  if (nbytes  0) {
throw new EOFException(End of file reached before reading fully.);
  }
  nread += nbytes;
}
  }

That way you can let read(long...) do all the unwinding of the seek position.

In seek(), you can do a check for forward == 0 and return immediately, thus 
saving the two calls to position() in the noop case. Ditto skip().

I noticed that you implemented read(ByteBufferPool), but not releaseBuffer(BB). 
Is that because you didn't have time (it's ok if that's the case, I'm just 
wondering why one and not the other)?

CryptoOutputStream.java
You'll need a getWrappedStream() method.



 Crypto input and output streams implementing Hadoop stream interfaces
 -

 Key: HADOOP-10603
 URL: https://issues.apache.org/jira/browse/HADOOP-10603
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
 HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, 
 HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, 
 HADOOP-10603.patch


 A common set of Crypto Input/Output streams. They would be used by 
 CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
 Note we cannot use the JDK Cipher Input/Output streams 

[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HADOOP-10603:
--

Attachment: CryptoOutputStream.java
CryptoInputStream.java

 Crypto input and output streams implementing Hadoop stream interfaces
 -

 Key: HADOOP-10603
 URL: https://issues.apache.org/jira/browse/HADOOP-10603
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: CryptoInputStream.java, CryptoOutputStream.java, 
 HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, 
 HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, 
 HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch


 A common set of Crypto Input/Output streams. They would be used by 
 CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
 Note we cannot use the JDK Cipher Input/Output streams directly because we 
 need to support the additional interfaces that the Hadoop FileSystem streams 
 implement (Seekable, PositionedReadable, ByteBufferReadable, 
 HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
 HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

2014-05-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005481#comment-14005481
 ] 

Yi Liu commented on HADOOP-10603:
-

Thanks Charles for good comments. I'm refining the patch for Andrew's comments, 
will respond you later and also want to address your comments in the new patch 
:-)

 Crypto input and output streams implementing Hadoop stream interfaces
 -

 Key: HADOOP-10603
 URL: https://issues.apache.org/jira/browse/HADOOP-10603
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: CryptoInputStream.java, CryptoOutputStream.java, 
 HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, 
 HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, 
 HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.patch


 A common set of Crypto Input/Output streams. They would be used by 
 CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
 Note we cannot use the JDK Cipher Input/Output streams directly because we 
 need to support the additional interfaces that the Hadoop FileSystem streams 
 implement (Seekable, PositionedReadable, ByteBufferReadable, 
 HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
 HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)
Wangda Tan created HADOOP-10625:
---

 Summary: Configuration: names should be trimmed when 
putting/getting to properties
 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan


Currently, Hadoop will not trim name when putting a pair of k/v to property. 
But when loading configuration from file, names will be trimmed:
(In Configuration.java)
{code}
  if (name.equals(field.getTagName())  field.hasChildNodes())
attr = StringInterner.weakIntern(
((Text)field.getFirstChild()).getData().trim());
  if (value.equals(field.getTagName())  field.hasChildNodes())
value = StringInterner.weakIntern(
((Text)field.getFirstChild()).getData());
{code}
With this behavior, following steps will be problematic:
1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
2. User try to get hadoop.key, cannot get value
3. Serialize/deserialize configuration (Like what did in MR)
4. User try to get hadoop.key, can get value, which will make inconsistency 
problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-10625:


Attachment: HADOOP-10625.patch

Attach a patch for this.

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
 Attachments: HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-10625:


Status: Patch Available  (was: Open)

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
 Attachments: HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)