[jira] [Created] (HADOOP-13310) S3A reporting of file group as null is harmful to compatibility for the shell.

2016-06-21 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-13310:
--

 Summary: S3A reporting of file group as null is harmful to 
compatibility for the shell.
 Key: HADOOP-13310
 URL: https://issues.apache.org/jira/browse/HADOOP-13310
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Chris Nauroth
Priority: Minor


S3A does not persist group information in file metadata.  Instead, it stubs the 
value of the group to an empty string.  Although the JavaDocs for 
{{FileStatus#getGroup}} indicate that empty string is a possible return value, 
this is likely to cause compatibility problems.  Most notably, shell scripts 
that expect to be able to perform positional parsing on the output of things 
like {{hadoop fs -ls}} will stop working if retargeted from HDFS to S3A.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13309) Document S3A known limitations in file ownership and permission model.

2016-06-21 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-13309:
--

 Summary: Document S3A known limitations in file ownership and 
permission model.
 Key: HADOOP-13309
 URL: https://issues.apache.org/jira/browse/HADOOP-13309
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Chris Nauroth
Priority: Minor


S3A does not match the implementation of HDFS in its handling of file ownership 
and permissions.  Fundamental S3 limitations prevent it.  This is a frequent 
source of confusion for end users.  This issue proposes to document these known 
limitations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13308) S3A delete may fail to preserve parent directory.

2016-06-21 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-13308:
--

 Summary: S3A delete may fail to preserve parent directory.
 Key: HADOOP-13308
 URL: https://issues.apache.org/jira/browse/HADOOP-13308
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Chris Nauroth


When a file or directory is deleted in S3A, and the result of that deletion 
makes the parent empty, S3A must store a fake directory (a pure metadata 
object) at the parent to indicate that the directory still exists.  The logic 
for restoring fake directories is not resilient to a process death.  This may 
cause a directory to vanish unexpectedly after a deletion of its last child.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13307) add rsync to Dockerfile so that precommit archive works

2016-06-21 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-13307:
-

 Summary: add rsync to Dockerfile so that precommit archive works
 Key: HADOOP-13307
 URL: https://issues.apache.org/jira/browse/HADOOP-13307
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Allen Wittenauer
Priority: Trivial


Apache Yetus 0.4.0 adds an archiving capability to store files from the build 
tree.  In order to use the Hadoop Dockerfile, the rsync package needs to be 
added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13306) add filter doesnot check if it exists

2016-06-21 Thread chillon_m (JIRA)
chillon_m created HADOOP-13306:
--

 Summary: add filter doesnot check if it exists
 Key: HADOOP-13306
 URL: https://issues.apache.org/jira/browse/HADOOP-13306
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.5.2
Reporter: chillon_m
Priority: Minor


hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java
line-705:defineFilter() doesnot check filter if it exists.we need check if it 
exists before add it.

another,No_Cache_Filter added twice when create a HttpServer2 object.
I think addDefaultApps()  invoke addNoCacheFilter() is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13305) Define common statistics names across schemes

2016-06-21 Thread Mingliang Liu (JIRA)
Mingliang Liu created HADOOP-13305:
--

 Summary: Define common statistics names across schemes
 Key: HADOOP-13305
 URL: https://issues.apache.org/jira/browse/HADOOP-13305
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.8.0
Reporter: Mingliang Liu
Assignee: Mingliang Liu


The {{StorageStatistics}} provides a pretty general interface, i.e. 
{{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard 
names for the storage statistics and thus the getLong(name) is up to the 
implementation of storage statistics. The problems:
# For the common statistics, downstream applications expect the same statistics 
name across different storage statistics and/or file system schemes. Chances 
are they have to use {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and 
{{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus operation 
stat.
# Moreover, probing per-operation stats is hard if there is no standard/shared 
common names.

It makes a lot of sense for different schemes to issue the per-operation stats 
of the same name. Meanwhile, every FS will have its own internal things to 
count, which can't be centrally defined or managed. But there are some common 
which would be easier managed if they all had the same name.

Another motivation is that having a common set of names here will encourage 
uniform instrumentation of all filesystems; it will also make it easier to 
analyze the output of runs, were the stats to be published to a "performance 
log" similar to the audit log. See Steve's work for S3  (e.g. [HADOOP-13171])

This jira is track the effort of defining common StorageStatistics entry names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-06-21 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/70/

[Jun 20, 2016 8:07:26 AM] (aajisaka) HADOOP-13192. 
org.apache.hadoop.util.LineReader cannot handle multibyte
[Jun 20, 2016 4:56:53 PM] (sjlee) YARN-4958. The file localization process 
should allow for wildcards to
[Jun 20, 2016 5:42:50 PM] (cmccabe) HDFS-10328. Add per-cache-pool default 
replication num configuration
[Jun 20, 2016 5:46:18 PM] (cmccabe) HADOOP-13280. 
FileSystemStorageStatistics#getLong(“readOps“) should
[Jun 20, 2016 8:46:56 PM] (atm) HDFS-10423. Increase default value of httpfs 
maxHttpHeaderSize.
[Jun 20, 2016 9:25:07 PM] (cmccabe) HADOOP-13288. Guard null stats key in 
FileSystemStorageStatistics
[Jun 20, 2016 11:05:32 PM] (ozawa) HADOOP-9613. [JDK8] Update jersey version to 
latest 1.x release.
[Jun 20, 2016 11:25:30 PM] (jitendra) HADOOP-13291. Probing stats in
[Jun 21, 2016 12:22:55 AM] (jitendra) HDFS-10538. Remove 
AsyncDistributedFileSystem. Contributed by Xiaobing
[Jun 21, 2016 1:25:09 AM] (cmccabe) HDFS-10448. CacheManager#addInternal tracks 
bytesNeeded incorrectly when




-1 overall


The following subsystems voted -1:
unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed CTEST tests :

   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zerocopy_hdfs_static 
   test_test_native_mini_dfs 
   test_test_libhdfs_threaded_hdfs_static 
   test_test_libhdfs_zero

[jira] [Created] (HADOOP-13304) distributed database for store , mapreduce for compute

2016-06-21 Thread jiang hehui (JIRA)
jiang hehui created HADOOP-13304:


 Summary: distributed database for store , mapreduce for compute
 Key: HADOOP-13304
 URL: https://issues.apache.org/jira/browse/HADOOP-13304
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs
Affects Versions: 2.6.4
Reporter: jiang hehui


in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute 
.
my idea is that data are stored in distributed database , data compute is like 
mapreduce.

!http://images2015.cnblogs.com/blog/439702/201606/439702-2016062112414-32823985.png!

* insert: 
using two-phase commit ,according to the split policy ,just execute insert in 
nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in 
nodes

* update:
using two-phase commit, according to the split policy, if record node does not 
change ,just execute update in nodes, if record node change, first delete old 
value in source node , and insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi 
nodes not need)is just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across 
multi nodes),we call a job 
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a 
job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql 
,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan 
which contain the dag , including stage and mapsql ,shuffle, reducesql in each 
stage; then just execute the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , 
reducesql is reduce.
{color}
{panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13303) Detail Informations of KMS High Avalibale

2016-06-21 Thread qiushi fan (JIRA)
qiushi fan created HADOOP-13303:
---

 Summary: Detail Informations of KMS High Avalibale
 Key: HADOOP-13303
 URL: https://issues.apache.org/jira/browse/HADOOP-13303
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ha, kms
Affects Versions: 2.7.2
Reporter: qiushi fan


I have some confusions of kms HA recently. 

1. we can set up multiple KMS instances  behind a load balancer. Among all 
these kms instances, there is only one master kms, others are slave kms. The 
master kms can handle Key create/store/rollover/delete operations by directly 
contacting with JCE keystore file. The slave kms can handle  Key 
create/store/rollover/delete operations by delegating it to the master kms.

so although we set up multiple kms, there is only one  JCE keystore file, and 
only the master kms can access to this file.   Both the JCE keystore file and 
the master kms don't have a backup. If one of them died, there is no way to 
avoid losing data.

Is all of the above true? KMS doesn't have a solution to handle the failure of 
master kms and  JCE keystore file?

2. I heard another way to achieve kms HA: make use of 
LoadBalancingKMSClientProvider. But  I can't find detail informations of 
LoadBalancingKMSClientProvider.  So why the  LoadBalancingKMSClientProvider can 
achieve kms HA?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org