[jira] [Created] (HADOOP-13310) S3A reporting of file group as null is harmful to compatibility for the shell.
Chris Nauroth created HADOOP-13310: -- Summary: S3A reporting of file group as null is harmful to compatibility for the shell. Key: HADOOP-13310 URL: https://issues.apache.org/jira/browse/HADOOP-13310 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Chris Nauroth Priority: Minor S3A does not persist group information in file metadata. Instead, it stubs the value of the group to an empty string. Although the JavaDocs for {{FileStatus#getGroup}} indicate that empty string is a possible return value, this is likely to cause compatibility problems. Most notably, shell scripts that expect to be able to perform positional parsing on the output of things like {{hadoop fs -ls}} will stop working if retargeted from HDFS to S3A. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13309) Document S3A known limitations in file ownership and permission model.
Chris Nauroth created HADOOP-13309: -- Summary: Document S3A known limitations in file ownership and permission model. Key: HADOOP-13309 URL: https://issues.apache.org/jira/browse/HADOOP-13309 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Chris Nauroth Priority: Minor S3A does not match the implementation of HDFS in its handling of file ownership and permissions. Fundamental S3 limitations prevent it. This is a frequent source of confusion for end users. This issue proposes to document these known limitations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13308) S3A delete may fail to preserve parent directory.
Chris Nauroth created HADOOP-13308: -- Summary: S3A delete may fail to preserve parent directory. Key: HADOOP-13308 URL: https://issues.apache.org/jira/browse/HADOOP-13308 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Chris Nauroth When a file or directory is deleted in S3A, and the result of that deletion makes the parent empty, S3A must store a fake directory (a pure metadata object) at the parent to indicate that the directory still exists. The logic for restoring fake directories is not resilient to a process death. This may cause a directory to vanish unexpectedly after a deletion of its last child. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13307) add rsync to Dockerfile so that precommit archive works
Allen Wittenauer created HADOOP-13307: - Summary: add rsync to Dockerfile so that precommit archive works Key: HADOOP-13307 URL: https://issues.apache.org/jira/browse/HADOOP-13307 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Allen Wittenauer Priority: Trivial Apache Yetus 0.4.0 adds an archiving capability to store files from the build tree. In order to use the Hadoop Dockerfile, the rsync package needs to be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13306) add filter doesnot check if it exists
chillon_m created HADOOP-13306: -- Summary: add filter doesnot check if it exists Key: HADOOP-13306 URL: https://issues.apache.org/jira/browse/HADOOP-13306 Project: Hadoop Common Issue Type: Improvement Components: net Affects Versions: 2.5.2 Reporter: chillon_m Priority: Minor hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java line-705:defineFilter() doesnot check filter if it exists.we need check if it exists before add it. another,No_Cache_Filter added twice when create a HttpServer2 object. I think addDefaultApps() invoke addNoCacheFilter() is unnecessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13305) Define common statistics names across schemes
Mingliang Liu created HADOOP-13305: -- Summary: Define common statistics names across schemes Key: HADOOP-13305 URL: https://issues.apache.org/jira/browse/HADOOP-13305 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 2.8.0 Reporter: Mingliang Liu Assignee: Mingliang Liu The {{StorageStatistics}} provides a pretty general interface, i.e. {{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard names for the storage statistics and thus the getLong(name) is up to the implementation of storage statistics. The problems: # For the common statistics, downstream applications expect the same statistics name across different storage statistics and/or file system schemes. Chances are they have to use {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and {{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus operation stat. # Moreover, probing per-operation stats is hard if there is no standard/shared common names. It makes a lot of sense for different schemes to issue the per-operation stats of the same name. Meanwhile, every FS will have its own internal things to count, which can't be centrally defined or managed. But there are some common which would be easier managed if they all had the same name. Another motivation is that having a common set of names here will encourage uniform instrumentation of all filesystems; it will also make it easier to analyze the output of runs, were the stats to be published to a "performance log" similar to the audit log. See Steve's work for S3 (e.g. [HADOOP-13171]) This jira is track the effort of defining common StorageStatistics entry names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/70/ [Jun 20, 2016 8:07:26 AM] (aajisaka) HADOOP-13192. org.apache.hadoop.util.LineReader cannot handle multibyte [Jun 20, 2016 4:56:53 PM] (sjlee) YARN-4958. The file localization process should allow for wildcards to [Jun 20, 2016 5:42:50 PM] (cmccabe) HDFS-10328. Add per-cache-pool default replication num configuration [Jun 20, 2016 5:46:18 PM] (cmccabe) HADOOP-13280. FileSystemStorageStatistics#getLong(âreadOpsâ) should [Jun 20, 2016 8:46:56 PM] (atm) HDFS-10423. Increase default value of httpfs maxHttpHeaderSize. [Jun 20, 2016 9:25:07 PM] (cmccabe) HADOOP-13288. Guard null stats key in FileSystemStorageStatistics [Jun 20, 2016 11:05:32 PM] (ozawa) HADOOP-9613. [JDK8] Update jersey version to latest 1.x release. [Jun 20, 2016 11:25:30 PM] (jitendra) HADOOP-13291. Probing stats in [Jun 21, 2016 12:22:55 AM] (jitendra) HDFS-10538. Remove AsyncDistributedFileSystem. Contributed by Xiaobing [Jun 21, 2016 1:25:09 AM] (cmccabe) HDFS-10448. CacheManager#addInternal tracks bytesNeeded incorrectly when -1 overall The following subsystems voted -1: unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed CTEST tests : test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zerocopy_hdfs_static test_test_native_mini_dfs test_test_libhdfs_threaded_hdfs_static test_test_libhdfs_zero
[jira] [Created] (HADOOP-13304) distributed database for store , mapreduce for compute
jiang hehui created HADOOP-13304: Summary: distributed database for store , mapreduce for compute Key: HADOOP-13304 URL: https://issues.apache.org/jira/browse/HADOOP-13304 Project: Hadoop Common Issue Type: New Feature Components: fs Affects Versions: 2.6.4 Reporter: jiang hehui in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute . my idea is that data are stored in distributed database , data compute is like mapreduce. !http://images2015.cnblogs.com/blog/439702/201606/439702-2016062112414-32823985.png! * insert: using two-phase commit ,according to the split policy ,just execute insert in nodes * delete: using two-phase commit ,according to the split policy ,just execute delete in nodes * update: using two-phase commit, according to the split policy, if record node does not change ,just execute update in nodes, if record node change, first delete old value in source node , and insert new value in destination node . * select: ** simple select (like data just in one node , or data fusion across multi nodes not need)is just the same like standalone database server; ** complex select (like distinct , group by, order by, sub query, join across multi nodes),we call a job {panel} {color:red}job are parsed into stages , stages have lineage , all stages in a job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql ,shuffle, reducesql . when receive request sql, according to metadata ,generate the execution plan which contain the dag , including stage and mapsql ,shuffle, reducesql in each stage; then just execute the plan , and return result to client. as in spark , it is the same ; rdd is table , job is job; as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , reducesql is reduce. {color} {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13303) Detail Informations of KMS High Avalibale
qiushi fan created HADOOP-13303: --- Summary: Detail Informations of KMS High Avalibale Key: HADOOP-13303 URL: https://issues.apache.org/jira/browse/HADOOP-13303 Project: Hadoop Common Issue Type: Improvement Components: ha, kms Affects Versions: 2.7.2 Reporter: qiushi fan I have some confusions of kms HA recently. 1. we can set up multiple KMS instances behind a load balancer. Among all these kms instances, there is only one master kms, others are slave kms. The master kms can handle Key create/store/rollover/delete operations by directly contacting with JCE keystore file. The slave kms can handle Key create/store/rollover/delete operations by delegating it to the master kms. so although we set up multiple kms, there is only one JCE keystore file, and only the master kms can access to this file. Both the JCE keystore file and the master kms don't have a backup. If one of them died, there is no way to avoid losing data. Is all of the above true? KMS doesn't have a solution to handle the failure of master kms and JCE keystore file? 2. I heard another way to achieve kms HA: make use of LoadBalancingKMSClientProvider. But I can't find detail informations of LoadBalancingKMSClientProvider. So why the LoadBalancingKMSClientProvider can achieve kms HA? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org