[jira] [Created] (HDFS-13903) write data into hdfs is very slow while the hdfs has mounted to local file system via nfs3

2018-09-06 Thread Liao Chunbo (JIRA)
Liao Chunbo created HDFS-13903:
--

 Summary: write data into hdfs is very slow while the hdfs has 
mounted to local file system via nfs3
 Key: HDFS-13903
 URL: https://issues.apache.org/jira/browse/HDFS-13903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.1
Reporter: Liao Chunbo


I have installed the hadoop 3.1.1, and mount hdfs to local filesystem via 
command of "mount -t nfs -o vers=3,nolock,rw,async,wsize=32768,rsize=32768 
hdfsserver:/ /mydir"

 

When I use use cp command to copy files to the mounted dir(/mydire). The speed 
is very slow. And some exception occurred as the follow:

 

2018-09-07 11:04:51,952 INFO security.ShellBasedIdMapping: Update cache now

2018-09-07 11:08:48,305 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:08:48,806 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:08:48,816 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:08:48,829 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:08:49,001 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:10:24,975 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:11:19,747 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:11:19,756 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:11:19,768 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:17,220 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:17,233 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:35,109 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:36,398 ERROR nfs3.RpcProgramNfs3: Setting file size is not 
supported when setattr, fileId: 27204

2018-09-07 11:12:44,424 ERROR nfs3.RpcProgramNfs3: Setting file size is not 
supported when setattr, fileId: 27204

2018-09-07 11:12:53,382 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:53,394 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:12:59,658 ERROR nfs3.RpcProgramNfs3: Setting file size is not 
supported when setattr, fileId: 27204

2018-09-07 11:13:17,519 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:17,533 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:18,602 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:18,613 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:18,933 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:42,596 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:13:42,608 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:18:30,308 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:19:23,438 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:19:23,446 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:19:23,666 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:19:31,833 INFO security.ShellBasedIdMapping: Can't map group 
supergroup. Use its string hashcode:-1710818332

2018-09-07 11:19:51,967 INFO security.ShellBasedIdMapping: Update cache now

2018-09-07 11:20:02,884 WARN hdfs.DataStreamer: Exception for 
BP-1952000504-10.56.233.182-1536129789677:blk_1073751181_10452

java.io.EOFException: Unexpected EOF while trying to read response from server

    at 
org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:549)

    at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)

    at 

Re: [Vote] Merge discussion for Node attribute support feature YARN-3409

2018-09-06 Thread Gour Saha
+1 for merge

-Gour


> On Sep 6, 2018, at 8:44 PM, Sunil G  wrote:
> 
> +1 for the merge.
> 
> - Sunil
> 
> 
> On Wed, Sep 5, 2018 at 6:01 PM Naganarasimha Garla <
> naganarasimha...@apache.org> wrote:
> 
>> Hi All,
>> Thanks for feedback folks, based on the positive response starting
>> a Vote thread for merging YARN-3409 to master.
>> 
>> Regards,
>> + Naga & Sunil
>> 
>>> On Wed, 5 Sep 2018 2:51 am Wangda Tan,  wrote:
>>> 
>>> +1 for the merge, it gonna be a great addition to 3.2.0 release. Thanks
>> to
>>> everybody for pushing this feature to complete.
>>> 
>>> Best,
>>> Wangda
>>> 
>>> On Tue, Sep 4, 2018 at 8:25 AM Bibinchundatt 
>>> wrote:
>>> 
 +1 for merge. Fetaure would be a good addition to 3.2 release.
 
 --
 Bibin A Chundatt
 M: +91-9742095715 <+91%2097420%2095715>> <+91%2097420%2095715>>
 E: bibin.chund...@huawei.com
 2012实验室-印研IT BU分部
 2012 Laboratories-IT BU Branch Dept.
 From:Naganarasimha Garla
 To:common-...@hadoop.apache.org,Hdfs-dev,yarn-...@hadoop.apache.org,
 mapreduce-...@hadoop.apache.org,
 Date:2018-08-29 20:00:44
 Subject:[Discuss] Merge discussion for Node attribute support feature
 YARN-3409
 
 Hi All,
 
 We would like to hear your thoughts on merging “Node Attributes Support
>> in
 YARN” branch (YARN-3409) [2] into trunk in a few weeks. The goal is to
>> get
 it in for HADOOP 3.2.
 
 *Major work happened in this branch*
 
 YARN-6858. Attribute Manager to store and provide node attributes in RM
 YARN-7871. Support Node attributes reporting from NM to RM( distributed
 node attributes)
 YARN-7863. Modify placement constraints to support node attributes
 YARN-7875. Node Attribute store for storing and recovering attributes
 
 *Detailed Design:*
 
 Please refer [1] for detailed design document.
 
 *Testing Efforts:*
 
 We did detailed tests for the feature in the last few weeks.
 This feature will be enabled only when Node Attributes constraints are
 specified through SchedulingRequest from AM.
 Manager implementation will help to store and recover Node Attributes.
 This
 works with existing placement constraints.
 
 *Regarding to API stability:*
 
 All newly added @Public APIs are @Unstable.
 
 Documentation jira [3] could help to provide detailed configuration
 details. This feature works from end-to-end and we tested this in our
 local
 cluster. Branch code is run against trunk and tracked via [4].
 
 We would love to get your thoughts before opening a voting thread.
 
 Special thanks to a team of folks who worked hard and contributed
>> towards
 this efforts including design discussion / patch / reviews, etc.: Weiwei
 Yang, Bibin Chundatt, Wangda Tan, Vinod Kumar Vavilappali, Konstantinos
 Karanasos, Arun Suresh, Varun Saxena, Devaraj Kavali, Lei Guo, Chong
>> Chen.
 
 [1] :
 
 
>> https://issues.apache.org/jira/secure/attachment/12937633/Node-Attributes-Requirements-Design-doc_v2.pdf
 [2] : https://issues.apache.org/jira/browse/YARN-3409
 [3] : https://issues.apache.org/jira/browse/YARN-7865
 [4] : https://issues.apache.org/jira/browse/YARN-8718
 
 Thanks,
 + Naga & Sunil Govindan
 
>>> 
>> 


Re: [Vote] Merge discussion for Node attribute support feature YARN-3409

2018-09-06 Thread Sunil G
+1 for the merge.

- Sunil


On Wed, Sep 5, 2018 at 6:01 PM Naganarasimha Garla <
naganarasimha...@apache.org> wrote:

> Hi All,
>  Thanks for feedback folks, based on the positive response starting
> a Vote thread for merging YARN-3409 to master.
>
> Regards,
> + Naga & Sunil
>
> On Wed, 5 Sep 2018 2:51 am Wangda Tan,  wrote:
>
> > +1 for the merge, it gonna be a great addition to 3.2.0 release. Thanks
> to
> > everybody for pushing this feature to complete.
> >
> > Best,
> > Wangda
> >
> > On Tue, Sep 4, 2018 at 8:25 AM Bibinchundatt 
> > wrote:
> >
> >> +1 for merge. Fetaure would be a good addition to 3.2 release.
> >>
> >> --
> >> Bibin A Chundatt
> >> M: +91-9742095715 <+91%2097420%2095715> <+91%2097420%2095715>>
> >> E: bibin.chund...@huawei.com
> >> 2012实验室-印研IT BU分部
> >> 2012 Laboratories-IT BU Branch Dept.
> >> From:Naganarasimha Garla
> >> To:common-...@hadoop.apache.org,Hdfs-dev,yarn-...@hadoop.apache.org,
> >> mapreduce-...@hadoop.apache.org,
> >> Date:2018-08-29 20:00:44
> >> Subject:[Discuss] Merge discussion for Node attribute support feature
> >> YARN-3409
> >>
> >> Hi All,
> >>
> >> We would like to hear your thoughts on merging “Node Attributes Support
> in
> >> YARN” branch (YARN-3409) [2] into trunk in a few weeks. The goal is to
> get
> >> it in for HADOOP 3.2.
> >>
> >> *Major work happened in this branch*
> >>
> >> YARN-6858. Attribute Manager to store and provide node attributes in RM
> >> YARN-7871. Support Node attributes reporting from NM to RM( distributed
> >> node attributes)
> >> YARN-7863. Modify placement constraints to support node attributes
> >> YARN-7875. Node Attribute store for storing and recovering attributes
> >>
> >> *Detailed Design:*
> >>
> >> Please refer [1] for detailed design document.
> >>
> >> *Testing Efforts:*
> >>
> >> We did detailed tests for the feature in the last few weeks.
> >> This feature will be enabled only when Node Attributes constraints are
> >> specified through SchedulingRequest from AM.
> >> Manager implementation will help to store and recover Node Attributes.
> >> This
> >> works with existing placement constraints.
> >>
> >> *Regarding to API stability:*
> >>
> >> All newly added @Public APIs are @Unstable.
> >>
> >> Documentation jira [3] could help to provide detailed configuration
> >> details. This feature works from end-to-end and we tested this in our
> >> local
> >> cluster. Branch code is run against trunk and tracked via [4].
> >>
> >> We would love to get your thoughts before opening a voting thread.
> >>
> >> Special thanks to a team of folks who worked hard and contributed
> towards
> >> this efforts including design discussion / patch / reviews, etc.: Weiwei
> >> Yang, Bibin Chundatt, Wangda Tan, Vinod Kumar Vavilappali, Konstantinos
> >> Karanasos, Arun Suresh, Varun Saxena, Devaraj Kavali, Lei Guo, Chong
> Chen.
> >>
> >> [1] :
> >>
> >>
> https://issues.apache.org/jira/secure/attachment/12937633/Node-Attributes-Requirements-Design-doc_v2.pdf
> >> [2] : https://issues.apache.org/jira/browse/YARN-3409
> >> [3] : https://issues.apache.org/jira/browse/YARN-7865
> >> [4] : https://issues.apache.org/jira/browse/YARN-8718
> >>
> >> Thanks,
> >> + Naga & Sunil Govindan
> >>
> >
>


[jira] [Created] (HDFS-13902) Add jmx conf and stacks menus to the datanode page

2018-09-06 Thread fengchuang (JIRA)
fengchuang created HDFS-13902:
-

 Summary:  Add jmx conf and stacks menus to the datanode page
 Key: HDFS-13902
 URL: https://issues.apache.org/jira/browse/HDFS-13902
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.3
Reporter: fengchuang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2) Chill Mode to consider percentage of container reports

2018-09-06 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-2.
-
Resolution: Duplicate

HDDS-351

> Chill Mode to consider percentage of container reports
> --
>
> Key: HDDS-2
> URL: https://issues.apache.org/jira/browse/HDDS-2
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: SCM
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: Chill Mode.pdf, HDDS-02.002.patch, HDDS-02.003.patch, 
> HDDS-2.004.patch, HDFS-13500.00.patch, HDFS-13500.01.patch, 
> HDFS-13500.02.patch
>
>
> To come out of chill mode currenly if one datanode is registered, we come out 
> of chill mode in SCM.
> This needs to be changed to consider percentage of container reports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

2018-09-06 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-296.
---
Resolution: Implemented

Fixed via 355,356,357,358...

> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -
>
> Key: HDDS-296
> URL: https://issues.apache.org/jira/browse/HDDS-296
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Anu Engineer
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-314) ozoneShell putKey command overwrites the existing key having same name

2018-09-06 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-314.
---
Resolution: Not A Problem

> ozoneShell putKey command overwrites the existing key having same name
> --
>
> Key: HDDS-314
> URL: https://issues.apache.org/jira/browse/HDDS-314
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-314.001.patch, HDDS-314.002.patch, 
> HDDS-314.003.patch
>
>
> steps taken : 
> 1) created a volume root-volume and a bucket root-bucket.
> 2)  Ran following command to put a key with name 'passwd'
>  
> {noformat}
> hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd 
> -file /etc/services -v
> 2018-08-02 09:20:17 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Volume Name : root-volume
> Bucket Name : root-bucket
> Key Name : passwd
> File Hash : 567c100888518c1163b3462993de7d47
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 02, 2018 9:20:18 AM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy
>  
> {noformat}
> 3) Ran following command to put a key with name 'passwd' again.
> {noformat}
> hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd 
> -file /etc/passwd -v
> 2018-08-02 09:20:41 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Volume Name : root-volume
> Bucket Name : root-bucket
> Key Name : passwd
> File Hash : b056233571cc80d6879212911cb8e500
> 2018-08-02 09:20:41 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 02, 2018 9:20:42 AM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl 
> detectProxy{noformat}
>  
> key 'passwd' was overwritten with new content and it did not throw any saying 
> that the key is already present.
> Expectation :
> ---
> key overwrite with same name should not be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13872) Only some protocol methods should perform msync wait

2018-09-06 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen resolved HDFS-13872.

Resolution: Duplicate

Closing in favor of HDFS-13880

> Only some protocol methods should perform msync wait
> 
>
> Key: HDFS-13872
> URL: https://issues.apache.org/jira/browse/HDFS-13872
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13872-HDFS-12943.000.patch
>
>
> Currently the implementation of msync added in HDFS-13767 waits until the 
> server has caught up to the client-specified transaction ID regardless of 
> what the inbound RPC is. This particularly causes problems for 
> ObserverReadProxyProvider (see HDFS-13779) when we try to fetch the state 
> from an observer/standby; this should be a quick operation, but it has to 
> wait for the node to catch up to the most current state. I initially thought 
> all {{HAServiceProtocol}} methods should thus be excluded from the wait 
> period, but actually I think the right approach is that _only_ 
> {{ClientProtocol}} methods should be subjected to the wait period. I propose 
> that we can do this via an annotation on client protocol which can then be 
> checked within {{ipc.Server}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-402) add separate unit tests for SCM chill mode exit rules

2018-09-06 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar resolved HDDS-402.
-
Resolution: Not A Problem

>  add separate unit tests for SCM chill mode exit rules
> --
>
> Key: HDDS-402
> URL: https://issues.apache.org/jira/browse/HDDS-402
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Ajay Kumar
>Priority: Major
>
>  add separate unit tests for SCM chill mode exit rules



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-409) Ozone acceptance-test and integration-test packages have undefined hadoop component

2018-09-06 Thread Hanisha Koneru (JIRA)
Hanisha Koneru created HDDS-409:
---

 Summary: Ozone acceptance-test and integration-test packages have 
undefined hadoop component
 Key: HDDS-409
 URL: https://issues.apache.org/jira/browse/HDDS-409
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru


When building the ozone package, the acceptance-test and integration-test 
packages create an UNDEF hadoop component in the share folder:
 * 
./hadoop-ozone/acceptance-test/target/hadoop-ozone-acceptance-test-3.2.0-SNAPSHOT/share/hadoop/UNDEF/lib
 * 
./hadoop-ozone/integration-test/target/hadoop-ozone-integration-test-0.2.1-SNAPSHOT/share/hadoop/UNDEF/lib
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-407) ozone logs are wriiten to ozone.log. instead of ozone.log

2018-09-06 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-407:
---

 Summary: ozone logs are wriiten to ozone.log. instead of 
ozone.log
 Key: HDDS-407
 URL: https://issues.apache.org/jira/browse/HDDS-407
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nilotpal Nandi
 Fix For: 0.2.1


Please refer below details 

ozone related logs are written to ozone.log.2018-09-05 instead of ozone.log. 
Also, please check the timestamps of the logs. The cluster was created 
{noformat}
[root@ctr-e138-1518143905142-459606-01-02 logs]# ls -lhart 
/root/hadoop_trunk/ozone-0.2.1-SNAPSHOT/logs/
total 968K
drwxr-xr-x 9 root root 4.0K Sep 5 10:04 ..
-rw-r--r-- 1 root root 0 Sep 5 10:04 fairscheduler-statedump.log
-rw-r--r-- 1 root root 17K Sep 5 10:05 
hadoop-root-om-ctr-e138-1518143905142-459606-01-02.hwx.site.out.1
-rw-r--r-- 1 root root 16K Sep 5 10:10 
hadoop-root-om-ctr-e138-1518143905142-459606-01-02.hwx.site.out
-rw-r--r-- 1 root root 11K Sep 5 10:10 
hadoop-root-om-ctr-e138-1518143905142-459606-01-02.hwx.site.log
-rw-r--r-- 1 root root 17K Sep 6 05:42 
hadoop-root-datanode-ctr-e138-1518143905142-459606-01-02.hwx.site.out
-rw-r--r-- 1 root root 2.1K Sep 6 13:20 ozone.log
-rw-r--r-- 1 root root 67K Sep 6 13:22 
hadoop-root-datanode-ctr-e138-1518143905142-459606-01-02.hwx.site.log
drwxr-xr-x 2 root root 4.0K Sep 6 13:31 .
-rw-r--r-- 1 root root 811K Sep 6 13:39 ozone.log.2018-09-05
[root@ctr-e138-1518143905142-459606-01-02 logs]# date
Thu Sep 6 13:39:47 UTC 2018{noformat}
 

tail of ozone.log
{noformat}
[root@ctr-e138-1518143905142-459606-01-02 logs]# tail -f ozone.log
2018-09-06 10:51:56,616 [IPC Server handler 13 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 10:52:18,570 [IPC Server handler 9 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file1 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 10:52:32,256 [IPC Server handler 12 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file2 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 10:53:11,008 [IPC Server handler 14 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file2 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 10:53:28,316 [IPC Server handler 10 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file2 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 10:53:39,509 [IPC Server handler 17 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 0file3 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 11:31:02,388 [IPC Server handler 19 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 2GBFILE allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 11:32:44,269 [IPC Server handler 12 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key 2GBFILE_1 allocated in volume test-vol2 bucket 
test-bucket2
2018-09-06 13:17:33,408 [IPC Server handler 16 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key FILEWITHZEROS allocated in volume test-vol2 
bucket test-bucket2
2018-09-06 13:20:13,897 [IPC Server handler 15 on 9889] DEBUG 
(KeyManagerImpl.java:255) - Key FILEWITHZEROS1 allocated in volume test-vol2 
bucket test-bucket2{noformat}
 

tail of ozone.log.2018-09-05:
{noformat}
root@ctr-e138-1518143905142-459606-01-02 logs]# tail -50 
ozone.log.2018-09-05
2018-09-06 13:28:57,866 [BlockDeletingService#8] DEBUG 
(TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
container, there is no pending deletion block contained in remaining containers.
2018-09-06 13:29:07,816 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:145) - Executing cycle Number : 3266
2018-09-06 13:29:13,687 [Datanode ReportManager Thread - 0] DEBUG 
(ContainerSet.java:191) - Starting container report iteration.
2018-09-06 13:29:37,816 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:145) - Executing cycle Number : 3267
2018-09-06 13:29:57,866 [BlockDeletingService#8] DEBUG 
(TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
container, there is no pending deletion block contained in remaining containers.
2018-09-06 13:30:07,816 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:145) - Executing cycle Number : 3268
2018-09-06 13:30:19,186 [Datanode ReportManager Thread - 0] DEBUG 
(ContainerSet.java:191) - Starting container report iteration.
2018-09-06 13:30:37,816 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:145) - Executing cycle Number : 3269
2018-09-06 13:30:57,866 [BlockDeletingService#8] DEBUG 
(TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
container, there is no pending deletion block contained in remaining containers.
2018-09-06 13:31:07,816 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:145) - Executing cycle Number : 

[jira] [Created] (HDFS-13901) INode access time is ignored because of race between open and rename

2018-09-06 Thread Jinglun (JIRA)
Jinglun created HDFS-13901:
--

 Summary: INode access time is ignored because of race between open 
and rename
 Key: HDFS-13901
 URL: https://issues.apache.org/jira/browse/HDFS-13901
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jinglun


That's because in getBlockLocations there is a gap between readUnlock and 
re-fetch write lock (to update access time). If a rename operation occurs in 
the gap, the update of access time will be ignored. We can calculate new path 
from the inode and use the new path to update access time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-406) Enable acceptace test of the putKey for rpc protocol

2018-09-06 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-406:
-

 Summary: Enable acceptace test of the putKey for rpc protocol
 Key: HDDS-406
 URL: https://issues.apache.org/jira/browse/HDDS-406
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.2.1
Reporter: Elek, Marton
Assignee: Elek, Marton


The current acceptance tests are not testing the putKey behaviour with RPC 
protocol just with REST interface (Maybe there were some issues at the time of 
the test creation).

I would like to enable the putKey test for all the ozone shell usage (rpc/rest).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13900) NameNode: Unable to trigger a roll of the active NN

2018-09-06 Thread liuhongtong (JIRA)
liuhongtong created HDFS-13900:
--

 Summary: NameNode: Unable to trigger a roll of the active NN
 Key: HDFS-13900
 URL: https://issues.apache.org/jira/browse/HDFS-13900
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: liuhongtong


I have backport Multi-standby NNs to our own hdfs version. I found an issue of 
EditLog roll.
h2. Reproducible Steps:
h3. 1.original state

nn1 active

nn2 standby

nn3 standby
h3. 2. stop nn1
h3. 3. new state

nn1 stopped

nn2 active

nn3 standby
h3. 4. nn3 unable to trigger a roll of the active NN

[2018-08-22T10:33:38.025+08:00] [WARN] 
namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java 307) [Edit 
log tailer] : Unable to trigger a roll of the active NN
java.net.ConnectException: Call From  to  failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:[http://wiki.apache.org/hadoop/ConnectionRefused]
at sun.reflect.GeneratedConstructorAccessor17.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:722)
at org.apache.hadoop.ipc.Client.call(Client.java:1536)
at org.apache.hadoop.ipc.Client.call(Client.java:1463)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:237)
at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:301)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:298)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:414)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:304)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$800(EditLogTailer.java:69)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:346)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:315)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:332)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:328)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:521)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:485)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:658)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:419)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1585)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
... 14 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-405) User/volume mapping is not cleaned up during the deletion of the last volume

2018-09-06 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-405:
-

 Summary: User/volume mapping is not cleaned up during the deletion 
of the last volume 
 Key: HDDS-405
 URL: https://issues.apache.org/jira/browse/HDDS-405
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Elek, Marton
Assignee: Elek, Marton


I found this bug with executing (an improved) acceptance tests: After creating 
and deleting a key the listKey didn't work any more as in the user table a 
stale record remained after the deletion.

The root cause is that the user is removed from the default column family 
instead of the table column family.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13899) unexpected message type: PooledUnsafeDirectByteBuf when get datanode info through http proxy

2018-09-06 Thread sunlisheng (JIRA)
sunlisheng created HDFS-13899:
-

 Summary: unexpected message type: PooledUnsafeDirectByteBuf when 
get datanode info through http proxy
 Key: HDFS-13899
 URL: https://issues.apache.org/jira/browse/HDFS-13899
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: sunlisheng
 Attachments: HDFS-.diff

unexpected message type: PooledUnsafeDirectByteBuf when get datanode info 
through http proxy 

noHttpRequestDecoder in InboundHandler of netty ,so when read message,  appear 
unexpected message type

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13898) Throw retriable exception for getBlockLocations when ObserverNameNode is in safemode

2018-09-06 Thread Chao Sun (JIRA)
Chao Sun created HDFS-13898:
---

 Summary: Throw retriable exception for getBlockLocations when 
ObserverNameNode is in safemode
 Key: HDFS-13898
 URL: https://issues.apache.org/jira/browse/HDFS-13898
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chao Sun


When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe 
mode exception if the given file doesn't have any block yet. 

{code}
try {
  checkOperation(OperationCategory.READ);
  res = FSDirStatAndListingOp.getBlockLocations(
  dir, pc, srcArg, offset, length, true);
  if (isInSafeMode()) {
for (LocatedBlock b : res.blocks.getLocatedBlocks()) {
  // if safemode & no block locations yet then throw safemodeException
  if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
SafeModeException se = newSafemodeException(
"Zero blocklocations for " + srcArg);
if (haEnabled && haContext != null &&
haContext.getState().getServiceState() == 
HAServiceState.ACTIVE) {
  throw new RetriableException(se);
} else {
  throw se;
}
  }
}
  }
{code}

It only throws {{RetriableException}} for active NN so requests on observer may 
just fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13897) DiskBalancer: for invalid configurations print WARN message in console output while executing the Diskbalaner commands.

2018-09-06 Thread Harshakiran Reddy (JIRA)
Harshakiran Reddy created HDFS-13897:


 Summary: DiskBalancer: for invalid configurations print WARN 
message in console output while executing the Diskbalaner commands.
 Key: HDFS-13897
 URL: https://issues.apache.org/jira/browse/HDFS-13897
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Reporter: Harshakiran Reddy


{{Scenario:-}}

1. configure the invalid value for any disk balancer configurations and restart 
the Datanode
2. Run the disk balancer commands

{{Actual output:-}}

it's continue with default configurations

{{Excepted output:-}} 

it will print WARN message in console like *configured invalid value and taking 
the default value for this configuration* and that time user/customer knows our 
configurations are not effected to current disk balancer otherwise he will 
think it taking their configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org