from:"Mukul Kumar Singh"

[jira] [Created] (HDFS-16343) Add some debug logs when the dfsUsed are not used during Datanode startup

2021-11-20 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-16343:


 Summary: Add some debug logs when the dfsUsed are not used during 
Datanode startup
 Key: HDFS-16343
 URL: https://issues.apache.org/jira/browse/HDFS-16343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16145) CopyListing fails with FNF exception with snapshot diff

2021-07-27 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-16145.
--
Fix Version/s: 3.3.2
   Resolution: Fixed

> CopyListing fails with FNF exception with snapshot diff
> ---
>
> Key: HDFS-16145
> URL: https://issues.apache.org/jira/browse/HDFS-16145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Distcp with snapshotdiff and with filters, marks a Rename as a delete 
> opeartion on the target if the rename target is to a directory which is 
> exluded by the filter. But, in cases, where files/subdirs created/modified 
> prior to the Rename post the old snapshot will still be present as 
> modified/created entries in the final copy list. Since, the parent diretory 
> is marked for deletion, these subsequent create/modify entries should be 
> ignored while building the final copy list. 
> With such cases, when the final copy list is built, distcp tries to do a 
> lookup for each create/modified file in the newer snapshot which will fail 
> as, the parent dir is already moved to a new location in later snapshot.
>  
> {code:java}
> sudo -u kms hadoop key create testkey
> hadoop fs -mkdir -p /data/gcgdlknnasg/
> hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/
> hadoop fs -mkdir -p /dest/gcgdlknnasg
> hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg
> hdfs dfs -mkdir /data/gcgdlknnasg/dir1
> hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ 
> hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ 
> [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/
> drwxrwxrwt   - hdfs supergroup  0 2021-07-16 14:05 
> /data/gcgdlknnasg/.Trash
> drwxr-xr-x   - hdfs supergroup  0 2021-07-16 13:07 
> /data/gcgdlknnasg/dir1
> [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/
> [root@nightly62x-1 logs]#
> hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/
> hdfs dfs -rm -r /data/gcgdlknnasg/dir1/
> hdfs dfs -mkdir /data/gcgdlknnasg/dir1/
> ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the 
> replication schedule. You get into below error and failure of the BDR job.
> 21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - 
> java.io.FileNotFoundException: File does not exist: 
> /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487)
> ……..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16121) Iterative snapshot diff report can generate duplicate records for creates, deletes and Renames

2021-07-08 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-16121.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> Iterative snapshot diff report can generate duplicate records for creates, 
> deletes and Renames
> --
>
> Key: HDFS-16121
> URL: https://issues.apache.org/jira/browse/HDFS-16121
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Srinivasu Majeti
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, iterative snapshot diff report first traverses the created list 
> for a directory diff and then the deleted list. If the deleted list size is 
> lesser than the created list size, the offset calculation in the respective 
> list seems wrong. So the next iteration of diff report generation call, it 
> will start iterating the already processed in the created list leading to 
> duplicate entries in the list.
> Fix is to correct the offset calculation during the traversal of the deleted 
> list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15865) Interrupt DataStreamer thread

2021-05-01 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-15865.
--
Resolution: Fixed

> Interrupt DataStreamer thread
> -
>
> Key: HDFS-15865
> URL: https://issues.apache.org/jira/browse/HDFS-15865
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Karthik Palanisamy
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Have noticed HiveServer2 halts due to DataStreamer#waitForAckedSeqno. 
> I think we have to interrupt DataStreamer if no packet ack(from datanodes). 
> It likely happens with infra/network issue.
> {code:java}
> "HiveServer2-Background-Pool: Thread-35977576" #35977576 prio=5 os_prio=0 
> cpu=797.65ms elapsed=3406.28s tid=0x7fc0c6c29800 nid=0x4198 in 
> Object.wait()  [0x7fc1079f3000]
>     java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  at java.lang.Object.wait(java.base(at)11.0.5/Native Method)
>  - waiting on 
>  at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:886)
>  - waiting to re-lock in wait() <0x7fe6eda86ca0> (a 
> java.util.LinkedList){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15518) Wrong operation name in FsNamesystem for listSnapshots

2020-08-07 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-15518:


 Summary: Wrong operation name in FsNamesystem for listSnapshots
 Key: HDFS-15518
 URL: https://issues.apache.org/jira/browse/HDFS-15518
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh


List snapshots makes use of listSnapshotDirectory as the string in place of 
ListSnapshot.

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L7026



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15501) Update Apache documentation for new ordered snapshot deletion feature

2020-07-29 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-15501:


 Summary: Update Apache documentation for new ordered snapshot 
deletion feature
 Key: HDFS-15501
 URL: https://issues.apache.org/jira/browse/HDFS-15501
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh


Update Apache documentation for new ordered snapshot deletion feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15500) Add more assertions about ordered deletion of snapshot

2020-07-29 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-15500:


 Summary: Add more assertions about ordered deletion of snapshot
 Key: HDFS-15500
 URL: https://issues.apache.org/jira/browse/HDFS-15500
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh
Assignee: Tsz-wo Sze


The jira proposes to add new assertions, one of the assertion to start with is
a) Add an assertion that with ordered snapshot deletion flag true, prior 
snapshot in cleansubtree is null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15496) Add UI for deleted snapshots

2020-07-27 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-15496:


 Summary: Add UI for deleted snapshots
 Key: HDFS-15496
 URL: https://issues.apache.org/jira/browse/HDFS-15496
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh


Add a 

a) Show the list of snapshots per snapshottable directory
b) Add deleted status in the JMX output for the Snapshot along with a snap ID
e) NN UI, should sort the snapshots for snapIds. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Apply for edit permission of cwiki.apache.org to edit page "Mountable+Ozone+File+System"

2020-06-08 Thread Mukul Kumar Singh


Hi Baoloongmao,

Can you please signup to cwiki and share your userid. ?

Thanks,

Mukul

On 08/06/20 8:56 am, baoloongmao(毛宝龙) wrote:

HI all,

Last week, I build a dfs-fuse, and try to test it, but, i cannot run dfs-fuse to 
access Ozone successfully through the document 
“https://cwiki.apache.org/confluence/display/HADOOP/Mountable+Ozone+File+System”,
Luckily, I get help from Nanda, he help me to let dfs-fuse work, I think that 
it is the time to update this document to add some more necessary information, 
but I have no edit permission for that page.

Is it possible to grant my permission to edit that wiki page?


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-15301.
--
Fix Version/s: 3.4.0
   3.3.0
   Resolution: Fixed

> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
> Fix For: 3.3.0, 3.4.0
>
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Shade guava into hadoop-thirdparty

2020-04-06 Thread Mukul Kumar Singh

On 07/04/20 7:05 am, Zhankun Tang wrote:

Thanks, Wei-Chiu for the proposal. +1.

On Mon, 6 Apr 2020 at 20:17, Ayush Saxena wrote:

-Ayush

On 05-Apr-2020, at 12:43 AM, Wei-Chiu Chuang wrote:

Hi Hadoop devs,

I spent a good part of the past 7 months working with a dozen of

colleagues

to update the guava version in Cloudera's software (that includes Hadoop,
HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects)

After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 /
3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be
really hard because of guava. Because of Guava, the amount of work to
certify a minor release update is almost equivalent to a major release
update.

That is because:
(1) Going from guava 11 to guava 27 is a big jump. There are several
incompatible API changes in many places. Too bad the Google developers

are

not sympathetic about its users.
(2) guava is used in all Hadoop jars. Not just Hadoop servers but also
client jars and Hadoop common libs.
(3) The Hadoop library is used in practically all software at Cloudera.

Here is my proposal:
(1) shade guava into hadoop-thirdparty, relocate the classpath to
org.hadoop.thirdparty.com.google.common.*
(2) make a hadoop-thirdparty 1.1.0 release.
(3) update existing references to guava to the relocated path. There are
more than 2k imports that need an update.
(4) release Hadoop 3.3.1 / 3.2.2 that contains this change.

In this way, we will be able to update guava in Hadoop in the future
without disrupting Hadoop applications.

Note: HBase already did this and this guava update project would have

been

much more difficult if HBase didn't do so.

Thoughts? Other options include
(1) force downstream applications to migrate to Hadoop client artifacts

listed here

https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html

but
that's nearly impossible.
(2) Migrate Guava to Java APIs. I suppose this is a big project and I

can't

estimate how much work it's going to be.

Weichiu

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [Discuss] Ozone moving to Beta tag

2020-02-23 Thread Mukul Kumar Singh

+1 for the Beta. Data pipeline and Ozone manager improvement have 
certainly helped in the latest runs.


Thanks,

Mukul

On 24/02/20 8:44 am, Bharat Viswanadham wrote:

+1 for Beta given major performance improvement work went in Ozone Manager
and Datanode Pipeline.

I have been testing Teragen runs and now we have consistent runs and
performance is almost near to HDFS with disaggregated Storage and compute
cluster.



Thanks,
Bharat


On Sun, Feb 23, 2020 at 6:35 PM Sammi Chen  wrote:


+1,  Impressive performance achievement on OzoneManager, let's move to
Beta.

Bests,
Sammi Chen

On Thu, Feb 20, 2020 at 4:17 AM Anu Engineer  wrote:


Hi All,


I would like to propose moving Ozone from 'Alpha' tags to 'Beta' tags

when

we do future releases. Here are a couple of reasons why I think we should
make this move.



1. Ozone Manager or the Namenode for Ozone scales to more than 1

billion

keys. We tested this in our labs in an organic fashion; that is, we

were

able to create more than 1 billion keys from external clients with no
loss
in performance.
2. The ozone Manager meets the performance and resource constraints

that

we set out to achieve. We were able to sustain the same throughput at
Ozone
manager for over three days that took us to get this 1 billion keys.
That
is, we did not have to shut down or resize memory for the namenode as

we

went through this exercise.
3.  The most critical, we did this experiment with 64GB of memory
allocation in JVM and 64 GB of RAM off-heap allocation. That is, the
Ozone
Manager was able to achieve this scale with far less memory footprint
than
HDFS.
4. Ozone's performance is at par with HDFS when running workloads like
Hive (



https://blog.cloudera.com/benchmarking-ozone-clouderas-next-generation-storage-for-cdp/

)
5. We have been able to run long-running clusters with Ozone.


Having achieved these goals, I propose that we move from the planned
0.4.2-Alpha release to 0.5.0-Beta as our next release. If we hear no
concerns about this, we would like to move Ozone from Alpha to Beta
releases.


Thanks

Anu


P.S. I am CC-ing HDFS dev since many people who are interested in Ozone
still have not subscribed to Ozone dev lists. My apologies if it feels

like

spam, I promise that over time we will become less noisy in the HDFS
channel.


PPS. I know lots of you will want to know more specifics; Our blog

presses

are working overtime and I promise you that you will get to see all the
details pretty soon.



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2600) Move chaos test to org.apache.hadoop.ozone.chaos package

2019-11-20 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2600:
---

 Summary: Move chaos test to org.apache.hadoop.ozone.chaos package
 Key: HDDS-2600
 URL: https://issues.apache.org/jira/browse/HDDS-2600
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Mukul Kumar Singh


This is a simple refactoring change where all the chaos test are moved to  
org.apache.hadoop.ozone.chaos package



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2389) add toStateMachineLogEntryString provider in Ozone's ContainerStateMachine

2019-10-31 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2389:
---

 Summary: add toStateMachineLogEntryString provider in Ozone's 
ContainerStateMachine
 Key: HDDS-2389
 URL: https://issues.apache.org/jira/browse/HDDS-2389
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.1
Reporter: Mukul Kumar Singh


This jira proposes to add a new toStateMachineLogEntryString provider in 
Ozone's ContainerStateMachine to print extra log debug statements in Ratis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2056) Datanode unable to start command handler thread with security enabled

2019-10-25 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2056.
-
Resolution: Duplicate

> Datanode unable to start command handler thread with security enabled
> -
>
> Key: HDDS-2056
> URL: https://issues.apache.org/jira/browse/HDDS-2056
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 0.5.0
>
>
>  
> {code:java}
> 2019-08-29 02:50:23,536 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> Critical Error : Command processor thread encountered an error. Thread: 
> Thread[Command processor thread,5,main]
> java.lang.IllegalArgumentException: Null user
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256)
>         at 
> org.apache.hadoop.hdds.security.token.BlockTokenVerifier.verify(BlockTokenVerifier.java:116)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.XceiverServer.submitRequest(XceiverServer.java:68)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.submitRequest(XceiverServerRatis.java:482)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:109)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2364) Add a OM metrics to find the false positive rate for the keyMayExist

2019-10-24 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2364:
---

 Summary: Add a OM metrics to find the false positive rate for the 
keyMayExist
 Key: HDDS-2364
 URL: https://issues.apache.org/jira/browse/HDDS-2364
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.5.0
Reporter: Mukul Kumar Singh


Add a OM metrics to find the false positive rate for the keyMayExist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2339) Add OzoneManager to MiniOzoneChaosCluster

2019-10-21 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2339:
---

 Summary: Add OzoneManager to MiniOzoneChaosCluster
 Key: HDDS-2339
 URL: https://issues.apache.org/jira/browse/HDDS-2339
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: om
Reporter: Mukul Kumar Singh


This jira proposes to add OzoneManager to MiniOzoneChaosCluster with OzoneHA 
implementation done. This will help in discovering bugs in Ozone Manager HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set

2019-10-20 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2280.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

Thanks for the contribution [~shashikant] and [~bharat] for the review. I have 
committed this.

> HddsUtils#CheckForException should not return null in case the ratis 
> exception cause is not set
> ---
>
> Key: HDDS-2280
> URL: https://issues.apache.org/jira/browse/HDDS-2280
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsUtils#CheckForException checks for the cause to be set properly to one of 
> the defined/expected exceptions. In case, ratis throws up any runtime 
> exception, HddsUtils#CheckForException can return null and lead to 
> NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception

2019-10-20 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2281.
-
Resolution: Fixed

Thanks for the contribution [~shashikant]. I have committed this.

> ContainerStateMachine#handleWriteChunk should ignore close container 
> exception 
> ---
>
> Key: HDDS-2281
> URL: https://issues.apache.org/jira/browse/HDDS-2281
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, ContainerStateMachine#applyTrannsaction ignores close container 
> exception.Similarly,ContainerStateMachine#handleWriteChunk call also should 
> ignore close container exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-20 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2283.
-
Resolution: Fixed

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>    Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block

2019-10-20 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2286.
-
Resolution: Fixed

Thanks for the contribution [~swagle] and [~adoroszlai] for the review. I have 
committed this.

> Add a log info in ozone client and scm to print the exclusion list during 
> allocate block
> 
>
> Key: HDDS-2286
> URL: https://issues.apache.org/jira/browse/HDDS-2286
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2204) Avoid buffer coping in checksum verification

2019-10-14 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2204.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

I have committed this to master. thanks for the conctribution [~szetszwo] and 
[~shashikant] for the review.

> Avoid buffer coping in checksum verification
> 
>
> Key: HDDS-2204
> URL: https://issues.apache.org/jira/browse/HDDS-2204
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: o2204_20190930.patch, o2204_20190930b.patch, 
> o2204_20191001.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In Checksum.verifyChecksum(ByteString, ..), it first converts the ByteString 
> to a byte array.  It lead to an unnecessary buffer coping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2306) Fix TestWatchForCommit failure

2019-10-14 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2306:
---

 Summary: Fix TestWatchForCommit failure
 Key: HDDS-2306
 URL: https://issues.apache.org/jira/browse/HDDS-2306
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.1
Reporter: Mukul Kumar Singh



{code}
[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 203.385 
s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
[ERROR] 
test2WayCommitForTimeoutException(org.apache.hadoop.ozone.client.rpc.TestWatchForCommit)
  Time elapsed: 27.093 s  <<< ERROR!
java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at 
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:283)
at 
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:391)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2305) Update Ozone to later ratis snapshot.

2019-10-14 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2305:
---

 Summary: Update Ozone to later ratis snapshot.
 Key: HDDS-2305
 URL: https://issues.apache.org/jira/browse/HDDS-2305
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Mukul Kumar Singh


This jira will update ozone to latest ratis snapshot. for commit corresponding 
to 

{code}
commit 3f446aaf27704b0bf929bd39887637a6a71b4418 (HEAD -> master, origin/master, 
origin/HEAD)
Author: Tsz Wo Nicholas Sze 
Date:   Fri Oct 11 16:35:38 2019 +0800

RATIS-705. GrpcClientProtocolClient#close Interrupts itself.  Contributed 
by Lokesh Jain
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2285) GetBlock and RadChunk command from the client should be sent to the same datanode to re-use the same connection

2019-10-11 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2285:
---

 Summary: GetBlock and RadChunk command from the client should be 
sent to the same datanode to re-use the same connection
 Key: HDDS-2285
 URL: https://issues.apache.org/jira/browse/HDDS-2285
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Mukul Kumar Singh


I can be observed that the GetBlock and ReadChunk command is sent to 2 
different datanodes. It should be sent to the same datanode to re-use the 
connection.

{code}
19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to datanode 
172.26.32.224
19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
datanode 172.26.32.231
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2284) XceiverClientMetrics should be initialised as part of XceiverClientManager constructor

2019-10-11 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2284:
---

 Summary: XceiverClientMetrics should be initialised as part of 
XceiverClientManager constructor
 Key: HDDS-2284
 URL: https://issues.apache.org/jira/browse/HDDS-2284
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


XceiverClientMetrics is currently initialized in the read write path, the 
metric should be initialized while creating XceiverClientManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-11 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2283:
---

 Summary: Container Creation on datanodes take around 300ms due to 
rocksdb creation
 Key: HDDS-2283
 URL: https://issues.apache.org/jira/browse/HDDS-2283
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Mukul Kumar Singh


Container Creation on datanodes take around 300ms due to rocksdb creation. 
Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2260) Avoid evaluation of LOG.trace and LOG.debug statement in the read/write path

2019-10-06 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2260:
---

 Summary: Avoid evaluation of LOG.trace and LOG.debug statement in 
the read/write path
 Key: HDDS-2260
 URL: https://issues.apache.org/jira/browse/HDDS-2260
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client, Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


LOG.trace and LOG.debug with logging information will be evaluated even when 
debug/trace logging is disabled. This jira proposes to wrap all the trace/debug 
logging with 
LOG.isDebugEnabled and LOG.isTraceEnabled to prevent the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2235) Ozone Datanode web page doesn't exist

2019-10-02 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2235:
---

 Summary: Ozone Datanode web page doesn't exist
 Key: HDDS-2235
 URL: https://issues.apache.org/jira/browse/HDDS-2235
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


On trying to access the dn UI, the following error is seen.

http://dn_ip:9882/

{code}
HTTP ERROR 403
Problem accessing /. Reason:

Forbidden
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2215) SCM should exclude datanode it the pipeline initialisation fails

2019-10-01 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2215:
---

 Summary: SCM should exclude datanode it the pipeline 
initialisation fails
 Key: HDDS-2215
 URL: https://issues.apache.org/jira/browse/HDDS-2215
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Mukul Kumar Singh


One of the node y131 is not accessible, however the RatisPipelineProider keeps 
chosing the same node for the pipeline initialization 


{code}
2019-10-01 06:03:46,023 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : b647db83-836d-41d5-bf8d-a04cb816025e{ip: 172.26.32.233, 
host: y133, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,044 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : cdf3c007-cf76-4997-85fc-a3385d826053{ip: 172.26.32.231, 
host: y131, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,099 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 1c699d4f-28a1-41ae-aa9c-8358f52b5d8d{ip: 172.26.32.230, 
host: y130, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,106 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : feba726b-2fcc-4b37-b112-8ed2e9fc8f94{ip: 172.26.32.224, 
host: y124, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,146 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 9c78c807-be23-415b-b1a2-5eaf6e8925b8{ip: 172.26.32.226, 
host: y126, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,235 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 0f6c93d2-c63f-4d1a-b57a-6012dd097bd1{ip: 172.26.32.225, 
host: y125, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,395 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 3e4db9bd-20ee-4e2a-8512-fddd37bf5cc2{ip: 172.26.32.228, 
host: y128.l42scl.hortonworks.com, networkLocation: /default-rack, 
certSerialId: null}
2019-10-01 06:03:46,395 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : ba096716-6942-4358-bb21-84623fd06d2c{ip: 172.26.32.232, 
host: y132, networkLocation: /default-rack, certSerialId: null}
2019-10-01 06:03:46,440 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 935dd070-8497-4b7d-a0be-ecb115586ed3{ip: 172.26.32.227, 
host: y127.l42scl.hortonworks.com, networkLocation: /default-rack, 
certSerialId: null}
2019-10-01 06:03:47,370 INFO 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager: Created pipeline 
Pipeline[ Id: 8ba7dda5-fcf6-45e3-a333-f4811311d34a, Nodes: 
b647db83-836d-41d5-bf8d-a04cb816025e{ip: 172.26.32.233, host: y133, 
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, 
State:OPEN]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-09-30 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDFS-14884:


 Summary: Add sanity check that zone key equals feinfo key while 
setting Xattrs
 Key: HDFS-14884
 URL: https://issues.apache.org/jira/browse/HDFS-14884
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, hdfs
Reporter: Mukul Kumar Singh


Currently, it is possible to set an external attribute where the  zone key is 
not the same as  feinfo key. This jira will add a precondition before setting 
this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2207) Update Ratis to latest snapshot

2019-09-30 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2207.
-
Resolution: Fixed

Thanks for working on this [~shashikant]. I have committed this to trunk.

> Update Ratis to latest snapshot
> ---
>
> Key: HDDS-2207
> URL: https://issues.apache.org/jira/browse/HDDS-2207
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This Jira aims to update ozone with latest ratis snapshot which has a crtical 
> fix for retry behaviour on getting not leader exception in client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2194) Replication of Container fails with "Only closed containers could be exported"

2019-09-26 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2194:
---

 Summary: Replication of Container fails with "Only closed 
containers could be exported"
 Key: HDDS-2194
 URL: https://issues.apache.org/jira/browse/HDDS-2194
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Mukul Kumar Singh


Replication of Container fails with "Only closed containers could be exported"

cc: [~nanda]

{code}
2019-09-26 15:00:17,640 [grpc-default-executor-13] INFO  
replication.GrpcReplicationService (GrpcReplicationService.java:download(57)) - 
Streaming container data (37) to other
datanode
Sep 26, 2019 3:00:17 PM 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@70e641f2
java.lang.IllegalStateException: Only closed containers could be exported: 
ContainerId=37
2019-09-26 15:00:17,644 [grpc-default-executor-17] ERROR 
replication.GrpcReplicationClient (GrpcReplicationClient.java:onError(142)) - 
Container download was unsuccessfull
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:527)
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNKNOWN
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
at 
org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at 
org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource
 at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
.java:64)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClient
 at 
org.apache.hadoop.hdds.protocol.datanode.proto.IntraDatanodeProtocolServiceGrpc$MethodHandlers.invoke(IntraDatanodeProtocolSCallListener.java:40)
erviceGrpc.java:217)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.
 at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
java:171)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClient
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:710)
CallListener.java:40)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.ja
 at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
va:397)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at java.lang.Thread.run(Thread.java:748)

at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(Cli

[jira] [Created] (HDDS-2188) Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce

2019-09-26 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2188:
---

 Summary: Implement LocatedFileStatus & getFileBlockLocations to 
provide node/localization information to Yarn/Mapreduce
 Key: HDDS-2188
 URL: https://issues.apache.org/jira/browse/HDDS-2188
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Filesystem
Affects Versions: 0.5.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


For applications like Hive/MapReduce to take advantage of the data locality in 
Ozone, Ozone should return the location of the Ozone blocks. This is needed for 
better read performance for Hadoop Applications.
{code}
if (file instanceof LocatedFileStatus) {
  blkLocations = ((LocatedFileStatus) file).getBlockLocations();
} else {
  blkLocations = fs.getFileBlockLocations(file, 0, length);
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2102:
---

 Summary: HddsVolumeChecker should use java optional in place of 
Guava optional
 Key: HDDS-2102
 URL: https://issues.apache.org/jira/browse/HDDS-2102
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


HddsVolumeChecker should use java optional in place of Guava optional, as the 
Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2088) Different components in MiniOzoneChaosCluster should log to different files

2019-09-05 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2088:
---

 Summary: Different components in MiniOzoneChaosCluster should log 
to different files
 Key: HDDS-2088
 URL: https://issues.apache.org/jira/browse/HDDS-2088
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee


Different components/nodes in MiniOzoneChaosCluster should log to different log 
files.
Thanks [~shashikant] for suggesting this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1806) Handle writeStateMachine Failures in Ozone

2019-09-03 Thread Mukul Kumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-1806.
-
Resolution: Duplicate

This is same as HDDS-1485. Duping it.

> Handle writeStateMachine Failures in Ozone
> --
>
> Key: HDDS-1806
> URL: https://issues.apache.org/jira/browse/HDDS-1806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Supratim Deka
>Priority: Major
> Fix For: 0.5.0
>
>
>  
> {code:java}
> Unexpected Storage Container Exception: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 3 does not exist
> Stacktrace
> java.io.IOException: Unexpected Storage Container Exception: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 3 does not exist at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:549)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:540)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:615)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 3 does not exist at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:537)
>  ... 7 more
> {code}
> The error propagated to client is erroneous. The container creation failed as 
> a result disk full   condition but never propagated to client.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-03 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2076:
---

 Summary: Read fails because the block cannot be located in the 
container
 Key: HDDS-2076
 URL: https://issues.apache.org/jira/browse/HDDS-2076
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client, Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
 Attachments: log.zip

Read fails as the client is not able to read the block from the container.

{code}
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
 12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
|List$0(ContainerP
rotocolCalls.java:569)
{code}


The client eventually exits here
{code}
2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR ozone.MiniOzoneLoadGenerator 
(MiniOzoneLoadGenerator.java:readData(176)) - LOADGEN: Read 
key:pool-224-thread-6_330651 failed with ex
ception
ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
LOADGEN: Exiting due to exception
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-2009) scm web ui should publish the list of scm pipeline by type and factor

2019-08-21 Thread Mukul Kumar Singh (Jira)

Mukul Kumar Singh created HDDS-2009:
---

 Summary: scm web ui should publish the list of scm pipeline by 
type and factor
 Key: HDDS-2009
 URL: https://issues.apache.org/jira/browse/HDDS-2009
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.5.0
Reporter: Mukul Kumar Singh


scm web ui should publish the list of scm pipeline by type and factor, this 
helps in monitoring the cluster in real time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1923) static/docs/start.html page doesn't render correctly on Firefox

2019-08-14 Thread Mukul Kumar Singh (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-1923.
-
Resolution: Invalid

Thanks for looking into this [~adoroszlai]. I just started a docker instance 
and the rendering looks fine. Resolving this.

> static/docs/start.html page doesn't render correctly on Firefox
> ---
>
> Key: HDDS-1923
> URL: https://issues.apache.org/jira/browse/HDDS-1923
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0
>        Reporter: Mukul Kumar Singh
>Assignee: Anu Engineer
>Priority: Blocker
>
> static/docs/start.html page doesn't render correctly on Firefox



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1957) MiniOzoneChaosCluster exits because of ArrayIndexOutOfBoundsException in load generator

2019-08-13 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1957:
---

 Summary: MiniOzoneChaosCluster exits because of 
ArrayIndexOutOfBoundsException in load generator
 Key: HDDS-1957
 URL: https://issues.apache.org/jira/browse/HDDS-1957
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


MiniOzoneChaosCluster exits because of ArrayIndexOutOfBoundsException in load 
generator.

It is exiting because of the following exception.
{code}
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.readData(MiniOzoneLoadGenerator.java:153)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.startAgedFilesLoad(MiniOzoneLoadGenerator.java:216)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$1(MiniOzoneLoadGenerator.java:242)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1955) TestBlockOutputStreamWithFailures#test2DatanodesFailure failing because of assertion error

2019-08-12 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1955:
---

 Summary: TestBlockOutputStreamWithFailures#test2DatanodesFailure 
failing because of assertion error
 Key: HDDS-1955
 URL: https://issues.apache.org/jira/browse/HDDS-1955
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


The test is failing because pipeline can be closed because of the datanode 
shutdown. This can also cause a ContainerNotOpenException to be raised.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1939) Owner/group information for a file should be returned from OzoneFileStatus

2019-08-09 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1939:
---

 Summary: Owner/group information for a file should be returned 
from OzoneFileStatus
 Key: HDDS-1939
 URL: https://issues.apache.org/jira/browse/HDDS-1939
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Security
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


BasicOzoneFilesystem returns the file's user/group information as the current 
user/group. This should default to the information read from the acl's for the 
file.

cc [~xyao]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change

2019-08-08 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1933:
---

 Summary: Datanode should use hostname in place of ip addresses to 
allow DN's to work when ipaddress change
 Key: HDDS-1933
 URL: https://issues.apache.org/jira/browse/HDDS-1933
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode, SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


This was noticed by [~elek] while deploying Ozone on Kubernetes based 
environment.

When the datanode ip address change on restart, the Datanode details cease to 
be correct for the datanode. and this prevents the cluster from functioning 
after a restart.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1932) Add support for object expiration in the s3 api

2019-08-08 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1932:
---

 Summary: Add support for object expiration in the s3 api
 Key: HDDS-1932
 URL: https://issues.apache.org/jira/browse/HDDS-1932
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: S3
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


This jira proposes to add support for object expiration in the s3 API. Objects 
are deleted once the object life cycle time is elapsed.

https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1924:
---

 Summary: ozone sh bucket path command does not exist
 Key: HDDS-1924
 URL: https://issues.apache.org/jira/browse/HDDS-1924
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


ozone sh bucket path command does not exist but it is mentioned in the 
static/docs/interface/s3.html. The command should either be added back or a the 
documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1923) static/docs/start.html page doesn't render correctly on Firefox

2019-08-07 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1923:
---

 Summary: static/docs/start.html page doesn't render correctly on 
Firefox
 Key: HDDS-1923
 URL: https://issues.apache.org/jira/browse/HDDS-1923
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


static/docs/start.html page doesn't render correctly on Firefox





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1922) Next button on the bottom of "static/docs/index.html" landing page does not work

2019-08-07 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1922:
---

 Summary: Next button on the bottom of "static/docs/index.html" 
landing page does not work
 Key: HDDS-1922
 URL: https://issues.apache.org/jira/browse/HDDS-1922
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


On Ozone landing doc page, the next link doesn't work .




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM

2019-08-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1899:
---

 Summary: DeleteBlocksCommandHandler is unable to find the 
container in SCM
 Key: HDDS-1899
 URL: https://issues.apache.org/jira/browse/HDDS-1899
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


DeleteBlocksCommandHandler is unable to find a container in SCM.

{code}
2019-08-02 14:04:56,735 WARN  commandhandler.DeleteBlocksCommandHandler 
(DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete 
blocks for container=33, TXID=184
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the container 33
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114)
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1898) GrpcReplicationService#download cannot replicate the container

2019-08-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1898:
---

 Summary: GrpcReplicationService#download cannot replicate the 
container
 Key: HDDS-1898
 URL: https://issues.apache.org/jira/browse/HDDS-1898
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Replication of container is failing because of rocksdb is unable to find the 
underlying files.

{code}
2019-08-02 14:07:26,670 INFO  replication.GrpcReplicationService 
(GrpcReplicationService.java:close(124)) - 663284 bytes written to th
e rpc stream from container 12
2019-08-02 14:07:26,670 ERROR replication.GrpcReplicationService 
(GrpcReplicationService.java:download(65)) - Can't stream the contain
er data
java.io.FileNotFoundException: 
/Users/msingh/code/apache/ozone/github/chaos_runs/hadoop-ozone/integration-test/target/test/data/MiniOzoneClusterImpl-403d87c2-5cbe-4511-8e14-dce727f10cf9/datanode-7/data/containers/hdds/9f2a75dc-3243-462a-a90e-c83f63ad0d55/current/containerDir0/12/metadata/12-dn-container.db/002084.log
 (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.includeFile(TarContainerPacker.java:243)
at 
org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.includePath(TarContainerPacker.java:233)
at 
org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.pack(TarContainerPacker.java:164)
at 
org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:67)
at 
org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
at 
org.apache.hadoop.hdds.protocol.datanode.proto.IntraDatanodeProtocolServiceGrpc$MethodHandlers.invoke(IntraDatanodeProtocolServiceGrpc.java:217)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:710)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.io.IOException: This archives contains unclosed 
entries.
at 
org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.finish(TarArchiveOutputStream.java:214)
at 
org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.close(TarArchiveOutputStream.java:229)
at 
org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.pack(TarContainerPacker.java:173)
... 11 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1897) SCMNodeManager.java#getNodeByAddress cannot find nodes by addresses

2019-08-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1897:
---

 Summary: SCMNodeManager.java#getNodeByAddress cannot find nodes by 
addresses
 Key: HDDS-1897
 URL: https://issues.apache.org/jira/browse/HDDS-1897
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Mukul Kumar Singh


SCMNodeManager cannot find the nodes via ip addresses in MiniOzoneChaosCluster

{code}
2019-08-02 13:57:01,501 WARN  node.SCMNodeManager 
(SCMNodeManager.java:getNodeByAddress(599)) - Cannot find node for address 
127.0.0.1
{code}

cc: [~xyao] & [~Sammi]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1804) TestCloseContainerHandlingByClient#estBlockWrites fails intermittently

2019-07-26 Thread Mukul Kumar Singh (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-1804.
-
Resolution: Duplicate

This is fixed via HDDS-1817. Duping it.

> TestCloseContainerHandlingByClient#estBlockWrites fails intermittently
> --
>
> Key: HDDS-1804
> URL: https://issues.apache.org/jira/browse/HDDS-1804
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> The test fails intermittently as reported here:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1082/1/testReport/org.apache.hadoop.ozone.client.rpc/TestCloseContainerHandlingByClient/testBlockWrites/]
> {code:java}
> java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
>   at java.io.InputStream.read(InputStream.java:101)
>   at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.validateData(TestCloseContainerHandlingByClient.java:401)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testBlockWrites(TestCloseContainerHandlingByClient.java:471)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBoot

[jira] [Created] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complet

2019-07-26 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1868:
---

 Summary: Ozone pipelines should be marked as ready only after the 
leader election is complet
 Key: HDDS-1868
 URL: https://issues.apache.org/jira/browse/HDDS-1868
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode, SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
 Fix For: 0.5.0


Ozone pipeline on restart start in allocated state, they are moved into open 
state after all the pipeline have reported to it. However this potentially can 
lead into an issue where the pipeline is still not ready to accept any incoming 
IO operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1866) Enable purging of raft logs in ContainerStateMachine in 0.5.0

2019-07-26 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1866:
---

 Summary: Enable purging of raft logs in ContainerStateMachine in 
0.5.0
 Key: HDDS-1866
 URL: https://issues.apache.org/jira/browse/HDDS-1866
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
 Fix For: 0.5.0


The current purge gap for the raft logs is set to 1billion for 
ContainerStateMachine, this should be set to 100,000 or a similar value to 
reenable purging.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1835) Improve metric name for CSM Metrics

2019-07-19 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1835:
---

 Summary: Improve metric name for CSM Metrics
 Key: HDDS-1835
 URL: https://issues.apache.org/jira/browse/HDDS-1835
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


CSMMetrics currently uses the fully qualified class name as the metric name. 
This should be shortened.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-18 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1833:
---

 Summary: RefCountedDB printing of stacktrace should be moved to 
trace logging
 Key: HDDS-1833
 URL: https://issues.apache.org/jira/browse/HDDS-1833
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


RefCountedDB logs the stackTrace for both increment and decrement, this 
pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1832) Improve logging for PipelineActions handling in SCM and datanode

2019-07-18 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1832:
---

 Summary: Improve logging for PipelineActions handling in SCM and 
datanode
 Key: HDDS-1832
 URL: https://issues.apache.org/jira/browse/HDDS-1832
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode, SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


XceiverServerRatis should log the reason while sending the PipelineAction to 
the datanode.
Also on the PipelineActionHandler should also log the detailed reason for the 
action.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1823) RatisPipelineProvider#initializePipeline logging needs to be verbose on debugging

2019-07-18 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1823:
---

 Summary: RatisPipelineProvider#initializePipeline logging needs to 
be verbose on debugging
 Key: HDDS-1823
 URL: https://issues.apache.org/jira/browse/HDDS-1823
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


RatisPipelineProvider#initializePipeline does not logs the information about 
pipeline details and the failed nodes when initializePipeline fails. The 
debugging needs to be verbose to help in debugging.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1822) NPE in SCMCommonPolicy.chooseDatanodes

2019-07-18 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1822:
---

 Summary: NPE in SCMCommonPolicy.chooseDatanodes
 Key: HDDS-1822
 URL: https://issues.apache.org/jira/browse/HDDS-1822
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Exception is in SCMCommonPolicy.chooseDatanodes
{code}
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at java.util.ArrayList.removeAll(ArrayList.java:693)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMCommonPolicy.chooseDatanodes(SCMCommonPolicy.java:112)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:74)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.TestContainerPlacementFactory.testDefaultPolicy(TestContainerPlacementFactory.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}

cc : [~xyao] [~Sammi]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Force "squash and merge" option for PR merge on github UI

2019-07-17 Thread Mukul Kumar Singh

+1, Lets have Squash and merge as the default & only option on github UI.

Thanks,
Mukul

On 7/17/19 11:37 AM, Elek, Marton wrote:

Hi,

Github UI (ui!) helps to merge Pull Requests to the proposed branch.
There are three different ways to do it [1]:

1. Keep all the different commits from the PR branch and create one
additional merge commit ("Create a merge commit")

2. Squash all the commits and commit the change as one patch ("Squash
and merge")

3. Keep all the different commits from the PR branch but rebase, merge
commit will be missing ("Rebase and merge")

As only the option 2 is compatible with the existing development
practices of Hadoop (1 issue = 1 patch = 1 commit), I call for a lazy
consensus vote: If no objections withing 3 days, I will ask INFRA to
disable the options 1 and 3 to make the process less error prone.

Please let me know, what do you think,

Thanks a lot
Marton

ps: Personally I prefer to merge from local as it enables to sign the
commits and do a final build before push. But this is a different story,
this proposal is only about removing the options which are obviously
risky...

ps2: You can always do any kind of merge / commits from CLI, for example
to merge a feature branch together with keeping the history.

[1]:
https://help.github.com/en/articles/merging-a-pull-request#merging-a-pull-request-on-github

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1812) Du while calculating used disk space reports that chunk files are file not found

2019-07-16 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1812:
---

 Summary: Du while calculating used disk space reports that chunk 
files are file not found
 Key: HDDS-1812
 URL: https://issues.apache.org/jira/browse/HDDS-1812
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh




{code}
2019-07-16 08:16:49,787 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
not get disk usage information for path /data/3/ozone-0715
ExitCodeException exitCode=1: du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/b113dd390e68e914d3ff405f3deec564_stream_60448f
77-6349-48fa-ae86-b2d311730569_chunk_1.tmp.1.14118085': No such file or 
directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/37993af2849bdd0320d0f9d4a6ef4b92_stream_1f68be9f-e083-45e5-84a9-08809bc392ed
_chunk_1.tmp.1.14118091': No such file or directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a38677def61389ec0be9105b1b4fddff_stream_9c3c3741-f710-4482-8423-7ac6695be96b
_chunk_1.tmp.1.14118102': No such file or directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a689c89f71a75547471baf6182f3be01_stream_baf0f21d-2fb0-4cd8-84b0-eff1723019a0
_chunk_1.tmp.1.14118105': No such file or directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/f58cf0fa5cb9360058ae25e8bc983e84_stream_d8d5ea61-995f-4ff5-88fb-4a9e97932f00
_chunk_1.tmp.1.14118109': No such file or directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a1d13ee6bbefd1f8156b1bd8db0d1b67_stream_db214bdd-a0c0-4f4a-8bc7-a3817e047e45_chunk_1.tmp.1.14118115':
 No such file or directory
du: cannot access 
'/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/8f8a4bd3f6c31161a70f82cb5ab8ee60_stream_d532d657-3d87-4332-baf8-effad9b3db23_chunk_1.tmp.1.14118127':
 No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:62)
at org.apache.hadoop.fs.DU.refresh(DU.java:53)
at 
org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:181)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1797) Add per volume operation metrics in datanode dispatcher

2019-07-15 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1797:
---

 Summary: Add per volume operation metrics in datanode dispatcher
 Key: HDDS-1797
 URL: https://issues.apache.org/jira/browse/HDDS-1797
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Add per volume metrics in Ozone.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1796) SCMClientProtocolServer#getContainerWithPipeline should check for admin access

2019-07-14 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1796:
---

 Summary: SCMClientProtocolServer#getContainerWithPipeline should 
check for admin access
 Key: HDDS-1796
 URL: https://issues.apache.org/jira/browse/HDDS-1796
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


SCMClientProtocolServer#getContainerWithPipeline currently calls 
checkAdminAccess with user as null.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1792) Use ConcurrentHashSet in place of ConcurrentHashMap in ContainerStteMachine.

2019-07-12 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1792:
---

 Summary: Use ConcurrentHashSet in place of ConcurrentHashMap in 
ContainerStteMachine.
 Key: HDDS-1792
 URL: https://issues.apache.org/jira/browse/HDDS-1792
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


Use ConcurrentHashSet in place of ConcurrentHashMap in ContainerStteMachine.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-07-11 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1786:
---

 Summary: Datanodes takeSnapshot should delete previously created 
snapshots
 Key: HDDS-1786
 URL: https://issues.apache.org/jira/browse/HDDS-1786
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Right now, after after taking a new snapshot, the previous snapshot file is 
left in the raft log directory. When a new snapshot is taken, the previous 
snapshots should be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1782) Add an option to MiniOzoneChaosCluster to read files multiple times.

2019-07-10 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1782:
---

 Summary: Add an option to MiniOzoneChaosCluster to read files 
multiple times.
 Key: HDDS-1782
 URL: https://issues.apache.org/jira/browse/HDDS-1782
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


Right now MiniOzoneChaosCluster writes a file/ reads it and deletes it 
immediately. This jira proposes to add an option to read the file multiple time 
in MiniOzoneChaosCluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1777) JVM crash while shutting down Ozone datanode in ShutdownHook

2019-07-09 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1777:
---

 Summary: JVM crash while shutting down Ozone datanode in 
ShutdownHook
 Key: HDDS-1777
 URL: https://issues.apache.org/jira/browse/HDDS-1777
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
 Attachments: hs_err_pid1459.log

JVM crash while shutting down Ozone datanode in ShutdownHook with the following 
exception.

{code}
Stack: [0x70008791,0x700087a1],  sp=0x700087a0db20,  free 
space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libsystem_platform.dylib+0x1d09]  _platform_memmove$VARIANT$Haswell+0x29
C  [libzip.dylib+0x3399]  newEntry+0x65b
C  [libzip.dylib+0x352d]  ZIP_GetEntry2+0xd4
C  [libzip.dylib+0x2238]  Java_java_util_zip_ZipFile_getEntry+0xcf
J 108  java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x0001087d23ce 
[0x0001087d2300+0xce]
J 4302 C2 
java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; (22 
bytes) @ 0x000108d659e8 [0x000108d65660+0x388]
J 4583 C2 
sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;
 (85 bytes) @ 0x000108954b24 [0x000108954aa0+0x84]
J 25559 C2 java.net.URLClassLoader$2.run()Ljava/lang/Object; (5 bytes) @ 
0x00010c2c04c8 [0x00010c2c0380+0x148]
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x2ef1f6]
V  [libjvm.dylib+0x34fb24]
J 4197  
java.security.AccessController.doPrivileged(Ljava/security/PrivilegedAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
 (0 bytes) @ 0x000108e36055 [0x000108e35f80+0xd5]
J 25557 C2 
java.net.URLClassLoader.findResource(Ljava/lang/String;)Ljava/net/URL; (37 
bytes) @ 0x000109c8505c [0x000109c84fc0+0x9c]
J 25556 C2 java.lang.ClassLoader.getResource(Ljava/lang/String;)Ljava/net/URL; 
(36 bytes) @ 0x00010c2bb984 [0x00010c2bb640+0x344]
j  
org.apache.hadoop.conf.Configuration.getResource(Ljava/lang/String;)Ljava/net/URL;+5
j  
org.apache.hadoop.conf.Configuration.getStreamReader(Lorg/apache/hadoop/conf/Configuration$Resource;Z)Lorg/codehaus/stax2/XMLStreamReader2;+51
J 7480 C1 
org.apache.hadoop.conf.Configuration.loadResource(Ljava/util/Properties;Lorg/apache/hadoop/conf/Configuration$Resource;Z)Lorg/apache/hadoop/conf/Configuration$Resource;
 (322 bytes) @ 0x000109964bf4 [0x000109964700+0x4f4]
j  
org.apache.hadoop.conf.Configuration.loadResources(Ljava/util/Properties;Ljava/util/ArrayList;Z)V+50
J 8094 C2 org.apache.hadoop.conf.Configuration.getProps()Ljava/util/Properties; 
(162 bytes) @ 0x000109af1fc0 [0x000109af1d40+0x280]
J 15086 C2 
org.apache.hadoop.conf.Configuration.get(Ljava/lang/String;)Ljava/lang/String; 
(64 bytes) @ 0x00010ae9ee78 [0x00010ae9eb20+0x358]
J 21716 C1 
org.apache.hadoop.conf.Configuration.getTimeDuration(Ljava/lang/String;JLjava/util/concurrent/TimeUnit;)J
 (25 bytes) @ 0x00010b8ab4d4 [0x00010b8ab3c0+0x114]
j  
org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(Lorg/apache/hadoop/conf/Configuration;)J+9
j  
org.apache.hadoop.util.ShutdownHookManager$HookEntry.(Ljava/lang/Runnable;I)V+10
j  
org.apache.hadoop.util.ShutdownHookManager.removeShutdownHook(Ljava/lang/Runnable;)Z+30
j  org.apache.hadoop.ozone.container.common.volume.VolumeSet.shutdown()V+22
j  org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop()V+43
j  
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close()V+159
j  
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon()V+25
j  org.apache.hadoop.ozone.HddsDatanodeService.stop()V+101
j  org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(IZ)V+15
j  org.apache.hadoop.ozone.MiniOzoneChaosCluster.shutdownNodes()V+103
j  org.apache.hadoop.ozone.MiniOzoneChaosCluster.fail()V+48
j  org.apache.hadoop.ozone.MiniOzoneChaosCluster$$Lambda$507.run()V+4
J 22514 C2 
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (14 
bytes) @ 0x00010ba89fec [0x00010ba89fa0+0x4c]
J 23026 C1 java.util.concurrent.FutureTask.runAndReset()Z (128 bytes) @ 
0x00010bd067ec [0x00010bd06580+0x26c]
J 22790 C2 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (59 
bytes) @ 0x00010bdabd8c [0x00010bdabb20+0x26c]
J 7745 C1 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x000109a1c9e4 [0x000109a1b9c0+0x1024]
J 7178 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x0001098452c4 [0x0001098451c0+0x104]
J 6840 C1 java.lang.Thread.run()V (17 bytes) @ 0x00010977d2c4 
[0x00010977d180+0x144]
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x2ef1f6]
V  [libjvm.dylib+0x2ef99a]
V  [libjvm.dylib+0x2efb46]
V

[jira] [Created] (HDDS-1758) Add replication and key deletion tests to MiniOzoneChaosCluster

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1758:
---

 Summary: Add replication and key deletion tests to 
MiniOzoneChaosCluster
 Key: HDDS-1758
 URL: https://issues.apache.org/jira/browse/HDDS-1758
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


This jira adds capability for deleting keys and also to test Replication 
Manager code in MiniOzoneChaosCluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1756) DeleteContainerCommandHandler fails with NPE

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1756:
---

 Summary: DeleteContainerCommandHandler fails with NPE
 Key: HDDS-1756
 URL: https://issues.apache.org/jira/browse/HDDS-1756
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


DeleteContainerCommandHandler fails with NPE

{code}
Thread[Command processor 
thread,5,org.apache.hadoop.ozone.TestMiniChaosOzoneCluster]
java.lang.NullPointerException
at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.getHandler(ContainerController.java:138)
at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:128)
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handle(DeleteContainerCommandHandler.java:57)
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1755) getContainerWithPipeline should log the container ID in case of failure

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1755:
---

 Summary: getContainerWithPipeline should log the container ID in 
case of failure
 Key: HDDS-1755
 URL: https://issues.apache.org/jira/browse/HDDS-1755
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


getContainerWithPipeline should log the container ID in scm logs for easy 
debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1754) getContainerWithPipeline fails with PipelineNotFoundException

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1754:
---

 Summary: getContainerWithPipeline fails with 
PipelineNotFoundException
 Key: HDDS-1754
 URL: https://issues.apache.org/jira/browse/HDDS-1754
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Once a pipeline is closed or finalized and it was not able to close all the 
containers inside the pipeline. 

Then getContainerWithPipeline will try to fetch the pipeline state from 
pipelineManager after the pipeline has been closed.

{code}
2019-07-02 20:48:20,370 INFO  ipc.Server (Server.java:logException(2726)) - IPC 
Server handler 13 on 50130, call Call#17339 Retry#0 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.getContainerWithPipeline
 from 192.168.0.2:51452
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=e1a7b16a-48d9-4194-9774-ad49ec9ad78b not found
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:132)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:66)
at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:184)
at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:244)
at 
org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:144)
at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:16390)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1752) ConcurrentModificationException while handling DeadNodeHandler event

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1752:
---

 Summary: ConcurrentModificationException while handling 
DeadNodeHandler event
 Key: HDDS-1752
 URL: https://issues.apache.org/jira/browse/HDDS-1752
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


ConcurrentModificationException while handling DeadNodeHandler event

{code}
2019-07-02 19:29:25,190 ERROR events.SingleThreadExecutor 
(SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution message 
56591ec5-c9e4-416c-9a36-db0507739fe5{ip: 192.168.0.2, host: 192.16
8.0.2, networkLocation: /default-rack, certSerialId: null}
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
at java.util.HashMap$KeyIterator.next(HashMap.java:1466)
at java.lang.Iterable.forEach(Iterable.java:74)
at 
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
at 
org.apache.hadoop.hdds.scm.node.DeadNodeHandler.lambda$destroyPipelines$1(DeadNodeHandler.java:99)
at java.util.Optional.ifPresent(Optional.java:159)
at 
org.apache.hadoop.hdds.scm.node.DeadNodeHandler.destroyPipelines(DeadNodeHandler.java:98)
at 
org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:78)
at 
org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:44)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1753:
---

 Summary: Datanode unable to find chunk while replication data 
using ratis.
 Key: HDDS-1753
 URL: https://issues.apache.org/jira/browse/HDDS-1753
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Leader datanode is unable to read chunk from the datanode while replicating 
data from leader to follower.
Please note that deletion of keys is also happening while the data is being 
replicated.

{code}
2019-07-02 19:39:22,604 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
Reply:76a3eb0f-d7cd-477b-8973-db1
014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl 
(ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : 
ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
-4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
(9770) already h
as the append entries (first index: 1)
2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
Reply:76a3eb0f-d7cd-477b-8973-db1
014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
2019-07-02 19:39:22,605 INFO  keyvalue.KeyValueHandler 
(ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace ID: 
4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
hunk file. chunk info 
ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
 offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
(9770) already h
as the append entries (first index: 2)
2019-07-02 19:39:22,606 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
Reply:76a3eb0f-d7cd-477b-8973-db1
014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | 
op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | 
ret=FAILURE
java.lang.Exception: Unable to find the chunk file. chunk info 
ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
 offset=0, len=2048}
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495)
 ~[hadoop-hdds-container-service-0.5.0-SN
APSHOT.jar:?]
at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
 ~[guava-11.0.2.jar:?]
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
 ~[guava-11.0.2.jar:?]
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) 
~[guava-11.0.2.jar:?]
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
 ~[guava-11.0.2.jar:?]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) 
~[guava-11.0.2.jar:?]
at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
~[guava-11.0.2.jar:?]
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) 
~[guava-11.0.2.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.getCachedStateMachineData(ContainerStateMac

[jira] [Created] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1751:
---

 Summary: replication of underReplicated container fails with 
SCMContainerPlacementRackAware policy
 Key: HDDS-1751
 URL: https://issues.apache.org/jira/browse/HDDS-1751
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


SCM container replication fails with

{code}
2019-07-02 18:26:41,564 WARN  container.ReplicationManager 
(ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception while 
replicating container 18.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
choose.
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
at 
java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
at 
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1749) Ozone Client should randomize the list of nodes in pipeline for reads

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1749:
---

 Summary: Ozone Client should randomize the list of nodes in 
pipeline for reads
 Key: HDDS-1749
 URL: https://issues.apache.org/jira/browse/HDDS-1749
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Currently the list of nodes returned by SCM are static and are returned in the 
same order to all the clients. Ideally these should be sorted by the network 
topology and then returned to client.

However even when network topology in not available, then SCM/client should 
randomly sort the nodes before choosing the replica's to connect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1748) Error message for 3 way commit failure is not verbose

2019-07-02 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1748:
---

 Summary: Error message for 3 way commit failure is not verbose
 Key: HDDS-1748
 URL: https://issues.apache.org/jira/browse/HDDS-1748
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


The error message for 3 way client commit is not verbose, it should include 
blockID and pipeline ID along with node details for debugging.

{code}
2019-07-02 09:58:12,025 WARN  scm.XceiverClientRatis 
(XceiverClientRatis.java:watchForCommit(262)) - 3 way commit failed 
java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.NotReplicatedException: Request with call Id 39482 
and log index 11562 is not yet replicated to ALL_COMMITTED
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at 
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:259)
at 
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:194)
at 
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnFirstIndex(CommitWatcher.java:135)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:355)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:332)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:259)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
at java.io.OutputStream.write(OutputStream.java:75)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:103)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.NotReplicatedException: Request with call 
Id 39482 and log index 11562 is not yet replicated to ALL_COMMITTED
at 
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:245)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:254)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:249)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 3 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1729) Ozone Client should timeout if the put block futures are taking a long time

2019-06-27 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1729:
---

 Summary: Ozone Client should timeout if the put block futures are 
taking a long time
 Key: HDDS-1729
 URL: https://issues.apache.org/jira/browse/HDDS-1729
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


Ozone client currently enqueues a put future to the future map, However if the 
pipeline is slow, the client does not timeout and wait for the future to 
finish. For reasonable latency in the system, the client should timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1728) Add metrics for leader's latency in ContainerStateMachine

2019-06-27 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1728:
---

 Summary: Add metrics for leader's latency in ContainerStateMachine
 Key: HDDS-1728
 URL: https://issues.apache.org/jira/browse/HDDS-1728
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


This jira proposes to add metrics around leaders round trip reply to ratis 
client. This will be done via startTransaction api 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1724) Add metrics for ratis pipeline latency

2019-06-24 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1724:
---

 Summary: Add metrics for ratis pipeline latency 
 Key: HDDS-1724
 URL: https://issues.apache.org/jira/browse/HDDS-1724
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


This jira adds metrics around ratis pipeline metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1707) SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes when all nodes(40) are up

2019-06-19 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1707:
---

 Summary: SCMContainerPlacementRackAware#chooseDatanodes throws not 
enough datanodes when all nodes(40) are up
 Key: HDDS-1707
 URL: https://issues.apache.org/jira/browse/HDDS-1707
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Mukul Kumar Singh


SCMContainerPlacementRackAware#chooseDatanodes is failing with the following 
error repeatedly.

{code}
2019-06-17 22:15:52,455 WARN 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception while 
replicating container 407.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
choose.
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
at 
java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
at 
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1706) Replication Manager thread running too frequently

2019-06-19 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1706:
---

 Summary: Replication Manager thread running too frequently
 Key: HDDS-1706
 URL: https://issues.apache.org/jira/browse/HDDS-1706
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Replication manager is running too frequnently at a 3s interval in place of 
300s.

{code}
host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
null}.
2019-06-18 03:11:51,687 INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
Thread took 4 milliseconds for processing 739 containers.
.
2019-06-18 03:11:54,692 INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
Thread took 4 milliseconds for processing 739 containers.
{code}

It is because of the following lines

{code}
@Config(key = "thread.interval",
type = ConfigType.TIME,
defaultValue = "3s",
tags = {SCM, OZONE},
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1697) Optimize the KeyManagerImpl#*get Apis using Rocksdb#multiGet api

2019-06-17 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1697:
---

 Summary: Optimize the KeyManagerImpl#*get Apis using 
Rocksdb#multiGet api
 Key: HDDS-1697
 URL: https://issues.apache.org/jira/browse/HDDS-1697
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Rocksdb provides a multigetApit to get multiple set of keys in a single rocksdb 
call.

This will help in optimizing multiple rocksdb lookup, by replacing them with 
multiGetAsList call using the following api

https://github.com/facebook/rocksdb/blob/7a8d7358bb40b13a06c2c6adc62e80295d89ed05/java/src/main/java/org/rocksdb/RocksDB.java#L2050



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1693) Enable Partitioned-Index-Filters for OM Metadata Manager

2019-06-16 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1693:
---

 Summary: Enable Partitioned-Index-Filters for OM Metadata Manager
 Key: HDDS-1693
 URL: https://issues.apache.org/jira/browse/HDDS-1693
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Enable Partitioned-Index-Filters for OM Metadata Manager, this will help in 
caching metadablocks effectively as the size of the objects increase.

https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters#how-to-use-it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1692) RDBTable#iterator should disabled caching of the keys during iterator

2019-06-16 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1692:
---

 Summary: RDBTable#iterator should disabled caching of the keys 
during iterator
 Key: HDDS-1692
 URL: https://issues.apache.org/jira/browse/HDDS-1692
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Reporter: Mukul Kumar Singh


Iterator normally do a bulk load of the keys, this causes thrashing of the 
actual keys in the DB.

This option is documented here:-
https://github.com/facebook/rocksdb/wiki/Basic-Operations#cache



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1691) RDBTable#isExist should use Rocksdb#keyMayExist

2019-06-16 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1691:
---

 Summary: RDBTable#isExist should use Rocksdb#keyMayExist
 Key: HDDS-1691
 URL: https://issues.apache.org/jira/browse/HDDS-1691
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Mukul Kumar Singh


RDBTable#isExist can use Rocksdb#keyMayExist, this avoids the cost of reading 
the value for the key.

Please refer, 
https://github.com/facebook/rocksdb/blob/7a8d7358bb40b13a06c2c6adc62e80295d89ed05/java/src/main/java/org/rocksdb/RocksDB.java#L2184



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1681) TestNodeReportHandler failing because of NPE

2019-06-13 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1681:
---

 Summary: TestNodeReportHandler failing because of NPE
 Key: HDDS-1681
 URL: https://issues.apache.org/jira/browse/HDDS-1681
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Mukul Kumar Singh



{code}
[INFO] Running org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.469 s 
<<< FAILURE! - in org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
[ERROR] testNodeReport(org.apache.hadoop.hdds.scm.node.TestNodeReportHandler)  
Time elapsed: 0.31 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.(SCMNodeManager.java:122)
at 
org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.resetEventCollector(TestNodeReportHandler.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1679) TestBCSID failing because of dangling db references

2019-06-13 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1679:
---

 Summary: TestBCSID failing because of dangling db references
 Key: HDDS-1679
 URL: https://issues.apache.org/jira/browse/HDDS-1679
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


TestBCSID failing because of dangling db references.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1671) Multiple unit test fails because of assertion while validating Acls

2019-06-11 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1671:
---

 Summary: Multiple unit test fails because of assertion while 
validating Acls
 Key: HDDS-1671
 URL: https://issues.apache.org/jira/browse/HDDS-1671
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Security
Reporter: Mukul Kumar Singh


There are multiple unit test failures because of assertion in validateAcls
https://builds.apache.org/job/hadoop-multibranch/job/PR-846/7/testReport/




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1658) RaftRetryFailureException & AlreadyClosedException should not exclude pipeline from client

2019-06-06 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1658:
---

 Summary: RaftRetryFailureException & AlreadyClosedException should 
not exclude pipeline from client
 Key: HDDS-1658
 URL: https://issues.apache.org/jira/browse/HDDS-1658
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


This problem can be seen at 
https://builds.apache.org/job/hadoop-multibranch/job/PR-846/6/testReport/org.apache.hadoop.ozone.client.rpc/TestBCSID/testBCSID/.

As seen here, after a RaftRetryFailureException, the pipeline is excluded from 
the pipeline and that leads to SCM create a new pipeline. Creation of a new 
pipeline might not be possible in a test cluster because of limited number of 
nodes.

{code}
2019-06-06 22:29:23,311 WARN  KeyOutputStream - Encountered exception 
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.RaftRetryFailureException: Failed 
RaftClientRequest:client-AD0A1CB44582->73f367e6-7f91-4409-b4d3-b831e0bfb585@group-31FAD62742D6,
 cid=1, seq=1*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$313/142004@60d08041 
for 180 attempts with RetryLimited(maxAttempts=180, sleepTime=1000ms) on the 
pipeline Pipeline[ Id: 27d23af1-7180-42f5-b3c7-31fad62742d6, Nodes: 
73f367e6-7f91-4409-b4d3-b831e0bfb585{ip: 172.17.0.2, host: 5e847226af57, 
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, 
State:OPEN]. The last committed block length is 0, uncommitted data length is 5 
retry count 0
2019-06-06 22:29:23,343 WARN  BlockManagerImpl - Pipeline creation failed for 
type:RATIS factor:ONE. Retrying get pipelines call once.
org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot 
create pipeline of factor 1 using 0 nodes.
at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:151)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:57)
at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:149)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:190)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:172)
at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:82)
at 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1614) Container Missing in the datanode after restart

2019-05-30 Thread Mukul Kumar Singh (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-1614.
-
Resolution: Duplicate

> Container Missing in the datanode after restart
> ---
>
> Key: HDDS-1614
> URL: https://issues.apache.org/jira/browse/HDDS-1614
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>        Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Container missing on the datanode after a restart.
> {code}
> 08:10:44.308 [pool-2131-thread-1] ERROR DNAudit - user=null | ip=null | 
> op=WRITE_CHUNK {blockData=conID: 34 locID: 102182684750055212 bcsId: 6198} | 
> ret=FAILURE
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 34 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:207)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  [?:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1616) ManagedChannel references are being leaked in while removing RaftGroup

2019-05-30 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1616:
---

 Summary: ManagedChannel references are being leaked in while 
removing RaftGroup
 Key: HDDS-1616
 URL: https://issues.apache.org/jira/browse/HDDS-1616
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


ManagedChannel references are being leaked in while removing RaftGroup

{code}
May 30, 2019 8:12:20 AM 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
 cleanQueue
SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=1805, target=192.168.0.3:49867} 
was not shutdown properly!!! ~*~*~*
Make sure to call shutdown()/shutdownNow() and wait until 
awaitTermination() returns true.
java.lang.RuntimeException: ManagedChannel allocation site
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
at 
org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:118)
at 
org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:55)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:61)
at 
org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:60)
at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:107)
at 
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:91)
at 
org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:409)
at 
org.apache.ratis.client.impl.RaftClientImpl.groupRemove(RaftClientImpl.java:281)
at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:97)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler.processPipelineReport(PipelineReportHandler.java:100)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler.onMessage(PipelineReportHandler.java:80)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler.onMessage(PipelineReportHandler.java:44)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1615) ManagedChannel references are being leaked in ReplicationSupervisor.java

2019-05-30 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1615:
---

 Summary: ManagedChannel references are being leaked in 
ReplicationSupervisor.java
 Key: HDDS-1615
 URL: https://issues.apache.org/jira/browse/HDDS-1615
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


ManagedChannel references are being leaked in ReplicationSupervisor.java

{code}
May 30, 2019 8:10:56 AM 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
 cleanQueue
SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=1495, target=192.168.0.3:49868} 
was not shutdown properly!!! ~*~*~*
Make sure to call shutdown()/shutdownNow() and wait until 
awaitTermination() returns true.
java.lang.RuntimeException: ManagedChannel allocation site
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
at 
org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
at 
org.apache.hadoop.ozone.container.replication.GrpcReplicationClient.(GrpcReplicationClient.java:65)
at 
org.apache.hadoop.ozone.container.replication.SimpleContainerDownloader.getContainerDataFromReplicas(SimpleContainerDownloader.java:87)
at 
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:118)
at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:115)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1614) Container Missing in the datanode after restart

2019-05-30 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1614:
---

 Summary: Container Missing in the datanode after restart
 Key: HDDS-1614
 URL: https://issues.apache.org/jira/browse/HDDS-1614
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Container missing on the datanode after a restart.

{code}
08:10:44.308 [pool-2131-thread-1] ERROR DNAudit - user=null | ip=null | 
op=WRITE_CHUNK {blockData=conID: 34 locID: 102182684750055212 bcsId: 6198} | 
ret=FAILURE
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 34 has been lost and and cannot be recreated on this DataNode
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:207)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385)
 ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 [?:1.8.0_171]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_171]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1613) Read key fails with "Unable to find the block"

2019-05-30 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1613:
---

 Summary: Read key fails with "Unable to find the block"
 Key: HDDS-1613
 URL: https://issues.apache.org/jira/browse/HDDS-1613
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


Block read fails with 

{code}
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block with bcsID 11777 .Container 68 bcsId is 0.
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:573)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:120)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.initializeBlockInputStream(KeyInputStream.java:295)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.getStream(KeyInputStream.java:265)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.access$000(KeyInputStream.java:229)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.getStreamEntry(KeyInputStream.java:107)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:140)
at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
at java.io.InputStream.read(InputStream.java:101)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:114)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}


Looking at the 3 datanodes, the containers are in bcs id of 11748, 11748 and 0.

{code}
2019-05-30 08:28:05,348 INFO  keyvalue.KeyValueHandler 
(ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace ID: 
93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block with 
bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID


2019-05-30 08:28:05,363 INFO  keyvalue.KeyValueHandler 
(ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace ID: 
93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block with 
bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID


2019-05-30 08:28:05,377 INFO  keyvalue.KeyValueHandler 
(ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace ID: 
93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block with 
bcsID 11777 .Container 68 bcsId is 0. : Result: UNKNOWN_BCSID
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1593) Improve logging for failures during pipeline creation and usage.

2019-05-27 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1593:
---

 Summary: Improve logging for failures during pipeline creation and 
usage.
 Key: HDDS-1593
 URL: https://issues.apache.org/jira/browse/HDDS-1593
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh


When pipeline creation fails, then the pipeline ID along with all the nodes in 
the pipeline should be printed. Also the node for which pipeline creation 
failed should also be printed as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1582) Fix BindException due to address already in use in unit tests

2019-05-22 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1582:
---

 Summary: Fix BindException due to address already in use in unit 
tests
 Key: HDDS-1582
 URL: https://issues.apache.org/jira/browse/HDDS-1582
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


This bug fixes the issues seen in HDDS-1384 & HDDS-1282, where unit tests are 
timing out because of BindException.

The fix is to use Socket.bind in place of server sockets. The biggest 
difference is that ServerSocket will do accept and listen after binding to the 
socket and this will keep the sockets in TIME_WAIT state after close. Please 
refer, 
https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1112) Add a ozoneFilesystem related api's to OzoneManager to reduce redundant lookups

2019-05-22 Thread Mukul Kumar Singh (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-1112.
-
Resolution: Fixed

Resolving this as all the subtasks have been completed.

> Add a ozoneFilesystem related api's to OzoneManager to reduce redundant 
> lookups
> ---
>
> Key: HDDS-1112
> URL: https://issues.apache.org/jira/browse/HDDS-1112
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.4.0
>    Reporter: Mukul Kumar Singh
>    Assignee: Mukul Kumar Singh
>Priority: Critical
>
> With the current OzoneFilesystem design, most of the lookups while create 
> happens via that getFileStatus api, which inturn does a getKey or a list Key 
> for the keys in the Ozone bucket. 
> In most of the cases, the files do not exists before creation, and hence 
> these lookups corresponds to wasted time in lookup. This jira proposes to 
> optimize the "create" and "getFileState" api in OzoneFileSystem by 
> introducing OzoneFilesystem friendly apis in OM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1562) Add Chaos tests for Replication Manager

2019-05-20 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1562:
---

 Summary: Add Chaos tests for Replication Manager
 Key: HDDS-1562
 URL: https://issues.apache.org/jira/browse/HDDS-1562
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh


This jira proposes to add new Chaos variant test for Replica Manager to 
identify possible bugs in Replica Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1561) Add a new replica state to identify containers which have not been able to apply all the transactions.

2019-05-20 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1561:
---

 Summary: Add a new replica state to identify containers which have 
not been able to apply all the transactions.
 Key: HDDS-1561
 URL: https://issues.apache.org/jira/browse/HDDS-1561
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode, SCM
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh
Assignee: Nanda kumar


Right now, if a pipeline is destroyed by SCM, all the container on the pipeline 
are marked as quasi closed. SCM while processing these containers reports, 
marks these containers as closed once majority of the nodes are available.

This is however not a sufficient condition in cases where the raft log 
directory is missing or corrupted. As the containers will not have all the 
applied transaction. To solve this problem,
a new container replica state needs to be added to differentiate this from 
quasi closed containers.


cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-1560) RejectedExecutionException on datanode after shutting it down

2019-05-19 Thread Mukul Kumar Singh (JIRA)

Mukul Kumar Singh created HDDS-1560:
---

 Summary: RejectedExecutionException on datanode after shutting it 
down
 Key: HDDS-1560
 URL: https://issues.apache.org/jira/browse/HDDS-1560
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh


RejectedExecutionException on datanode after shutting it down

{code}
2019-05-20 00:38:52,757 ERROR statemachine.DatanodeStateMachine 
(DatanodeStateMachine.java:start(199)) - Unable to finish the execution.
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ExecutorCompletionService$QueueingFuture@74b926e9 rejected 
from org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@15e1f6
9d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed 
tasks = 90]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at 
java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181)
at 
org.apache.hadoop.ozone.container.common.states.datanode.RunningDatanodeState.execute(RunningDatanodeState.java:90)
at 
org.apache.hadoop.ozone.container.common.statemachine.StateContext.execute(StateContext.java:375)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:186)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:349)
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

1 2 3 4 >

1 - 100 of 370 matches

Mail list logo