[jira] [Resolved] (HDFS-15610) Reduce datanode upgrade/hardlink thread

2020-10-08 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDFS-15610.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Reduce datanode upgrade/hardlink thread
> ---
>
> Key: HDFS-15610
> URL: https://issues.apache.org/jira/browse/HDFS-15610
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 3.1.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15401) Namenode should log warning if concat/append finds file with large number of blocks

2020-06-09 Thread Lokesh Jain (Jira)
Lokesh Jain created HDFS-15401:
--

 Summary: Namenode should log warning if concat/append finds file 
with large number of blocks
 Key: HDFS-15401
 URL: https://issues.apache.org/jira/browse/HDFS-15401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain


Namenode should log warning if concat/append finds file has more than 
configured number of blocks. 

This is based on [~weichiu]'s comment 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15400) fsck should log a warning if it finds a file with large number of blocks

2020-06-09 Thread Lokesh Jain (Jira)
Lokesh Jain created HDFS-15400:
--

 Summary: fsck should log a warning if it finds a file with large 
number of blocks
 Key: HDFS-15400
 URL: https://issues.apache.org/jira/browse/HDFS-15400
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain


fsck should log a warning if it finds a file has more than configured number of 
blocks.

This is based on [~weichiu]'s comment 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15392) DistrbutedFileSystem#concat api can create large number of small blocks

2020-06-05 Thread Lokesh Jain (Jira)
Lokesh Jain created HDFS-15392:
--

 Summary: DistrbutedFileSystem#concat api can create large number 
of small blocks
 Key: HDFS-15392
 URL: https://issues.apache.org/jira/browse/HDFS-15392
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain


DistrbutedFileSystem#concat moves blocks from source to target. If the api is 
repeatedly used on small files it can create large number of small blocks in 
the target file. The Jira aims to optimize the api to avoid the issue of small 
blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit

2020-03-24 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDFS-15201.

Resolution: Fixed

> SnapshotCounter hits MaxSnapshotID limit
> 
>
> Key: HDFS-15201
> URL: https://issues.apache.org/jira/browse/HDFS-15201
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>
> Users reported that they are unable to take HDFS snapshots and their 
> snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215.
> {code:java}
> SnapshotManager.java
> private static final int SNAPSHOT_ID_BIT_WIDTH = 24;
> /**
>  * Returns the maximum allowable snapshot ID based on the bit width of the
>  * snapshot ID.
>  *
>  * @return maximum allowable snapshot ID.
>  */
>  public int getMaxSnapshotID() {
>  return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1);
> }
> {code}
>  
> I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase 
> SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit 
> (Integer.MAX_VALUE - 1).
>  
> {code:java}
> /**
>  * This id is used to indicate the current state (vs. snapshots)
>  */
> public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2347) XCeiverClientGrpc's parallel use leads to NPE

2019-10-30 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDDS-2347.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> XCeiverClientGrpc's parallel use leads to NPE
> -
>
> Key: HDDS-2347
> URL: https://issues.apache.org/jira/browse/HDDS-2347
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: changes.diff, logs.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue came up when testing Hive with ORC tables on Ozone storage 
> backend, I so far I could not reproduce it locally within a JUnit test but 
> the issue.
> I am attaching a diff file that shows what logging I have added in 
> XCevierClientGrpc and in KeyInputStream to get the results that made me 
> arrive to the following understanding of the scenario:
> - Hive starts a couple of threads to work on the table data during query 
> execution
> - There is one RPCClient that is being used by these threads
> - The threads are opening different stream to read from the same key in ozone
> - The InputStreams internally are using the same XCeiverClientGrpc
> - XCeiverClientGrpc throws the following NPE intermittently:
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:398)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:295)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:259)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:242)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:169)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
> at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
> at 
> org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555)
> at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:61)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1708)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1596)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1383)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1568)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1565)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1565)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1383)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}
> I have two proposals to fix this issue, one is the easy answer to put 
> synchronization to the XCeiverClientGrpc code, the other one is a bit more 
> complicated, let me explain below.
> Naively I would assume that when I get a client SPI instance from 
> XCeiverClientManager, that instance is ready to use. In fact it is not, and 
> when the user of the SPI instance sends the first request that is the point 
> when the client gets essentially ready. Now if we put synchronization to this 
> code, that is the easy solution, but my pragmatic half screams for a better 
> solution, that ensures that 

[jira] [Updated] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory

2019-10-21 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2342:
--
Description: 
In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto 
native memory in the ThreadLocal map. Every such thread holds onto chunk worth 
of DirectByteBuffer. Since these threads are involved in write and read chunk 
operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the 
ThreadLocalMap for every thread involved in IO. Also the native memory would 
not be GC'ed as long as the thread is alive.

It would be better to reduce the default number of chunk executor threads and 
have them in proportion to number of disks on the datanode. We should also use 
DirectByeBuffers for the IO on datanode. Currently we allocate HeapByteBuffer 
which needs to be backed by DirectByteBuffer. If we can use a DirectByteBuffer 
we can avoid a buffer copy.

  was:
In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto 
native memory in the ThreadLocal map. Every such thread holds onto chunk worth 
of DirectByteBuffer. Since these threads are involved in write and read chunk 
operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the 
ThreadLocalMap for every thread involved in IO. Also the native memory would 
not be GC'ed as long as the thread is alive.

It would be better to reduce the default number of chunk executor threads and 
have them in proportion to number of disks on the datanode.


> ContainerStateMachine$chunkExecutor threads hold onto native memory
> ---
>
> Key: HDDS-2342
> URL: https://issues.apache.org/jira/browse/HDDS-2342
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto 
> native memory in the ThreadLocal map. Every such thread holds onto chunk 
> worth of DirectByteBuffer. Since these threads are involved in write and read 
> chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in 
> the ThreadLocalMap for every thread involved in IO. Also the native memory 
> would not be GC'ed as long as the thread is alive.
> It would be better to reduce the default number of chunk executor threads and 
> have them in proportion to number of disks on the datanode. We should also 
> use DirectByeBuffers for the IO on datanode. Currently we allocate 
> HeapByteBuffer which needs to be backed by DirectByteBuffer. If we can use a 
> DirectByteBuffer we can avoid a buffer copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory

2019-10-21 Thread Lokesh Jain (Jira)
Lokesh Jain created HDDS-2342:
-

 Summary: ContainerStateMachine$chunkExecutor threads hold onto 
native memory
 Key: HDDS-2342
 URL: https://issues.apache.org/jira/browse/HDDS-2342
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain


In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto 
native memory in the ThreadLocal map. Every such thread holds onto chunk worth 
of DirectByteBuffer. Since these threads are involved in write and read chunk 
operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the 
ThreadLocalMap for every thread involved in IO. Also the native memory would 
not be GC'ed as long as the thread is alive.

It would be better to reduce the default number of chunk executor threads and 
have them in proportion to number of disks on the datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-20 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955757#comment-16955757
 ] 

Lokesh Jain commented on HDDS-2332:
---

[~cxorm] It is difficult to reproduce the issue. I saw it in one of the runs. 
It is happening because of RATIS-718. Once it is fixed it should not appear in 
the runs. But we might need to support request timeouts in ozone as well.

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  

[jira] [Commented] (HDDS-2328) Support large-scale listing

2019-10-20 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955756#comment-16955756
 ] 

Lokesh Jain commented on HDDS-2328:
---

Currently we do not implement FileSystem#listLocatedStatus api in Ozone. 
Therefore it ends up calling listStatus for the entire directory at once which 
can lead to OOM. I think we just need to have an implementation for 
listLocatedStatus and other such related apis in BasicOzoneFileSystem.

> Support large-scale listing 
> 
>
> Key: HDDS-2328
> URL: https://issues.apache.org/jira/browse/HDDS-2328
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: performance
>
> Large-scale listing of directory contents takes a lot longer time and also 
> has the potential to run into OOM. I have > 1 million entries in the same 
> level and it took lot longer time with {{RemoteIterator}} (didn't complete as 
> it was stuck in RDB::seek).
> S3A batches it with 5K listing per fetch IIRC.  It would be good to have this 
> feature in ozone as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-18 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954689#comment-16954689
 ] 

Lokesh Jain commented on HDDS-2332:
---

Should we support timeout in client as well which works if ratis does not 
timeout? The call currently fails because ratis is not able to retry the 
request.

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> 

[jira] [Created] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-18 Thread Lokesh Jain (Jira)
Lokesh Jain created HDDS-2332:
-

 Summary: BlockOutputStream#waitOnFlushFutures blocks on putBlock 
combined future
 Key: HDDS-2332
 URL: https://issues.apache.org/jira/browse/HDDS-2332
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Lokesh Jain


BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that the 
thread is blocked on the same condition.
{code:java}
2019-10-18 06:30:38
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
"main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
condition [0x7fbea96d6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe4739888> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
at 
org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
- locked <0xa6a75930> (a 
org.apache.hadoop.fs.FSDataOutputStream)
at 
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
- locked <0xa6a75918> (a 
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
at 
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at 
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
at 
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


2019-10-18 07:02:50
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
"main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
condition [0x7fbea96d6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe4739888> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
at 

[jira] [Updated] (HDDS-2299) BlockManager should allocate a block in excluded pipelines if none other left

2019-10-14 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2299:
--
Description: In SCM, BlockManager#allocateBlock does not allocate a block 
in the excluded pipelines or datanodes if requested by the client. But there 
can be cases where excluded pipelines and datanodes are the only ones left. In 
such a case SCM should allocate a block in such pipelines and return to the 
client. The client can choose to use or discard the block.  (was: In SCM, 
BlockManager#allocateBlock does not allocate a block in the excluded pipelines 
or datanodes if requested by the client. But there can be cases where excluded 
pipelines are the only pipelines left. In such a case SCM should allocate a 
block in such pipelines and return to the client. The client can choose to use 
or discard the block.)

> BlockManager should allocate a block in excluded pipelines if none other left
> -
>
> Key: HDDS-2299
> URL: https://issues.apache.org/jira/browse/HDDS-2299
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> In SCM, BlockManager#allocateBlock does not allocate a block in the excluded 
> pipelines or datanodes if requested by the client. But there can be cases 
> where excluded pipelines and datanodes are the only ones left. In such a case 
> SCM should allocate a block in such pipelines and return to the client. The 
> client can choose to use or discard the block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2299) BlockManager should allocate a block in excluded pipelines if none other left

2019-10-14 Thread Lokesh Jain (Jira)
Lokesh Jain created HDDS-2299:
-

 Summary: BlockManager should allocate a block in excluded 
pipelines if none other left
 Key: HDDS-2299
 URL: https://issues.apache.org/jira/browse/HDDS-2299
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Lokesh Jain
Assignee: Lokesh Jain


In SCM, BlockManager#allocateBlock does not allocate a block in the excluded 
pipelines or datanodes if requested by the client. But there can be cases where 
excluded pipelines are the only pipelines left. In such a case SCM should 
allocate a block in such pipelines and return to the client. The client can 
choose to use or discard the block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-09-27 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939290#comment-16939290
 ] 

Lokesh Jain commented on HDDS-2186:
---

[~timmylicheng] You are right. This might be related to multiple ratis 
pipelines in the datanode. I would suggest taking a heap dump and analysing the 
heap and direct memory usage.

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Priority: Major
>  Labels: flaky-test
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> 

[jira] [Assigned] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure

2019-09-26 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain reassigned HDDS-2189:
-

Assignee: Lokesh Jain

> Datanode should send PipelineAction on RaftServer failure
> -
>
> Key: HDDS-2189
> URL: https://issues.apache.org/jira/browse/HDDS-2189
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> {code:java}
> 2019-09-26 08:03:07,152 ERROR 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker
>  hit exception
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
> at java.lang.Thread.run(Thread.java:748)
> 2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown
> {code}
> On RaftServer shutdown datanode should send a PipelineAction denoting that 
> the pipeline has been closed exceptionally in the datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure

2019-09-26 Thread Lokesh Jain (Jira)
Lokesh Jain created HDDS-2189:
-

 Summary: Datanode should send PipelineAction on RaftServer failure
 Key: HDDS-2189
 URL: https://issues.apache.org/jira/browse/HDDS-2189
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain


{code:java}
2019-09-26 08:03:07,152 ERROR 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker 
hit exception
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
at java.lang.Thread.run(Thread.java:748)
2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: 
664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown
{code}
On RaftServer shutdown datanode should send a PipelineAction denoting that the 
pipeline has been closed exceptionally in the datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-25 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937742#comment-16937742
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] Thanks for updating the patch! The changes look good to me. Please 
find my comments below.
 # Pipeline#setLeaderId - It can be made package private. We can document that 
the pipeline object is immutable but we can allow classes in pipeline package 
to set the leaderId.
 # Pipeline#getFromProtobuf - We need a null check for leaderId.
 # XceiverServerRatis - We need to update the leaderId when 
StateMachine#notifyLeader is called. Also we should triggerHeartbeat once a 
leader change occurs.
 # PipelineReportHandler - We cant map a datanode to a leaderId due to 
multi-raft. Can we keep it simple so that we call pipeline.reportDatanode(dn) 
once a pipeline report with leaderId set is received? Also we can update the 
leaderId in the pipeline every time a pipelineReport reports a change in 
leaderID.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-25 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1868:
--
Status: Open  (was: Patch Available)

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-19 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933383#comment-16933383
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] I think there is a case where it is not handled. There can be a 
leader elected s1 and two followers s2 and s3. Pipeline Report from s2 and s3 
can now arrive after the pipeline action and may not arrive at all. In both 
these cases we would have opened the pipeline in SCM. I think we need to either 
send only pipeline report or only pipeline action in this case from the 
datanodes. Once we get this action or report from all the datanodes after a 
leader has been elected and acknowledged by all the datanodes, SCM can open the 
pipeline?

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-18 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932224#comment-16932224
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] Thanks for working on this! I think we should include pipeline 
reports from followers as well. Otherwise there can be cases where followers 
have not yet registered or can not communicate to SCM but the pipeline is still 
active in SCM.  In RATIS-678 if we include leader information in the api we can 
use it to update leader information in SCM as well.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2117) ContainerStateMachine#writeStateMachineData times out

2019-09-17 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2117:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ContainerStateMachine#writeStateMachineData times out
> -
>
> Key: HDDS-2117
> URL: https://issues.apache.org/jira/browse/HDDS-2117
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The issue seems to be happening because the below precondition check fails in 
> case two writeChunk gets executed in parallel and the runtime exception 
> thrown is handled correctly in ContainerStateMachine.
>  
> HddsDispatcher.java:239
> {code:java}
> Preconditions
> .checkArgument(!container2BCSIDMap.containsKey(containerID));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2114) Rename does not preserve non-explicitly created interim directories

2019-09-13 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2114:
--
Status: Patch Available  (was: Open)

> Rename does not preserve non-explicitly created interim directories
> ---
>
> Key: HDDS-2114
> URL: https://issues.apache.org/jira/browse/HDDS-2114
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Lokesh Jain
>Priority: Critical
>  Labels: pull-request-available
> Attachments: demonstrative_test.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am attaching a patch that adds a test that demonstrates the problem.
> The scenario is coming from the way how Hive implements acid transactions 
> with the ORC table format, but the test is redacted to the simplest possible 
> code that reproduces the issue.
> The scenario:
>  * Given a 3 level directory structure, where the top level directory was 
> explicitly created, and the interim directory is implicitly created (for 
> example either by creating a file with create("/top/interim/file") or by 
> creating a directory with mkdirs("top/interim/dir"))
>  * When the leaf is moved out from the implicitly created directory making 
> this directory an empty directory
>  * Then a FileNotFoundException is thrown when getFileStatus or listStatus is 
> called on the interim directory.
> The expected behaviour:
> after the directory is becoming empty, the directory should still be part of 
> the file system, moreover an empty FileStatus array should be returned when 
> listStatus is called on it, and also a valid FileStatus object should be 
> returned when getFileStatus is called on it.
>  
>  
> As this issue is present with Hive, and as this is how a FileSystem is 
> expected to work this seems to be an at least critical issue as I see, please 
> feel free to change the priority if needed.
> Also please note that, if the interim directory is explicitly created with 
> mkdirs("top/interim") before creating the leaf, then the issue does not 
> appear.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2103) TestContainerReplication fails due to unhealthy container

2019-09-11 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2103:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> TestContainerReplication fails due to unhealthy container
> -
>
> Key: HDDS-2103
> URL: https://issues.apache.org/jira/browse/HDDS-2103
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< 
> FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication
> testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication)
>   Time elapsed: 12.702 s  <<< FAILURE!
> java.lang.AssertionError: Container is not replicated to the destination 
> datanode
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at 
> org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153)
> {code}
> caused by:
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt}
> java.lang.IllegalStateException: Only closed containers could be exported: 
> ContainerId=1
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
>   at 
> org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64)
>   at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
> {code}
> Container is in unhealthy state because pipeline is not found for it in 
> {{CloseContainerCommandHandler}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-11 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927278#comment-16927278
 ] 

Lokesh Jain commented on HDDS-1868:
---

ContainerStateMachine already has an api called notifyLeader which notifies the 
state machine that the server has been elected as leader. We can use that api 
to trigger pipeline report from leader. For followers we will either need to 
add another api or leverage the notifyLeader to notify about elected leader to 
the follower datanode. This would require change in Ratis.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-10 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926426#comment-16926426
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] The changes look good to me. I am not able to open the links to the 
checkstyle and Test Results. There are few issue related to datanode and SCM 
here. PipelineReports are published by PipelineReportPublisher. This publisher 
works at default frequency of 60 seconds. Lets suppose first report did not get 
the pipeline report because there was no leader elected till then. The second 
pipeline report will only be sent after 60 secs. I think we will need to 
trigger pipeline report as soon as leader gets elected.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925401#comment-16925401
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] In the patch below condition would only be true for the leader 
datanode.

 
{code:java}
if (reply.getRoleInfoProto().hasLeaderInfo()) {
  reports.add(PipelineReport.newBuilder()
  .setPipelineID(
  PipelineID.valueOf(groupId.getUuid()).getProtobuf())
  .build());
}
{code}
We would end up sending Pipeline Report only from leader to SCM. SCM should 
ideally receive pipeline reports from all datanodes in a pipeline in order to 
mark the pipeline as OPEN.

The follower does get a roleInfoProto but 
reply.getRoleInfoProto().hasLeaderInfo() for a follower is false as it does not 
have leaderInfo but a followerInfo.

 

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM

2019-09-06 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924282#comment-16924282
 ] 

Lokesh Jain commented on HDDS-1899:
---

[~nandakumar131] I think the exception seems harmless. This exception is thrown 
when the container can not be found before processing a DeleteBlocks command. 
As mentioned by you it can be because replication manager deleted a container 
before block deletion was processed.

There is another issue however. Currently all the synchronization is done via 
locking the container object itself. In case of delete container the container 
is removed from containerSet but the container object may still be alive and 
can be used to acquire a lock on the container. Also in deleteContainer we 
delete the container outside the lock which could race with the other 
operations.

With the current locking semantics we need to check if container exists or not 
after acquiring a lock on it. Also container deletion should be done inside the 
lock itself.

> DeleteBlocksCommandHandler is unable to find the container in SCM
> -
>
> Key: HDDS-1899
> URL: https://issues.apache.org/jira/browse/HDDS-1899
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> DeleteBlocksCommandHandler is unable to find a container in SCM.
> {code}
> 2019-08-02 14:04:56,735 WARN  commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete 
> blocks for container=33, TXID=184
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the container 33
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-05 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923635#comment-16923635
 ] 

Lokesh Jain commented on HDDS-1868:
---

On restart, SCM marks the pipeline as OPEN only if all the datanodes have 
reported the pipeline. In this change only leader would report the pipeline 
therefore the pipeline will not be marked as OPEN in SCM.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-05 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923519#comment-16923519
 ] 

Lokesh Jain commented on HDDS-1868:
---

[~swagle] Thanks for working on this! I think leaderInfo in RoleInfoProto 
object is only set for a leader itself. For followers this will not be set.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove

2019-09-04 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1561:
--
Status: Patch Available  (was: Open)

> Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
> -
>
> Key: HDDS-1561
> URL: https://issues.apache.org/jira/browse/HDDS-1561
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: HDDS-1561.001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, if a pipeline is destroyed by SCM, all the container on the 
> pipeline are marked as quasi closed when datanode received close container 
> command. SCM while processing these containers reports, marks these 
> containers as closed once majority of the nodes are available.
> This is however not a sufficient condition in cases where the raft log 
> directory is missing or corrupted. As the containers will not have all the 
> applied transaction. 
> To solve this problem, we should QUASI_CLOSE the containers in datanode as 
> part of ratis groupRemove. If a container is in OPEN state in datanode 
> without any active pipeline, it will be marked as Unhealthy while processing 
> close container command.
> cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove

2019-09-04 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922496#comment-16922496
 ] 

Lokesh Jain commented on HDDS-1561:
---

The Jira needs a ratis snapshot upgrade. Uploaded a patch without the ratis 
snapshot change.

> Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
> -
>
> Key: HDDS-1561
> URL: https://issues.apache.org/jira/browse/HDDS-1561
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Blocker
> Attachments: HDDS-1561.001.patch
>
>
> Right now, if a pipeline is destroyed by SCM, all the container on the 
> pipeline are marked as quasi closed when datanode received close container 
> command. SCM while processing these containers reports, marks these 
> containers as closed once majority of the nodes are available.
> This is however not a sufficient condition in cases where the raft log 
> directory is missing or corrupted. As the containers will not have all the 
> applied transaction. 
> To solve this problem, we should QUASI_CLOSE the containers in datanode as 
> part of ratis groupRemove. If a container is in OPEN state in datanode 
> without any active pipeline, it will be marked as Unhealthy while processing 
> close container command.
> cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove

2019-09-04 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1561:
--
Attachment: HDDS-1561.001.patch

> Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
> -
>
> Key: HDDS-1561
> URL: https://issues.apache.org/jira/browse/HDDS-1561
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Blocker
> Attachments: HDDS-1561.001.patch
>
>
> Right now, if a pipeline is destroyed by SCM, all the container on the 
> pipeline are marked as quasi closed when datanode received close container 
> command. SCM while processing these containers reports, marks these 
> containers as closed once majority of the nodes are available.
> This is however not a sufficient condition in cases where the raft log 
> directory is missing or corrupted. As the containers will not have all the 
> applied transaction. 
> To solve this problem, we should QUASI_CLOSE the containers in datanode as 
> part of ratis groupRemove. If a container is in OPEN state in datanode 
> without any active pipeline, it will be marked as Unhealthy while processing 
> close container command.
> cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2048) State check during container state transition in datanode should be lock protected

2019-08-30 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-2048:
--
Status: Patch Available  (was: Open)

> State check during container state transition in datanode should be lock 
> protected
> --
>
> Key: HDDS-2048
> URL: https://issues.apache.org/jira/browse/HDDS-2048
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> Currently container state checks during state transition are not lock 
> protected in KeyValueHandler. These can cause invalid state transitions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2048) State check during container state transition in datanode should be lock protected

2019-08-28 Thread Lokesh Jain (Jira)
Lokesh Jain created HDDS-2048:
-

 Summary: State check during container state transition in datanode 
should be lock protected
 Key: HDDS-2048
 URL: https://issues.apache.org/jira/browse/HDDS-2048
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Currently container state checks during state transition are not lock protected 
in KeyValueHandler. These can cause invalid state transitions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1553) Add metrics in rack aware container placement policy

2019-08-27 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916620#comment-16916620
 ] 

Lokesh Jain commented on HDDS-1553:
---

[~Sammi] Can you please attach a link to the PR?

> Add metrics in rack aware container placement policy
> 
>
> Key: HDDS-1553
> URL: https://issues.apache.org/jira/browse/HDDS-1553
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> To collect following statistics, 
> 1. total requested datanode count (A)
> 2. success allocated datanode count without constrain compromise (B)
> 3. success allocated datanode count with some comstrain compromise (C)
> B includes C, failed allocation = (A - B)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state

2019-08-26 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1981:
--
Fix Version/s: 0.5.0

> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state
> ---
>
> Key: HDDS-1981
> URL: https://issues.apache.org/jira/browse/HDDS-1981
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state. This will ensure that the metadata is persisted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state

2019-08-26 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1981:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state
> ---
>
> Key: HDDS-1981
> URL: https://issues.apache.org/jira/browse/HDDS-1981
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state. This will ensure that the metadata is persisted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state

2019-08-20 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1981:
--
Status: Patch Available  (was: Open)

> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state
> ---
>
> Key: HDDS-1981
> URL: https://issues.apache.org/jira/browse/HDDS-1981
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED 
> state. This will ensure that the metadata is persisted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-17 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-17 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Fix Version/s: 0.5.0

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-16 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Status: Patch Available  (was: Open)

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-15 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907886#comment-16907886
 ] 

Lokesh Jain edited comment on HDDS-1959 at 8/15/19 7:33 AM:


[~pingsutw] Sorry! I had posted the wrong configuration in the description. The 
configuration to be changed is "dfs.container.ratis.log.purge.gap". The default 
value also needs to be changed to 100. Can you please update the PR with 
same?


was (Author: ljain):
[~pingsutw] Sorry! I had posted the wrong configuration in the description. The 
configuration to be changed is "dfs.container.ratis.log.purge.gap". Can you 
please update the PR with same?

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-15 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Description: Currently purge interval for ratis 
log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira aims to 
reduce the interval and set it to 100.  (was: Currently purge interval for 
ratis log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira 
aims to reduce the interval and set it to 10.)

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode

2019-08-15 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Summary: Decrement purge interval for Ratis logs in datanode  (was: 
Decrement purge interval for Ratis logs)

> Decrement purge interval for Ratis logs in datanode
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 10.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs

2019-08-15 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1959:
--
Description: Currently purge interval for ratis 
log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira aims to 
reduce the interval and set it to 10.  (was: Currently purge interval for 
ratis log("ozone.om.ratis.log.purge.gap") is set at 100. The Jira aims to 
reduce the interval and set it to 10.)

> Decrement purge interval for Ratis logs
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") 
> is set at 10. The Jira aims to reduce the interval and set it to 
> 10.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1959) Decrement purge interval for Ratis logs

2019-08-15 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907886#comment-16907886
 ] 

Lokesh Jain commented on HDDS-1959:
---

[~pingsutw] Sorry! I had posted the wrong configuration in the description. The 
configuration to be changed is "dfs.container.ratis.log.purge.gap". Can you 
please update the PR with same?

> Decrement purge interval for Ratis logs
> ---
>
> Key: HDDS-1959
> URL: https://issues.apache.org/jira/browse/HDDS-1959
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: kevin su
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently purge interval for ratis log("ozone.om.ratis.log.purge.gap") is set 
> at 100. The Jira aims to reduce the interval and set it to 10.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1959) Decrement purge interval for Ratis logs

2019-08-13 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1959:
-

 Summary: Decrement purge interval for Ratis logs
 Key: HDDS-1959
 URL: https://issues.apache.org/jira/browse/HDDS-1959
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain


Currently purge interval for ratis log("ozone.om.ratis.log.purge.gap") is set 
at 100. The Jira aims to reduce the interval and set it to 10.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14692) Upload button should not encode complete url

2019-08-01 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898275#comment-16898275
 ] 

Lokesh Jain commented on HDFS-14692:


The patch fixes the issue by encoding just the directory part in the url. The 
upload file button still fails with Mixed content error after the fix. The 
mixed content error would require another fix.
{code:java}
jquery-3.3.1.min.js:2 Mixed Content: The page at 
'https://127.0.0.1:/gateway/default/hdfs/explorer.html#/app-logs' was 
loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 
'http://nn-host:50075/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=drwho=nn-host:8020==true=false'.
 This request has been blocked; the content must be served over HTTPS.
{code}

> Upload button should not encode complete url
> 
>
> Key: HDFS-14692
> URL: https://issues.apache.org/jira/browse/HDFS-14692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-14692.001.patch
>
>
> explorer.js#modal-upload-file-button currently does not work with knox. The 
> function encodes the complete url and thus creates a malformed url. This 
> leads to an error while uploading the file.
> Example of malformed url - 
> "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true"



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14692) Upload button should not encode complete url

2019-08-01 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-14692:
---
Status: Patch Available  (was: Open)

> Upload button should not encode complete url
> 
>
> Key: HDFS-14692
> URL: https://issues.apache.org/jira/browse/HDFS-14692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-14692.001.patch
>
>
> explorer.js#modal-upload-file-button currently does not work with knox. The 
> function encodes the complete url and thus creates a malformed url. This 
> leads to an error while uploading the file.
> Example of malformed url - 
> "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true"



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14692) Upload button should not encode complete url

2019-08-01 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-14692:
---
Attachment: HDFS-14692.001.patch

> Upload button should not encode complete url
> 
>
> Key: HDFS-14692
> URL: https://issues.apache.org/jira/browse/HDFS-14692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-14692.001.patch
>
>
> explorer.js#modal-upload-file-button currently does not work with knox. The 
> function encodes the complete url and thus creates a malformed url. This 
> leads to an error while uploading the file.
> Example of malformed url - 
> "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true"



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14692) Upload button should not encode complete url

2019-08-01 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDFS-14692:
--

 Summary: Upload button should not encode complete url
 Key: HDFS-14692
 URL: https://issues.apache.org/jira/browse/HDFS-14692
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


explorer.js#modal-upload-file-button currently does not work with knox. The 
function encodes the complete url and thus creates a malformed url. This leads 
to an error while uploading the file.

Example of malformed url - 
"https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true"



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1834) parent directories not found in secure setup due to ACL check

2019-07-25 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892790#comment-16892790
 ] 

Lokesh Jain edited comment on HDDS-1834 at 7/25/19 2:14 PM:


There are two bugs associated with checkAccess.
 # In OzoneFileSystem use cases, for access of a descendant checkAccess of any 
ancestor is not done. Currently while accessing a/b/c.txt we do not check the 
access for a/ and a/b/ and do a access check only for the path a/b/c.txt
 # In HDDS-1481 while doing mkdir, the ancestor directories are not created if 
they do not exist. checkAccess method only checks for the key provided and 
therefore fails with KEY_NOT_FOUND error. It should do a check for existence of 
a directory using getFileStatus.

KeyManagerImpl#checkAccess:1645-1657
{code:java}
OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey);
if (keyInfo == null) {
  objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey);
  keyInfo = metadataManager.getKeyTable().get(objectKey);
  
  if(keyInfo == null) {
keyInfo = metadataManager.getOpenKeyTable().get(objectKey);
if (keyInfo == null) {
  throw new OMException("Key not found, checkAccess failed. Key:" +
  objectKey, KEY_NOT_FOUND);
}
  }
}
{code}
Example illustrating the problem 2. 
{code:java}
ozone sh key list o3://om/fstest/bucket1/
[ {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"size" : 0,
"keyName" : "testdir/deep/",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT",
"size" : 22808,
"keyName" : "testdir/deep/MOVED.TXT",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT",
"size" : 22808,
"keyName" : "testdir/deep/PUTFILE.txt",
"type" : null
} ]

ozone sh key info o3://om/fstest/bucket1/testdir
KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/
{code}


was (Author: ljain):
The problem exists in general for checkAccess. There are two bugs associated 
with checkAccess.
 # In OzoneFileSystem use cases, for access of a descendant checkAccess of any 
ancestor is not done. Currently while accessing a/b/c.txt we do not check the 
access for a/ and a/b/ and do a access check only for the path a/b/c.txt
 # In HDDS-1481 while doing mkdir, the ancestor directories are not created if 
they do not exist. checkAccess method only checks for the key provided and 
therefore fails with KEY_NOT_FOUND error. It should do a check for existence of 
a directory using getFileStatus.

KeyManagerImpl#checkAccess:1645-1657
{code:java}
OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey);
if (keyInfo == null) {
  objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey);
  keyInfo = metadataManager.getKeyTable().get(objectKey);
  
  if(keyInfo == null) {
keyInfo = metadataManager.getOpenKeyTable().get(objectKey);
if (keyInfo == null) {
  throw new OMException("Key not found, checkAccess failed. Key:" +
  objectKey, KEY_NOT_FOUND);
}
  }
}
{code}
Example illustrating the problem 2. 
{code:java}
ozone sh key list o3://om/fstest/bucket1/
[ {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"size" : 0,
"keyName" : "testdir/deep/",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT",
"size" : 22808,
"keyName" : "testdir/deep/MOVED.TXT",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT",
"size" : 22808,
"keyName" : "testdir/deep/PUTFILE.txt",
"type" : null
} ]

ozone sh key info o3://om/fstest/bucket1/testdir
KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/
{code}

> parent directories not found in secure setup due to ACL check
> -
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Blocker
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key 

[jira] [Commented] (HDDS-1834) parent directories not found in secure setup due to ACL check

2019-07-25 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892790#comment-16892790
 ] 

Lokesh Jain commented on HDDS-1834:
---

The problem exists in general for checkAccess. There are two bugs associated 
with checkAccess.
 # In OzoneFileSystem use cases, for access of a descendant checkAccess of any 
ancestor is not done. Currently while accessing a/b/c.txt we do not check the 
access for a/ and a/b/ and do a access check only for the path a/b/c.txt
 # In HDDS-1481 while doing mkdir, the ancestor directories are not created if 
they do not exist. checkAccess method only checks for the key provided and 
therefore fails with KEY_NOT_FOUND error. It should do a check for existence of 
a directory using getFileStatus.

KeyManagerImpl#checkAccess:1645-1657
{code:java}
OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey);
if (keyInfo == null) {
  objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey);
  keyInfo = metadataManager.getKeyTable().get(objectKey);
  
  if(keyInfo == null) {
keyInfo = metadataManager.getOpenKeyTable().get(objectKey);
if (keyInfo == null) {
  throw new OMException("Key not found, checkAccess failed. Key:" +
  objectKey, KEY_NOT_FOUND);
}
  }
}
{code}
Example illustrating the problem 2. 
{code:java}
ozone sh key list o3://om/fstest/bucket1/
[ {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT",
"size" : 0,
"keyName" : "testdir/deep/",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT",
"size" : 22808,
"keyName" : "testdir/deep/MOVED.TXT",
"type" : null
}, {
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT",
"modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT",
"size" : 22808,
"keyName" : "testdir/deep/PUTFILE.txt",
"type" : null
} ]

ozone sh key info o3://om/fstest/bucket1/testdir
KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/
{code}

> parent directories not found in secure setup due to ACL check
> -
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Blocker
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1834) ozone fs -mkdir -p does not create parent directories in ozonesecure

2019-07-25 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892634#comment-16892634
 ] 

Lokesh Jain commented on HDDS-1834:
---

HDDS-1481 changes the mkdir logic for OzoneFileSystem. Earlier all the parent 
directories were created as part of mkdir. We removed that change to just add 
key for the corresponding directory. 

The failure here might be related to acls enabled in ozonesecure compose file.

> ozone fs -mkdir -p does not create parent directories in ozonesecure
> 
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Priority: Blocker
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions

2019-07-24 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1816:
--
Status: Patch Available  (was: Open)

> ContainerStateMachine should limit number of pending apply transactions
> ---
>
> Key: HDDS-1816
> URL: https://issues.apache.org/jira/browse/HDDS-1816
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ContainerStateMachine should limit number of pending apply transactions in 
> order to avoid excessive heap usage by the pending transactions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions

2019-07-22 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890287#comment-16890287
 ] 

Lokesh Jain commented on HDDS-1816:
---

[~nandakumar131] it is good to have but not a blocker for 0.4.1 release.

> ContainerStateMachine should limit number of pending apply transactions
> ---
>
> Key: HDDS-1816
> URL: https://issues.apache.org/jira/browse/HDDS-1816
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> ContainerStateMachine should limit number of pending apply transactions in 
> order to avoid excessive heap usage by the pending transactions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1834) ozone fs -mkdir -p does not create parent directories in ozonesecure

2019-07-22 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain reassigned HDDS-1834:
-

Assignee: (was: Lokesh Jain)

> ozone fs -mkdir -p does not create parent directories in ozonesecure
> 
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Priority: Major
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1834) ozone fs -mkdir -p does not create parent directories

2019-07-19 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888798#comment-16888798
 ] 

Lokesh Jain edited comment on HDDS-1834 at 7/19/19 11:28 AM:
-

[~adoroszlai] Thanks for reporting the issue! On my local setup it is working.  

 
{code:java}
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p 
o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/
Found 2 items
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/
Found 1 items
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir
{code}
 

 


was (Author: ljain):
[~adoroszlai] Thanks for reporting the issue! On my local setup it is working.  

 
{code:java}
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p 
o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/
Found 2 items
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/
Found 1 items
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir
{code}
 

 

> ozone fs -mkdir -p does not create parent directories
> -
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Assignee: Lokesh Jain
>Priority: Major
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1834) ozone fs -mkdir -p does not create parent directories

2019-07-19 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888798#comment-16888798
 ] 

Lokesh Jain commented on HDDS-1834:
---

[~adoroszlai] Thanks for reporting the issue! On my local setup it is working.  

 
{code:java}
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p 
o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls 
o3fs://bucket1.vol1/testdir/
Found 2 items
-rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 
o3fs://bucket1.vol1/testdir/TOUCHFILE.txt
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep
hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/
Found 1 items
drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir
{code}
 

 

> ozone fs -mkdir -p does not create parent directories
> -
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Assignee: Lokesh Jain
>Priority: Major
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1834) ozone fs -mkdir -p does not create parent directories

2019-07-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain reassigned HDDS-1834:
-

Assignee: Lokesh Jain

> ozone fs -mkdir -p does not create parent directories
> -
>
> Key: HDDS-1834
> URL: https://issues.apache.org/jira/browse/HDDS-1834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Doroszlai, Attila
>Assignee: Lokesh Jain
>Priority: Major
>
> ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir 
> -p}} only creates key for the specific directory, not its parents.
> {noformat}
> ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep
> {noformat}
> Previous result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/
> testdir/deep/
> {noformat}
> Current result:
> {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2}
> $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r 
> '.[].keyName'
> testdir/deep/
> {noformat}
> The failure happens on first operation that tries to use {{testdir/}} 
> directly:
> {noformat}
> $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt
> ls: `o3fs://bucket1.fstest/testdir': No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1824) IllegalArgumentException in NetworkTopologyImpl causes SCM to shutdown

2019-07-18 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1824:
-

 Summary: IllegalArgumentException in NetworkTopologyImpl causes 
SCM to shutdown
 Key: HDDS-1824
 URL: https://issues.apache.org/jira/browse/HDDS-1824
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Lokesh Jain


 

 
{code:java}
2019-07-18 02:22:18,005 ERROR 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception in 
Replication Monitor Thread.
java.lang.IllegalArgumentException: Affinity node /default-rack/10.17.213.25 is 
not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:780)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:408)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:242)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:168)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
at 
java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
at java.lang.Thread.run(Thread.java:745)
2019-07-18 02:22:18,008 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: java.lang.IllegalArgumentException: Affinity node 
/default-rack/10.17.213.25 is not a member of topology
2019-07-18 02:22:18,010 INFO 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG:
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls

2019-07-18 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1767:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ContainerStateMachine should have its own executors for executing 
> applyTransaction calls
> 
>
> Key: HDDS-1767
> URL: https://issues.apache.org/jira/browse/HDDS-1767
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently ContainerStateMachine uses the executors provided by 
> XceiverServerRatis for executing applyTransaction calls. This would result in 
> two or more ContainerStateMachine to share the same set of executors. Delay 
> or load in one ContainerStateMachine would adversely affect the performance 
> of other state machines in such a case. It is better to have separate set of 
> executors for each ContainerStateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir

2019-07-18 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1481:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Cleanup BasicOzoneFileSystem#mkdir
> --
>
> Key: HDDS-1481
> URL: https://issues.apache.org/jira/browse/HDDS-1481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in 
> HDDS-1300. The changes for this function were missed in HDDS-1460.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir

2019-07-17 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1481:
--
Status: Patch Available  (was: Open)

> Cleanup BasicOzoneFileSystem#mkdir
> --
>
> Key: HDDS-1481
> URL: https://issues.apache.org/jira/browse/HDDS-1481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in 
> HDDS-1300. The changes for this function were missed in HDDS-1460.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions

2019-07-17 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1816:
-

 Summary: ContainerStateMachine should limit number of pending 
apply transactions
 Key: HDDS-1816
 URL: https://issues.apache.org/jira/browse/HDDS-1816
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


ContainerStateMachine should limit number of pending apply transactions in 
order to avoid excessive heap usage by the pending transactions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls

2019-07-12 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1767:
--
Status: Patch Available  (was: Open)

> ContainerStateMachine should have its own executors for executing 
> applyTransaction calls
> 
>
> Key: HDDS-1767
> URL: https://issues.apache.org/jira/browse/HDDS-1767
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> Currently ContainerStateMachine uses the executors provided by 
> XceiverServerRatis for executing applyTransaction calls. This would result in 
> two or more ContainerStateMachine to share the same set of executors. Delay 
> or load in one ContainerStateMachine would adversely affect the performance 
> of other state machines in such a case. It is better to have separate set of 
> executors for each ContainerStateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls

2019-07-12 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1767:
--
Labels: pull-request-available  (was: )

> ContainerStateMachine should have its own executors for executing 
> applyTransaction calls
> 
>
> Key: HDDS-1767
> URL: https://issues.apache.org/jira/browse/HDDS-1767
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> Currently ContainerStateMachine uses the executors provided by 
> XceiverServerRatis for executing applyTransaction calls. This would result in 
> two or more ContainerStateMachine to share the same set of executors. Delay 
> or load in one ContainerStateMachine would adversely affect the performance 
> of other state machines in such a case. It is better to have separate set of 
> executors for each ContainerStateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDDS-1779) TestWatchForCommit tests are flaky

2019-07-10 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain moved RATIS-620 to HDDS-1779:
-

Fix Version/s: (was: 0.4.0)
Affects Version/s: (was: 0.4.0)
 Target Version/s: 0.4.1  (was: 0.4.0)
  Component/s: (was: client)
 Workflow: patch-available, re-open possible  (was: 
no-reopen-closed, patch-avail)
  Key: HDDS-1779  (was: RATIS-620)
  Project: Hadoop Distributed Data Store  (was: Ratis)

> TestWatchForCommit tests are flaky
> --
>
> Key: HDDS-1779
> URL: https://issues.apache.org/jira/browse/HDDS-1779
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls

2019-07-05 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1767:
-

 Summary: ContainerStateMachine should have its own executors for 
executing applyTransaction calls
 Key: HDDS-1767
 URL: https://issues.apache.org/jira/browse/HDDS-1767
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Currently ContainerStateMachine uses the executors provided by 
XceiverServerRatis for executing applyTransaction calls. This would result in 
two or more ContainerStateMachine to share the same set of executors. Delay or 
load in one ContainerStateMachine would adversely affect the performance of 
other state machines in such a case. It is better to have separate set of 
executors for each ContainerStateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1766) ContainerStateMachine is unable to increment lastAppliedIndex

2019-07-05 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1766:
-

 Summary: ContainerStateMachine is unable to increment 
lastAppliedIndex
 Key: HDDS-1766
 URL: https://issues.apache.org/jira/browse/HDDS-1766
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain


ContainerStateMachine#updateLastApplied currently updates the 
lastAppliedTermIndex using applyTransactionCompletionMap. There are null 
entries in the applyTransactionCompletionMap causing the lastAppliedIndex to 
not be incremented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1750) Add block allocation metric for pipelines in SCM

2019-07-02 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1750:
-

 Summary: Add block allocation metric for pipelines in SCM
 Key: HDDS-1750
 URL: https://issues.apache.org/jira/browse/HDDS-1750
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


This Jira aims to add block allocation metrics for pipelines in SCM. This would 
help in determining the distribution of block allocations among various 
pipelines in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1626) Optimize allocateBlock for cases when excludeList is provided

2019-06-02 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1626:
-

 Summary: Optimize allocateBlock for cases when excludeList is 
provided
 Key: HDDS-1626
 URL: https://issues.apache.org/jira/browse/HDDS-1626
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


This Jira aims to optimize allocateBlock for cases when excludeList is 
provided. This includes the case when excludeList is empty or the cases when it 
is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-21 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1461:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server

2019-05-14 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1503:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> Reduce garbage generated by non-netty threads in datanode ratis server
> --
>
> Key: HDDS-1503
> URL: https://issues.apache.org/jira/browse/HDDS-1503
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We use GRPC protocol for rpc communication in Ratis. By default thread caches 
> are generated even for non-netty threads. This Jira aims to add a default JVM 
> parameter for disabling thread caches for non-netty threads in datanode ratis 
> server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server

2019-05-13 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1503:
--
Status: Patch Available  (was: Open)

> Reduce garbage generated by non-netty threads in datanode ratis server
> --
>
> Key: HDDS-1503
> URL: https://issues.apache.org/jira/browse/HDDS-1503
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We use GRPC protocol for rpc communication in Ratis. By default thread caches 
> are generated even for non-netty threads. This Jira aims to add a default JVM 
> parameter for disabling thread caches for non-netty threads in datanode ratis 
> server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12735) Make ContainerStateMachine#applyTransaction async

2019-05-11 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDFS-12735.

Resolution: Duplicate

> Make ContainerStateMachine#applyTransaction async
> -
>
> Key: HDFS-12735
> URL: https://issues.apache.org/jira/browse/HDFS-12735
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: performance
> Attachments: HDFS-12735-HDFS-7240.000.patch, 
> HDFS-12735-HDFS-7240.001.patch, HDFS-12735-HDFS-7240.002.patch
>
>
> Currently ContainerStateMachine#applyTransaction makes a synchronous call to 
> dispatch client requests. Idea is to have a thread pool which dispatches 
> client requests and returns a CompletableFuture.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server

2019-05-08 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1503:
-

 Summary: Reduce garbage generated by non-netty threads in datanode 
ratis server
 Key: HDDS-1503
 URL: https://issues.apache.org/jira/browse/HDDS-1503
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain


We use GRPC protocol for rpc communication in Ratis. By default thread caches 
are generated even for non-netty threads. This Jira aims to add a default JVM 
parameter for disabling thread caches for non-netty threads in datanode ratis 
server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir

2019-05-01 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1481:
-

 Summary: Cleanup BasicOzoneFileSystem#mkdir
 Key: HDDS-1481
 URL: https://issues.apache.org/jira/browse/HDDS-1481
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Filesystem
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in 
HDDS-1300. The changes for this function were missed in HDDS-1460.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-04-29 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1461:
--
Status: Patch Available  (was: Open)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem

2019-04-26 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1460:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
> -
>
> Key: HDDS-1460
> URL: https://issues.apache.org/jira/browse/HDDS-1460
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This 
> Jira aims to bring back those optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-04-24 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1461:
--
Summary: Optimize listStatus api in OzoneFileSystem  (was: Optimize 
listStatus api in OzoneFileStatus)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1461) Optimize listStatus api in OzoneFileStatus

2019-04-24 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1461:
-

 Summary: Optimize listStatus api in OzoneFileStatus
 Key: HDDS-1461
 URL: https://issues.apache.org/jira/browse/HDDS-1461
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Filesystem, Ozone Manager
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Currently in listStatus we make multiple getFileStatus calls. This can be 
optimized by converting to a single rpc call for listStatus.

Also currently listStatus has to traverse a directory recursively in order to 
list its immediate children. This happens because in OzoneManager all the 
metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem

2019-04-24 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1460:
-

 Summary: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
 Key: HDDS-1460
 URL: https://issues.apache.org/jira/browse/HDDS-1460
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This 
Jira aims to bring back those optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation

2019-04-22 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823698#comment-16823698
 ] 

Lokesh Jain commented on HDDS-1448:
---

The changes required for this Jira would enable multiple three node pipelines 
in a datanode. It was implemented this way to make sure that a datanode is not 
part of more than one factor three pipeline.

> RatisPipelineProvider should only consider open pipeline while excluding dn 
> for pipeline allocation
> ---
>
> Key: HDDS-1448
> URL: https://issues.apache.org/jira/browse/HDDS-1448
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> While allocation pipelines, Ratis pipeline provider considers all the 
> pipelines irrespective of the state of the pipeline. This can lead to case 
> where all the datanodes are up but the pipelines are in closing state in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1405) ITestOzoneContractCreate is failing

2019-04-09 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDDS-1405.
---
Resolution: Resolved

> ITestOzoneContractCreate is failing
> ---
>
> Key: HDDS-1405
> URL: https://issues.apache.org/jira/browse/HDDS-1405
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ITestOzoneContractCreate and ITestOzoneContractMkdir are failing with 
> FileAlreadyExistsException. The issue is around the file imported in 
> BasicOzoneClientAdapterImpl. The class needs to import 
> org.apache.hadoop.fs.FileAlreadyExistsException but currently imports 
> java.nio.file.FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1405) ITestOzoneContractCreate is failing

2019-04-09 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-1405:
-

 Summary: ITestOzoneContractCreate is failing
 Key: HDDS-1405
 URL: https://issues.apache.org/jira/browse/HDDS-1405
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


ITestOzoneContractCreate and ITestOzoneContractMkdir are failing with 
FileAlreadyExistsException. The issue is around the file imported in 
BasicOzoneClientAdapterImpl. The class needs to import 
org.apache.hadoop.fs.FileAlreadyExistsException but currently imports 
java.nio.file.FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1301) Optimize recursive ozone filesystem apis

2019-04-08 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1301:
--
Status: Patch Available  (was: Open)

> Optimize recursive ozone filesystem apis
> 
>
> Key: HDDS-1301
> URL: https://issues.apache.org/jira/browse/HDDS-1301
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1301.001.patch
>
>
> This Jira aims to optimise recursive apis in ozone file system. These are the 
> apis which have a recursive flag which requires an operation to be performed 
> on all the children of the directory. The Jira would add support for 
> recursive apis in Ozone manager in order to reduce the number of rpc calls to 
> Ozone Manager. Also currently these operations are not atomic. This Jira 
> would make all the operations in ozone filesystem atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1301) Optimize recursive ozone filesystem apis

2019-04-08 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812538#comment-16812538
 ] 

Lokesh Jain commented on HDDS-1301:
---

Uploaded v1 patch for review. Will create pull request for the same.

> Optimize recursive ozone filesystem apis
> 
>
> Key: HDDS-1301
> URL: https://issues.apache.org/jira/browse/HDDS-1301
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1301.001.patch
>
>
> This Jira aims to optimise recursive apis in ozone file system. These are the 
> apis which have a recursive flag which requires an operation to be performed 
> on all the children of the directory. The Jira would add support for 
> recursive apis in Ozone manager in order to reduce the number of rpc calls to 
> Ozone Manager. Also currently these operations are not atomic. This Jira 
> would make all the operations in ozone filesystem atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1301) Optimize recursive ozone filesystem apis

2019-04-08 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1301:
--
Attachment: HDDS-1301.001.patch

> Optimize recursive ozone filesystem apis
> 
>
> Key: HDDS-1301
> URL: https://issues.apache.org/jira/browse/HDDS-1301
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1301.001.patch
>
>
> This Jira aims to optimise recursive apis in ozone file system. These are the 
> apis which have a recursive flag which requires an operation to be performed 
> on all the children of the directory. The Jira would add support for 
> recursive apis in Ozone manager in order to reduce the number of rpc calls to 
> Ozone Manager. Also currently these operations are not atomic. This Jira 
> would make all the operations in ozone filesystem atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1294) ExcludeList shoud be a RPC Client config so that multiple streams can avoid the same error.

2019-04-05 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811164#comment-16811164
 ] 

Lokesh Jain commented on HDDS-1294:
---

[~shashikant] Thanks for updating the patch! In ExcludeList#getPipelineIds 
iteration over the list would still need to be synchronized explicitly.

> ExcludeList shoud be a RPC Client config so that multiple streams can avoid 
> the same error.
> ---
>
> Key: HDDS-1294
> URL: https://issues.apache.org/jira/browse/HDDS-1294
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: HDDS-1294.000.patch, HDDS-1294.001.patch
>
>
> ExcludeList right now is a per BlockOutPutStream value, this can result in 
> multiple keys created out of the same client to run into same exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1294) ExcludeList shoud be a RPC Client config so that multiple streams can avoid the same error.

2019-04-05 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810704#comment-16810704
 ] 

Lokesh Jain commented on HDDS-1294:
---

[~shashikant] Thanks for working on this! The patch looks good to me. Please 
find my comments below.
 # ExcludeList.java - We should synchronize getProtobuf and getPipelineIds 
calls as well.
 # TestCloseContainerHandlingByClient#testContainerExclusionWithMultipleClients 
: We should remove the ignore annotation and rename the function to 
testContainerExclusionWithMultiple"Streams".

> ExcludeList shoud be a RPC Client config so that multiple streams can avoid 
> the same error.
> ---
>
> Key: HDDS-1294
> URL: https://issues.apache.org/jira/browse/HDDS-1294
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: HDDS-1294.000.patch
>
>
> ExcludeList right now is a per BlockOutPutStream value, this can result in 
> multiple keys created out of the same client to run into same exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1349) Remove watchClient from XceiverClientRatis

2019-04-03 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809500#comment-16809500
 ] 

Lokesh Jain commented on HDDS-1349:
---

[~shashikant] Thanks for working on this! The patch looks good to me. +1.

> Remove watchClient from XceiverClientRatis
> --
>
> Key: HDDS-1349
> URL: https://issues.apache.org/jira/browse/HDDS-1349
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1349.000.patch
>
>
> WatchForCommit now bypasses the sliding window of RaftClient. and hence 
> creating a new raft client for calling watchForCommit is not required as it 
> won't block any subsequent calls. This Jira aims to remove the watchClient 
> from XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1134) OzoneFileSystem#create should allocate alteast one block for future writes.

2019-03-29 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain reopened HDDS-1134:
---

Reopening issue as it was not fixed in HDDS-1300.

> OzoneFileSystem#create should allocate alteast one block for future writes.
> ---
>
> Key: HDDS-1134
> URL: https://issues.apache.org/jira/browse/HDDS-1134
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-1134.001.patch
>
>
> While opening a new key, OM should at least allocate one block for the key, 
> this should be done in case the client is not sure about the number of block. 
> However for users of OzoneFS, if the key is being created for a directory, 
> then no blocks should be allocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-29 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1300:
--
Resolution: Resolved
Status: Resolved  (was: Patch Available)

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-29 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1300:
--
Fix Version/s: 0.5.0

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-29 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805006#comment-16805006
 ] 

Lokesh Jain commented on HDDS-1300:
---

[~msingh] [~bharatviswa] Thanks for reviewing the patch! I have committed the 
patch to trunk.

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-29 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804739#comment-16804739
 ] 

Lokesh Jain commented on HDDS-1300:
---

[~bharatviswa] Thanks for reviewing the patch! v8 patch removes the 
allocateBlock call in createFile function. The allocateBlock call can be added 
in a followup jira.

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-29 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-1300:
--
Attachment: HDDS-1300.008.patch

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis

2019-03-28 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803954#comment-16803954
 ] 

Lokesh Jain commented on HDDS-1300:
---

[~msingh] Based on offline discussion v7 patch avoids allocateBlock call while 
lock is held in createFile.

> Optimize non-recursive ozone filesystem apis
> 
>
> Key: HDDS-1300
> URL: https://issues.apache.org/jira/browse/HDDS-1300
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, 
> HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, 
> HDDS-1300.006.patch, HDDS-1300.007.patch
>
>
> This Jira aims to optimise non recursive apis in ozone file system. The Jira 
> would add support for such apis in Ozone manager in order to reduce the 
> number of rpc calls to Ozone Manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >