[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-10-14 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171346#comment-14171346
 ] 

Hari Mankude commented on HDFS-7056:


Is the proposal to copy the list of blocks to snapshot copy only when the file 
is truncated or is it when snapshot is taken irrespective of whether file is 
truncated or not?

After the file is truncated and then appended again, will all subsequent 
snapshots of the file get a copy of block list?

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko

 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart

2013-11-05 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814338#comment-13814338
 ] 

Hari Mankude commented on HDFS-5427:


Is this patch going to be backported to 2.2 also?

 not able to read deleted files from snapshot directly under snapshottable dir 
 after checkpoint and NN restart
 -

 Key: HDFS-5427
 URL: https://issues.apache.org/jira/browse/HDFS-5427
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/bar
 3. create a snapshot s1 under /foo
 4. delete the file /foo/bar
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar
 client will get BlockMissingException
 Reason is 
 While loading the deleted file list for a snashottable dir from fsimage, 
 blocks were not updated in blocksmap



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4872) Idempotent delete operation.

2013-06-05 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675915#comment-13675915
 ] 

Hari Mankude commented on HDFS-4872:


modTime cannot be used as unique parameter due to race between delete and 
append. 

If hdfs had cTime, it would have been a unique file specific value in 
combination with file path. 

 Idempotent delete operation.
 

 Key: HDFS-4872
 URL: https://issues.apache.org/jira/browse/HDFS-4872
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Konstantin Shvachko

 Making delete idempotent is important to provide uninterrupted job execution 
 in case of HA failover.
 This is to discuss different approaches to idempotent implementation of 
 delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files

2013-06-03 Thread Hari Mankude (JIRA)
Hari Mankude created HDFS-4873:
--

 Summary: callGetBlockLocations returns incorrect number of blocks 
for snapshotted files
 Key: HDFS-4873
 URL: https://issues.apache.org/jira/browse/HDFS-4873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Hari Mankude
Assignee: Jing Zhao


callGetBlockLocations() returns all the blocks of a file even when they are not 
present in the snap version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files

2013-06-03 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673558#comment-13673558
 ] 

Hari Mankude commented on HDFS-4873:


The sequence of operations for creating the problem

1. create a file of size one block
2. take a snapshot
3. append some data to this file.
4. use DfsClient.callGetBlockLocations() to get block locations of the snapshot 
version of the file. The file len is specified as Long.MAX_VALUE.
5. This call returns two LocatedBlocks for the snapshot version of the file 
instead of one block.

 callGetBlockLocations returns incorrect number of blocks for snapshotted files
 --

 Key: HDFS-4873
 URL: https://issues.apache.org/jira/browse/HDFS-4873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Hari Mankude
Assignee: Jing Zhao

 callGetBlockLocations() returns all the blocks of a file even when they are 
 not present in the snap version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files

2013-06-03 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673562#comment-13673562
 ] 

Hari Mankude commented on HDFS-4873:


Looks like the problem is in getBlockLocationsUpdateTimes() where length is not 
truncated to fileSize before calling createLocatedBlocks(). There are other 
solutions possible if snap inode is passed in.

 callGetBlockLocations returns incorrect number of blocks for snapshotted files
 --

 Key: HDFS-4873
 URL: https://issues.apache.org/jira/browse/HDFS-4873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Hari Mankude
Assignee: Jing Zhao

 callGetBlockLocations() returns all the blocks of a file even when they are 
 not present in the snap version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660724#comment-13660724
 ] 

Hari Mankude commented on HDFS-4817:


Colin,

Can this feature be extended to determine where data needs to be stored in DN? 
For example, a DN might have SSDs and SATA/SAS drives and depending on hints 
provided by the user on the access patterns (random reads vs long sequential 
reads), it might be useful to put the data in SSDs vs SATA. I understand that 
NN has to be involved to make this information persistent during block 
relocation. 

The nice goal would be to make DN smarter (or have the ability to learn with 
minimal involvement from NN) than what it is doing right now given that nodes 
can have storage devices with vastly different characteristics. Another option 
is to use access patterns to move data across various storages in DN. [sort of 
HSM]

It looks like current patch is mainly to manage the OS pagecache. 

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660880#comment-13660880
 ] 

Hari Mankude commented on HDFS-4817:


I would look at the patch as an ability for the user to provide hints to DN 
regarding the access patterns (random reads/sequential read/write once 
only/multiple access etc). It is incidental that these hints are currently used 
to manage pagecache. The same hints or similar hints can be used for moving 
blocks to different storage tiers at DN. 

Another suggestion that I had is to provide a fadvise() like interface on the 
iostream that a user can use to send hints.

I am aware of hfds-4672. It is a complicated and correct way of managing 
storage pools.

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643734#comment-13643734
 ] 

Hari Mankude commented on HDFS-4750:


I would recommend thinking through NFS write operations. The client does 
caching and page cache can result in lots of weirdness. For example, as long as 
the data is cached in client's page cache, client can do random writes and 
overwrites. When page cache is flushed to hdfs data store, some writes would 
fail (translate to overwrites in hdfs) while others might succeed (offsets 
happen to be append). 

An alternative to consider to support NFS writes is to require clients do NFS 
mounts with directio enabled. Directio will bypass client cache and might 
alleviate some of the funky behavior.



 Support NFSv3 interface to HDFS
 ---

 Key: HDFS-4750
 URL: https://issues.apache.org/jira/browse/HDFS-4750
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HADOOP-NFS-Proposal.pdf


 Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
 integration with client’s file system makes it difficult for users and 
 impossible for some applications to access HDFS. NFS interface support is one 
 way for HDFS to have such easy integration.
 This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
 client, webHDFS and the NFS interface, HDFS will be easier to access and be 
 able support more applications and use cases. 
 We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642102#comment-13642102
 ] 

Hari Mankude commented on HDFS-4750:


Implementing writes might not be easy. The client implementations in various 
kernels does not guarantee that the writes are issued in sequential order. Page 
flushing algorithms try to find contiguous pages (offsets). However, there are 
other factors in play with page flushing algorithms.  So it does not imply that 
writes from the client has to be sequential as HDFS requires it to be. This is 
true whether the writes are coming in lazily from the client or due to a sync() 
before close(). A possible solution is for nfs gateway on dfs client to cache 
and reorder the writes to be sequential. But, this might still result in 
holes which hdfs cannot handle. Also, the cache requirements might not be 
trivial and might require a flush to local disk.

NFS interfaces are very useful for reads. 

 Support NFSv3 interface to HDFS
 ---

 Key: HDFS-4750
 URL: https://issues.apache.org/jira/browse/HDFS-4750
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HADOOP-NFS-Proposal.pdf


 Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
 integration with client’s file system makes it difficult for users and 
 impossible for some applications to access HDFS. NFS interface support is one 
 way for HDFS to have such easy integration.
 This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
 client, webHDFS and the NFS interface, HDFS will be easier to access and be 
 able support more applications and use cases. 
 We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4758) Disallow nested snapshottable directories

2013-04-25 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642270#comment-13642270
 ] 

Hari Mankude commented on HDFS-4758:


Actually one use case of nested snapshots that I see is that user might have 
different backup policies for /user (once every day) and /user/hive (every 8 
hrs). When backing up /user, it is possible to setup exclusions of /user/hive 
directory so that two copies of /user/hive is not made. However, if snapshots 
cannot be taken of /user and /user/hive at the same time, it would be a 
disadvantage.

 Disallow nested snapshottable directories
 -

 Key: HDFS-4758
 URL: https://issues.apache.org/jira/browse/HDFS-4758
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

 Nested snapshottable directories are supported by the current implementation. 
  However, it seems that there are no good use cases for nested snapshottable 
 directories.  So we disable it for now until someone has a valid use case for 
 it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4758) Disallow nested snapshottable directories

2013-04-25 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642316#comment-13642316
 ] 

Hari Mankude commented on HDFS-4758:


The trade-off is between usability vs complexity. In this case, it might result 
in issues where a user has taken a snapshot of /user/foo/dir1 and admin finds 
that system-wide snaps cannot be taken at say /user dir levels since there are 
several users with their snapshots at lower directories. This might limit the 
usability of the feature.

 Disallow nested snapshottable directories
 -

 Key: HDFS-4758
 URL: https://issues.apache.org/jira/browse/HDFS-4758
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

 Nested snapshottable directories are supported by the current implementation. 
  However, it seems that there are no good use cases for nested snapshottable 
 directories.  So we disable it for now until someone has a valid use case for 
 it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2576) Namenode should have a favored nodes hint to enable clients to have control over block placement.

2013-03-08 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597548#comment-13597548
 ] 

Hari Mankude commented on HDFS-2576:


Is data skew going to be an issue where some DNs are overloaded vs other DNs? 
Would this an issue when there is other data stored in hdfs along with hbase?

 Namenode should have a favored nodes hint to enable clients to have control 
 over block placement.
 -

 Key: HDFS-2576
 URL: https://issues.apache.org/jira/browse/HDFS-2576
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Pritam Damania
 Attachments: hdfs-2576-1.txt, hdfs-2576-trunk-1.patch


 Sometimes Clients like HBase are required to dynamically compute the 
 datanodes it wishes to place the blocks for a file for higher level of 
 locality. For this purpose there is a need of a way to give the Namenode a 
 hint in terms of a favoredNodes parameter about the locations where the 
 client wants to put each block. The proposed solution is a favored nodes 
 parameter in the addBlock() method and in the create() file method to enable 
 the clients to give the hints to the NameNode about the locations of each 
 replica of the block. Note that this would be just a hint and finally the 
 NameNode would look at disk usage, datanode load etc. and decide whether it 
 can respect the hints or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4087) Protocol changes for listSnapshots functionality

2013-02-21 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583360#comment-13583360
 ] 

Hari Mankude commented on HDFS-4087:


Has the listSnap cli call been added?

 Protocol changes for listSnapshots functionality
 

 Key: HDFS-4087
 URL: https://issues.apache.org/jira/browse/HDFS-4087
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Snapshot (HDFS-2802)
Reporter: Brandon Li
Assignee: Brandon Li
  Labels: needs-test
 Fix For: Snapshot (HDFS-2802)

 Attachments: HDFS-4087.patch, HDFS-4087.patch, HDFS-4087.patch, 
 HDFS-4087.patch


 SnapInfo saves information about a snapshot. This jira also updates the java 
 protocol classes and translation for listSnapshot operation.
 Given a snapshot root, the snapshots create under it can be listed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-22 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481669#comment-13481669
 ] 

Hari Mankude commented on HDFS-2802:


Nicholas is right in that we do start off with O(1) memory usage. But depending 
on writes and updates on the base filesystem, memory usage for snapshot will 
increase. The worst case is when an application updates all the files in the 
snapshotted subtree. Even in this scenario, the snap inodes are minimized 
versions of the actual file inode and retain only the relevant information for 
snapshots. Additionally (in the prototype), if multiple snapshots are taken of 
the same subtree, then significant optimizations are done to reduce the memory 
footprint by representing more than one snapshot in a single snapINode. 

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-20 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480745#comment-13480745
 ] 

Hari Mankude commented on HDFS-2802:


Todd, another option is to look at the inodesUnderConstruction in the NN and 
query the DNs for the exact filesize at the time of taking snapshot. Even with 
this, the filesize that is obtained will be at the instant. Applications like 
hbase will have to deal with hlogs that could have incomplete log entries when 
an un-cordinated snapshot is taken at the hdfs. A better approach is to have 
the application reach a quiesce point and then take a snap. This is normally 
done for oracle (hot backup mode) and sqlserver so that an application 
consistent snapshot can be taken.

Also, createSnap()/removeSnap() has the writeLock() on the FSNamesystem which 
will ensure that there are no other metadata updates when snap is being taken.




 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-20 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480784#comment-13480784
 ] 

Hari Mankude commented on HDFS-2802:


Todd,

I do not agree that your solution will be any beneficial to hbase than what is 
being proposed. Any type of txid information in DNs will be at the beginning of 
the transaction. If the client is writing in the middle of block, there is no 
way to know the exact size when snap was taken. Querying 
inodesUnderConstruction will give the block length at the time of the query. It 
is not possible to take an application consistent snapshot (one which does not 
require recovery) without coordination with the application.  

In fact, communication with DNs when snapshots are being taken will make the 
process of taking snapshots very slow while giving very little additional 
benefit. 

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-10-20 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480785#comment-13480785
 ] 

Hari Mankude commented on HDFS-2802:


Sorry hit the comment early.

Additionally, including the sizes of non-finalized blocks in snapshots has 
implication that if the client dies and the non-finalized section is discarded, 
then snapshot might have pointers to non-existent blocks.



 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-07-12 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412969#comment-13412969
 ] 

Hari Mankude commented on HDFS-2802:


A quick user's guide 

hadoop dfsadmin -createsnap snapname path where snap is to be taken ro/rw 
will create a snap with snapname at the location mentioned

hadoop dfsadmin -removesnap snapname will remove snapshot

hadoop dfsadmin -listsnap / will list all snaps that have been taken under / 

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-07-11 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412508#comment-13412508
 ] 

Hari Mankude commented on HDFS-2802:


I am attaching an early version of the patch based on the trunk. It took longer 
than I expected to rebase the patch. Code needs cleanup, further optimization 
of memory usage in the NN, fixes to checkpointing code to handle some border 
conditions and more tests. Next steps - Working on splitting the patch into 
smaller and easy to review code. Branch HDFS-2802 has been created for this 
work. Next version of the design document will be posted soon. (some we 
discussed during HDFS meetup). 

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snapshot-one-pager.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-07-11 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude updated HDFS-2802:
---

Attachment: snap.patch

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-12 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274035#comment-13274035
 ] 

Hari Mankude commented on HDFS-3370:


Can the hard linked files be reopened for append?


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-05-09 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271616#comment-13271616
 ] 

Hari Mankude commented on HDFS-2802:


@Eli,
Regarding scenario #3, consider a hbase setup with huge dataset in production. 
A new app has been developed which needs to be validated against production 
dataset. It is not feasible to copy the entire dataset to a test setup. At the 
same time, app is not ready for production and it is not safe to have the app 
modify the data in the production database. One of the solutions for these 
types of problems is to take a RW snapshot of the production dataset and then 
have the development app run against the RW snapshot. After the app testing is 
done, RW snap is deleted. This assumes that the cluster has sufficient compute 
capacity and incremental storage capacity to support RW snaps.

Regarding appends, current prototype of snapshot relies on the filesize that is 
available at the namenode. So, if a file is appended after snap is taken, then 
it is a no-op from a snap perspective. If a snap is taken of a file which has 
append pipeline setup, inode is of type underconstruction in the NN. Prototype 
relies on filesize that is available on the NN for snaps. This might not be 
perfect and I have some ideas on trying to acquire more upto-date filesize.  

I thought that truncate is not supported currently in the trunk. If you are 
referring to deletes, prototype handles deletes correctly without issues. 

I will post a more detailed doc after I am done with HA related work.

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snapshot-one-pager.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-30 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude updated HDFS-3293:
---

Attachment: hdfs-3293-2.patch

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293-1.patch, hdfs-3293-2.patch, hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3325) When configuring dfs.namenode.safemode.threshold-pct to a value greater or equal to 1 there is mismatch in the UI report

2012-04-30 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude reassigned HDFS-3325:
--

Assignee: Hari Mankude

 When configuring dfs.namenode.safemode.threshold-pct to a value greater or 
 equal to 1 there is mismatch in the UI report
 --

 Key: HDFS-3325
 URL: https://issues.apache.org/jira/browse/HDFS-3325
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
Assignee: Hari Mankude
Priority: Minor
 Fix For: 2.0.0, 3.0.0


 When dfs.namenode.safemode.threshold-pct is configured to n
 Namenode will be in safemode until n percentage of blocks that should satisfy 
 the minimal replication requirement defined by 
 dfs.namenode.replication.min is reported to namenode
 But in UI it displays that n percentage of total blocks + 1 blocks  are 
 additionally needed
 to come out of the safemode
 Scenario 1:
 
 Configurations:
 dfs.namenode.safemode.threshold-pct = 2
 dfs.replication = 2
 dfs.namenode.replication.min =2
 Step 1: Start NN,DN1,DN2
 Step 2: Write a file a.txt which has got 167 blocks
 step 3: Stop NN,DN1,DN2
 Step 4: start NN
 In UI report the Number of blocks needed to come out of safemode and number 
 of blocks actually present is different.
 {noformat}
 Cluster Summary
 Security is OFF 
 Safe mode is ON. The reported blocks 0 needs additional 335 blocks to reach 
 the threshold 2. of total blocks 167. Safe mode will be turned off 
 automatically.
 2 files and directories, 167 blocks = 169 total.
 Heap Memory used 57.05 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory 
 is 2 GB. 
 Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. 
 Max Non Heap Memory is 176 MB.{noformat}
 Scenario 2:
 ===
 Configurations:
 dfs.namenode.safemode.threshold-pct = 1
 dfs.replication = 2
 dfs.namenode.replication.min =2
 Step 1: Start NN,DN1,DN2
 Step 2: Write a file a.txt which has got 167 blocks
 step 3: Stop NN,DN1,DN2
 Step 4: start NN
 In UI report the Number of blocks needed to come out of safemode and number 
 of blocks actually present is different
 {noformat}
 Cluster Summary
 Security is OFF 
 Safe mode is ON. The reported blocks 0 needs additional 168 blocks to reach 
 the threshold 1. of total blocks 167. Safe mode will be turned off 
 automatically.
 2 files and directories, 167 blocks = 169 total.
 Heap Memory used 56.2 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory 
 is 2 GB. 
 Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. 
 Max Non Heap Memory is 176 MB.{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-27 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude updated HDFS-3293:
---

Target Version/s: 0.24.0
  Status: Patch Available  (was: Open)

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-27 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude updated HDFS-3293:
---

Attachment: hdfs-3293.patch

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-27 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263987#comment-13263987
 ] 

Hari Mankude commented on HDFS-3293:


Changes are trivial. So test is not included.

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3205) testHANameNodesWithFederation is failing in trunk

2012-04-27 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude resolved HDFS-3205.


Resolution: Duplicate

This is a dup of hdfs-2960

 testHANameNodesWithFederation is failing in trunk
 -

 Key: HDFS-3205
 URL: https://issues.apache.org/jira/browse/HDFS-3205
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor

 The test is failing with the error
 org.junit.ComparisonFailure: expected:ns1-nn1.example.com[]:8020 but 
 was:ns1-nn1.example.com[/50.28.50.93]:8020

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-27 Thread Hari Mankude (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude updated HDFS-3293:
---

Attachment: hdfs-3293-1.patch

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293-1.patch, hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3293) Implement equals for storageinfo and journainfo class.

2012-04-27 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264022#comment-13264022
 ] 

Hari Mankude commented on HDFS-3293:


Fixed all the issues mentioned by Nicholas.

 Implement equals for storageinfo and journainfo class. 
 ---

 Key: HDFS-3293
 URL: https://issues.apache.org/jira/browse/HDFS-3293
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Priority: Minor
 Attachments: hdfs-3293-1.patch, hdfs-3293.patch


 Implement equals for storageinfo and journalinfo class. Also journalinfo 
 class needs a toString() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira