[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2015-08-25 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710744#comment-14710744
 ] 

Vinayakumar B commented on HDFS-5851:
-

This Jira should be marked as duplicate of HDFS-6581 ?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-08-22 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107876#comment-14107876
 ] 

Henry Saputra commented on HDFS-5851:
-

Thanks [~arpitagarwal], will check out the new JIRA

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-08-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101208#comment-14101208
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Let's create a branch for this work, just like we did for the original 
HDFS-4949 work.  This is clearly a big feature, and I think we'll need to 
iterate a bit to get the best design.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047035#comment-14047035
 ] 

Konstantin Shvachko commented on HDFS-5851:
---

Few questions as I don't see it covered in the design document:
# Heterogeneous storage is implemented but not enabled, as for now it can only 
allocate the StorageType.DEFAULT blocks. This seems to be the first extension 
to other StorageTypes. Why memory type is getting prioritized ahead of say SSDs?
# I understand the design assumes that only one DN will hold a memory replica 
of a  DDM block. This will increase the latency accessing that block for a 
single client, but it also makes this DN a bottleneck for many clients trying 
to access the same data.
# If I understood correctly the proposal is to not introduce new APIs for 
discarding unnecessary or lost data, but handle it using a discardability 
policy. What is the policy?
#- In this regard Eric's idea of ZK ephemeral nodes is interesting, but 
probably not directly applicable, as a file should not be discarded only 
because its creator quit.
# Eviction policy is another thing which needs clarification.
# What do you mean by static allocation of memory? A configuration parameter 
for DNs?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-27 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046247#comment-14046247
 ] 

Henry Saputra commented on HDFS-5851:
-

HI [~sanjay.radia], I was looking at the JIRA and proposal and I have some 
questions related to it:
1. I did not see where the memory will be allocated for the DDM proposal. Is it 
similar to HDFS-4949 to use the memory from Datanode?
2. As for the APIs, would it be new Hadoop FS (Java) APIs or higher level 
construct to store data in memory because it seemed that the proposal only 
relying on file path to indicate trying to use in-memory cache
3. In the problem statement of the proposal seemed like there would be policy 
to manage how data should be store in  memory per application but I could not 
find details about how to achieve it. Some applications may need to have quick 
access to some small portion of data more significant (eg: newer time series 
data) whereas some others may be need to store more (eg: large Hive query)
4. In term of discardability, what is the eviction policy for such data and 
how control or fine tune it if needed.

Maybe it was discussed in the in-person happened before but I could not find it 
in the meet summary.
Thanks for driving this new feature.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046280#comment-14046280
 ] 

Andrew Purtell commented on HDFS-5851:
--

bq.  In term of discardability, what is the eviction policy for such data and 
how control or fine tune it if needed.

Related, I was talking with [~cmccabe] in the context of HDFS-4949 about 
possible LRU or LFU policy based eviction, and how that might work. Interesting 
open question of how to revoke access to mapped pages shared by the datanode 
with another process without causing the client process to segfault. I don't 
see this issue addressed in the design doc on this issue. One possibility is a 
callback protocol advising the client process of pending invalidations?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034164#comment-14034164
 ] 

Todd Lipcon commented on HDFS-5851:
---

Yep, the native checksumming that James is working on is one big part of it. 
The other half is the work that Trevor Robinson was doing on using 
DirectByteBuffers on the DN side to avoid some copies to/from byte arrays.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-17 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034632#comment-14034632
 ] 

Sanjay Radia commented on HDFS-5851:


Colin wrote:
bq. why a separate namespace under hdfs://namespace/.reserved/ddm ? We have 
xattrs now, so files ...
I did not explain it well. It is a separation of policy and mechanism. HDFS has 
to support such files for ANY name. Hence we can use xattr to create files 
write cache.

The policy of managing the memory space and the underlying swap space (e.g. 
hdfs://namepace/.reserved/ddm) is separate from the write-cache mechanism that 
HDFS needs to support in ANY part of its namespace; so I believe we are in 
agreement here. I will explain the policy I am proposing in a separate comment.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029500#comment-14029500
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Thanks for the pointer, James!

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-11 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028500#comment-14028500
 ] 

Arpit Agarwal commented on HDFS-5851:
-

[~tlipcon] - I think you mentioned you had prototyped some write pipeline 
improvements which showed significant improvements. Are you able to share those 
out?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-11 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028535#comment-14028535
 ] 

James Thomas commented on HDFS-5851:


Hi Arpit, I posted some preliminary work on native checksumming in the write 
path at HDFS-3528.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-03 Thread Ali Ghodsi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016877#comment-14016877
 ] 

Ali Ghodsi commented on HDFS-5851:
--

[~sanjay.radia] Sanjay, Tachyon indeed has had discardability since about a 
year back. In fact, the lineage support was only put in in the latest version. 
In fact, Spark's current releases use Tachyon without lineage stored in 
Tachyon. If data falls out of Tachyon, Spark will recompute it. Please update 
the design doc accordingly. Thanks.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016907#comment-14016907
 ] 

Andrew Purtell commented on HDFS-5851:
--

bq. Can we make these files transient like ZK ephemeral nodes?

This is an interesting idea but ZK ephemeral nodes rely on session semantics. 
Would the HDFS equivalent be close-on-delete, lease == session? Perhaps it can 
be done generically for all types of files on all media by implementing 
delete-on-close with refcounting. You could imagine a file marked 
delete-on-close open for append, with multiple readers tailing data behind the 
writer. Once the writer and all readers close the file it could be garbage 
collected.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-05-23 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007631#comment-14007631
 ] 

Todd Lipcon commented on HDFS-5851:
---

{quote}
 The case of a local short circuit read having access to the open file is 
 interesting... does this pin the memory until the possibly misbehaved client 
 process closes the socket / FD?
 Yes this is correct. This should be the existing behavior with short-circuit 
 reads.
{quote}
Not quite - it doesn't pin the memory, since the memory pinning is implemented 
by the datanode calling mlock. The datanode can still munlock the memory even 
if the client holds the fd open, and the client will fall back to reading from 
disk (through normal linux read path)

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-05-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007896#comment-14007896
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Todd is correct here.  It's the DataNode which pins the block file and metadata 
file, and it can un-pin them if the client takes too long.

A few concerns:
* why a separate namespace under {{hdfs://namespace/.reserved/ddm}} ?  We 
have xattrs now, so files which have been created in the write cache could be 
identified with a given (system) xattr.
* discarding files when memory gets tight.  spilling them to disk is another.  
we should have both policies available so that this can be useful in things 
like Spark.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-05-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005285#comment-14005285
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Hi [~eric14], I apologize for the long delayed response.

bq. The case of a local short circuit read having access to the open file is 
interesting... does this pin the memory until the possibly misbehaved client 
process closes the socket / FD?
Yes this is correct. This should be the existing behavior with short-circuit 
reads.

bq. Single replicas? Why would one want to triple replicate discardable memory? 
One should at least have the option to only keep a single local copy in HDFS.
We will have a single replica to begin with. There are use cases for triple 
replicas to share memory across sessions/applications. Multiple replicas would 
have to be optional as you suggest.

bq. If we can not prevent random access writes to DDM (we could presumably 
limit this in client API), then I don't think we can checksum or replicate 
until a file is closed. My gut is delaying such until close is the right call...
Direct writes to DN memory are appealing for performance reasons but there are 
some open questions including the ones you raised. Hence phase 1 will avoid 
short-circuit writes.

I will let Sanjay respond to the remaining questions since he is more familiar 
with those aspects.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-05-20 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004031#comment-14004031
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Minutes from Google Hangout:

Wrt to the mechanism to support memory caching there was high level agreement 
on the implementation phases roughly as:
* 1st phase - streaming socket write, but mlock on DN side so that it keeps it 
for readers.
** Make this work for a single replica
** Separately (in another Jira) investigate write-pipeline improvements because 
the write-pipeline has not been optimized. This should give us some initial 
performance numbers and one can start using this mechanism. [~tlipcon] (?) has 
a prototype.

* 2nd phase - Explore short-circuit write, but datanode still mlocks. We had a 
quick discussion on short-circuit write being tricky
** Recovery issues (RBW)
** Client can do things that can get the DN confused (e.g. truncate/append the 
file after close)

* Future phases
** Add lazy replication to other replicas (note earlier phases allowed only 1 
replica)
** Direct writes to memory by memory-mapping the file

Discussion on discardability:
* Shouldn't this be property of file (such a replica count of 1) rather than a 
a property of /.reserved/ddm?
** This needs further discussion on the jira.
* Why the two layer approach?
** We don't want to necessarily put load on NN for intermediate files and hence 
the 2nd layer.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986160#comment-13986160
 ] 

Arpit Agarwal commented on HDFS-5851:
-

The hangout is in progress right now. We are having some technical difficulties 
with Hangouts on Air. :-)

If you'd like to join please send me an email at aagarwal @ hortonworks.com 
from a Google account and I'll add you.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986159#comment-13986159
 ] 

Colin Patrick McCabe commented on HDFS-5851:


This link is a direct link to the hangout:

https://plus.google.com/hangouts/_/stream/72cpimt8aamrmkuj0036v19u08?pqs=1authuser=0hl=en

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-29 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984544#comment-13984544
 ] 

Sanjay Radia commented on HDFS-5851:


BTW we will host the meeting at Hortonworks for those that are local and want 
to attend in person:
Hortonworks
3460 W. Bayshore Rd
Palo Alto CA 94303


 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985036#comment-13985036
 ] 

Andrew Wang commented on HDFS-5851:
---

Hey Arpit/Sanjay, as a heads up, a bunch of us from Cloudera are planning on 
attending in person (myself, Colin, ATM, Todd, Charlie, maybe Eli). Looking 
forward to the meeting tomorrow.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983835#comment-13983835
 ] 

Arpit Agarwal commented on HDFS-5851:
-

I scheduled a Google+ hangout for 4/30 3-4pm PDT - [link 
here|https://plus.google.com/events/ckvo7ui46qihd6cfq0sqptrhogo?authkey=CMvgrcTOv9n12wE].

Let me know if you are unable to access it.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-25 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980741#comment-13980741
 ] 

eric baldeschwieler commented on HDFS-5851:
---

The case of a local short circuit read having access to the open file is 
interesting...  does this pin the memory until the possibly misbehaved client 
process closes the socket / FD?

Single replicas?  Why would one want to triple replicate discardable memory?  
One should at least have the option to only keep a single local copy in HDFS.

If we can not prevent random access writes to DDM (we could presumably limit 
this in client API), then I don't think we can checksum or replicate until a 
file is closed.  My gut is delaying such until close is the right call...

How are discarded or lost (node fails) blocks / files handled?  Do the names 
remain in the NN and get reported in FSCK and other operations?  We want to be 
sure this doesn't add work to operators.  

Can we make these files transient like ZK ephemeral nodes?

Once one assumes you don't need to replicate discardable files, then one can 
think about allocating only an arena name (think directory) in the NN and then 
creating individual files only at the DN, limiting NN interaction.  This would 
be a lot faster.  (You could still have remote access via 
.../ARENA/DN-NAME/name style URLs.)  With this you could vastly reduce NN 
interactions, which is probably good for latency reduction and scalability.  
You could then imagine using this mechanism for MR / Tez / Spark shuffle files 
...  which has been a long term project goal...  Maybe we should break this 
idea out into another JIRA... ?  happy to chat if folks want to flesh this out.

Involving Yarn in HDFS resource management is interestingly circular.  Is this 
needed?  One would want the right abstraction to allow other solutions to be 
applied to Yarnless deployments.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980183#comment-13980183
 ] 

Colin Patrick McCabe commented on HDFS-5851:


I took a quick look at the design doc.  I think the focus on discardable 
memory makes sense in light of next-gen frameworks like Spark, Tez, etc.  One 
note: Tachyon, Spark's caching layer, does not currently incorporate the 
concept of RDDs, although that support is planned, as I understand.  It's just 
caching (serialized) files at this point, and I think the semantics match up 
pretty well with what we're talking about here.  The execution framework can 
re-generate the data if needed... this re-generating support does not need to 
be included in HDFS.

I think that some HDFS applications will want the ability to treat multiple 
files as a single eviction unit... i.e., if you evict one file, you evict them 
all.  (Things like Hive tables are multiple files, but probably ought to be 
treated as a single unit for caching purposes.)  There are also some questions 
about when eviction can occur... it seems like it would be very inconvenient to 
do it while the file was being read.  On the other hand, we probably need a 
timeout to prevent a selfish process (or a process on a disconnected node) from 
pinning something in the cache forever by keeping a file open.

Clearly we want the ability to do things like skip checksums when reading the 
cached files.  This will reuse a lot of the HDFS-4949 code.  It's less clear 
what other aspects of the HDFS-4949 code we'll want to reuse.  I think cache 
pools might be one such thing.  There is a potential to reuse some of the 
implementation as well, such as mlocking and so forth.  An mlocked file in 
/dev/shm could be a good way to go here.

I am free all of next week, except for Friday.  Let's schedule a webex so we 
can figure this stuff out.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-24 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980276#comment-13980276
 ] 

Arpit Agarwal commented on HDFS-5851:
-

How about 4/30 (next Wednesday) at 3pm PDT? We will setup a Google+ hangout or 
webex.

I will also be attend remotely since I am not in the Bay Area. If there is 
interest, we can host a conference room at the Hortonworks office in Palo Alto 
for folks to attend in person.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980470#comment-13980470
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Next Wednesday at 3pm works for me.  I can come by in person if you are hosting 
at the Hortonworks office.  Alternately, we can host at Cloudera, if you like.  
Thanks

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977302#comment-13977302
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Hi Colin, a call sounds like a good idea and we are open to collaborating on 
the feature implementation too. Let's have a call next week. We would like to 
get an initial proposal out by then. There will be ample time to discuss the 
approach within the community. I will propose a date and time within a couple 
of days.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977604#comment-13977604
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Thanks, Aprit.  Would be nice to get the time worked out soon so everyone can 
fit it into their schedule.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975760#comment-13975760
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Let's organize a webex about this.  It shouldn't take more than an hour of 
everyone's time.

If the community gets involved later rather than sooner, I think this may get 
unpleasant (it's always unpleasant to reject a design doc and I want to avoid 
that by sharing use cases and ideas up front).

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975777#comment-13975777
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Hey guys, how does Thursday (April 24) at 3pm-4pm sound for a webex?  I can 
organize.

I'd like to figure out:
* what are the use cases for HDFS-5851
* how is it different than HDFS-4949 (what motivates a separate implementation)
* how it fits into our long-term plans for heterogeneous storage

If we have time we can brainstorm about implementation (Andrew and I have 
actually thought about some ways of extending HDFS-4949 recently, so we'd like 
to share some of those ideas with the community).

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974472#comment-13974472
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Hi Andrew,

bq. There are also plans to move towards sub-block caching. Whole-block caching 
is wasteful for columnar formats like ORC and Parquet. With sub-block caching, 
automatic cache replacement looks a lot more attractive (another planned 
feature). These are both things we can support with HDFS-4949's infrastructure. 
I'm not sure about with HSM.
Does CCM support block-level caching or is it just _cache all blocks in a file_?

bq. Anyway, either pulling this towards HDFS-4949 or vice versa, we should 
figure out these details before moving ahead. I'll echo Colin's desire for a 
meeting to discuss this. We're willing to host at our Palo Alto office.
Thanks for the offer! We are discussing use cases and proposed API at high 
level. I am not sure there is much overlap between CCM and our work and I 
expect them to solve different use cases. However I am also open to discussing 
how and whether we can align them more closely once we have shared our initial 
proposal.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-18 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974489#comment-13974489
 ] 

Andrew Wang commented on HDFS-5851:
---

bq. Does CCM support block-level caching or is it just cache all blocks in a 
file?

Right now, the user APIs are all in terms of files, but the backend could do 
single blocks pretty easily. We didn't add block-level cache directives because 
we feel automatic cache eviction is a better solution (and could operate at the 
block or subblock level).

Looking forward to the design doc! Feel free to ping us even with preliminary 
usecases/ideas.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-17 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973379#comment-13973379
 ] 

Arpit Agarwal commented on HDFS-5851:
-

Hi Eric,

Unlike Tachyon we won't deal with data regeneration or checkpointing, leaving 
it to the application. We are still discussing use cases and this task got 
pushed out due to the 2.4 release.
* No durability or replication guarantees.
* Lost files/blocks will be discarded.
* The application is responsible for timely checkpointing i.e. moving blocks to 
persistent storage.




 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973462#comment-13973462
 ] 

Andrew Wang commented on HDFS-5851:
---

I'd really like to integrate this with HDFS-4949 where possible. One concern is 
that we should avoid having another pool of memory carved off from the cluster. 
HDFS-4949's cache pools were designed to eventually integrate with YARN, but 
this might introduce another separate pool for a memory quota, putting us back 
in the same place.

There are also plans to move towards sub-block caching. Whole-block caching is 
wasteful for columnar formats like ORC and Parquet. With sub-block caching, 
automatic cache replacement looks a lot more attractive (another planned 
feature). These are both things we can support with HDFS-4949's infrastructure. 
I'm not sure about with HSM.

It'd also be nice if apps could ZCR these memory-only replicas, ideally reusing 
the existing auto-ZCR infrastructure.

Anyway, either pulling this towards HDFS-4949 or vice versa, we should figure 
out these details before moving ahead. I'll echo Colin's desire for a meeting 
to discuss this. We're willing to host at our Palo Alto office.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-16 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972103#comment-13972103
 ] 

eric baldeschwieler commented on HDFS-5851:
---

A very interesting design space!

How does this relate to Tachyon's design goals?
What is the target use case?  
What durability / replication guarantees are you providing?
What happens when RAM is exhausted?

Assuming low durability, what happens when a file or block is lost?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-01-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887111#comment-13887111
 ] 

Arpit Agarwal commented on HDFS-5851:
-

I have not given much thought to the specifics except that it would fit within 
the Heterogeneous Storage framework.

Spilling writes is an interesting idea. It could be done with extensions to 
_Storage Preferences_. Or we don't spill writes silently and limit memory 
consumption with the quota extensions we described in HDFS-2832. DFSClient or 
the app could handle the failure.

Do you see overlap with your CCM work?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-01-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887408#comment-13887408
 ] 

Colin Patrick McCabe commented on HDFS-5851:


The HDFS-4949 work was centered around caching small, often-used files that 
were already stored durably to disk.  If we supported a temporary / 
non-durable storage tier, there would be some overlap with internal 
implementation, but probably not much with interface.  We should probably have 
a conference call about this at some point.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-01-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886212#comment-13886212
 ] 

Colin Patrick McCabe commented on HDFS-5851:


Hi Arpit,

I don't know if you were present for some of the discussions around in-memory 
caching and HDFS-4949.  See 
https://issues.apache.org/jira/browse/HDFS-4949?focusedCommentId=13707389page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13707389
  for some discussion around this.

In the past, we've talked about having a transient tier for files that we 
write, but don't necessarily want to put on-disk.  I think many applications 
would choose to write to a tier that would put stuff into memory if space was 
available, but if not, would spill it to disk.  It's crucial to implement 
spilling, though.  Otherwise, we make the applications worry about how much 
memory is left on the DataNode, which I think would lead to limited adoption.  
In this sense, memory gets used as a temporary area during a job, not so much a 
storage area (at least that's how I look at it.)  Does this line up with your 
thinking in this area?

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)