Hi Ben,

In response to your questions (and apologies if I'm teaching anyone on this 
list to suck eggs):-

- What do you feel you might gain by placing 500Gb+ files into a repository, 
compared with having them in an addressable filestore?
My understanding is that many funding bodies are now requiring that research 
data be made available along with the academic paper allowing for people to 
investigate/reproduce published research. The gain here in having the research 
data  in a repository like Dspace would I guess makes it possible for people to 
easily find the data alongside the papers. That's not to say that the actual 
data has to be in the traditional Dspace asset store. From what I've read so 
far on Dspace, the data can also be "referenced", as long as the reference is 
to a file store accessible to Dspace.  Further responses to this thread have 
talked about using other ideas like CKAN etc. - of which I know near zero so I 
will need to investigate further.  At Exeter are very much geared up for using 
Dspace since we already have a number of Dspace repositories running already, 
so whatever solution we end up with, I think currently it will involve Dspace.
The other gain of course is the proper and managed curation of research data 
through a work flow process rather the data ending up on a DVD on a professors 
shelf.

- Have people been able to download files of that size from DSpace, Fedora or 
EPrints?
No idea!! But I take the point and it's one I've already alluded to  - it's all 
very well getting this stuff into or referenced from Dspace but how will Joe 
researcher down load it easily. I suspect there is a requirement for providing 
other mechanisms for download from or via the Dspace repository rather than a 
normal web interface - isn't this where SWORD comes in ?

- Has the repository been allocated space on a suitable filesystem? XFS, EBS, 
Thumper or similar?
Yes I think so. We have a EMC Atmos providing currently 860TB of raw storage. 
This is basically object cloud storage in a similar vein to Amazon S3, but it 
does also provide NFS/CIFS access via an what EMC call an IFS server. We are 
currently running a DSpace asset store on our Atmos using an IFS server. Atmos 
also has a REST based interface and also has as an Amazon S3 Proxy (i.e. making 
it work with many Amazon S3 clients) in development and we have been beta 
testing this. We are also hoping to use the Atmos for backup of live research 
data. Atmos is good for archiving and serving up objects to the web and the 
sort of things people use S3 for but it's not tier 1 storage - it's not 
designed to be. The fit as a DSpace asset store seems to be a good one. Caveat 
- still remains to be proved in production DSpace use.

- Once the file is ingested into DSpace or Fedora for example, is there any 
other route to retrieve this, aside from HTTP? (Coding your own servlet/addon 
is not a real answer to this.) Is it easily accessible via Grid-FTP or HPN-SSH 
for example?
Not yet for us,  but I agree that another route such as HPN-SSH will be needed 
for large data sets.

So, in short, weigh up the benefits against the downsides and not in 
hypotheticals. Actually do it, and get real researchers to try and use it. 
You'll soon have a metric to show what is useful and what isn't.
That's what we are aiming to do as part of the JISC funded OpenExeter project - 
we will be piloting with researchers and using this to develop procedures and 
workflow etc.

Many thanks for all comments and emails so far on this thread - very useful and 
interesting.

Best regards,

Pete

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
sword-app-tech mailing list
sword-app-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sword-app-tech

Reply via email to