Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-08 Thread Vlastimil Krejcir
   Thanks for reply, however, file systems and sik capacity is not the 
touble. My question is pointing directly to the DSpace.

   Regards,

   Vlastik

On Thu, 4 Apr 2013, Hilton Gibson wrote:

 Perhaps this will help you start the
 evaluation: http://en.wikipedia.org/wiki/Comparison_of_file_systemsThe main
 capacity consideration for you would be storage on a suitable platform and
 therefore file system capacity is paramount.
 Alsosee: 
 http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Planning#Prior
 ity_2_-_Digital_asset_storage
 
 Cheers
 
 hg
 
 
 On 4 April 2013 15:44, Vlastimil Krejcir krej...@ics.muni.cz wrote:
      Hello all,

      I have been recently ask the question on DSpace scalability -
   assume the
   project:

   16 millions of items (bistreams size about 230 TB) increasing by
   3
   millions items (86 TB) per year

      Is DSpace able to handle this? My answer was I don't know. Is
   anyone
   working with such big loads of data? What is your opinion?

      Regards,

      Vlastik
 
 ---
   -
   Vlastimil Krejčíř
   Library and Information Centre, Institute of Computer Science
   Masaryk University, Brno, Czech Republic
   Email: krejcir (at) ics (dot) muni (dot) cz
   Phone: +420 549 49 3872
   ICQ: 163963217
   Jabber: kre...@jabber.org
 ---
   -
 
 ---
   ---
   Minimize network downtime and maximize team effectiveness.
   Reduce network management and security costs.Learn how to hire
   the most talented Cisco Certified professionals. Visit the
   Employer Resources Portal
   http://www.cisco.com/web/learning/employer_resources/index.html
   ___
   Dspace-general mailing list
   Dspace-general@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/dspace-general
 
 
 
 
 --
 Hilton Gibson
 Systems Administrator
 JS Gericke Library
 Room 1025D
 Stellenbosch University
 Private Bag X5036
 Stellenbosch
 7599
 South Africa
 
 Tel: +27 21 808 4100 | Cell: +27 84 646 4758
 http://library.sun.ac.za
 http://scholar.sun.ac.za
 http://ar1.sun.ac.za
 http://aj1.sun.ac.za
 


--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general


Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-08 Thread Tim Donohue
Hi Vlastik,

Unfortunately, as far as I'm aware there are no DSpace installations 
with many TBs worth of data. (If anyone out there is running DSpace with 
large amounts of data, we'd definitely love to hear from your experiences!)

I'd hope that DSpace could scale to that level. But, to be completely 
honest, we've never had anyone attempt it. However, should you have the 
resources to do this sort of scalability testing, we'd definitely 
appreciate feedback on any issues you run into (if any).

We do our best to ensure that DSpace is scalable. But, as we are a team 
of volunteers, we don't always have the resources to do extensively 
large scalability testing (and therefore, we are forced to depend on the 
community to help report such issues to us). However, we'd do our best 
to help resolve any issues you'd encounter -- we've worked with others 
in the past when they've noticed scalability or memory leak issues in 
DSpace.

If you were to encounter issues, it'd likely be memory related issues. 
In recent releases we've done some work to plug some longer standing 
memory leaks. But, I cannot guarantee we've located them all. Again 
though, this is something we'd love feedback on -- we'd want to fix 
memory leaks as quickly as we can.

I'm not sure if that helps or not.

- Tim

On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote:
 Hello all,

 I have been recently ask the question on DSpace scalability - assume the
 project:

 16 millions of items (bistreams size about 230 TB) increasing by 3
 millions items (86 TB) per year

 Is DSpace able to handle this? My answer was I don't know. Is anyone
 working with such big loads of data? What is your opinion?

 Regards,

 Vlastik

 
 Vlastimil Krejčíř
 Library and Information Centre, Institute of Computer Science
 Masaryk University, Brno, Czech Republic
 Email: krejcir (at) ics (dot) muni (dot) cz
 Phone: +420 549 49 3872
 ICQ: 163963217
 Jabber: kre...@jabber.org
 

 --
 Minimize network downtime and maximize team effectiveness.
 Reduce network management and security costs.Learn how to hire
 the most talented Cisco Certified professionals. Visit the
 Employer Resources Portal
 http://www.cisco.com/web/learning/employer_resources/index.html
 ___
 Dspace-general mailing list
 Dspace-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-general


--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general


Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-08 Thread Tim Donohue
Hi Vlastik,

This had slipped my mind, but there was some scalability testing by U of 
Cambridge in 2010. They had tested with DSpace 1.6.2. At the time they 
ran into scalability/memory issues, when loading DSpace 1.6.2 with 12 TB 
worth of data

http://dspace.2283337.n4.nabble.com/Dspace-tech-Scalability-issues-report-DSpace-Cambridge-td3287701.html

However, based on Cambridge's reported issues, we performed many 
scalability/memory usage enhancements in DSpace 1.7.0 (and Cambridge had 
verified those resolved their issues -- cannot seem to track down that 
email though). More notes on the performance improvements in 1.7.0 are 
on our 1.7.0 Release notes:
https://wiki.duraspace.org/display/DSPACE/DSpace+Release+1.7.0+Notes

Since then, we've kept a closer watch for possible memory leaks. I 
cannot guarantee we've caught all of them, but if any are noticed, we'd 
gladly try to resolve them ASAP.

U of Cambridge is one of the larger (known) DSpace instances. I'm not 
sure how much data they currently have. But, at least in 2010 they said 
they had around 12TB (200K items).

- Tim


On 4/8/2013 9:49 AM, Tim Donohue wrote:
 Hi Vlastik,

 Unfortunately, as far as I'm aware there are no DSpace installations
 with many TBs worth of data. (If anyone out there is running DSpace with
 large amounts of data, we'd definitely love to hear from your experiences!)

 I'd hope that DSpace could scale to that level. But, to be completely
 honest, we've never had anyone attempt it. However, should you have the
 resources to do this sort of scalability testing, we'd definitely
 appreciate feedback on any issues you run into (if any).

 We do our best to ensure that DSpace is scalable. But, as we are a team
 of volunteers, we don't always have the resources to do extensively
 large scalability testing (and therefore, we are forced to depend on the
 community to help report such issues to us). However, we'd do our best
 to help resolve any issues you'd encounter -- we've worked with others
 in the past when they've noticed scalability or memory leak issues in
 DSpace.

 If you were to encounter issues, it'd likely be memory related issues.
 In recent releases we've done some work to plug some longer standing
 memory leaks. But, I cannot guarantee we've located them all. Again
 though, this is something we'd love feedback on -- we'd want to fix
 memory leaks as quickly as we can.

 I'm not sure if that helps or not.

 - Tim

 On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote:
 Hello all,

 I have been recently ask the question on DSpace scalability -
 assume the
 project:

 16 millions of items (bistreams size about 230 TB) increasing by 3
 millions items (86 TB) per year

 Is DSpace able to handle this? My answer was I don't know. Is anyone
 working with such big loads of data? What is your opinion?

 Regards,

 Vlastik

 

 Vlastimil Krejčíř
 Library and Information Centre, Institute of Computer Science
 Masaryk University, Brno, Czech Republic
 Email: krejcir (at) ics (dot) muni (dot) cz
 Phone: +420 549 49 3872
 ICQ: 163963217
 Jabber: kre...@jabber.org
 


 --

 Minimize network downtime and maximize team effectiveness.
 Reduce network management and security costs.Learn how to hire
 the most talented Cisco Certified professionals. Visit the
 Employer Resources Portal
 http://www.cisco.com/web/learning/employer_resources/index.html
 ___
 Dspace-general mailing list
 Dspace-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-general


--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general


Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-08 Thread LEWIS Stuart
Hi Vlastik,

The extent to which DSpace will scale will also depend a lot on the usage
of the repository.  For example if it is to be used as a management tool
with very little access, then it will scale further than if you plan on
having many simultaneous users all interacting with the contents.

There are also options for 'scaling out' the repository, depending on your
planned usage patterns.  For example if there would be a lot of 'reads' of
items, then you can install multiple front end servers, and replicate the
solr search indexes.  One front end server could be configured to allow
logins, whilst all the others have logins disabled, and are restricted to
read-only operations.  Other parts of the infrastructure, such as the
database (postgres / oracle) will also have their own methods of being
scaled up and out.

If you do decide to use DSpace in this fashion, or indeed any system, you
will probably need to invest a reasonable amount of time in tuning the
system for performance.  If you learn any lessons from this, the DSpace
community would benefit greatly if you were happy to share them.

Best wishes,


Stuart Lewis
Head of Research and Learning Services
Deputy Director Library  University Collections, Information Services
University of Edinburgh
stuart.le...@ed.ac.uk





On 08/04/2013 16:24, Tim Donohue tdono...@duraspace.org wrote:

Hi Vlastik,

This had slipped my mind, but there was some scalability testing by U of
Cambridge in 2010. They had tested with DSpace 1.6.2. At the time they
ran into scalability/memory issues, when loading DSpace 1.6.2 with 12 TB
worth of data

http://dspace.2283337.n4.nabble.com/Dspace-tech-Scalability-issues-report-D
Space-Cambridge-td3287701.html

However, based on Cambridge's reported issues, we performed many
scalability/memory usage enhancements in DSpace 1.7.0 (and Cambridge had
verified those resolved their issues -- cannot seem to track down that
email though). More notes on the performance improvements in 1.7.0 are
on our 1.7.0 Release notes:
https://wiki.duraspace.org/display/DSPACE/DSpace+Release+1.7.0+Notes

Since then, we've kept a closer watch for possible memory leaks. I
cannot guarantee we've caught all of them, but if any are noticed, we'd
gladly try to resolve them ASAP.

U of Cambridge is one of the larger (known) DSpace instances. I'm not
sure how much data they currently have. But, at least in 2010 they said
they had around 12TB (200K items).

- Tim


On 4/8/2013 9:49 AM, Tim Donohue wrote:
 Hi Vlastik,

 Unfortunately, as far as I'm aware there are no DSpace installations
 with many TBs worth of data. (If anyone out there is running DSpace with
 large amounts of data, we'd definitely love to hear from your
experiences!)

 I'd hope that DSpace could scale to that level. But, to be completely
 honest, we've never had anyone attempt it. However, should you have the
 resources to do this sort of scalability testing, we'd definitely
 appreciate feedback on any issues you run into (if any).

 We do our best to ensure that DSpace is scalable. But, as we are a team
 of volunteers, we don't always have the resources to do extensively
 large scalability testing (and therefore, we are forced to depend on the
 community to help report such issues to us). However, we'd do our best
 to help resolve any issues you'd encounter -- we've worked with others
 in the past when they've noticed scalability or memory leak issues in
 DSpace.

 If you were to encounter issues, it'd likely be memory related issues.
 In recent releases we've done some work to plug some longer standing
 memory leaks. But, I cannot guarantee we've located them all. Again
 though, this is something we'd love feedback on -- we'd want to fix
 memory leaks as quickly as we can.

 I'm not sure if that helps or not.

 - Tim

 On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote:
 Hello all,

 I have been recently ask the question on DSpace scalability -
 assume the
 project:

 16 millions of items (bistreams size about 230 TB) increasing by 3
 millions items (86 TB) per year

 Is DSpace able to handle this? My answer was I don't know. Is anyone
 working with such big loads of data? What is your opinion?

 Regards,

 Vlastik

 
-
---

 Vlastimil Krejčíř
 Library and Information Centre, Institute of Computer Science
 Masaryk University, Brno, Czech Republic
 Email: krejcir (at) ics (dot) muni (dot) cz
 Phone: +420 549 49 3872
 ICQ: 163963217
 Jabber: kre...@jabber.org
 
-
---


 
-
-

 Minimize network downtime and maximize team effectiveness.
 Reduce network management and security costs.Learn how to hire
 the most talented Cisco Certified professionals. Visit the
 Employer Resources Portal
 http://www.cisco.com/web/learning/employer_resources/index.html
 

[Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-04 Thread Vlastimil Krejcir
   Hello all,

   I have been recently ask the question on DSpace scalability - assume the 
project:

16 millions of items (bistreams size about 230 TB) increasing by 3 
millions items (86 TB) per year

   Is DSpace able to handle this? My answer was I don't know. Is anyone 
working with such big loads of data? What is your opinion?

   Regards,

   Vlastik


Vlastimil Krejčíř
Library and Information Centre, Institute of Computer Science
Masaryk University, Brno, Czech Republic
Email: krejcir (at) ics (dot) muni (dot) cz
Phone: +420 549 49 3872
ICQ: 163963217
Jabber: kre...@jabber.org


--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general


Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)

2013-04-04 Thread Hilton Gibson
Perhaps this will help you start the evaluation:
http://en.wikipedia.org/wiki/Comparison_of_file_systems
The main capacity consideration for you would be storage on a suitable
platform and therefore file system capacity is paramount.
Also see:
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Planning#Priority_2_-_Digital_asset_storage

Cheers

hg


On 4 April 2013 15:44, Vlastimil Krejcir krej...@ics.muni.cz wrote:

Hello all,

I have been recently ask the question on DSpace scalability - assume the
 project:

 16 millions of items (bistreams size about 230 TB) increasing by 3
 millions items (86 TB) per year

Is DSpace able to handle this? My answer was I don't know. Is anyone
 working with such big loads of data? What is your opinion?

Regards,

Vlastik


 
 Vlastimil Krejčíř
 Library and Information Centre, Institute of Computer Science
 Masaryk University, Brno, Czech Republic
 Email: krejcir (at) ics (dot) muni (dot) cz
 Phone: +420 549 49 3872
 ICQ: 163963217
 Jabber: kre...@jabber.org

 


 --
 Minimize network downtime and maximize team effectiveness.
 Reduce network management and security costs.Learn how to hire
 the most talented Cisco Certified professionals. Visit the
 Employer Resources Portal
 http://www.cisco.com/web/learning/employer_resources/index.html
 ___
 Dspace-general mailing list
 Dspace-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-general




-- 
*Hilton Gibson*
Systems Administrator
JS Gericke Library
Room 1025D
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758
http://library.sun.ac.za
http://scholar.sun.ac.za
http://ar1.sun.ac.za
http://aj1.sun.ac.za
--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html___
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general