Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)
Thanks for reply, however, file systems and sik capacity is not the touble. My question is pointing directly to the DSpace. Regards, Vlastik On Thu, 4 Apr 2013, Hilton Gibson wrote: Perhaps this will help you start the evaluation: http://en.wikipedia.org/wiki/Comparison_of_file_systemsThe main capacity consideration for you would be storage on a suitable platform and therefore file system capacity is paramount. Alsosee: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Planning#Prior ity_2_-_Digital_asset_storage Cheers hg On 4 April 2013 15:44, Vlastimil Krejcir krej...@ics.muni.cz wrote: Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik --- - Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org --- - --- --- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general -- Hilton Gibson Systems Administrator JS Gericke Library Room 1025D Stellenbosch University Private Bag X5036 Stellenbosch 7599 South Africa Tel: +27 21 808 4100 | Cell: +27 84 646 4758 http://library.sun.ac.za http://scholar.sun.ac.za http://ar1.sun.ac.za http://aj1.sun.ac.za -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general
Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)
Hi Vlastik, Unfortunately, as far as I'm aware there are no DSpace installations with many TBs worth of data. (If anyone out there is running DSpace with large amounts of data, we'd definitely love to hear from your experiences!) I'd hope that DSpace could scale to that level. But, to be completely honest, we've never had anyone attempt it. However, should you have the resources to do this sort of scalability testing, we'd definitely appreciate feedback on any issues you run into (if any). We do our best to ensure that DSpace is scalable. But, as we are a team of volunteers, we don't always have the resources to do extensively large scalability testing (and therefore, we are forced to depend on the community to help report such issues to us). However, we'd do our best to help resolve any issues you'd encounter -- we've worked with others in the past when they've noticed scalability or memory leak issues in DSpace. If you were to encounter issues, it'd likely be memory related issues. In recent releases we've done some work to plug some longer standing memory leaks. But, I cannot guarantee we've located them all. Again though, this is something we'd love feedback on -- we'd want to fix memory leaks as quickly as we can. I'm not sure if that helps or not. - Tim On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote: Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general
Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)
Hi Vlastik, This had slipped my mind, but there was some scalability testing by U of Cambridge in 2010. They had tested with DSpace 1.6.2. At the time they ran into scalability/memory issues, when loading DSpace 1.6.2 with 12 TB worth of data http://dspace.2283337.n4.nabble.com/Dspace-tech-Scalability-issues-report-DSpace-Cambridge-td3287701.html However, based on Cambridge's reported issues, we performed many scalability/memory usage enhancements in DSpace 1.7.0 (and Cambridge had verified those resolved their issues -- cannot seem to track down that email though). More notes on the performance improvements in 1.7.0 are on our 1.7.0 Release notes: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+1.7.0+Notes Since then, we've kept a closer watch for possible memory leaks. I cannot guarantee we've caught all of them, but if any are noticed, we'd gladly try to resolve them ASAP. U of Cambridge is one of the larger (known) DSpace instances. I'm not sure how much data they currently have. But, at least in 2010 they said they had around 12TB (200K items). - Tim On 4/8/2013 9:49 AM, Tim Donohue wrote: Hi Vlastik, Unfortunately, as far as I'm aware there are no DSpace installations with many TBs worth of data. (If anyone out there is running DSpace with large amounts of data, we'd definitely love to hear from your experiences!) I'd hope that DSpace could scale to that level. But, to be completely honest, we've never had anyone attempt it. However, should you have the resources to do this sort of scalability testing, we'd definitely appreciate feedback on any issues you run into (if any). We do our best to ensure that DSpace is scalable. But, as we are a team of volunteers, we don't always have the resources to do extensively large scalability testing (and therefore, we are forced to depend on the community to help report such issues to us). However, we'd do our best to help resolve any issues you'd encounter -- we've worked with others in the past when they've noticed scalability or memory leak issues in DSpace. If you were to encounter issues, it'd likely be memory related issues. In recent releases we've done some work to plug some longer standing memory leaks. But, I cannot guarantee we've located them all. Again though, this is something we'd love feedback on -- we'd want to fix memory leaks as quickly as we can. I'm not sure if that helps or not. - Tim On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote: Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general
Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)
Hi Vlastik, The extent to which DSpace will scale will also depend a lot on the usage of the repository. For example if it is to be used as a management tool with very little access, then it will scale further than if you plan on having many simultaneous users all interacting with the contents. There are also options for 'scaling out' the repository, depending on your planned usage patterns. For example if there would be a lot of 'reads' of items, then you can install multiple front end servers, and replicate the solr search indexes. One front end server could be configured to allow logins, whilst all the others have logins disabled, and are restricted to read-only operations. Other parts of the infrastructure, such as the database (postgres / oracle) will also have their own methods of being scaled up and out. If you do decide to use DSpace in this fashion, or indeed any system, you will probably need to invest a reasonable amount of time in tuning the system for performance. If you learn any lessons from this, the DSpace community would benefit greatly if you were happy to share them. Best wishes, Stuart Lewis Head of Research and Learning Services Deputy Director Library University Collections, Information Services University of Edinburgh stuart.le...@ed.ac.uk On 08/04/2013 16:24, Tim Donohue tdono...@duraspace.org wrote: Hi Vlastik, This had slipped my mind, but there was some scalability testing by U of Cambridge in 2010. They had tested with DSpace 1.6.2. At the time they ran into scalability/memory issues, when loading DSpace 1.6.2 with 12 TB worth of data http://dspace.2283337.n4.nabble.com/Dspace-tech-Scalability-issues-report-D Space-Cambridge-td3287701.html However, based on Cambridge's reported issues, we performed many scalability/memory usage enhancements in DSpace 1.7.0 (and Cambridge had verified those resolved their issues -- cannot seem to track down that email though). More notes on the performance improvements in 1.7.0 are on our 1.7.0 Release notes: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+1.7.0+Notes Since then, we've kept a closer watch for possible memory leaks. I cannot guarantee we've caught all of them, but if any are noticed, we'd gladly try to resolve them ASAP. U of Cambridge is one of the larger (known) DSpace instances. I'm not sure how much data they currently have. But, at least in 2010 they said they had around 12TB (200K items). - Tim On 4/8/2013 9:49 AM, Tim Donohue wrote: Hi Vlastik, Unfortunately, as far as I'm aware there are no DSpace installations with many TBs worth of data. (If anyone out there is running DSpace with large amounts of data, we'd definitely love to hear from your experiences!) I'd hope that DSpace could scale to that level. But, to be completely honest, we've never had anyone attempt it. However, should you have the resources to do this sort of scalability testing, we'd definitely appreciate feedback on any issues you run into (if any). We do our best to ensure that DSpace is scalable. But, as we are a team of volunteers, we don't always have the resources to do extensively large scalability testing (and therefore, we are forced to depend on the community to help report such issues to us). However, we'd do our best to help resolve any issues you'd encounter -- we've worked with others in the past when they've noticed scalability or memory leak issues in DSpace. If you were to encounter issues, it'd likely be memory related issues. In recent releases we've done some work to plug some longer standing memory leaks. But, I cannot guarantee we've located them all. Again though, this is something we'd love feedback on -- we'd want to fix memory leaks as quickly as we can. I'm not sure if that helps or not. - Tim On 4/4/2013 8:44 AM, Vlastimil Krejcir wrote: Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik - --- Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org - --- - - Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html
[Dspace-general] DSpace scalability (tens of hundreds TBs)
Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general
Re: [Dspace-general] DSpace scalability (tens of hundreds TBs)
Perhaps this will help you start the evaluation: http://en.wikipedia.org/wiki/Comparison_of_file_systems The main capacity consideration for you would be storage on a suitable platform and therefore file system capacity is paramount. Also see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Planning#Priority_2_-_Digital_asset_storage Cheers hg On 4 April 2013 15:44, Vlastimil Krejcir krej...@ics.muni.cz wrote: Hello all, I have been recently ask the question on DSpace scalability - assume the project: 16 millions of items (bistreams size about 230 TB) increasing by 3 millions items (86 TB) per year Is DSpace able to handle this? My answer was I don't know. Is anyone working with such big loads of data? What is your opinion? Regards, Vlastik Vlastimil Krejčíř Library and Information Centre, Institute of Computer Science Masaryk University, Brno, Czech Republic Email: krejcir (at) ics (dot) muni (dot) cz Phone: +420 549 49 3872 ICQ: 163963217 Jabber: kre...@jabber.org -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general -- *Hilton Gibson* Systems Administrator JS Gericke Library Room 1025D Stellenbosch University Private Bag X5036 Stellenbosch 7599 South Africa Tel: +27 21 808 4100 | Cell: +27 84 646 4758 http://library.sun.ac.za http://scholar.sun.ac.za http://ar1.sun.ac.za http://aj1.sun.ac.za -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html___ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general