[Dspace-tech] Assetstore clean-up? Advice/reassurance/gotchas?

2013-03-12 Thread Michael White
Hi,

Running DSpace v1.6.2 (JSPUI).

Our DSpace server ran out of disk space unexpectedly (as we thought we had 
allowed for substantial growth) and I have now determined it is because of a 
huge amount of orphaned content in our Assetstore.

Looking at our bitstream table, there are 38,488 rows but 22,287 of them are 
marked as deleted=true.

I have determined that this orphaned content is a result of the integration 
between our CRIS system (Converis) and DSpace - each time a Publication record 
(that has full text attached and that has already been exported to DSpace) is 
updated in our CRIS it updates the corresponding record in DSpace - only it 
does this by deleting the existing record and creating a new one with the same 
handle, which results in the files belonging to the deleted record lying around 
in the Assetstore as orphaned bitstreams.

So, I've been reading up in the manual, and I note the existence of the 
/dspace/bin/cleanup script which I believe will resolve this problem by 
deleting the old rows from the bitstream table and the corresponding files in 
the assetstore (?).

However, before I run this and potentially muck everything up, I was hoping 
that someone could confirm that this script will do what I'm after, and that it 
will be able to handle such a large cleanup? And if there are any other 
gotchas or advice relating to this that anyone out there can offer?

I'll backup the assetstore and DB before doing anything, but any advice or 
reassurance would be most welcome as I'm obviously nervous about running 
something that could, potentially, do more harm than good :-).

Cheers,

Mike

P.S. If there is anyone else out there with an integration between Converis and 
DSpace, you might also want to look into this!

Michael White 
eLearning Liaison and Development (eLD)
Information Services
S8, Library
University of Stirling 
Stirling SCOTLAND 
FK9 4LA 
Email: michael.wh...@stir.ac.uk 
Tel: +44 (0) 1786 466877 
Fax: +44 (0) 1786 466880
http://www.stir.ac.uk/is/staff/about/teams/aldt/#eld



-- 
The University of Stirling is ranked in the top 50 in the world in The Times 
Higher Education 100 Under 50 table, which ranks the world's best 100 
universities under 50 years old.
The University of Stirling is a charity registered in Scotland, 
 number SC 011159.


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] Assetstore clean-up? Advice/reassurance/gotchas?

2013-03-12 Thread Hilton Gibson
We ran it on: http://etd.sun.ac.za using DSpace 1.5.2 with no problems.


On 12 March 2013 15:58, Michael White michael.wh...@stir.ac.uk wrote:

 Hi,

 Running DSpace v1.6.2 (JSPUI).

 Our DSpace server ran out of disk space unexpectedly (as we thought we
 had allowed for substantial growth) and I have now determined it is because
 of a huge amount of orphaned content in our Assetstore.

 Looking at our bitstream table, there are 38,488 rows but 22,287 of them
 are marked as deleted=true.

 I have determined that this orphaned content is a result of the
 integration between our CRIS system (Converis) and DSpace - each time a
 Publication record (that has full text attached and that has already been
 exported to DSpace) is updated in our CRIS it updates the corresponding
 record in DSpace - only it does this by deleting the existing record and
 creating a new one with the same handle, which results in the files
 belonging to the deleted record lying around in the Assetstore as orphaned
 bitstreams.

 So, I've been reading up in the manual, and I note the existence of the
 /dspace/bin/cleanup script which I believe will resolve this problem by
 deleting the old rows from the bitstream table and the corresponding
 files in the assetstore (?).

 However, before I run this and potentially muck everything up, I was
 hoping that someone could confirm that this script will do what I'm after,
 and that it will be able to handle such a large cleanup? And if there are
 any other gotchas or advice relating to this that anyone out there can
 offer?

 I'll backup the assetstore and DB before doing anything, but any advice or
 reassurance would be most welcome as I'm obviously nervous about running
 something that could, potentially, do more harm than good :-).

 Cheers,

 Mike

 P.S. If there is anyone else out there with an integration between
 Converis and DSpace, you might also want to look into this!

 Michael White
 eLearning Liaison and Development (eLD)
 Information Services
 S8, Library
 University of Stirling
 Stirling SCOTLAND
 FK9 4LA
 Email: michael.wh...@stir.ac.uk
 Tel: +44 (0) 1786 466877
 Fax: +44 (0) 1786 466880
 http://www.stir.ac.uk/is/staff/about/teams/aldt/#eld



 --
 The University of Stirling is ranked in the top 50 in the world in The
 Times Higher Education 100 Under 50 table, which ranks the world's best 100
 universities under 50 years old.
 The University of Stirling is a charity registered in Scotland,
  number SC 011159.



 --
 Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
 Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the
 endpoint security space. For insight on selecting the right partner to
 tackle endpoint security challenges, access the full report.
 http://p.sf.net/sfu/symantec-dev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette




-- 
*Hilton Gibson*
Systems Administrator
JS Gericke Library
Room 1025D
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758
http://library.sun.ac.za
http://scholar.sun.ac.za
http://ar1.sun.ac.za
http://aj1.sun.ac.za
--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Assetstore clean-up? Advice/reassurance/gotchas?

2013-03-12 Thread helix84
Hi Mike,

yes, the cleanup script deletes bitstreams which are marked as deleted='t'.

I wouldn't have any worries about running it, it works just fine.

Tip: run it with the -v parameter to see which files it's currently deleting.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] Assetstore clean-up? Advice/reassurance/gotchas?

2013-03-12 Thread Michael White
Many thanks helix84 (such an enigmatic nomenclature ;-) ) and Hilton for your 
reassuring replies, and for the -v tip . . .

I now feel confident enough to give this a bash :-)

Much appreciated.

Mike

Michael White 
eLearning Liaison and Development (eLD)
Information Services
S8, Library
University of Stirling 
Stirling SCOTLAND 
FK9 4LA 
Email: michael.wh...@stir.ac.uk 
Tel: +44 (0) 1786 466877 
Fax: +44 (0) 1786 466880
http://www.stir.ac.uk/is/staff/about/teams/aldt/#eld


-Original Message-
From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84
Sent: 12 March 2013 14:41
To: Michael White
Cc: dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] Assetstore clean-up? Advice/reassurance/gotchas?

Hi Mike,

yes, the cleanup script deletes bitstreams which are marked as deleted='t'.

I wouldn't have any worries about running it, it works just fine.

Tip: run it with the -v parameter to see which files it's currently deleting.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette 
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



-- 
The University of Stirling is ranked in the top 50 in the world in The Times 
Higher Education 100 Under 50 table, which ranks the world's best 100 
universities under 50 years old.
The University of Stirling is a charity registered in Scotland, 
 number SC 011159.

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette