Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time? AND: question about regular recompiling

2011-07-29 Thread Berry, Irene (CIV)
Thank you, Peter, Stuart, Andrea, Alex -
So it sounds like it comes down to how much RAM is available on our
server. 

We're going to try 1000, and see how it goes, as we are still on a
development server for just this kind of reason.  We are looking at
correcting a very large batch of records (retrospective ETDs) before we go
live, so this is kind of exceptional..but looking forward, I can see we'll
have to be correcting/recompiling regularly.  I'm still kind of new, so if
anyone has any advice/warnings about doing regular batch corrections as a
regular thing, I'd be glad to hear it.

How do you handle updates and recompiling? Quarterly?  After hours?
Thanks everyone,
Irene Berry, MLIS
Digital Services Librarian
Dudley Knox Library, Naval Postgraduate School






 


--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread Lemann, Alexander Bernard
 -Original Message-
 From: Andrea Schweer [mailto:schw...@waikato.ac.nz]
 Sent: Wednesday, July 27, 2011 6:38 PM
 To: dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] Batch metadata corrections question: does
 anyone know why the limit is set to just 20 items at a time?
 
 Hi,
 
 On 28/07/11 10:19, Peter Dietz wrote:
  Doing batch changes with a large number of changes will keep your
  system busy, and the reindexing can take a while. I've noticed that
  when we set the limit to be really high, it appears that nothing will
  happen from the user's browser for 20+ minutes, so I've connected to
  the server from the command line, and noticed that the reindexing task
  was taking a long time, but still running. So you might be safe with
  setting this to a really high number (several thousand), you'll just
  have to have the patience to not disrupt it. But smaller / more
  manageable batch sizes will complete in a reasonable amount of time.
  With this set to 1000 or more, I'm guessing your more likely to run
  into Out-Of-Memory errors
 
 With one of 'my' repositories, when we increased the limit (to 1000 I think),
 completing the changes took so long that the Apache-Tomcat connection
 timed out. This meant that the user saw an error in their browser even
 though the changes actually went through fine. In our case we decided to
 stick with a lower limit to avoid confusion. Though 20 really does feel very
 low.

There are some reports of this working with as many as 1000 items.
http://dspace.2283337.n4.nabble.com/Excel-to-Dspace-td3310564.html
Though it seems to be dependent on the amount of RAM available on your server.

Relatedly, does anyone know how safe it is to test higher limit values?  I 
looked at the code to determine whether these batch edits occur within a single 
database transaction, but I didn't see any evidence either way.

Thanks,
Alex Lemann


--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread helix84
On Thu, Jul 28, 2011 at 14:00, Lemann, Alexander Bernard
ablem...@bsu.edu wrote:
 Relatedly, does anyone know how safe it is to test higher limit values?  I 
 looked at the code to determine whether these batch edits occur within a 
 single database transaction, but I didn't see any evidence either way.

This shouldn't be any problem at the database layer. This limit and
timeout problems concern only the web UI. I upped the limit to 2000
because I wanted to look at the diff in a web browser and it wasn't
a problem - but I only did the preview, not the actual import step. I
routinely run my imports of 2000 items via the command line interface.
This dspace machine has only 1 GB of memory dedicated to it.

Regards,
~~helix84

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread Heitor Barbieri

helix84 

Usually, how much time do you wait to import 2000 items via the command line 
interface? 

Regards 
Heitor 

De: helix84 heli...@centrum.sk 
Para: Alexander Bernard Lemann ablem...@bsu.edu 
Cc: dspace-tech@lists.sourceforge.net 
Enviadas: Quinta-feira, 28 de Julho de 2011 9:17:16 
Assunto: Re: [Dspace-tech] Batch metadata corrections question: does anyone 
know why the limit is set to just 20 items at a time? 

On Thu, Jul 28, 2011 at 14:00, Lemann, Alexander Bernard 
ablem...@bsu.edu wrote: 
 Relatedly, does anyone know how safe it is to test higher limit values? I 
 looked at the code to determine whether these batch edits occur within a 
 single database transaction, but I didn't see any evidence either way. 

This shouldn't be any problem at the database layer. This limit and 
timeout problems concern only the web UI. I upped the limit to 2000 
because I wanted to look at the diff in a web browser and it wasn't 
a problem - but I only did the preview, not the actual import step. I 
routinely run my imports of 2000 items via the command line interface. 
This dspace machine has only 1 GB of memory dedicated to it. 

Regards, 
~~helix84 

-- 
Got Input? Slashdot Needs You. 
Take our quick survey online. Come on, we don't ask for help often. 
Plus, you'll get a chance to win $100 to spend on ThinkGeek. 
http://p.sf.net/sfu/slashdot-survey 
___ 
DSpace-tech mailing list 
DSpace-tech@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/dspace-tech 
--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread helix84
On Thu, Jul 28, 2011 at 15:37, Heitor Barbieri
heitor.barbi...@bireme.org wrote:
 Usually, how much time do you wait to import 2000 items via the command line
 interface?

Not more than a minute.

Regards,
~~helix84

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread helix84
I think the most error-prone part will be displaying the diff (the
list of all changes) via HTTP. This is what is most likely to cause
long waiting times and timeouts, because a lot of HTML will be
generated. I had no problem with command line imports - you can even
halve the time it takes if you use the -s option to suppress
confirmation of changes, but I do not recommend this.

Regards,
~~helix84

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-28 Thread helix84
On Thu, Jul 28, 2011 at 15:50, helix84 heli...@centrum.sk wrote:
 I think the most error-prone part will be displaying the diff (the
 list of all changes) via HTTP. This is what is most likely to cause
 long waiting times and timeouts, because a lot of HTML will be
 generated.

I would suggest the developers to improve this by paging the diff in
UI. This should be practically as fast as command line import, it
would allow those who want to review the changes page by page and it
would allow those who don't want to review it to just confirm the
import.

Regards,
~~helix84

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-27 Thread Stuart Lewis
Hi Irene,

 We're experimenting with making batch corrections to metadata using the 
 Import Metadata feature in the jsp.  We'd like to raise the limit on the 
 number of items that may be changed at a time.
 
 I can see the file: 
 http://scm.dspace.org/svn/repo/dspace/tags/dspace-1.6.2/dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/servlet/MetadataImportServlet.java
 
 Where it says this:
 // Set the lmimt to the number of items that may be changed in one go, 
 default to 20
 limit = ConfigurationManager.getIntProperty(bulkedit.gui-item-limit, 20);
 log.debug(Setting bulk edit limit to  + limit +  items);

Correct - adjust the setting 'bulkedit.gui-item-limit' in dspace.cfg.

 …We’d like to up it from 20 to maybe 500 as an experiment -- but potentially 
 much higher.  Does anyone know if that's a really bad idea?  We just don’t 
 know what the consequence is of making this limit larger, but 20 seems way 
 too low for a typical batch of changes.

We set it to 20 initially so that there is a low risk of anything going wrong.  
If you are are happy with the tool, and the way it works, then it is fine to 
set it higher (but of course it carries the risk of potentially wrecking 500 
records at once instead of 20!).  

We don't recommend setting it too high as the changes are all made in one hit, 
and this can cause timeouts or memory problems on the server.  500 should be 
fine.  We've heard of problems when people start going over 1 or 2 thousand 
records at a time.

Thanks,


Stuart Lewis
Digital Development Manager
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: +64 (0)9 373 7599 x81928


--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-27 Thread Peter Dietz
Hi Irene,

We've wondered that too at my university, Ohio State University, so we've
upped our setting to 600 which we feel is safe, but users typically do
smaller batches.

I'm guessing the too-low-for-practical-use limit of 20 is to be overly
conservative by default, so that there is no risk of data loss or
interruption.

Doing batch changes with a large number of changes will keep your system
busy, and the reindexing can take a while. I've noticed that when we set the
limit to be really high, it appears that nothing will happen from the user's
browser for 20+ minutes, so I've connected to the server from the command
line, and noticed that the reindexing task was taking a long time, but still
running. So you might be safe with setting this to a really high number
(several thousand), you'll just have to have the patience to not disrupt it.
But smaller / more manageable batch sizes will complete in a reasonable
amount of time. With this set to 1000 or more, I'm guessing your more likely
to run into Out-Of-Memory errors

There is a note in:
https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-BatchMetadataEditing
that
cautions against doing too large of batch edits.


Peter Dietz



On Wed, Jul 27, 2011 at 5:25 PM, Berry, Irene (CIV) icbe...@nps.edu wrote:

   Hello,
   We're experimenting with making batch corrections to metadata using the
 Import Metadata feature in the jsp.  We'd like to raise the limit on the
 number of items that may be changed at a time.

  I can see the file:
 http://scm.dspace.org/svn/repo/dspace/tags/dspace-1.6.2/dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/servlet/MetadataImportServlet.java

  Where it says this:
 // Set the lmimt to the number of items that may be changed in one go,
 default to 20
 limit = ConfigurationManager.getIntProperty(bulkedit.gui-item-limit, 20);
 log.debug(Setting bulk edit limit to  + limit +  items);


  …We’d like to up it from 20 to maybe 500 as an experiment -- but
 potentially much higher.  Does anyone know if that's a really bad idea?  We
 just don’t know what the consequence is of making this limit larger, but 20
 seems way too low for a typical batch of changes.

  Thanks,

  Irene Berry, MLIS
 Digital Services Librarian
 Dudley Knox Library, Naval Postgraduate School



 --
 Got Input?   Slashdot Needs You.
 Take our quick survey online.  Come on, we don't ask for help often.
 Plus, you'll get a chance to win $100 to spend on ThinkGeek.
 http://p.sf.net/sfu/slashdot-survey
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?

2011-07-27 Thread Andrea Schweer
Hi,

On 28/07/11 10:19, Peter Dietz wrote:
 Doing batch changes with a large number of changes will keep your
 system busy, and the reindexing can take a while. I've noticed that
 when we set the limit to be really high, it appears that nothing will
 happen from the user's browser for 20+ minutes, so I've connected to
 the server from the command line, and noticed that the reindexing
 task was taking a long time, but still running. So you might be safe
 with setting this to a really high number (several thousand), you'll
 just have to have the patience to not disrupt it. But smaller / more
 manageable batch sizes will complete in a reasonable amount of time.
 With this set to 1000 or more, I'm guessing your more likely to run
 into Out-Of-Memory errors

With one of 'my' repositories, when we increased the limit (to 1000 I
think), completing the changes took so long that the Apache-Tomcat
connection timed out. This meant that the user saw an error in their
browser even though the changes actually went through fine. In our case
we decided to stick with a lower limit to avoid confusion. Though 20
really does feel very low.

cheers,
Andrea

-- 
Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech