Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time? AND: question about regular recompiling
Thank you, Peter, Stuart, Andrea, Alex - So it sounds like it comes down to how much RAM is available on our server. We're going to try 1000, and see how it goes, as we are still on a development server for just this kind of reason. We are looking at correcting a very large batch of records (retrospective ETDs) before we go live, so this is kind of exceptional..but looking forward, I can see we'll have to be correcting/recompiling regularly. I'm still kind of new, so if anyone has any advice/warnings about doing regular batch corrections as a regular thing, I'd be glad to hear it. How do you handle updates and recompiling? Quarterly? After hours? Thanks everyone, Irene Berry, MLIS Digital Services Librarian Dudley Knox Library, Naval Postgraduate School -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
-Original Message- From: Andrea Schweer [mailto:schw...@waikato.ac.nz] Sent: Wednesday, July 27, 2011 6:38 PM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time? Hi, On 28/07/11 10:19, Peter Dietz wrote: Doing batch changes with a large number of changes will keep your system busy, and the reindexing can take a while. I've noticed that when we set the limit to be really high, it appears that nothing will happen from the user's browser for 20+ minutes, so I've connected to the server from the command line, and noticed that the reindexing task was taking a long time, but still running. So you might be safe with setting this to a really high number (several thousand), you'll just have to have the patience to not disrupt it. But smaller / more manageable batch sizes will complete in a reasonable amount of time. With this set to 1000 or more, I'm guessing your more likely to run into Out-Of-Memory errors With one of 'my' repositories, when we increased the limit (to 1000 I think), completing the changes took so long that the Apache-Tomcat connection timed out. This meant that the user saw an error in their browser even though the changes actually went through fine. In our case we decided to stick with a lower limit to avoid confusion. Though 20 really does feel very low. There are some reports of this working with as many as 1000 items. http://dspace.2283337.n4.nabble.com/Excel-to-Dspace-td3310564.html Though it seems to be dependent on the amount of RAM available on your server. Relatedly, does anyone know how safe it is to test higher limit values? I looked at the code to determine whether these batch edits occur within a single database transaction, but I didn't see any evidence either way. Thanks, Alex Lemann -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
On Thu, Jul 28, 2011 at 14:00, Lemann, Alexander Bernard ablem...@bsu.edu wrote: Relatedly, does anyone know how safe it is to test higher limit values? I looked at the code to determine whether these batch edits occur within a single database transaction, but I didn't see any evidence either way. This shouldn't be any problem at the database layer. This limit and timeout problems concern only the web UI. I upped the limit to 2000 because I wanted to look at the diff in a web browser and it wasn't a problem - but I only did the preview, not the actual import step. I routinely run my imports of 2000 items via the command line interface. This dspace machine has only 1 GB of memory dedicated to it. Regards, ~~helix84 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
helix84 Usually, how much time do you wait to import 2000 items via the command line interface? Regards Heitor De: helix84 heli...@centrum.sk Para: Alexander Bernard Lemann ablem...@bsu.edu Cc: dspace-tech@lists.sourceforge.net Enviadas: Quinta-feira, 28 de Julho de 2011 9:17:16 Assunto: Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time? On Thu, Jul 28, 2011 at 14:00, Lemann, Alexander Bernard ablem...@bsu.edu wrote: Relatedly, does anyone know how safe it is to test higher limit values? I looked at the code to determine whether these batch edits occur within a single database transaction, but I didn't see any evidence either way. This shouldn't be any problem at the database layer. This limit and timeout problems concern only the web UI. I upped the limit to 2000 because I wanted to look at the diff in a web browser and it wasn't a problem - but I only did the preview, not the actual import step. I routinely run my imports of 2000 items via the command line interface. This dspace machine has only 1 GB of memory dedicated to it. Regards, ~~helix84 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
On Thu, Jul 28, 2011 at 15:37, Heitor Barbieri heitor.barbi...@bireme.org wrote: Usually, how much time do you wait to import 2000 items via the command line interface? Not more than a minute. Regards, ~~helix84 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
I think the most error-prone part will be displaying the diff (the list of all changes) via HTTP. This is what is most likely to cause long waiting times and timeouts, because a lot of HTML will be generated. I had no problem with command line imports - you can even halve the time it takes if you use the -s option to suppress confirmation of changes, but I do not recommend this. Regards, ~~helix84 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
On Thu, Jul 28, 2011 at 15:50, helix84 heli...@centrum.sk wrote: I think the most error-prone part will be displaying the diff (the list of all changes) via HTTP. This is what is most likely to cause long waiting times and timeouts, because a lot of HTML will be generated. I would suggest the developers to improve this by paging the diff in UI. This should be practically as fast as command line import, it would allow those who want to review the changes page by page and it would allow those who don't want to review it to just confirm the import. Regards, ~~helix84 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
Hi Irene, We're experimenting with making batch corrections to metadata using the Import Metadata feature in the jsp. We'd like to raise the limit on the number of items that may be changed at a time. I can see the file: http://scm.dspace.org/svn/repo/dspace/tags/dspace-1.6.2/dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/servlet/MetadataImportServlet.java Where it says this: // Set the lmimt to the number of items that may be changed in one go, default to 20 limit = ConfigurationManager.getIntProperty(bulkedit.gui-item-limit, 20); log.debug(Setting bulk edit limit to + limit + items); Correct - adjust the setting 'bulkedit.gui-item-limit' in dspace.cfg. …We’d like to up it from 20 to maybe 500 as an experiment -- but potentially much higher. Does anyone know if that's a really bad idea? We just don’t know what the consequence is of making this limit larger, but 20 seems way too low for a typical batch of changes. We set it to 20 initially so that there is a low risk of anything going wrong. If you are are happy with the tool, and the way it works, then it is fine to set it higher (but of course it carries the risk of potentially wrecking 500 records at once instead of 20!). We don't recommend setting it too high as the changes are all made in one hit, and this can cause timeouts or memory problems on the server. 500 should be fine. We've heard of problems when people start going over 1 or 2 thousand records at a time. Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
Hi Irene, We've wondered that too at my university, Ohio State University, so we've upped our setting to 600 which we feel is safe, but users typically do smaller batches. I'm guessing the too-low-for-practical-use limit of 20 is to be overly conservative by default, so that there is no risk of data loss or interruption. Doing batch changes with a large number of changes will keep your system busy, and the reindexing can take a while. I've noticed that when we set the limit to be really high, it appears that nothing will happen from the user's browser for 20+ minutes, so I've connected to the server from the command line, and noticed that the reindexing task was taking a long time, but still running. So you might be safe with setting this to a really high number (several thousand), you'll just have to have the patience to not disrupt it. But smaller / more manageable batch sizes will complete in a reasonable amount of time. With this set to 1000 or more, I'm guessing your more likely to run into Out-Of-Memory errors There is a note in: https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-BatchMetadataEditing that cautions against doing too large of batch edits. Peter Dietz On Wed, Jul 27, 2011 at 5:25 PM, Berry, Irene (CIV) icbe...@nps.edu wrote: Hello, We're experimenting with making batch corrections to metadata using the Import Metadata feature in the jsp. We'd like to raise the limit on the number of items that may be changed at a time. I can see the file: http://scm.dspace.org/svn/repo/dspace/tags/dspace-1.6.2/dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/servlet/MetadataImportServlet.java Where it says this: // Set the lmimt to the number of items that may be changed in one go, default to 20 limit = ConfigurationManager.getIntProperty(bulkedit.gui-item-limit, 20); log.debug(Setting bulk edit limit to + limit + items); …We’d like to up it from 20 to maybe 500 as an experiment -- but potentially much higher. Does anyone know if that's a really bad idea? We just don’t know what the consequence is of making this limit larger, but 20 seems way too low for a typical batch of changes. Thanks, Irene Berry, MLIS Digital Services Librarian Dudley Knox Library, Naval Postgraduate School -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Batch metadata corrections question: does anyone know why the limit is set to just 20 items at a time?
Hi, On 28/07/11 10:19, Peter Dietz wrote: Doing batch changes with a large number of changes will keep your system busy, and the reindexing can take a while. I've noticed that when we set the limit to be really high, it appears that nothing will happen from the user's browser for 20+ minutes, so I've connected to the server from the command line, and noticed that the reindexing task was taking a long time, but still running. So you might be safe with setting this to a really high number (several thousand), you'll just have to have the patience to not disrupt it. But smaller / more manageable batch sizes will complete in a reasonable amount of time. With this set to 1000 or more, I'm guessing your more likely to run into Out-Of-Memory errors With one of 'my' repositories, when we increased the limit (to 1000 I think), completing the changes took so long that the Apache-Tomcat connection timed out. This meant that the user saw an error in their browser even though the changes actually went through fine. In our case we decided to stick with a lower limit to avoid confusion. Though 20 really does feel very low. cheers, Andrea -- Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech