Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughts please
I looked through the Persistent Identifier (PI) wiki page and came up with a few questions/comments. 1) You created the prototype with a stackable interface, something I thought about doing, but now I've been wondering if it causes more problems than its worth. Why would an institution use more than one PI system? How do you determine which PI system generates a PId (base it on collection, community)? What if one PI system fails (URL unreachable, temporarily down) and it is needed to resolve the PId? Could it be possible to create a loop of PIds that resolve to different PI systems while moving through the PI system stack? 2) It is mentioned that HTTP isn't "persistent": Could someone explain why HTTP isn't as persistent as any other protocol? 3) Including special characters in the URL string doesn't seem like a good idea. While they are valid characters, it does take extra processing to encode/decode them from layer to layer. Why not just leave the URL alone or change /handle to something like /uri, /id, or /pid? Why encode the PI system into the URI? 4) Assigning bitstreams persistent identifiers seems dangerous. At the very least, version control and a history function are required by the application and PI system to determine if the PId is actually pointing to what was requested. Also, how are multiple bitstreams handled when assigned to an item? Does each bitstream get a PId? How does a user look at all bitstreams associated together by the item when the PId references only a single bitstream? As far as having a default PI system out of the box for Dspace, I would recommend using a local identifier schema which used the existing URLs. Include the Handle PI system in the release as a configurable option, but not turned on by default. This would remove the fake handle being assigned to all objects and clean up the default URLs out of the box. -- Brad On 05/22/2007 05:06 AM, James Rutherford wrote: > Hi all, > > I've recently started looking into the way DSpace deals (or doesn't) > with persistent identifiers (prompted in part by patch #1690912 and a > conversation I had with Mark Diggory). I've put some thoughts on the > wiki: > > http://wiki.dspace.org/index.php/PersistentIdentifiers > > and I'd like to gather some input. I've already implemented everything > discussed on the wiki in a prototype, and it seems to be working well. > Note that the implementation is being done in parallel with the DAO > prototype: > > http://wiki.dspace.org/index.php/DaoPrototype > > The most controversial aspects that I've come up against are: > > * deciding which persistent identifier method is used (if more than one >is supported); and > * what the URLs should look like (http://dspace.me.ac.uk/uri/hdl:12/34 >rather than http://dspace.me.ac.uk/handle/12/34, for instance) > > > I'm particularly interested in hearing from folks who already need to > support other identifiers (PURLs, DOIs, etc), but any input would be > appreciated. > > cheers, > > Jim > - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Cory, Comments below: On 04/18/2007 01:54 PM, Cory Snavely wrote: > Well, as I said at first, it all depends on your definition of what a > memory hog is. Today's hog fits in tomorrow's pocket. We better all > already be used to that. Thank you for proving my point on memory bloat pervasiveness in the IT industry. This type of thinking allows vendors (whether open source or proprietary) to drive up the "base" systems requirements without greatly improving functionality because it is predestined. > Also, I don't think for a *minute* that the original developers of > DSpace made a casual choice about their development environment--in > fact, I think they made a responsible choice given the alternatives. > Let's give our colleagues credit that's due. Their choice permits > scaling and fits well for an open-source project. Putting the general > problem of memory bloat in their laps seems pretty angsty to me. > > Lastly, dedicating a server to DSpace is a choice, not a necessity. We > as implementors have complete freedom to separate out the database and > storage tiers, and mechanisms exist for scaling Tomcat horizontally as > well. In the other direction, I suspect people are running DSpace on > VMware or xen virtual machines, too. I didn't say they made a casual choice about their development environment. I said the functional requirements of the application didn't justify the memory footprint required to run this application. Whether or not they made a choice that "fits well for an open-source project" depends on your definition of Open Source. However, I don't think that debate is relevant to this discussion. As far as scaling requirements, it depends on where you want scalability. As you pointed out, there is a natural ability with web applications to scale them vertically through hardware or Tomcat's, now native, horizontal approach. Since either approach needs hardware, the memory footprint of an application needs to be taken into account. The higher the "base" system requirements, the likelihood of someone having a scalable system is lowered due to total cost of ownership (TCO). While virtual machine technology can help lower some TCO issues, it brings in a whole new batch of problems which are out of scope for this discussion. The general problem of memory bloat rests in all developers laps (mine included). As an industry, we need to constantly weigh our use of memory against the functionality we are providing. The functionality provided by Dspace isn't rocket science, and shouldn't require memory footprints greater than most of systems that get people into space. -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] > On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: >> Pan, >> >> Dspace is a memory hog considering the functionality the application >> provides. This is mainly due to the technological choices made by the >> founders of the Dspace project, and not the functional requirements the >> Dspace project fulfills. >> >> Application and memory bloat are pervasive in the IT industry. Each >> individual organization should look at their requirements whether they >> are hardware, software or both. Having to dedicate a machine to an >> application, especially a relatively simple application like Dspace, is >> wasteful for hardware resources and people resources. >> >> Web applications should _not_ need 2G of memory to "run comfortably". >> - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to "run comfortably". -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] OS for DSpace
I just find it odd that RedHat doesn't seem to provide all the Apache modules. I _did_ find a mod_jk rpm mentioned on rhn.redhat.com, but it seems to only be a source RPM. Where as, Debian and other distros (I've used) package mod_jk binariesjust seems funny that RedHat is afraid? of packaging a mod_jk binary. -Brad On 02/08/2007 03:16 PM, Tim Donohue wrote: > > > Brad Teale wrote: >> All, >> I've looked around RedHat EL and couldn't find a properly supported >> Apache/Java/Tomat stack from RedHat. Our Institution wide IT department >> recommends using RH Apache and Java/Tomcat installed by the user. I'm >> still working with them for the mod_jk package. They want me to build >> one, but I'm not sure why I would build a C++ package when RH supplies >> one. Any ideas on that front? > > Brad, > > The way we have things set up on RedHat EL 3 is as follows: > > - RedHat's Apache Web Server > - Ant (downloaded & installed from Apache) > - Java (downloaded & installed from Sun) > - Tomcat (downloaded & installed from Apache) > - mod_jk (compiled & installed following Wiki instructions: > http://wiki.dspace.org/index.php/ModJk ) > > When I first installed DSpace a little over 1.5 years ago this seemed to > be the best way to do things on RHEL (since Ant, Java and Tomcat from > RedHat were all outdated at that time). In addition, at that time, I > wasn't able to locate a mod_jk package from RedHat or anywhere else > (hence compiling it from source). > > It actually wasn't too incredibly painful to compile mod_jk to install > it (the hardest part was finding all the prerequisites to get it to > actually compile). But, I documented it as detailed as I could on the > Wiki, so hopefully it *should* be relatively straightforward. Let me > know if you hit any snags, and maybe I can help out. > > - Tim > -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] 612-625-0473 - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] OS for DSpace
All, On 02/08/2007 09:21 AM, Mark Diggory wrote: > Nobody has yet to actually answer the deeper question about RHEL... > How about you guys? Are you running DSpace on Java/Tomcat provided > by RHEL support channels/updates or are you running on a "rolled > your own" installation of java/tomcat? Or alternatively, are you > using JPackage? I've looked around RedHat EL and couldn't find a properly supported Apache/Java/Tomat stack from RedHat. Our Institution wide IT department recommends using RH Apache and Java/Tomcat installed by the user. I'm still working with them for the mod_jk package. They want me to build one, but I'm not sure why I would build a C++ package when RH supplies one. Any ideas on that front? My $0.02, -Brad -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] 612-625-0473 - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Multiple Metadata Schema w/ Batch import
There is some code which looks like it would use a schema tag. However, this code is not written correctly since the code behind it uses addDC() instead of a proper call to addMetadata(). The addDC method is used through the ItemImport object, and I haven't had time to rework this code yet. I was looking for something out of the box, but it doesn't appear that Dspace provides the functionality I would like. -Brad On 01/26/2007 01:23 PM, Don Gourley wrote: > As I recall (I don't have access to the code right now) ItemImport as > of 1.4 does support 'schema' as an attribute of, I think, the > element. There is a problem mixing schemas because it just checks the > attribute for the first dcvalue, but that shouldn't affect you if you > are going to use your own schema for everything in the dublin_core.xml > file. As mentioned before you will need to define that schema in the > metadata registry and it must be a "flat" schema (no nested structures). > Also, I imagine you will have to customize the item display in JSP and > tag libraries pretty extensively. > > -Don > > > - > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ___ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Multiple Metadata Schema w/ Batch import
Christophe, I read through both the Dspace wiki and your blog, and am still a little confused. It looks like you added to the dc schema and are just mapping the new schema to the modified dc schema. However, we do not want to use the dc schema at all. Instead we would like to use our own defined schema for most of the collections in our Dspace instance and use the dc schema for a few other collections. Is this possible? Or must everything be converted to a modified dc schema? Thanks, Brad On 01/25/2007 12:48 AM, Christophe Dupriez wrote: > Hi Brad! > > You can look at the blog I started on the subject: > http://pubmed-dspace.blogspot.com > > I am working to load (very soon now) 46k bibliographic records from > Medline. > > Wishing this may help, > > Christophe > > Brad Teale a écrit : > >> Has anyone run a batch import of data that doesn't comply with the >> Dublin Core. We created a new metadata schema and would like to >> import the data directly into the new schema and not bother with the >> Dublin Core at all. Does Dspace support this? >> >> I've looked at the ItemImport object, but it doesn't allow this, and I >> looked into the Packager object but it seemed a little convoluted. >> I've loaded my metadata schema in the metadatafieldregistry table, but >> I can't get anything to actually look at it during import. We have >> around 25K objects with this new schema and would rather not have to >> covert them to DC. Any ideas? >> >> Thanks, >> Brad >> >> - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Multiple Metadata Schema w/ Batch import
Has anyone run a batch import of data that doesn't comply with the Dublin Core. We created a new metadata schema and would like to import the data directly into the new schema and not bother with the Dublin Core at all. Does Dspace support this? I've looked at the ItemImport object, but it doesn't allow this, and I looked into the Packager object but it seemed a little convoluted. I've loaded my metadata schema in the metadatafieldregistry table, but I can't get anything to actually look at it during import. We have around 25K objects with this new schema and would rather not have to covert them to DC. Any ideas? Thanks, Brad -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] 612-625-0473 - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech