Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughts please

2007-05-25 Thread Brad Teale
I looked through the Persistent Identifier (PI) wiki page and came up
with a few questions/comments.

1) You created the prototype with a stackable interface, something I
thought about doing, but now I've been wondering if it causes more
problems than its worth.  Why would an institution use more than one PI
system?  How do you determine which PI system generates a PId (base it
on collection, community)?  What if one PI system fails (URL
unreachable, temporarily down) and it is needed to resolve the PId?
Could it be possible to create a loop of PIds that resolve to different
PI systems while moving through the PI system stack?

2)  It is mentioned that HTTP isn't "persistent":  Could someone explain
why HTTP isn't as persistent as any other protocol?

3) Including special characters in the URL string doesn't seem like a
good idea.  While they are valid characters, it does take extra
processing to encode/decode them from layer to layer.  Why not just
leave the URL alone or change /handle to something like /uri, /id, or
/pid?  Why encode the PI system into the URI?

4) Assigning bitstreams persistent identifiers seems dangerous.  At the
very least, version control and a history function are required by the
application and PI system to determine if the PId is actually pointing
to what was requested.  Also, how are multiple bitstreams handled when
assigned to an item?  Does each bitstream get a PId?  How does a user
look at all bitstreams associated together by the item when the PId
references only a single bitstream?

As far as having a default PI system out of the box for Dspace, I would
recommend using a local identifier schema which used the existing URLs.
 Include the Handle PI system in the release as a configurable option,
but not turned on by default.  This would remove the fake handle being
assigned to all objects and clean up the default URLs out of the box.

--
Brad



On 05/22/2007 05:06 AM, James Rutherford wrote:
> Hi all,
> 
> I've recently started looking into the way DSpace deals (or doesn't)
> with persistent identifiers (prompted in part by patch #1690912 and a
> conversation I had with Mark Diggory). I've put some thoughts on the
> wiki:
> 
> http://wiki.dspace.org/index.php/PersistentIdentifiers
> 
> and I'd like to gather some input. I've already implemented everything
> discussed on the wiki in a prototype, and it seems to be working well.
> Note that the implementation is being done in parallel with the DAO
> prototype:
> 
> http://wiki.dspace.org/index.php/DaoPrototype
> 
> The most controversial aspects that I've come up against are:
> 
>  * deciding which persistent identifier method is used (if more than one
>is supported); and
>  * what the URLs should look like (http://dspace.me.ac.uk/uri/hdl:12/34
>rather than http://dspace.me.ac.uk/handle/12/34, for instance)
> 
> 
> I'm particularly interested in hearing from folks who already need to
> support other identifiers (PURLs, DOIs, etc), but any input would be
> appreciated.
> 
> cheers,
> 
> Jim
> 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-20 Thread Brad Teale
Cory,

Comments below:

On 04/18/2007 01:54 PM, Cory Snavely wrote:
> Well, as I said at first, it all depends on your definition of what a
> memory hog is. Today's hog fits in tomorrow's pocket. We better all
> already be used to that.

Thank you for proving my point on memory bloat pervasiveness in the IT
industry.  This type of thinking allows vendors (whether open source or
proprietary) to drive up the "base" systems requirements without greatly
improving functionality because it is predestined.

> Also, I don't think for a *minute* that the original developers of
> DSpace made a casual choice about their development environment--in
> fact, I think they made a responsible choice given the alternatives.
> Let's give our colleagues credit that's due. Their choice permits
> scaling and fits well for an open-source project. Putting the general
> problem of memory bloat in their laps seems pretty angsty to me.
> 
> Lastly, dedicating a server to DSpace is a choice, not a necessity. We
> as implementors have complete freedom to separate out the database and
> storage tiers, and mechanisms exist for scaling Tomcat horizontally as
> well. In the other direction, I suspect people are running DSpace on
> VMware or xen virtual machines, too.

I didn't say they made a casual choice about their development
environment.  I said the functional requirements of the application
didn't justify the memory footprint required to run this application.
Whether or not they made a choice that "fits well for an open-source
project" depends on your definition of Open Source.  However, I don't
think that debate is relevant to this discussion.

As far as scaling requirements, it depends on where you want
scalability.  As you pointed out, there is a natural ability with web
applications to scale them vertically through hardware or Tomcat's, now
native, horizontal approach.  Since either approach needs hardware, the
memory footprint of an application needs to be taken into account.  The
higher the "base" system requirements, the likelihood of someone having
a scalable system is lowered due to total cost of ownership (TCO).
While virtual machine technology can help lower some TCO issues, it
brings in a whole new batch of problems which are out of scope for this
discussion.

The general problem of memory bloat rests in all developers laps (mine
included).  As an industry, we need to constantly weigh our use of
memory against the functionality we are providing.  The functionality
provided by Dspace isn't rocket science, and shouldn't require memory
footprints greater than most of systems that get people into space.

-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]


> On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
>> Pan,
>>
>> Dspace is a memory hog considering the functionality the application
>> provides.  This is mainly due to the technological choices made by the
>> founders of the Dspace project, and not the functional requirements the
>> Dspace project fulfills.
>>
>> Application and memory bloat are pervasive in the IT industry.  Each
>> individual organization should look at their requirements whether they
>> are hardware, software or both.  Having to dedicate a machine to an
>> application, especially a relatively simple application like Dspace, is
>> wasteful for hardware resources and people resources.
>>
>> Web applications should _not_ need 2G of memory to "run comfortably".
>>



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Brad Teale
Pan,

Dspace is a memory hog considering the functionality the application
provides.  This is mainly due to the technological choices made by the
founders of the Dspace project, and not the functional requirements the
Dspace project fulfills.

Application and memory bloat are pervasive in the IT industry.  Each
individual organization should look at their requirements whether they
are hardware, software or both.  Having to dedicate a machine to an
application, especially a relatively simple application like Dspace, is
wasteful for hardware resources and people resources.

Web applications should _not_ need 2G of memory to "run comfortably".

-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] OS for DSpace

2007-02-08 Thread Brad Teale
I just find it odd that RedHat doesn't seem to provide all the Apache
modules.  I _did_ find a mod_jk rpm mentioned on rhn.redhat.com, but it
seems to only be a source RPM.  Where as, Debian and other distros (I've
used) package mod_jk binariesjust seems funny that RedHat is afraid?
of packaging a mod_jk binary.

-Brad


On 02/08/2007 03:16 PM, Tim Donohue wrote:
> 
> 
> Brad Teale wrote:
>> All,
>> I've looked around RedHat EL and couldn't find a properly supported
>> Apache/Java/Tomat stack from RedHat.  Our Institution wide IT department
>> recommends using RH Apache and Java/Tomcat installed by the user.  I'm
>> still working with them for the mod_jk package.  They want me to build
>> one, but I'm not sure why I would build a C++ package when RH supplies
>> one.  Any ideas on that front?
> 
> Brad,
> 
> The way we have things set up on RedHat EL 3 is as follows:
> 
> - RedHat's Apache Web Server
> - Ant  (downloaded & installed from Apache)
> - Java (downloaded & installed from Sun)
> - Tomcat (downloaded & installed from Apache)
> - mod_jk (compiled & installed following Wiki instructions:
>   http://wiki.dspace.org/index.php/ModJk )
> 
> When I first installed DSpace a little over 1.5 years ago this seemed to
> be the best way to do things on RHEL (since Ant, Java and Tomcat from
> RedHat were all outdated at that time).  In addition, at that time, I
> wasn't able to locate a mod_jk package from RedHat or anywhere else
> (hence compiling it from source).
> 
> It actually wasn't too incredibly painful to compile mod_jk to install
> it (the hardest part was finding all the prerequisites to get it to
> actually compile).  But, I documented it as detailed as I could on the
> Wiki, so hopefully it *should* be relatively straightforward.  Let me
> know if you hit any snags, and maybe I can help out.
> 
> - Tim
> 


-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]  612-625-0473

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] OS for DSpace

2007-02-08 Thread Brad Teale
All,

On 02/08/2007 09:21 AM, Mark Diggory wrote:
> Nobody has yet to actually answer the deeper question about RHEL...  
> How about you guys? Are you running DSpace on Java/Tomcat provided  
> by  RHEL support channels/updates or are you running on a "rolled  
> your own" installation of java/tomcat? Or alternatively, are you  
> using JPackage?
I've looked around RedHat EL and couldn't find a properly supported
Apache/Java/Tomat stack from RedHat.  Our Institution wide IT department
recommends using RH Apache and Java/Tomcat installed by the user.  I'm
still working with them for the mod_jk package.  They want me to build
one, but I'm not sure why I would build a C++ package when RH supplies
one.  Any ideas on that front?

My $0.02,
-Brad



-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]  612-625-0473

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Multiple Metadata Schema w/ Batch import

2007-01-26 Thread Brad Teale
There is some code which looks like it would use a schema tag.  However,
this code is not written correctly since the code behind it uses addDC()
instead of a proper call to addMetadata().  The addDC method is used
through the ItemImport object, and I haven't had time to rework this
code yet.  I was looking for something out of the box, but it doesn't
appear that Dspace provides the functionality I would like.

-Brad

On 01/26/2007 01:23 PM, Don Gourley wrote:
> As I recall (I don't have access to the code right now) ItemImport as
> of 1.4 does support 'schema' as an attribute of, I think, the 
> element.  There is a problem mixing schemas because it just checks the
> attribute for the first dcvalue, but that shouldn't affect you if you
> are going to use your own schema for everything in the dublin_core.xml
> file.  As mentioned before you will need to define that schema in the
> metadata registry and it must be a "flat" schema (no nested structures).
> Also, I imagine you will have to customize the item display in JSP and
> tag libraries pretty extensively.
> 
> -Don
> 
> 
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> 

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Multiple Metadata Schema w/ Batch import

2007-01-25 Thread Brad Teale
Christophe,

I read through both the Dspace wiki and your blog, and am still a little 
confused.  It looks like you added to the dc schema and are just mapping 
the new schema to the modified dc schema.  However, we do not want to 
use the dc schema at all.  Instead we would like to use our own defined 
schema for most of the collections in our Dspace instance and use the dc 
schema for a few other collections.

Is this possible?  Or must everything be converted to a modified dc schema?

Thanks,
Brad

On 01/25/2007 12:48 AM, Christophe Dupriez wrote:
> Hi Brad!
> 
> You can look at the blog I started on the subject:
> http://pubmed-dspace.blogspot.com
> 
> I am working to load (very soon now) 46k bibliographic records from 
> Medline.
> 
> Wishing this may help,
> 
> Christophe
> 
> Brad Teale a écrit :
> 
>> Has anyone run a batch import of data that doesn't comply with the 
>> Dublin Core.  We created a new metadata schema and would like to 
>> import the data directly into the new schema and not bother with the 
>> Dublin Core at all.  Does Dspace support this?
>>
>> I've looked at the ItemImport object, but it doesn't allow this, and I 
>> looked into the Packager object but it seemed a little convoluted.  
>> I've loaded my metadata schema in the metadatafieldregistry table, but 
>> I can't get anything to actually look at it during import.  We have 
>> around 25K objects with this new schema and would rather not have to 
>> covert them to DC.  Any ideas?
>>
>> Thanks,
>> Brad
>>
>>   

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Multiple Metadata Schema w/ Batch import

2007-01-24 Thread Brad Teale
Has anyone run a batch import of data that doesn't comply with the 
Dublin Core.  We created a new metadata schema and would like to import 
the data directly into the new schema and not bother with the Dublin 
Core at all.  Does Dspace support this?

I've looked at the ItemImport object, but it doesn't allow this, and I 
looked into the Packager object but it seemed a little convoluted.  I've 
loaded my metadata schema in the metadatafieldregistry table, but I 
can't get anything to actually look at it during import.  We have around 
25K objects with this new schema and would rather not have to covert 
them to DC.  Any ideas?

Thanks,
Brad

-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]  612-625-0473

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech