[CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
Hello, list,
Do you know the Digital Library systems which can search within the documents 
(e.g. PDFs) and handle access restrictions (e.g. DRM)?
Has any of you compared these DL systems?

Thanks for any information!
Sophie


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Han, Yan
I would think DSpace, Fedora, and Eprint. DSpace is fairly easy to implement, 
which has embargo support in 1.6 
(https://wiki.duraspace.org/display/DSTEST/Embargo ).
I have an article comparing DSpace and Fedora, but was written 6 years ago. 
DSpace has not been changed much, but Fedora is a different story. 
Yan
-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Deng, 
Sai
Sent: Wednesday, October 20, 2010 10:33 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Hello, list,
Do you know the Digital Library systems which can search within the documents 
(e.g. PDFs) and handle access restrictions (e.g. DRM)?
Has any of you compared these DL systems?

Thanks for any information!
Sophie


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
Maybe my question is not clear. We are looking for some system which can search 
the full text of the deposited documents; these are licensed materials, so 
we'll also need access restriction.
We use DSpace, but I don't think DSpace does full text search, e.g. it doesn't 
search content in bitstreams (pdfs, ppts...). 

Any suggestion?
Thanks!
Sophie

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Han, Yan 
[h...@u.library.arizona.edu]
Sent: Wednesday, October 20, 2010 3:25 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

I would think DSpace, Fedora, and Eprint. DSpace is fairly easy to implement, 
which has embargo support in 1.6 
(https://wiki.duraspace.org/display/DSTEST/Embargo ).
I have an article comparing DSpace and Fedora, but was written 6 years ago. 
DSpace has not been changed much, but Fedora is a different story.
Yan
-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Deng, 
Sai
Sent: Wednesday, October 20, 2010 10:33 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Hello, list,
Do you know the Digital Library systems which can search within the documents 
(e.g. PDFs) and handle access restrictions (e.g. DRM)?
Has any of you compared these DL systems?

Thanks for any information!
Sophie


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Bill Janssen
Deng, Sai sai.d...@wichita.edu wrote:

 Do you know the Digital Library systems which can search within the
 documents (e.g. PDFs) and handle access restrictions (e.g. DRM)?

Not sure what you mean by handle access restrictions.  Do you mean it
can index the documents put into it even if they have DRM encumbrances?

UpLib has search within the documents -- if you search for a word or
phrase, it shows you all the documents which match, but also all the
pages in each document which match.  Supports a wide variety of document
formats, from JPEG2000 to PDF to Powerpoint.  But as far as I know it
doesn't deal with DRM restrictions.

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
For access restriction, I mean we would like to have certain documents open 
only to certain communities (UpLib cannot do that, right?). I don't know how 
DRM affects file indexing.

On second thought, I searched for DSpace full text search and found this: 
https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
However, I haven't seen any instance which shows the full text search results 
as I would see from vendor databases.

Any idea on what system might be good/best for search within documents and DRM?
Thank you for the reply!
Sophie


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Janssen 
[jans...@parc.com]
Sent: Wednesday, October 20, 2010 4:01 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Deng, Sai sai.d...@wichita.edu wrote:

 Do you know the Digital Library systems which can search within the
 documents (e.g. PDFs) and handle access restrictions (e.g. DRM)?

Not sure what you mean by handle access restrictions.  Do you mean it
can index the documents put into it even if they have DRM encumbrances?

UpLib has search within the documents -- if you search for a word or
phrase, it shows you all the documents which match, but also all the
pages in each document which match.  Supports a wide variety of document
formats, from JPEG2000 to PDF to Powerpoint.  But as far as I know it
doesn't deal with DRM restrictions.

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Bill Janssen
Deng, Sai sai.d...@wichita.edu wrote:

 For access restriction, I mean we would like to have certain documents
 open only to certain communities (UpLib cannot do that, right?).

OK, that's not I typically think of when I hear DRM.  Access control
is (I think) the way it's usually put.

No, UpLib has no built-in access control system, though the hooks are
there, and I know that some have used them to do access control.  I know
of one UpLib application which requires incoming connections to provide
a client certificate, which it uses to give different clients different
access rights.  Probably overkill for most uses.

You'd probably want to do an application-specific Web UI, though -- you
could put the access restrictions there.  I recently saw a Tomcat app
which uses the UpLib Java client-side library to search for documents,
then provided a completely custom UI.

 On second thought, I searched for DSpace full text search and found
 this:
 https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
 However, I haven't seen any instance which shows the full text search
 results as I would see from vendor databases.
 
 Any idea on what system might be good/best for search within documents and 
 DRM?

How about Greenstone?

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Han, Yan
DSpace does Full-text search, you need to turn on the configuration file. 
See UAL http://arizona.openrepository.com/arizona/ 
Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Deng, 
Sai
Sent: Wednesday, October 20, 2010 2:14 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

For access restriction, I mean we would like to have certain documents open 
only to certain communities (UpLib cannot do that, right?). I don't know how 
DRM affects file indexing.

On second thought, I searched for DSpace full text search and found this: 
https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
However, I haven't seen any instance which shows the full text search results 
as I would see from vendor databases.

Any idea on what system might be good/best for search within documents and DRM?
Thank you for the reply!
Sophie


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Janssen 
[jans...@parc.com]
Sent: Wednesday, October 20, 2010 4:01 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Deng, Sai sai.d...@wichita.edu wrote:

 Do you know the Digital Library systems which can search within the 
 documents (e.g. PDFs) and handle access restrictions (e.g. DRM)?

Not sure what you mean by handle access restrictions.  Do you mean it can 
index the documents put into it even if they have DRM encumbrances?

UpLib has search within the documents -- if you search for a word or phrase, 
it shows you all the documents which match, but also all the pages in each 
document which match.  Supports a wide variety of document formats, from 
JPEG2000 to PDF to Powerpoint.  But as far as I know it doesn't deal with DRM 
restrictions.

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
How can people tell it searches content in bitstreams (pdfs, word docs)? It 
looks like it only searches metadata.
Thanks.

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Han, Yan 
[h...@u.library.arizona.edu]
Sent: Wednesday, October 20, 2010 4:43 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

DSpace does Full-text search, you need to turn on the configuration file.
See UAL http://arizona.openrepository.com/arizona/
Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Deng, 
Sai
Sent: Wednesday, October 20, 2010 2:14 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

For access restriction, I mean we would like to have certain documents open 
only to certain communities (UpLib cannot do that, right?). I don't know how 
DRM affects file indexing.

On second thought, I searched for DSpace full text search and found this: 
https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
However, I haven't seen any instance which shows the full text search results 
as I would see from vendor databases.

Any idea on what system might be good/best for search within documents and DRM?
Thank you for the reply!
Sophie


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Janssen 
[jans...@parc.com]
Sent: Wednesday, October 20, 2010 4:01 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Deng, Sai sai.d...@wichita.edu wrote:

 Do you know the Digital Library systems which can search within the
 documents (e.g. PDFs) and handle access restrictions (e.g. DRM)?

Not sure what you mean by handle access restrictions.  Do you mean it can 
index the documents put into it even if they have DRM encumbrances?

UpLib has search within the documents -- if you search for a word or phrase, 
it shows you all the documents which match, but also all the pages in each 
document which match.  Supports a wide variety of document formats, from 
JPEG2000 to PDF to Powerpoint.  But as far as I know it doesn't deal with DRM 
restrictions.

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
Thanks for the information!
Greenstone has full text search, but I heard that its access control is much 
weaker than DSpace. Will it be able to allow certain documents open only to 
certain people or certain departments?
Thanks.
Sophie

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Janssen 
[jans...@parc.com]
Sent: Wednesday, October 20, 2010 4:31 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Deng, Sai sai.d...@wichita.edu wrote:

 For access restriction, I mean we would like to have certain documents
 open only to certain communities (UpLib cannot do that, right?).

OK, that's not I typically think of when I hear DRM.  Access control
is (I think) the way it's usually put.

No, UpLib has no built-in access control system, though the hooks are
there, and I know that some have used them to do access control.  I know
of one UpLib application which requires incoming connections to provide
a client certificate, which it uses to give different clients different
access rights.  Probably overkill for most uses.

You'd probably want to do an application-specific Web UI, though -- you
could put the access restrictions there.  I recently saw a Tomcat app
which uses the UpLib Java client-side library to search for documents,
then provided a completely custom UI.

 On second thought, I searched for DSpace full text search and found
 this:
 https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
 However, I haven't seen any instance which shows the full text search
 results as I would see from vendor databases.

 Any idea on what system might be good/best for search within documents and 
 DRM?

How about Greenstone?

Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Mark Jordan
Sophie,

It might help some of us on the list to understand what types of access control 
you need if you can describe some of the ways that the allowed users (people 
and/or departments, to use your examples) will identify themselves? Will they 
have already logged into the system with a local (to the system) account, or 
with a campus account that knows that they are part of a specific department? 
Will they need to log into he system when they request to see a specific 
document? Will where they are sitting matter (i.e., restricted by IP address)? 

Mark

Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
mjor...@sfu.ca

- Original Message -
 Thanks for the information!
 Greenstone has full text search, but I heard that its access control
 is much weaker than DSpace. Will it be able to allow certain documents
 open only to certain people or certain departments?
 Thanks.
 Sophie
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill
 Janssen [jans...@parc.com]
 Sent: Wednesday, October 20, 2010 4:31 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] DL Systems (allowing search within documents
 and access restrictions)?
 
 Deng, Sai sai.d...@wichita.edu wrote:
 
  For access restriction, I mean we would like to have certain
  documents
  open only to certain communities (UpLib cannot do that, right?).
 
 OK, that's not I typically think of when I hear DRM. Access
 control
 is (I think) the way it's usually put.
 
 No, UpLib has no built-in access control system, though the hooks are
 there, and I know that some have used them to do access control. I
 know
 of one UpLib application which requires incoming connections to
 provide
 a client certificate, which it uses to give different clients
 different
 access rights. Probably overkill for most uses.
 
 You'd probably want to do an application-specific Web UI, though --
 you
 could put the access restrictions there. I recently saw a Tomcat app
 which uses the UpLib Java client-side library to search for documents,
 then provided a completely custom UI.
 
  On second thought, I searched for DSpace full text search and
  found
  this:
  https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
  However, I haven't seen any instance which shows the full text
  search
  results as I would see from vendor databases.
 
  Any idea on what system might be good/best for search within
  documents and DRM?
 
 How about Greenstone?
 
 Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Deng, Sai
Thanks for the questions!
We don't have a clear idea yet and we are looking for a system now. The basic 
idea is that we'll deposit some licensed materials for some department and open 
them only to that group. I guess a local account would be ok, of course, if a 
campus account can be recognized, that's better. They'll need to log in to see 
the document if it's not ip restricted, right? IP restriction might not be the 
best way since faculty members will not always be in their departments.

Sophie  

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mark Jordan 
[mjor...@sfu.ca]
Sent: Wednesday, October 20, 2010 5:08 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and access 
restrictions)?

Sophie,

It might help some of us on the list to understand what types of access control 
you need if you can describe some of the ways that the allowed users (people 
and/or departments, to use your examples) will identify themselves? Will they 
have already logged into the system with a local (to the system) account, or 
with a campus account that knows that they are part of a specific department? 
Will they need to log into he system when they request to see a specific 
document? Will where they are sitting matter (i.e., restricted by IP address)?

Mark

Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
mjor...@sfu.ca

- Original Message -
 Thanks for the information!
 Greenstone has full text search, but I heard that its access control
 is much weaker than DSpace. Will it be able to allow certain documents
 open only to certain people or certain departments?
 Thanks.
 Sophie
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill
 Janssen [jans...@parc.com]
 Sent: Wednesday, October 20, 2010 4:31 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] DL Systems (allowing search within documents
 and access restrictions)?

 Deng, Sai sai.d...@wichita.edu wrote:

  For access restriction, I mean we would like to have certain
  documents
  open only to certain communities (UpLib cannot do that, right?).

 OK, that's not I typically think of when I hear DRM. Access
 control
 is (I think) the way it's usually put.

 No, UpLib has no built-in access control system, though the hooks are
 there, and I know that some have used them to do access control. I
 know
 of one UpLib application which requires incoming connections to
 provide
 a client certificate, which it uses to give different clients
 different
 access rights. Probably overkill for most uses.

 You'd probably want to do an application-specific Web UI, though --
 you
 could put the access restrictions there. I recently saw a Tomcat app
 which uses the UpLib Java client-side library to search for documents,
 then provided a completely custom UI.

  On second thought, I searched for DSpace full text search and
  found
  this:
  https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
  However, I haven't seen any instance which shows the full text
  search
  results as I would see from vendor databases.
 
  Any idea on what system might be good/best for search within
  documents and DRM?

 How about Greenstone?

 Bill


Re: [CODE4LIB] DL Systems (allowing search within documents and access restrictions)?

2010-10-20 Thread Bill Janssen
Deng, Sai sai.d...@wichita.edu wrote:

 Thanks for the questions!

 We don't have a clear idea yet and we are looking for a system
 now. The basic idea is that we'll deposit some licensed materials for
 some department and open them only to that group. I guess a local
 account would be ok, of course, if a campus account can be recognized,
 that's better.

In which case you'll need some access control system which can
understand your campus login system.

 They'll need to log in to see the document if it's not
 ip restricted, right? IP restriction might not be the best way since
 faculty members will not always be in their departments.

Will you let them search for documents, and show the search results,
even if they can't retrieve the full document, as the ACM Digital
Library does?  Or do search results have to be filtered, too?

How many different access groups will you have?  One per department?
One per licensed set of material?  And what's the approximate size of
each of those numbers?

A simple thing to do would be to install something like DocuShare, which
already does all this stuff and is built on top of Autonomy, one of the
better suites for extracting and indexing content from documents.

Bill



 
 Sophie  
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mark Jordan 
 [mjor...@sfu.ca]
 Sent: Wednesday, October 20, 2010 5:08 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] DL Systems (allowing search within documents and 
 access restrictions)?
 
 Sophie,
 
 It might help some of us on the list to understand what types of access 
 control you need if you can describe some of the ways that the allowed users 
 (people and/or departments, to use your examples) will identify themselves? 
 Will they have already logged into the system with a local (to the system) 
 account, or with a campus account that knows that they are part of a specific 
 department? Will they need to log into he system when they request to see a 
 specific document? Will where they are sitting matter (i.e., restricted by IP 
 address)?
 
 Mark
 
 Mark Jordan
 Head of Library Systems
 W.A.C. Bennett Library, Simon Fraser University
 Burnaby, British Columbia, V5A 1S6, Canada
 Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
 mjor...@sfu.ca
 
 - Original Message -
  Thanks for the information!
  Greenstone has full text search, but I heard that its access control
  is much weaker than DSpace. Will it be able to allow certain documents
  open only to certain people or certain departments?
  Thanks.
  Sophie
  
  From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill
  Janssen [jans...@parc.com]
  Sent: Wednesday, October 20, 2010 4:31 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] DL Systems (allowing search within documents
  and access restrictions)?
 
  Deng, Sai sai.d...@wichita.edu wrote:
 
   For access restriction, I mean we would like to have certain
   documents
   open only to certain communities (UpLib cannot do that, right?).
 
  OK, that's not I typically think of when I hear DRM. Access
  control
  is (I think) the way it's usually put.
 
  No, UpLib has no built-in access control system, though the hooks are
  there, and I know that some have used them to do access control. I
  know
  of one UpLib application which requires incoming connections to
  provide
  a client certificate, which it uses to give different clients
  different
  access rights. Probably overkill for most uses.
 
  You'd probably want to do an application-specific Web UI, though --
  you
  could put the access restrictions there. I recently saw a Tomcat app
  which uses the UpLib Java client-side library to search for documents,
  then provided a completely custom UI.
 
   On second thought, I searched for DSpace full text search and
   found
   this:
   https://wiki.duraspace.org/display/DSPACE/Configure+full+text+indexing
   However, I haven't seen any instance which shows the full text
   search
   results as I would see from vendor databases.
  
   Any idea on what system might be good/best for search within
   documents and DRM?
 
  How about Greenstone?
 
  Bill