Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Arash.Joorabchi
Eric,

You might want to consider using http://www.documentcloud.org to host
your users document. That would also take care of
privacy/authentication concerns. I know of a project in journalism
domain (http://overview.ap.org/) which does that.

As far as I remember they do provide an API interface and do some named
entity recognition as well. 

Regards,
Arash

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Eric Lease Morgan
Sent: 11 October 2013 18:58
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] pdf2txt

On Oct 11, 2013, at 1:49 PM, Matthew Sherman matt.r.sher...@gmail.com
wrote:

 For a limited period of time I am making publicly available a 
 Web-based program called PDF2TXT -- http://bit.ly/1bJRyh8
 
 Very slick, good work.  I can see where this tool can be very helpful.

 It does have some issues with some characters, but this is rather 
 common with most systems.

Again, thank you for the support. Yes, there are some escaping issues to
be resolved. Release early. Release often. I need help with the
graphic design in general. 

Here's an enhancement I thought of:

  1. allow readers to authenticate
  2. allow readers to upload documents
  3. documents get saved in readers' cache
  4. allow interface to list documents in the cache
  5. provide text mining services against reader-selected documents
  6. go to Step #1

It would also be cool if I could figure out how to finish the
installation of Tesseract to enable OCRing. [1]

[1] OCRing -
http://serials.infomotions.com/code4lib/archive/2013/201303/1554.html

--
Eric Morgan

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4142 / Virus Database: 3604/6734 - Release Date:
10/08/13


[CODE4LIB] WorldCat API - myTags

2013-06-06 Thread Arash.Joorabchi
Hi all,

 

When viewing a work's metadata on WorldCat.org website, in the tag
section of the page you are given the option to add new tags after
logging in with your (free) account. I was wondering if there is a
WorldCat api to do this from within my Java code.

 

Thanks,

Arash

 

 


[CODE4LIB] MARC field for FAST

2012-11-23 Thread Arash.Joorabchi
Given a collection of scientific documents annotated with FAST subject
headings, I was wondering what MARC field should be used to represent
FAST?

DDC (MARC-082)
LCC (MARC-050)
LCSH (MARC-650)
FAST ?

Thanks,
Arash


Re: [CODE4LIB] articles using ddc

2012-07-12 Thread Arash.Joorabchi
Thanks Karen, Rene also mentioned the BASE (thanks). They only go as far
as the third level of the DDC and in all the cases I checked, the DDC
classes were assigned automatically. Meanwhile, I have found out that
some university libraries have assigned subject metadata to the
technical reports and articles archived in their institutional
repositories, e.g., see:

http://www.worldcat.org/title/supporting-oo-design-heuristics/oclc/19148
1189referer=brief_results

None of them have all the DDC, LCC, and LCSH metadata assigned, but it
is a good start anyway. Some of them like 1400 research papers from the
Carnegie Mellon University-School of Computer Science have proper LCSHs
assigned but have been assigned the same DDC and LCC (i.e., LCC:
QA76.C37
DDC: 510.7808), which I suppose is understandable considering the amount
of work required for their proper manual classification.

Thanks,
Arash

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Karen Coyle
Sent: 12 July 2012 18:12
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] articles using ddc

Someone asked a while back about a source of journal articles that had 
been indexed using DDC. I have found such a source here:

http://www.base-search.net/Browse/Home

No idea if it meets your needs, but it reminded me.

kc

-- 
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2195 / Virus Database: 2437/5127 - Release Date:
07/12/12


[CODE4LIB] Test dataset for evaluation of automatic classification of research documents according to FAST and DDC

2012-06-29 Thread Arash.Joorabchi
Hi all,

 

I am working on developing a software system designed to analyze the
content of research documents (e.g., research papers, articles, etc.)
archived in scientific repositories (e.g., http://citeseerx.ist.psu.edu
http://citeseerx.ist.psu.edu/  , http://arxiv.org/ ) and automatically
classify them according to FAST and DDC. In order to objectively qualify
the performance of the system, a collection of research documents which
have been manually classified according to the DDC and been assigned
FAST subject heading would be required. I was wondering if anyone is
aware of such dataset existing online.

 

Regards,

Arash


Re: [CODE4LIB] OCLC Classify API - sfa vs. nsfa

2012-06-21 Thread Arash.Joorabchi
controlNumber: 47151174 DDC - afa:FIC nsfa:null
controlNumber: 30576709 DDC - afa:510.7808 nsfa:510.78
controlNumber: 36240850 DDC - afa:510.7808 nsfa:510.78
controlNumber: 36240846 DDC - afa:510.7808 nsfa:510.78
controlNumber: 25415527 DDC - afa:510.7808 nsfa:510.78
controlNumber: 32043473 DDC - afa:510.7808 nsfa:510.78
controlNumber: 7559271 DDC - afa:748.2917 nsfa:748.291
controlNumber: 38735328 DDC - afa:E nsfa:null
controlNumber: 122704504 DDC - afa:516.158 nsfa:516.15
controlNumber: 47198847 DDC - afa:E nsfa:null




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Steve 
Meyer
Sent: 21 June 2012 13:46
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OCLC Classify API - sfa vs. nsfa

For the Classify service at OCLC, when it is LCC we use a regular
expression: ^[a-zA-Z]{1,3}[1-9].*$. For DDC we filter out the
truncation symbols, spaces, quotes, etc.

-Steve

On Wed, Jun 20, 2012 at 8:54 AM, Arash.Joorabchi arash.joorab...@ul.ie wrote:
 Hi all,

 I am using the OCLC Classify API. As show in the sample response snippet
 below the two attributes sfa and nsfa could hold different values.

 According to
 http://oclc.org/developer/documentation/classify/response-details:

 sfa - classification number from the subfield $a of 082/092 or 050/090,
 or 060/096

 nsfa - normalized classification number from the subfield $a of 082/092
 or 050/090, or 060/096

 However,I would like to know how this normalization is done.

 Thanks,
 Arash


 recommendations

 graphhttp://chart.apis.google.com/chart?cht=pamp;chs=350x200amp;chd=
 t:100.0amp;chtt=All+Editionsamp;chdl=Classified (100.00%)/graph
    fast

 graphhttp://chart.apis.google.com/chart?cht=pamp;chs=475x175amp;chd=
 t:100.0,16.68,16.68,16.68amp;chl=Functional programming
 (Computer science)|Lambda calculus|Modality (Logic)|Type theory|/graph
      headings
        heading heldby=6 ident=fst00936086Functional programming
 (Computer science)/heading
        heading heldby=1 ident=fst00991011Lambda
 calculus/heading
        heading heldby=1 ident=fst01024350Modality
 (Logic)/heading
        heading heldby=1 ident=fst01159972Type theory/heading
      /headings
    /fast
    ddc
      mostPopular holdings=6 nsfa=510.78 sfa=510.7808/
      mostRecent holdings=6 sfa=510.7808/

 graphhttp://chart.apis.google.com/chart?cht=pamp;chs=350x200amp;chd=
 t:100.0amp;chtt=DDCamp;chdl=510.7808/graph
    /ddc
  /recommendations

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2180 / Virus Database: 2437/5082 - Release Date: 06/20/12


[CODE4LIB] OCLC Classify API - sfa vs. nsfa

2012-06-20 Thread Arash.Joorabchi
Hi all,

I am using the OCLC Classify API. As show in the sample response snippet
below the two attributes sfa and nsfa could hold different values.  

According to
http://oclc.org/developer/documentation/classify/response-details:

sfa - classification number from the subfield $a of 082/092 or 050/090,
or 060/096

nsfa - normalized classification number from the subfield $a of 082/092
or 050/090, or 060/096

However,I would like to know how this normalization is done.

Thanks,
Arash


recommendations
 
graphhttp://chart.apis.google.com/chart?cht=pamp;chs=350x200amp;chd=
t:100.0amp;chtt=All+Editionsamp;chdl=Classified (100.00%)/graph
fast
 
graphhttp://chart.apis.google.com/chart?cht=pamp;chs=475x175amp;chd=
t:100.0,16.68,16.68,16.68amp;chl=Functional programming
(Computer science)|Lambda calculus|Modality (Logic)|Type theory|/graph
  headings
heading heldby=6 ident=fst00936086Functional programming
(Computer science)/heading
heading heldby=1 ident=fst00991011Lambda
calculus/heading
heading heldby=1 ident=fst01024350Modality
(Logic)/heading
heading heldby=1 ident=fst01159972Type theory/heading
  /headings
/fast
ddc
  mostPopular holdings=6 nsfa=510.78 sfa=510.7808/
  mostRecent holdings=6 sfa=510.7808/
 
graphhttp://chart.apis.google.com/chart?cht=pamp;chs=350x200amp;chd=
t:100.0amp;chtt=DDCamp;chdl=510.7808/graph
/ddc
  /recommendations


Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-22 Thread Arash.Joorabchi
Thank you Roy and Simon for the info.

As for your second point, I suppose one advantage of using the WorldCat
API at this experimental stage is that the returned bib records are
already FRBR-ized.

Ross - Thanks for the link of Open Library data dump. WorldCat
collection is 2 orders of magnitude larger than open library which makes
a significant difference considering the skewness and sparsity of bib
records classified according to library taxonomies, e.g., DDC, LCC (for
more info, see:
http://cdm15003.contentdm.oclc.org/cdm/singleitem/collection/p267701coll
27/id/277/rec/28)


Thanks,
Arash


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Simon Spero
Sent: 22 May 2012 19:47
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
without a DDC no from the result set

Arash - you might not want to use a straight dump of worldcat catalog
records- at least not without the associated holdings information.*

There are a lot of quasi-duplicate records that are  sufficiently broken
that the worldcat de-duplication algorithm refuses to merge them.  These
records will usually only be used by a handful of institutions;  the
better
records will tend to have more associated holdings.  The holdings count
should be used to weight the strength of association between class
numbers
and features.

Also, since classification/categorization is something that is usually
considered to be a property of works, rather than manifestations, one
might
get better results by using Work sets for training.

I would suggest, er, contacting  Thom Hickey.

Simon

* Well, not precisely holdings - you just need the number of distinct
institutions with at least one copy.  I call them 'hasings'.

On Sat, May 19, 2012 at 8:42 PM, Roy Tennant roytenn...@gmail.com
wrote:

 Arash,
 Yes, we have made WorldCat available to researchers under a special
 license agreement. I suggest contacting Thom Hickeyhic...@oclc.org
 about such an arrangement. Thanks,
 Roy

 On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi
arash.joorab...@ul.ie
 wrote:
  Dear Karen,
 
  I am conducting a research experiment on automatic text
classification
 and I am trying to retrieve top matching bib records (which include
DDC
 fields) for a set of keyphrases extracted from a given document. So, I
 suppose this is a rather exceptional use case. In fact, the right
approach
 for this experiment is to process the full dump of WorldCat database
 directly rather than sending a limited number of queries via the API.
 
  I read here:
  http://dltj.org/article/worldcat-lld-may-become-available
under-odc-by/
  that WorldCat might become available as open linked data in future,
 which would solve my problem and help similar text mining projects.
 However, I wonder if it is currently available to researchers under a
 research/non-commercial use license agreement.
 
  Regards,
  Arash
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Karen Coombs
  Sent: 17 May 2012 08:37
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records
 without a DDC no from the result set
 
  I forwarded this thread to the Product Manager for the WorldCat
Search
  API. She responded back that unfortunately this query is not
possible
  using the API at this time.
 
  FYI, the SRU interface to WorldCat Search API doesn't currently
  support any scan type searches either.
 
  Is there a particular use case you're trying to support? Know that
  would help us document this as a possible enhancement.
 
  Karen
 
  Karen Coombs
  Senior Product Analyst
  Web Services
  OCLC
  coom...@oclc.org
 
  On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi
arash.joorab...@ul.ie
 wrote:
  Hi Andy,
 
 
 
  I am a SRU newbie myself, so I don't know how this could be
achieved
  using scan operations and could not find much info on SRU website
  (http://www.loc.gov/standards/sru/).
 
  As for the wildcards, according to this guide:
 

http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
  rchworldcatquickreference.pdf the symbols should be preceded by at
least
  3 characters, and therefore clauses like:
 
 
 
  ... AND srw.dd=*
 
  ... AND srw.dd=?.*
 
  ... AND srw/dd=###.*
 
  ... AND srw/dd=?3.*
 
 
 
 
 
  do not work and result in the following error:
 
  Diagnostics
 
  Identifier:
 
  info:srw/diagnostic/1/9
 
  Meaning:
 
 
 
  Details:
 
 
 
  Message:
 
  Not enough chars in truncated term:Truncated words too short(9)
 
 
 
 
 
  Thanks,
 
  Arash
 
 
 
  
 
  From: Houghton,Andrew [mailto:hough...@oclc.org]
  Sent: 16 May 2012 11:58
  To: Arash.Joorabchi
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records
  without a DDC no from the result set
 
 
 
  I'm not an SRU guru, but is it possible to do a scan and look for a
  postings of zero?
 
 
 
  Andy.
 
  On May

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-18 Thread Arash.Joorabchi
Dear Karen,

I am conducting a research experiment on automatic text classification and I am 
trying to retrieve top matching bib records (which include DDC fields) for a 
set of keyphrases extracted from a given document. So, I suppose this is a 
rather exceptional use case. In fact, the right approach for this experiment is 
to process the full dump of WorldCat database directly rather than sending a 
limited number of queries via the API.

I read here: 
http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ 
that WorldCat might become available as open linked data in future, which would 
solve my problem and help similar text mining projects. However, I wonder if it 
is currently available to researchers under a research/non-commercial use 
license agreement.

Regards,
Arash

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
Coombs
Sent: 17 May 2012 08:37
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a 
DDC no from the result set

I forwarded this thread to the Product Manager for the WorldCat Search
API. She responded back that unfortunately this query is not possible
using the API at this time.

FYI, the SRU interface to WorldCat Search API doesn't currently
support any scan type searches either.

Is there a particular use case you're trying to support? Know that
would help us document this as a possible enhancement.

Karen

Karen Coombs
Senior Product Analyst
Web Services
OCLC
coom...@oclc.org

On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote:
 Hi Andy,



 I am a SRU newbie myself, so I don't know how this could be achieved
 using scan operations and could not find much info on SRU website
 (http://www.loc.gov/standards/sru/).

 As for the wildcards, according to this guide:
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
 rchworldcatquickreference.pdf the symbols should be preceded by at least
 3 characters, and therefore clauses like:



 ... AND srw.dd=*

 ... AND srw.dd=?.*

 ... AND srw/dd=###.*

 ... AND srw/dd=?3.*





 do not work and result in the following error:

 Diagnostics

 Identifier:

 info:srw/diagnostic/1/9

 Meaning:



 Details:



 Message:

 Not enough chars in truncated term:Truncated words too short(9)





 Thanks,

 Arash



 

 From: Houghton,Andrew [mailto:hough...@oclc.org]
 Sent: 16 May 2012 11:58
 To: Arash.Joorabchi
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set



 I'm not an SRU guru, but is it possible to do a scan and look for a
 postings of zero?



 Andy.

 On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:

        Hi mark,

        Srw.dd=* does not work either:

        Identifier:     info:srw/diagnostic/1/27
        Meaning:
        Details:        srw.dd
        Message:        The index [srw.dd] did not include a searchable
 value

        I suppose the only option left is to retrieve everything and
 filter the results on the client side.

        Thanks for your quick reply.
        Arash


        -Original Message-
        From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Mike Taylor
        Sent: 16 May 2012 10:43
        To: CODE4LIB@LISTSERV.ND.EDU
        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
 records without a DDC no from the result set

        There is no standard way in CQL to express field X is not
 empty.
        Depending on implementations, NOT srw.dd= might work (but
 evidently
        doesn't in this case).  Another possibility is srw.dd=*, but
 again
        that may or may not work, and might be appallingly inefficient
 if it
        does.  NOT srw.dd=null will definitely not work: null is not a
        special word in CQL.

        -- Mike.


        On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
          Hi all,
        
         I am sending SRU queries to the WorldCat in the following
 form:
        
        
                        String host =
         http://worldcat.org/webservices/catalog/search/;;
                    String query = sru?query=srw.kw=\ + keyword +
 \
                                        +  AND srw.ln exact \eng\
                                        +  AND srw.mt all \bks\
                                        +  AND srw.nt=\ + keyword +
 \
                                        + servicelevel=full
                                        + maximumRecords=100
                                      + sortKeys=relevance,,0
                                        + wskey=[wskey];
        
         And it is working fine, however I'd like to limit the results
 to those
         records that have a DDC number assigned to them, but I don't
 know what's
         the right way to specify this limit in the query

[CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Arash.Joorabchi
 Hi all,

I am sending SRU queries to the WorldCat in the following form:


String host =
http://worldcat.org/webservices/catalog/search/;;
String query = sru?query=srw.kw=\ + keyword + \
+  AND srw.ln exact \eng\
+  AND srw.mt all \bks\
+  AND srw.nt=\ + keyword + \
+ servicelevel=full
+ maximumRecords=100
  + sortKeys=relevance,,0
+ wskey=[wskey];

And it is working fine, however I'd like to limit the results to those
records that have a DDC number assigned to them, but I don't know what's
the right way to specify this limit in the query.

 NOT srw.dd=
 NOT srw.dd=null

Neither of above work


Thanks,
Arash

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Chad Benjamin Nelson
Sent: 15 May 2012 21:54
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Atlanta Digital Libraries meetup - May 23rd

The first / next Atlanta Digital Libraries meetup is coming up soon:

Wednesday, May 23rd 7pm
Manuel's Tavernhttp://www.manuelstavern.com/location.php
602 N Highland Avenue Northeast
Atlanta, GA 30307
North Avenue Room

We have two scheduled talks, and are still looking others interested in
presenting. It's informal, so even if it is just a short topic you want
to get some feedback on, we'd love to hear it.

So, come along if you are interested and in the area.


Chad


Chad Nelson
Web Services Programmer
University Library
Georgia State University

e: cnelso...@gsu.edu
t: 404 413 2771
My Calendarhttp://bit.ly/qybPLJ

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2176 / Virus Database: 2425/5000 - Release Date:
05/15/12


Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Arash.Joorabchi
Hi mark,

Srw.dd=* does not work either:

Identifier: info:srw/diagnostic/1/27
Meaning:
Details:srw.dd
Message:The index [srw.dd] did not include a searchable value

I suppose the only option left is to retrieve everything and filter the results 
on the client side.

Thanks for your quick reply.
Arash 


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike 
Taylor
Sent: 16 May 2012 10:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a 
DDC no from the result set

There is no standard way in CQL to express field X is not empty.
Depending on implementations, NOT srw.dd= might work (but evidently
doesn't in this case).  Another possibility is srw.dd=*, but again
that may or may not work, and might be appallingly inefficient if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.

-- Mike.


On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following form:


                String host =
 http://worldcat.org/webservices/catalog/search/;;
            String query = sru?query=srw.kw=\ + keyword + \
                                +  AND srw.ln exact \eng\
                                +  AND srw.mt all \bks\
                                +  AND srw.nt=\ + keyword + \
                                + servicelevel=full
                                + maximumRecords=100
                              + sortKeys=relevance,,0
                                + wskey=[wskey];

 And it is working fine, however I'd like to limit the results to those
 records that have a DDC number assigned to them, but I don't know what's
 the right way to specify this limit in the query.

  NOT srw.dd=
  NOT srw.dd=null

 Neither of above work


 Thanks,
 Arash



Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Arash.Joorabchi
Hi Andy,

 

I am a SRU newbie myself, so I don't know how this could be achieved
using scan operations and could not find much info on SRU website
(http://www.loc.gov/standards/sru/).

As for the wildcards, according to this guide:
http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
rchworldcatquickreference.pdf the symbols should be preceded by at least
3 characters, and therefore clauses like: 

 

... AND srw.dd=*

... AND srw.dd=?.*

... AND srw/dd=###.*

... AND srw/dd=?3.*

 

 

do not work and result in the following error:

Diagnostics

Identifier:

info:srw/diagnostic/1/9

Meaning:

 

Details:

 

Message:

Not enough chars in truncated term:Truncated words too short(9)

 

 

Thanks,

Arash

 



From: Houghton,Andrew [mailto:hough...@oclc.org] 
Sent: 16 May 2012 11:58
To: Arash.Joorabchi
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
without a DDC no from the result set

 

I'm not an SRU guru, but is it possible to do a scan and look for a
postings of zero?

 

Andy.

On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
wrote:

Hi mark,

Srw.dd=* does not work either:

Identifier: info:srw/diagnostic/1/27
Meaning:   
Details:srw.dd
Message:The index [srw.dd] did not include a searchable
value

I suppose the only option left is to retrieve everything and
filter the results on the client side.

Thanks for your quick reply.
Arash


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
Behalf Of Mike Taylor
Sent: 16 May 2012 10:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records without a DDC no from the result set

There is no standard way in CQL to express field X is not
empty.
Depending on implementations, NOT srw.dd= might work (but
evidently
doesn't in this case).  Another possibility is srw.dd=*, but
again
that may or may not work, and might be appallingly inefficient
if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.

-- Mike.


On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following
form:


String host =
 http://worldcat.org/webservices/catalog/search/;;
String query = sru?query=srw.kw=\ + keyword +
\
+  AND srw.ln exact \eng\
+  AND srw.mt all \bks\
+  AND srw.nt=\ + keyword +
\
+ servicelevel=full
+ maximumRecords=100
  + sortKeys=relevance,,0
+ wskey=[wskey];

 And it is working fine, however I'd like to limit the results
to those
 records that have a DDC number assigned to them, but I don't
know what's
 the right way to specify this limit in the query.

  NOT srw.dd=
  NOT srw.dd=null

 Neither of above work


 Thanks,
 Arash




No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
05/15/12


Re: [CODE4LIB] linked data endpoints [wikipedia-miner]

2011-05-17 Thread Arash.Joorabchi
It also has a built-in ML-based disambiguator reportedly achieving a
high F1-measure of 97.1 [1]  

[1]
http://www.cs.waikato.ac.nz/~dnk2/publications/CIKM08-LearningToLinkWith
Wikipedia.pdf


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Eric Lease Morgan
Sent: 17 May 2011 16:25
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] linked data endpoints [wikipedia-miner]

On May 16, 2011, at 9:13 AM, Arash.Joorabchi wrote:

 If you think wikipedia articles could be used as good endpoints for
your
 purposes then have a look at this opensource tool
 
   http://wikipedia-miner.sourceforge.net/

Wikipedia-miner is a pretty cool tool; it is a good example of various
text mining techniques. It even supports a Web services interface. Thank
you for bringing it to our attention.

-- 
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] linked data endpoints

2011-05-16 Thread Arash.Joorabchi
Hi Eric,

If you think wikipedia articles could be used as good endpoints for your
purposes then have a look at this opensource tool
http://wikipedia-miner.sourceforge.net/

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Eric Lease Morgan
Sent: 16 May 2011 13:34
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] linked data endpoints

What are some of the ways to best insert Linked Data endpoints into an
XML file?

I have been playing lately with named-entity recognition/extraction
technology. [1] Feed a text file, such as a novel, into the recognition
program. Get back a rudimentary XML file where things like names,
places, and organizations are marked with simple tags. I can then
extract all the place names from a text, tabulate them, display a
word-cloud, allow the reader to select items, guess latitude and
longitude of the place, and finally plot them on a map. [2] This process
works pretty well, but Google Maps only allows me to plot a limited
number of items at a time. Consequently, I am thinking about
preprocessing my data by looping through the XML file and adding
latitude and longitude attributes to the place name elements.

I then got to thinking about names and organizations. It would be nice
to supplement these entities with canonical Linked Data endpoints. My
application could then read the endpoints, extract the links associated
with them, and display some sort of graphic illustrating relationships.
Finally, I could allow the reader to select a relationship for further
investigation.

Given a name -- say, Plato or Thoreau -- how would one go about
identifying good endpoints? What sort of query would I send to what sort
of database? What might I get back? Assuming my goal is to enrich the
text, what sort of link(s) should I insert into my XML?

[1] NER - http://bit.ly/e0SnA6
[2] geo-location for WebKit mobile - http://bit.ly/msIu16

-- 
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)

2010-05-03 Thread Arash.Joorabchi
The stats reported in this paper might help:

http://homes.ukoln.ac.uk/~kg249/publ/RenardusFinal.pdf

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Bill Dueber
Sent: 03 May 2010 19:09
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] A call for your OPAC (or other system) statistics!
(Browse interfaces)

I got email from a person today saying, and I quote,

 I must say that [the lack of a browse interface] come as a shock
(*which
interface cannot browse??*)

[Emphasis mine]

Here, a browse interface is one where you can get a giant list of all
the
titles/authors/subjects whatever -- a view on the data devoid of any
searching.

Will those of you out there with browse interfaces in your system take
a
couple minutes to send along a guesstimate of what percentage of patron
sessions involve their use?

[Note that for right now, I'm excluding type-ahead search boxes
although
there's an obvious and, in my mind, strong argument to be made that
they're
substantially similar for many types of data]

We don't have a browse interface on our (VuFind) OPAC right now. But in
the
interest of paying it forward, I can tell you that in Mirlyn, our OPAC,
has
numbers like this:

Pct of Mirlyn sessions, Feb/March/April 2010, which included at least
one
basic
search and also:

  Go to full record view  46% (we put a lot of info in search
results)
  Select/favorite an item   15%
  Add a facet:13%
  Export record(s)
   to email/refworks/RIS/etc. 3.4%
  Send to phone (sms) 0.21%
  Click on faq/help/AskUs
 in footer0.17%  (324 total)

Based on 187,784 sessions, 2010.02.01 to 2010.04.31

So...anyone out there able to tell me anything about browse interfaces?

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library