Re: [CODE4LIB] text mining software

2013-08-27 Thread Alan Darnell
Do any of these work in Hadoop using MapReduce as a programming model? It seems 
like Hadoop would be a natural use case for text mining and analysis.  

Alan

On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote:

 This is still command-line, but Mallet is heavily used in the DH
 community: http://mallet.cs.umass.edu/. I think MONK
 (http://monkproject.org/) has a UI, but I'm not overly familiar with its
 features.
 
 Jenn
 
 
 Jenn Riley
 Head, Carolina Digital Library and Archives
 The University of North Carolina at Chapel Hill
 http://cdla.unc.edu/
 http://www.lib.unc.edu/users/jlriley
 
 jennri...@unc.edu
 (919) 843-5910
 
 
 
 
 
 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote:
 
 What sorts of text mining software do y'all support / use in your
 libraries?
 
 We here in the Hesburgh Libraries at the University of Notre Dame have
 all but opened a place called the Center For Digital Scholarship. We are
 / will be providing a number of different services to a number of
 different audiences. These services include but are not necessarily
 limited exactly to:
 
 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus
 
 I am expected to support the text mining investigations. I have
 traditionally used open source tools do to my work. Many of these tools
 require some sort of programming in order to exploit. To some degree I am
 expected mount text mining software on our local Windows and Macintosh
 computers here in our Center. I am familiar with the lists of tools
 available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good
 too, but a bit long in the tooth. [2]
 
 Do you know of other sets of tools to choose from? Are you familiar with
 SASĀ® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]
 
 [0] Bamboo Dirt - http://dirt.projectbamboo.org
 [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
 [2] TAPoRware - http://taporware.ualberta.ca
 [3] Text Analytics - http://www.sas.com/text-analytics/
 [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
 [5] RapidMiner - http://rapid-i.com/content/view/181/190/
 
 --
 Eric Lease Morgan, Digital Initiatives Librarian
 Hesburgh Libraries
 University of Notre Dame
 
 574/631-8604


Re: [CODE4LIB] term co-occurrence analysis?

2012-10-01 Thread Alan Darnell
Beth,

MarkLogic has community and academic editions that do a really good job at 
co-occurence analysis.  We use it for all our XML projects.

http://www.marklogic.com/products-and-services/marklogic-6/

Alan

Alan Darnell
Scholars Portal



On Oct 1, 2012, at 7:55 PM, Bess Sadler 
bess.sad...@gmail.commailto:bess.sad...@gmail.com wrote:

For a full-text search system we're prototyping, we are being asked to provide 
term co-occurrence analysis. I'm not very familiar with this concept, so maybe 
someone on the list can describe it better, but I believe that what is wanted 
is to be able to query a text corpus for a given word, and to receive in return 
a list of words that co-occur with the search term, along with some indication 
of how often those words co-occur. Something like this IBM Many Eyes demo: 
http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/clint-eastwood-applause-lines-at-r
 (but we're not necessarily looking for a visualization, just a way to do the 
query).

Some google searching gives me lots of scholarly articles from computational 
linguistics and humanities computing, but nothing like here's a recipe for how 
to do this in solr which is what I would really love.

Has anyone done this? How did you approach it? Are there tools you can 
recommend? Articles or books I should read?

Many thanks in advance,
Bess


Re: [CODE4LIB] Serials Solutions Summon

2009-04-21 Thread Alan Darnell
It is possible for a consortium to build the same sort of service as Serials 
Solutions.  Besides the OhioLink example, we've been doing that in Ontario for 
the last 7 years or so - aggregating ejournal content (15 million articles), 
abstract and index databases (over 100 now in partnership with Proquest), 
ebooks (about 50,000 commercial ebooks and 170,000 plus digitized ebooks from 
the Open Content Alliance).  It is a significant effort to deal with all the 
data feeds but as publishers migrate their production processes to XML we're 
finding that it is getting a little easier each year.  We aggregate everything 
in a single XML database from a company called MarkLogic.  The biggest issues 
we struggle with are currency - it's never as fast as the publisher site though 
it isn't far behind when things are working well - and quality control - the 
publisher production processes are shifting to XML but the quality of the data 
varies.  But hey, it's a library, and these are age-old issues present even in 
the print world.

Alan


On 4/21/09 2:13 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

Peter Murray wrote:
 I don't think it is part of SerSol's business model to offer a feed of
 the full metadata it aggregates, but it does seem to be part of the
 business model to offer an API upon which you could put your own
 interface to the underlying aggregated data.


Yep, it's not presently, but I'm hoping that in the future they expand
to that business model as well.  I think it's feasible.

An API on which you can act on their index is great.  But actually
having the data to re-index yourself in exactly the way you wanted would
give you even more power (if you wanted to do the more work to get it).
And would still be worth paying SerSol for, for the work of aggregating
and normalizing the data.

Jonathan


Re: [CODE4LIB] Video encoding done - Mashup idea request

2007-03-16 Thread Alan Darnell

There is a software product called Wirecast that synchs the
recordings and the slides automatically.

http://www.varasoftware.com/products/wirecast/

We've used it for a number of web cast projects and it works well.

Alan


On 16-Mar-07, at 4:38 PM, Smith,Devon wrote:


2. Two camera's so the person at the podium can be composed inside

their slides / demos.

If I understand this, you're thinking that one camera would capture
the
speaker, and the other would capture the slide presentation.

If it's possible, wouldn't it be better to put the slides directly
into
the video, rather than showing video of the projected slides?
Screen capture could give the same option for demos, right?

Or is that really tough?
What I don't know about video capture/edit could just about fill the
grand canyon.
Given that, thanks to all the people who've pulled the video together.
Community Effort++

/dev



-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On
Behalf Of
Noel Peden
Sent: Friday, March 16, 2007 12:02 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Video encoding done - Mashup idea request

You are all certainly welcome.  I'm glad to be able to do the video.
Thanks are due to Dan Scott and Karen Schneider for bringing their
cameras and providing tapes.  Karen shot the first 1.5 days of video
too.  Ryan Eby has done pretty much all the work getting the files on
Google and posting / embedding on code4lib.org.

I expect to be there next year, and perhaps we can raise the bar a bit
more (but not too much.)  Here are some ideas (suggestions are
welcome):
1. Wired microphone for consistent sound 2. Two camera's so the person
at the podium can be composed inside their slides / demos.

We'll wait for the year after to try out 'bullet time' Matrix style
shots.  :)  Any other suggestions?

Regards,
Noel