Alan,
if you are looking for data mining software that runs well in Hadoop, I would
definitely recommend looking into Apache Mahout [1]. This software is
specifically focused on categorization and clustering, and these algorithms
tend to work well in the distributed architecture of a
What sorts of text mining software do y'all support / use in your libraries?
We here in the Hesburgh Libraries at the University of Notre Dame have all but
opened a place called the Center For Digital Scholarship. We are / will be
providing a number of different services to a number of
Hi, Eric, I don't have any experience in this field, but I went looking a
while ago when the topic came up, and these two links are in my notes for
further exploration, if the topic ever comes around again:
http://wordseer.berkeley.edu/
http://mininghumanities.com/
May they serve you well.
--
Subject: Re: [CODE4LIB] text mining software
Hi, Eric, I don't have any experience in this field, but I went looking a while
ago when the topic came up, and these two links are in my notes for further
exploration, if the topic ever comes around again:
http://wordseer.berkeley.edu/
http
] On Behalf Of
Pottinger, Hardy J.
Sent: Tuesday, August 27, 2013 11:51 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] text mining software
Hi, Eric, I don't have any experience in this field, but I went looking a
while ago when the topic came up, and these two links are in my notes
This is still command-line, but Mallet is heavily used in the DH
community: http://mallet.cs.umass.edu/. I think MONK
(http://monkproject.org/) has a UI, but I'm not overly familiar with its
features.
Jenn
Jenn Riley
Head, Carolina Digital Library and Archives
Do any of these work in Hadoop using MapReduce as a programming model? It seems
like Hadoop would be a natural use case for text mining and analysis.
Alan
On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote:
This is still command-line, but Mallet is heavily used in the DH
There have been some great software recommendations in this thread, that
I really don't want to quibble with. What I'd like to quibble with is
the software-first approach. We've all tried the software-first
approach, how many of us were happy with it?
There is a standard in this area and that
I worked a lot with GATE in a previous position (not in a library, but in a
research position at the Univ. of Texas at Austin). It's handy in that
there is both a UI version (GATE Developer) and a set of APIs (GATE
Embedded), which were the only versions I worked with. Also nice is the
fact that