Re: [CODE4LIB] Linked data questions list

2013-09-07 Thread Matt Jones
Askbot is free and open source and has worked well for us for QA on our
servers. They also offer paid plans where they host the site.

On Sep 7, 2013 12:34 AM, Karen Coyle wrote:

 I made a crude list of the questions that I got from my what do you want
 to learn email:**html

 This, of course, just begs for a stack overflow-type QA capability. I
 don't have that on my own web site and am looking around for an appropriate
 place with that feature. (This may convince me to go the Drupal route;
 unless anyone knows a better way?)

 Meanwhile, I'm working on some answers of my own, and will update the file
 as I get those done. This is not ideal, but it is helping me understand
 what we need to do in terms of training. I'll also work to get my training
 materials more transferable.

 More questions are always welcome. This should be an ongoing process.


 Karen Coyle
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet

Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

2011-08-09 Thread Matt Jones
As some points for comparison, you might look at two exisintg and similar
systems for registering software...

First,  a software tools database that is maintained for the environmental
sciences community:

An example of one of my tool entries in this system is here:

The system is easy to use, has some nice descriptions of the software, and
is user-maintained.  Maybe some of their use cases and yours overlap?  I'm
not sure which CMS they use, but I found it easy to edit entries myself.

Second, the open source site Ohloh has some nice features for characterizing
a project, such as languages used, licenses, etc. Here's the page for the
same Kepler system in Ohloh:

Ohloh is nice because much of its information is harvested directly from
links to the open source code repositories for the project, which allows it
to show some nice trends in the software project's life.

Hope these are helpful to you in designing your system.


On Tue, Aug 9, 2011 at 10:21 AM, Jonathan Rochkind wrote:

 I agree with Brice think you might be over-thinking/over-**architecting
 it, although over-thinking is one of my sins too and I'm not always sure how
 to get out of it.

 But am I correct that you're going to be relying on user-submitted content
 in large part? Then it's important to keep it simple, so it's easy for users
 to add content without having to go through a milliion steps and understand
 a complicated data model.  If you can keep it simple in a way that is
 flexible (the 'tags' idea for instance), you also may find users using it in
 ways that you didn't anticipate, but which are still useful.

 On 8/9/2011 12:47 PM, Brice Stacey wrote:

 I'd be curious to know if this project itself would be open source.

 Second, I'm intrigued because I've never seen a UML diagram so close
 before in the wild and it's fascinating to discover the jokes are true (I
 kid, I kid...). Let's get serious and pull out your Refactoring book by
 Fowler and turn to page 336... you can Extract superclass to get
 Provider/Institution/Person to inherit from Entity. Then Merge Hierarchy
 to tear it down into a single Entity class and add a self-referencing
 association for employs. ProviderType should be renamed to Services and be
 made an association allowing 0..* services. At that point, the DB design is
 pretty straight forward and the architecture astronauts can come back down
 to earth.

 Seriously though, I think that technically, you might be over thinking
 this. If you replace Package with Blog, Release with Post, Technology with
 Tag, Provider/Institution/Person with User, Keep Comment as Comment, and
 ignore Event for now It's just a simple collection of blogs with posts
 with tags and users that have roles and can leave comments.

 Lastly, you may want to look into Drupal's project module. I think that's
 what they use to run their module directory. It seems like it would be a
 good starting point and may work out of the box.

 It's a bold project. The library needs it and it's something no single
 institution would ever pay to have done, so I'm glad to see there is a grant
 for it.

 Brice Stacey
 Digital Library Services
 University of Massachusetts Boston

 On 15 July 2011 19:42, Peter 

 Colleagues --

 As part of the Mellon Foundation grant funding the start-up of LYRASIS
 Technology Services, LTS is establishing a registry to provide in-depth
 comparative, evaluative, and version information about open source products.
  This registry will be free for viewing and editing (all libraries, not just
 LYRASIS members, and any provider offering services for open source software
 in libraries).  Drupal will be the underlying content system, and it will be
 hosted by LYRASIS.

 I'm seeking input on a data model that is intended to answer these

* What open source options exist to meet a particular need of my
* What are the strengths and weaknesses of an open source package?
* My library has developers with skills in specific technologies.
 What open source packages mesh well with the skills my library has in-house?
* Where can my library go to get training, documentation, hosting,
 and/or contract software development for a specific open source package?
* Are any peers using this open source software?
* Where is there more information about this open source software

 The E-R diagram and narrative surrounding it are on the Code4Lib wiki:**index.php/Registry_E-R_Diagram

 Comments on the data model can be made as changes to the wiki document,
 replies posted here, or e-mail sent directly to me.  In addition to comments

Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

2011-08-09 Thread Matt Jones
On Tue, Aug 9, 2011 at 3:50 PM, stuart yeates

 Ohloh is great. However it relies almost completely on metrics which are
 easily gamed by the technically competent. Use of these kinds of metrics in
 ways which encouraging gaming will only be productive in the short term,
 perhaps the very short term.

 For example: it's easy to set up dummy version control accounts and there
 can be good technical reasons for doing so. It's easy to set up a build/test
 suite to update a file in the version control after it's daily run and there
 can be good technical reasons for doing so. But doing these things can also
 transform a very-low activity single user project into a high-activity dual
 user project, in the eyes of ohloh.

 Turning on template-derived comments in the next big migration handles the
 is the code commented? metric.

 The more metrics are used, the more motivation there is to use tools (which
 admittedly have other motivations) which make a project look good.

I agree the ohloh metrics are easily gamed.  What metrics do you recommend
that can't be gamed but still provide a synopsis of the project for
evaluation, comparison, and selection? I think there is some utility even
though they can be gamed.  The metrics are not a substitute for critical
evaluation, but provide a nice synopsis as a jumping off point.  For
example, if I am interested in projects that have a demonstrable lifespan 
5 years, and that have had more than 10 developers contribute, I can find
that via these metrics.  I can then assess for myself if any of the
resulting projects are false positives (e.g., the commit log will give some
idea of the types of commits made by each person).

If you're concerned about the system being gamed via metrics, then you
should also be concerned about user-submitted project descriptions.
 Projects have a tendency to over-generalize on what their software does,
under-report defects, and generally paint a rosy picture.  Will there be
some sort of quality control/editing/verification of the claims made by
submitters? Will it matter if some of the projects are described more
generously than in reality?  Won't the system still be useful even if they


Re: [CODE4LIB] Sign up to present at the Code4Lib Virtual Lightning Talks on April 4th

2011-04-01 Thread Matt Jones
I also like this idea, and it also conflicts for my schedule, but I was
contemplating skipping my other commitment to attend this.  Hadn't decided
for sure, as I was waiting to see a few of the Virtual LT talk titles that
signed up (hah!).

Also, I am a bit of a newcomer to code4lib, having never attended the
conference, and coming from a related but somewhat different community
(environmental informatics).  I thought this might be a good way to hear a
little more about what is developing in the code4lib community.  I
considered giving a Virtual LT as well, but thought it best to hear a few
to gauge what might be of interest from my projects.


On Fri, Apr 1, 2011 at 2:48 PM, Edward M. Corrado ecorr...@ecorrado.uswrote:

 I agree with Luciano that the lead time was a bit short for me. Well,
 maybe not specifically because it was short, but it does conflicts
 with something else I have to do and I don't have time to reschedule.
 I really like this idea and I hope it can be successful, so I hope
 this message brought a rash of sign-ups and it goes on, or it is


 On Fri, Apr 1, 2011 at 6:42 PM, Luciano Ramalho
  On Fri, Apr 1, 2011 at 3:01 PM, Peter Murray
  So far no one has signed up to present on Monday and only one person has
 signed up to attend.  It sounds like the idea of virtual lightning talks
 isn't going to fly.  If you have feedback (e.g., not interesting, not enough
 lead time to prepare, wrong time of day/week/year), I'd appreciate hearing
  As a Pythonista I am a huge fan of lightning talks, a staple of PyCons
  all over the World.
  Virtual lightning talks is a novel idea to me, but sounds great.
  I'd be interested in attending and even presenting, but I think the
  lead time was too short, particularly for an activity intended for
  business hours (1:30pm EDT is 2:30pm BRT / UTC-3).
  How about trying again, but aiming at a date in late April?
  Luciano Ramalho
  programador repentista || stand-up programmer
  Twitter: @luciano

[CODE4LIB] Summer Internships with DataONE available

2010-03-16 Thread Matt Jones
The Data Observation Network for Earth (DataONE, is a
virtual organization dedicated to providing open, persistent, robust, and
secure access to biodiversity, ecology, and environmental data.  DataONE is
supported by the U.S. National Science Foundation, with headquarters at the
University of New Mexico.

DataONE invites applications for summer research internships for
undergraduates, graduate students and postgraduate (MS, PhD) students. As
part of a larger virtual organization, interns will work in virtual groups
of 2-3 with multiple mentors from DataONE, and are not expected to be at the
same location or institution as the mentors and other team members.  Regular
communication (including videoconference) with mentors and other team
members will be needed.  Face-to-face meetings at the start and end of the
session are currently planned, and any required travel will be fully paid
for by DataONE.

Projects cover a range of topic areas including software development,
library science, and sociotechnical aspects of scientific data.  They also
vary in the extent and type of technical expertise required.  The interests
and expertise of the applicants will determine which projects will be
selected for the program.  Interns will have the opportunity to receive
credit for their contribution to peer-reviewed papers and/or open source
software tools.  Interns will also be encouraged to present their results at
conferences and symposia.

ELIGIBILITY: The program is open to all undergraduate, graduate, and
post-doctoral/post-masters students who are currently in, and eligible to
work in, the United States and are at least 18 years of age by May 23, 2010.
Applicants must also be enrolled or employed at a university or other
accredited research institution; postgrads must have received their most
recent degree within the past 5 years. Given the broad range of projects,
there are no restrictions on academic backgrounds or field of study.
 Interns are expected to be available approximately 40 hours/week during the
internship period (noted below) with significant availability during normal
business hours.  Interns from previous years are eligible to participate.

SUPPORT: Interns will receive a stipend of $4,500 for participation, paid in
two installments (one at the midterm and one at the conclusion of the
program). In addition, required travel and communication expenses will be
borne by DataONE.   Participation in the program after the midterm is
contigent on prior performance.  Interns will need to supply their own
computing equipment and Internet connection.

TO APPLY:  Applicants are requested to submit responses to a short
questionnaire with their resume, and arrange for a letter of reference from
a faculty member, researcher, or instructor.  All application materials must
be submitted by midnight Pacific time on April 2, 2010.  For the list of
projects and detailed application instructions, please see

SCHEDULE: Applicants will be notified by April 19th.  The internship will
run from May 24 - July 30th, and allowance will be made for interns who are
unavailable to start on May 24th due to their academic or work schedule.

Re: [CODE4LIB] A wiki for all things digital collection

2010-03-11 Thread Matt Jones
This sounds like an interesting idea. Within the environmental sciences, an
extensive list that is community maintained is available here:

The list includes several hundred metadata standards, community
vocabularies, ontologies, and related specifications, and it changes
dynamically as new specifications and resources become available.  The site
also maintains a list of software tools (like metadata editors) and other
useful references.

Maybe you could build off of existing community resources such as this to
build a meta-index that covers a broader set of disciplines. Combining
forces with groups like MMI could be a fast way to build up the list of
resources that you propose.


On Thu, Mar 11, 2010 at 9:58 AM, Roy Tennant wrote:

 This sounds like it's completely appropriate to do this on the Library
 Success wiki:


 On 3/11/10 3/11/10 € 10:30 AM, Ingrid Schneider

  Apologies for cross-posting. Some of you may have seen this last week.
  I have an idea that I've been tossing around for a while, and I'd like
  to ask your opinion on it and gauge possible interest.
  As a new metadata librarian working on building a digital program from
  the ground up, I've spent a lot of time searching for little bits of
  information. Information on different metadata schemes, what software is
  available for what purpose, functions of the software we currently have
  that might help us on our way, exporting and importing data, etc., etc.,
  The idea I had was to start a wiki where all the myriad knowledge and
  information on metadata, digital collections, digital objects, IRs, etc.
  can be gathered. I envision gathering such information as:
  *Different metadata schemes: maybe summaries, intended uses, pros and
  cons, idiosyncrasies, etc.
  *Different software options: coverage of software being used in our
  institutions; again with pros and cons, idiosyncracies, workarounds, etc.
  *Processes: how exactly do you export from FileMaker Pro (for example)?
  Or build a tab delimited file for importing? How does the tab delimited
  file link to the digital objects?
  *Resources: Information on relevant blogs, mailing lists, associations,
  interest groups, etc.
  *Information from mailing lists: Answers to questions that come up
  frequently, topics that generate special interest among the
  participants, topics that may be of interest to people outside of the
  Of course, I'm still learning much of this, so I'd have to ask for
  community participation. If enough people seem interested in having the
  resource available and/or contributing I'll go ahead and move forward
  with it. Feel free to email me off list, and thank you for reading my
  Ingrid Schneider

Re: [CODE4LIB] Need VirtualBox Help

2009-12-01 Thread Matt Jones
In windows on your VM you need to mount your mac shared folder as a network
drive.  You can do this using the 'Tools | Map Network Drive...' menu
command in windows. Choose a drive letter (e.g., X:), then for the 'Folder'
name, use the Virtual box share syntax, which is '\\vboxsvr\sharename',
where 'sharename' is the name of the share that you set up in the Virtual
Box configuration window.  Once done, you should see your mac folder under
the drive letter you chose.  The virtual box manual explains this more


On Tue, Dec 1, 2009 at 3:57 PM, Nicole Engard wrote:

 I have virtualbox installed on my mac.  I have several virtual
 machines, the one that's causing problems is my windows xp machine.  I
 want to transfer files from my mac to my windows machine.  I tried
 sharing a folder but can't find it in windows - I tried putting a CD
 in but windows can't read it - I installed Filezilla FTP Server - but
 now I'm suck with how to set that up so that I can FTP from my Mac to
 the Windows Virtual Machine.

 Any tips from the experts?

 Thanks in advance,

Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Matt Jones
Hi Ken,

This may be obvious, but when running from the command line, stdout and
stderr are often interleaved together, but on the web server you see stdout
in the browser and stderr in the web server error log.  Your script is
probably exiting with an error either at the 'get' line (line 6) or at the
'die' line (line 7), which is what 'die' does -- terminate your script.
Have you checked your web server error log to see what the error is on your
'get' call?


On Mon, Nov 23, 2009 at 7:17 AM, Ken Irwin wrote:

 Hi all,

 I'm moving to a new web server and struggling to get it configured
 properly. The problem of the moment: having a Perl CGI script call another
 web page in the background and make decisions based on its content. On the
 old server I used an antique Perl script called hcat (from the Pelican
 book; I've also tried
 curl and LWP::Simple.

 In all three cases, I get the same behavior: it works just fine on the
 command line, but when called by the web server through a CGI script, the
 LWP (or other socket connection) gets no results. It sounds like a
 permissions thing, but I don't know what kind of permissions setting to
 tinker with. In the test script below, my command line outputs:

 Content-type: text/plain
 Getting URL:
 885 lines

 Whereas the web output just says Getting URL:; - and
 doesn't even get to the Couldn't get error message.

 Any clue how I can make use of a web page's contents from w/in a CGI
 script? (The actual application has to do with exporting data from our
 catalog, but I need to work out the basic mechanism first.)

 Here's the script I'm using.

 use LWP::Simple;
 print Content-type: text/plain\n\n;
 my $url =;;
 print Getting URL: $url\n;
 my $content = get $url;
 die Couldn't get $url unless defined $content;
 @lines = split (/\n/, $content);
 foreach (@lines) { $i++; }
 print \n\n$i lines\n\n;

 Any ideas?


Re: [CODE4LIB] lingua::stem::snowball

2009-10-12 Thread Matt Jones
Presumably the call to stem() is the expensive part of your loop, so I'd
want to cut that out if that is true. It looks to me that you can pass in an
array reference to stem(), so there's no need for calling stem() in a loop
at all.   I'd think something like the code below should help reduce your
calls to stem() to one call for the the idea and one call for the list of
words. Note I used a sorted set of keys in order to assure that I keep the
counts and the words that are stemmed in the same order when adding up the
totals.  The sort could be expensive too, so this may not work out better
for you, depending on your input data and the performance of sort() and
stem(). You could also use stem_in_place() if you don't want to make a copy
of the array.  Changing to use an array of @ideas instead of the scalar
$idea would use an analogous technique.


use strict;
use Lingua::Stem::Snowball;
my $idea  = 'books';
my %words = ( 'books'= 5,
 'library'   = 6,
 'librarianship' = 5,
 'librarians'= 3,
 'librarian' = 3,
 'book'  = 3,
 'museums'   = 2
my $stemmer   = Lingua::Stem::Snowball-new( lang = 'en' );
my $idea_stem = $stemmer-stem( $idea );
print $idea ($idea_stem)\n;
my @wordkeys = sort(keys(%words));
my @stemwords = $stemmer-stem( \...@wordkeys );
my $i = 0;
my $total = 0;
foreach my $word (@wordkeys) {
if ( $idea_stem eq $stemwords[$i] ) { $total += $words{ $word } }
print $total\n;