Re: [CODE4LIB] Running a repository on Debian Stable

2010-04-12 Thread Mike Taylor
Thanks to all who responded to this.  I went with EPrints, using the
Debian/Ubuntu package pointed out by Thomas and others, and it seems
to be working OK.


On 8 April 2010 16:25, Thomas Krichel kric...@openlib.org wrote:
  Mike Taylor writes

 I was surprised to find that there seems to be no package for DSpace,
 EPrints,

 http://wiki.eprints.org/w/Installing_EPrints_3_via_apt_%28Debian/Ubuntu%29

 Fedora,

  The problem there, as I understand it is that Fedora expects
  everything to be in one directory. This setup in inimical to the
  Debian setup.

 Most of all, I want something that I can install from the standard
 operating system packages, using apt-get.

  I suggest you use aptitude instead. It has superior dependency
  resolution.


  Cheers,

  Thomas Krichel                    http://openlib.org/home/krichel
                                http://authorclaim.org/profile/pkr1
                                               skype: thomaskrichel




[CODE4LIB] code4lib.hu workshop

2010-04-12 Thread Király Péter

Dear code4lib-ers,

during last week (wendesday afternoon) we held the first
code4lib.hu workshop in Debrecen, at the University Library.
The purpose of the meeting was that the library developers,
and library information system's power users meet and talk
each other, on order, that in the future different systems
could communicate over standard protocols, which is the
base condition of any mashupable, shareable service.

Preliminary only 9 person said that they will be there for
sure, but finally 28 developers participated, from libraries
and developer companies. The result was not a workshop for
hardcore coders, but an interesting and (more important)
productive talking. Since participants were not tied to any
concrete project, we could discuss a somehow 'ideal' state-of-art:
how to get there, what development and library policy steps
would be involved. The discussion focused on the uniform
library authentication (one entry oint for all Hungarian library)
and the inter-library loan. Some important statements:

- the services should be based on standards, either international,
or if we couldn't find a proper one, we could form a doemstic
(Hungarian) standard

- the authentication system provided by the National
Infrastructure Agency does not fit for all libraries, since
even the university libraries have users, who are not university
citizens, so they lack university identifiers

- bilateral agreement between libraries is a must have for the
unified authentication, that A library accepts the authentication
system of B library, and it will provide services for the users
of B library

- the current statistical measurements are outdated, and could
not reflect such a shared services, but since the statistics are
the most important measuring tool for the owner of libraries,
the libraries tend to not develop shared services, because they
could loose some of their resources (they spend on things,
which do not reflect in the statistics...)

- the inter-library loans could be initialized by the users, and
such way, it releases some burden from the librarians. The librarians
could controll the whole process, but not as the only player.

The meeting was not aimed to agree on anything, so we do not created
any document or manifestation, but there were some ideas about the
continuation. Since then, one of the participants bought the code4lib.hu
domain, and offered it for free to community usage. We restarted
an older listserv (at http://groups.google.com/group/ikr-fejlesztok),
and we decided, that we will continue the meeting in the near
future with lighting talks and discussions on library standards
(like NCIP, inter library loans etc.), and personally I hope,
that we could do mashaton-like meeting.

Final note: somebody said on the code4lib IRC, that we
will miss bbq. Well, we didn't have bbq, but as I promissed
we had slambuc, a traditional shepherds' dish near Debrecen.

Thank you for your support!

Király Péter
http://eXtensibleCatalog.org 


[CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Thomas Dowling
So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Kozlowski,Brendon
I personally would vote for: This guy's on the bleeding edge.  Personally, I'd 
hold off, but it could
work.  However, I attended a webinar on MongoDB and apparently the 
representative stated that SourceForge has moved to a NoSQL platform using 
MongoDB and tested their load with 100x growth and visits of what they are 
already seeing and had zero issues with scalability.  That's pretty impressive.
 
Oh, it also managed to be more efficient than a traditional RDBMS.
 
 
 
Brendon Kozlowski
Web Administrator
Saratoga Springs Public Library
49 Henry Street
Saratoga Springs, NY, 12866
[518] 584-7860 x217



From: Code for Libraries on behalf of Thomas Dowling
Sent: Mon 4/12/2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?



So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu



To report this message as spam, offensive, or if you feel you have received 
this in error,
please send e-mail to ab...@sals.edu including the entire contents and subject 
of the message.
It will be reviewed by staff and acted upon appropriately.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Robert Sanderson
Depends on the sort of features required, in particular the access
patterns, and the hardware it's going to run on.

In my experience, NoSQL systems (for example apache's Cassandra) have
extremely good distribution properties over multiple machines, much
better than SQL databases.  Essentially, it's easier to store a bunch
of key/values in a distributed fashion, as you don't need to do joins
across tables (there aren't any) and eventually consistent systems
(such as Cassandra) don't even need to always be internally consistent
between nodes.

If many concurrent write accesses are required, then NoSQL can also be
a good choice, for the same reasons as it's easily distributed.
And for the same reasons, it can be much faster than SQL systems with
the same data given a data model that fits the access patterns.

The flip side is that if later you want to do something that just
requires the equivalent of table joins, it has to be done at the
application level.  This is going to be MUCH MUCH slower and harder
than if there was SQL underneath.


Rob


On Mon, Apr 12, 2010 at 7:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

 That's a sensible, forward-looking approach.  Lots of sites are putting
 lots of data into these databases and they'll only get better.

 This guy's on the bleeding edge.  Personally, I'd hold off, but it could
 work.

 Schedule that 2012 re-migration to Oracle or Postgres now.

 Bwahahahah!!!

 Or something else?



 (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


 --
 Thomas Dowling
 tdowl...@ohiolink.edu



[CODE4LIB] Job Posting: Associate Vice President for Library and Information Services at Wheaton College in Norton, MA

2010-04-12 Thread Rosalyn Metz
Please excuse cross-postings.

--
Associate Vice President for Library and Information Services at
Wheaton College in Norton, MA

Located between Boston and Providence, Wheaton College is a four-year,
private liberal arts college with 1,550 students. The College invites
applications and nominations for the Associate Vice President for
Library and Information Services. This person provides leadership for
Library and Information Services in developing innovative strategies
and cultivating strong partnerships in the delivery and use of
academic information and technologies to support the mission and
priorities of the college.

The Wheaton Curriculum offers more than 600 courses in 40 majors and
50 minors. Interdisciplinarity, which lies at the heart of our
curriculum, is implemented through connected courses. Our
student-faculty ratio of 10-1 and average class size of 15-20 students
help foster the close collaborative relationships that develop between
our undergraduates and faculty.

With a nationally recognized record of achievement in using technology
and information resources to enhance teaching and learning, Wheaton
College considers a unified vision of library and information
technology critical to fulfilling its liberal arts mission. In 2004,
the College merged the Library, Academic Computing, and Information
and Technology Services to create Library and Information Services
(LIS), which encompasses the functions of research and instruction,
collections and public access, technology support and infrastructure.
The Associate Vice President for Library and Information Services will
lead a team of five individuals who oversee these areas, to continue
the development of current programs, provide support for the research
and teaching activities of faculty and students, and raise funds by
seeking further grant support. In addition, the successful candidate
will chair the Administrative Technology Committee and work with the
faculty's Educational Policy Committee and the Library, Technology,
and Learning Committee, to develop new initiatives that fulfill
curricular goals, including integrating information fluency, new
media, and digital scholarship in the educational experience of
Wheaton College students.

The successful candidate will create a shared vision through leading
collaborative, team-based processes; manage external relationships;
and work strategically with college leaders for both library and
college-wide interests. This person will

-implement technology and strategic plans to deliver comprehensive,
integrated library and information services for the college
- manage resources, facilities, and services that respond to the needs
of students, faculty, and staff
-oversee personnel and resource administration, budget planning and
allocation, and overall project management
-practice outreach and communication with students, faculty members
and administrative staff
-set and maintain standards of service and quality
-and establish instruments for benchmarking and continuous assessment.

The successful candidate for this position will report directly to the
Provost (Chief Academic Officer), work closely with the Vice President
for Finance and Operations, and regularly consult with the President's
Council of senior advisors. Where appropriate, the position will carry
faculty status.

Minimum Qualifications:Wheaton College seeks a collaborative and
visionary leader with extensive experience in one or more areas of
information technology and service in an academic setting. The
successful candidate has demonstrated the ability to foster teamwork
and work effectively with faculty members and staff at all levels. The
new Associate Vice President for Library and Information Services
should possess a graduate degree in a relevant field, such as
librarianship, information science, computer science, or related
field, or have equivalent experience or certification.


[CODE4LIB] Job Posting: Senior Programmer Analyst - Office of Digital Assets and Infrastructure, Yale University

2010-04-12 Thread Michael Appleby

Senior Programmer Analyst
Office of Digital Assets and Infrastructure, Yale University
New Haven, CT
( http://tinyurl.com/yyn7dgz ) 

ODAI is charged with developing a digital information management 
strategy for Yale and building digital collections and technical 
infrastructure in a coordinated and collaborative manner across the 
entire campus. Programs include the development and deployment of 
large-scale digital asset management systems, 
long-term preservation repositories for Yale digital content in all 
formats, cross-collection search capabilities to enable discovery of 
collections hosted by numerous departments and many other innovative 
initiatives.


The Senior Programmer Analyst will lead the planning, development, 
implementation, maintenance, and support of software applications that 
stand alone, extend functionality of existing systems, bridge systems 
through interoperability, and provide end-user functionality to 
the academic community. The software development includes but is not 
limited to digital asset management systems, digital library 
systems, knowledge management systems, media processing systems, storage 
systems, and related ancillary products and services.




--

Michael Appleby
Senior Software Developer
Office of Digital Assets and Infrastructure
Yale University

e michael.appl...@yale.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer
The advantage of the NoSQL DBs is that they're schema-less which
allows much more flexibility in your data going in.

However, it sounds like your schema may be pretty standardized -- I'm
not sure of a huge advantage (outside the aforementioned replication
functionality) you'd get.

-Ross.

On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

 That's a sensible, forward-looking approach.  Lots of sites are putting
 lots of data into these databases and they'll only get better.

 This guy's on the bleeding edge.  Personally, I'd hold off, but it could
 work.

 Schedule that 2012 re-migration to Oracle or Postgres now.

 Bwahahahah!!!

 Or something else?



 (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


 --
 Thomas Dowling
 tdowl...@ohiolink.edu



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Peter Schlumpf
I'd opt for the first response.  I hope NoSQL is not flash in the pan.  It 
makes eminent sense to me.  SQL is just one way of looking at data.  A level of 
abstraction.  What authority says that SQL is the only or the best way of 
looking at a dataset?  Or the MARC record format for that matter?  They 
certainly weren't inscribed on stone tablets.   These things can become mind 
prisons.  I think it's refreshing that there are those willing to look at 
databases beyond SQL.

Peter Schlumpf
www.avantilibrarysystems.com


-Original Message-
From: Thomas Dowling tdowl...@ohiolink.edu
Sent: Apr 12, 2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
I'd actually vote for the sensible, forward-looking approach. The BBC 
(for one) is already using CouchDB in a production: 
http://damienkatz.net/2010/03/bbc_and_couchdb.html


That said, NoSQL as a movement is as wide and varied as the RDBMS 
world, and there are pros and cons to each. I'm personally a proponent 
of CouchDB because it's RESTful API, JSON storage system, and JavaScript 
(or Erlang, PHP, Python, Ruby, etc) map/reduce view engine. If your 
project need replication at all (whether for scaling, data sharing, 
etc), I'd take a good hard look at CouchDB as that's it's core 
distinction among the other NoSQL databases.


Hope that helps,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 10:55 AM, Thomas Dowling wrote:

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL  is a good jumping-in point.)


   


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
SQL-style JOINs can be done in CouchDB (can't speak for the other NoSQL 
DB's).


In CouchDB, it's called view collation:
http://chrischandler.name/couchdb/view-collation-for-join-like-behavior-in-couchdb/

It's a different way of thinking (as there are no tables, and map/reduce 
goes through every document to generate it's output), but it is possible 
to get interestingly combined data out of the whole database.


Later,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 11:08 AM, Robert Sanderson wrote:

Depends on the sort of features required, in particular the access
patterns, and the hardware it's going to run on.

In my experience, NoSQL systems (for example apache's Cassandra) have
extremely good distribution properties over multiple machines, much
better than SQL databases.  Essentially, it's easier to store a bunch
of key/values in a distributed fashion, as you don't need to do joins
across tables (there aren't any) and eventually consistent systems
(such as Cassandra) don't even need to always be internally consistent
between nodes.

If many concurrent write accesses are required, then NoSQL can also be
a good choice, for the same reasons as it's easily distributed.
And for the same reasons, it can be much faster than SQL systems with
the same data given a data model that fits the access patterns.

The flip side is that if later you want to do something that just
requires the equivalent of table joins, it has to be done at the
application level.  This is going to be MUCH MUCH slower and harder
than if there was SQL underneath.


Rob


On Mon, Apr 12, 2010 at 7:55 AM, Thomas Dowlingtdowl...@ohiolink.edu  wrote:
   

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL  is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu

 


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind
The thing is, the NoSQL stuff is pretty much just a key-value store.  
There's generally no way to query the store, instead you can simply 
look up a document by ID.


If this meets the needs of your application, all you need is a key-value 
store, and not any kind of query, then it's definitely going to be a lot 
less overhead than an actual SQL rdbms, and simpler to manage, with 
advantages for scalability and replication etc.  The reason it's simpler 
and more performant, is well, because it's _simpler_, you don't actually 
have querrying or joining abilities.


But if you are actually going to need querrying on values other than 
ID...   SQL rdbms is a pretty standardized, well understood way to do 
this.  There are certainly other ways -- you could combine a noSQL 
key-value store with Solr/Lucene, for instance.  Which in some cases may 
get you even better performance and more flexiblity than an rdbms 
solution.  But it's (IMO) going to be a bit harder to set up and manage 
and use in your favorite development environment, precisely because 
rdbms is such a time-tested standardized mature approach. 

So, as usual, the right tool for the job. If all you really need is a 
key-value store on ID, then a NoSQL solution may be the right thing.  
But if you need actual querrying and joining, then personally I'd stick 
with rdbms unless I had some concrete reason to think a more complicated 
nosql+solr solution was required.  Certainly if you are planning on 
using Solr _anyway_ because your application is a search engine of some 
type, that would lessen the incremental 'cost' of a nosql+solr solution.


[ Note that if all you want is a schemaless storage, you CAN just 
stick large chunks of binary or text in an rdbms 'blob' or 'text' 
column.  You won't be able to efficiently search on these -- but you 
aren't able to efficiently search in a 'nosql' solution either.  So you 
_can_ use an rdbms like a nosql solution to store arbitrary data, no 
problem.  If you're using an rdbms, you can have _other_ columns in 
addition to your blob/text one, that you can populate for select and 
join.  If you _aren't_ going to need those -- then there's be no reason 
to do it in an rdbms (even though you could), you would indeed then just 
want to use a 'nosql' key-value store solution which will be higher 
performance.  So the conclusion again I think is that rdbms is _more 
powerful_ than nosql, but that power comes with a performance cost.  If 
you don't need it, nosql.  If you do need it -- there's no reason you 
can't store structureless units of data in text/blob in an rdbms too. ]


Peter Schlumpf wrote:

I'd opt for the first response.  I hope NoSQL is not flash in the pan.  It 
makes eminent sense to me.  SQL is just one way of looking at data.  A level of 
abstraction.  What authority says that SQL is the only or the best way of 
looking at a dataset?  Or the MARC record format for that matter?  They 
certainly weren't inscribed on stone tablets.   These things can become mind 
prisons.  I think it's refreshing that there are those willing to look at 
databases beyond SQL.

Peter Schlumpf
www.avantilibrarysystems.com


-Original Message-
  

From: Thomas Dowling tdowl...@ohiolink.edu
Sent: Apr 12, 2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle

On Mon, 12 Apr 2010, Jonathan Rochkind wrote:

So, as usual, the right tool for the job. If all you really need is a 
key-value store on ID, then a NoSQL solution may be the right thing.  But 
if you need actual querrying and joining, then personally I'd stick with 
rdbms unless I had some concrete reason to think a more complicated 
nosql+solr solution was required.  Certainly if you are planning on using 
Solr _anyway_ because your application is a search engine of some type, that 
would lessen the incremental 'cost' of a nosql+solr solution.


I'm surprised that I keep hearing so much about NoSQL for key-value 
stores, and everyone seems to forget the *old* key-value stores, such as 
directory services (X.500 and LDAP, although that's actually the protocol 
used to query them, not the storage implementation).


Yes, there are things that LDAP doesn't do so well (relationships being 
one of them), but it supports querying, you can adjust the matching by 
attribute (ie, this one's matched as a number, this one's matched as a 
string, this one's a case insensitive string ... I think some 
implementations have functionality to run the search term through a 
functions for things like soundex, so it might be possible add hooks for 
stemming and query expansion, etc.)



I think that NoSQL got a lot of press because of Google having used it 
(and their having a *VERY* large data system -- but not everyone has that 
large of a system; also, Google did it 10+ years ago -- you can now 
through a lot more CPU and RAM at an RDBMS, so the point at which the 
database becomes a problem isn't the same as it was when Google first came 
out.)


...

So, I think that there are cases where NoSQL is the right solution for the 
job, and I think there are times when an DRBMS is the right solution ... 
there are also plenty of times for flat file databases, XML, LDAP, and a 
slew of other storage standards.


-Joe


hmm ... now I'm going to have to try to bring back my attempt to put my 
catalogs into a directory service ... I have a feeling I'm going to run 
into issues with unit conversions when searching.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.

Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jay Luker
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.


Schemaless != no way to query.

Key-value stores, like memcache,  are just one end of what most consider the
nosql spectrum. For instance, I can query my CouchDB instances through the
different views I create.

I thought this blog post had an interesting take on NoSQL, although this
guy, Mike Stonebreaker of VoltDB, obviously has a horse in the race.
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext

--jay


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind

Yeah, I may have gotten it completely wrong.

Okay, help this grasshopper (possibly by pointing me to relevant 
documentation), what's the difference between document-based and 
key-value store?  When I've looked at CouchDB before, despite it 
describing itself as document based, I haven't been able to tell what 
the difference is between it and a key value store.  It seemed to 
support storing a document by key, and retrieving it by key.  It 
didn't seem to _do_ anything special with the document other than 
storing it there (maybe it DOES, but I missed it?).  So you can call it 
a document instead of a value, but I couldn't figure out how that 
differed from a key-value store.


I guess it's that CouchDB _does_ let you build indexes on values other 
than the key?  Wacky, wonder how I missed that when I reviewed it last.


Jonathan

Ross Singer wrote:

On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
  

The thing is, the NoSQL stuff is pretty much just a key-value store.
 There's generally no way to query the store, instead you can simply look
up a document by ID.



Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.

  


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ryan Eby
On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:


There's really two reactions in here. One about NoSQL and the other
about your colleague.

As for NoSQL i would be on the side that the ecosystem is here to stay
although individual projects may or may not take off/evolve. The best
description I've seen about nosql as a whole is choice[1]. Not
having to shove everything in a similar style database for every
project and making the database fit the data/use. Theres a large
number of projects now, each with their own priorities and the
trade-offs they've made to reach them. Some care about consistency,
others eventual consistency is good enough and others go as far as
distributed transactions over nodes. Some do lazy writes to disk,
others not. How you query your data also varies quite a bit with
sql-like, map/reduce, hadoop, etc.

From your brief description it sounds like quite a few projects could
fit the bill, including rdbms-types, and which one you want would
probably depend on what you think you might do in the future. If you
foresee yourself having lots of fields that might only cover certain
subsets of the dataset then couchdb or the like are probably worth
looking at.

As for the colleague, I guess the question is why? If it is because of
trendiness then Bwahahahah!!! might be the best answer. But I'm
guessing they've thought about the data and what benefits they would
get out of the backend.

[1] http://blog.couch.io/post/511008668/nosql-is-about


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle

On Mon, 12 Apr 2010, Ryan Eby wrote:

[trimmed]


But I'm
guessing they've thought about the data and what benefits they would
get out of the backend.



Wow.  You obviously don't work with the same folks that I do.

I've been attached to one project for about 16 months now, while the rest 
of the team's been together for 4 years ... I've been trying to get a few 
changes made to better support my user community (basically, all of the 
people who don't have access to their system, or don't want to spend the 6 
months using the system 'to be able to do something almost useful'.


About 2-3 months ago, the main project team finally realized that they 
have *no*idea* what the user community wants or needs.


Oh, and they have to go live on April 21st.  I'm expecting a major 'wtf?' 
reaction from the majority of the community.


-Joe


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
From my understanding of key/value stores, one can put documents on the 
other side of the key, but any and all parsing/processing of that value 
happens outside of the database. In CouchDB, the entire document is 
query-able from within map/reduce views. After being querying on, those 
keys are indexed for faster future queries. So, in that way, CouchDB 
jumps over the key/value limitations and becomes a document database.


In addition to map/reduce output, there's also a handy _update system 
that can be used to validate a JSON document prior to it's insertion in 
the database--again, something not possible with key/value storage.


You can, though, use CouchDB in a key/value fashion by storing binary 
data (or HTML, XML, RDF, etc) as attachments or JSON encoded strings 
(where possible). In that case, you would just be retrieving them by id 
(or URL), but you could store all kinds of ad hoc metadata about those 
attachments and use those to query with later.


Also, the blog article Ryan Eby just posted, is a great (and quick) 
overview of the varied noSQL ecosystem. In many ways, these systems are 
as different as they are similar.


Hope you (re)search goes well,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 2:42 PM, Jonathan Rochkind wrote:

Yeah, I may have gotten it completely wrong.

Okay, help this grasshopper (possibly by pointing me to relevant 
documentation), what's the difference between document-based and 
key-value store?  When I've looked at CouchDB before, despite it 
describing itself as document based, I haven't been able to tell 
what the difference is between it and a key value store.  It seemed 
to support storing a document by key, and retrieving it by key.  It 
didn't seem to _do_ anything special with the document other than 
storing it there (maybe it DOES, but I missed it?).  So you can call 
it a document instead of a value, but I couldn't figure out how 
that differed from a key-value store.


I guess it's that CouchDB _does_ let you build indexes on values other 
than the key?  Wacky, wonder how I missed that when I reviewed it last.


Jonathan

Ross Singer wrote:
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind 
rochk...@jhu.edu wrote:

The thing is, the NoSQL stuff is pretty much just a key-value store.
 There's generally no way to query the store, instead you can 
simply look

up a document by ID.


Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Sam Kome
Michael Stonebraker *is* the horse, and yet has pointed pointed out that RDBMSs 
aren't always the hammer you're looking for.  Next time you use a B-tree or 
R-tree (spatial search, anyone?), give him a toast with your favorite beverage.

http://cacm.acm.org/blogs/blog-cacm/32212-the-end-of-a-dbms-era-might-be-upon-us/fulltext

http://en.wikipedia.org/wiki/Michael_Stonebraker


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jay 
Luker
Sent: Monday, April 12, 2010 10:38 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.


Schemaless != no way to query.

Key-value stores, like memcache,  are just one end of what most consider the
nosql spectrum. For instance, I can query my CouchDB instances through the
different views I create.

I thought this blog post had an interesting take on NoSQL, although this
guy, Mike Stonebreaker of VoltDB, obviously has a horse in the race.
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext

--jay


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Thomas Dowling
On 04/12/2010 03:26 PM, Ryan Eby wrote:

 
 As for the colleague, I guess the question is why?...

He's hoping it'll impress the babes.  :-)

Seriously (and not to draw the conversation to a close), thanks to all for
their insights.


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Chad Fennell
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

Noo!!! NoSQL is terrible for startup projects ;)
http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/

But seriously, it depends.  You know, a lotta ins, lotta outs, lotta
what-have-yous.  I sort of like MongoDB's characterization of the
landscape as tradeoffs between scale  performance on the one hand and
depth of  functionality on the other:
http://www.mongodb.org/display/DOCS/Philosophy I suspect we'll
continue to see more hybrid systems for some time to come with various
data stores handling the pieces they do best.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young

On 4/12/10 4:47 PM, Ryan Eby wrote:

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.
   
Once there's a good marc2json script (and format) out there, it'd be 
grand to see marc records dumped into CouchDB to allow them to be 
replicated between groups of librarians (and even up to OpenLibrary). 
I'm still up for helping make that possible if anyone's into that. :)


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Andrew Hankinson
Couldn't you do MARC - MARCXML - JSON?

-Andrew

On 2010-04-12, at 5:00 PM, Benjamin Young wrote:

 On 4/12/10 4:47 PM, Ryan Eby wrote:
 You could put your logs, marc records broken out by fields or
 arrays/hashes (types in couchdb) in any of them but the approach each
 takes would limit you (or empower you) differently.
   
 Once there's a good marc2json script (and format) out there, it'd be grand to 
 see marc records dumped into CouchDB to allow them to be replicated between 
 groups of librarians (and even up to OpenLibrary). I'm still up for helping 
 make that possible if anyone's into that. :)


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind
There are at least TWO good marc2json formats, and several open source 
scripts at least for Bill Dueber's, no?


Benjamin Young wrote:

On 4/12/10 4:47 PM, Ryan Eby wrote:
  

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.
   

Once there's a good marc2json script (and format) out there, it'd be 
grand to see marc records dumped into CouchDB to allow them to be 
replicated between groups of librarians (and even up to OpenLibrary). 
I'm still up for helping make that possible if anyone's into that. :)


  


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young

On 4/12/10 5:04 PM, Andrew Hankinson wrote:

Couldn't you do MARC -  MARCXML -  JSON?

-Andrew
   
Certainly, but the hard part is knowing what you want MARC to look like 
once it's in JSON. XML 2 JSON conversions generally need some love to 
make the data meaningful on the JSON side (as attributes and such make a 
1-to-1 conversion complicated--though there have been attempts at 
general conversion scripts).


Once a JSON output format for MARC is done, then converting from MARCXML 
to marc.json (or whatever) would be an easy first step.

On 2010-04-12, at 5:00 PM, Benjamin Young wrote:

   

On 4/12/10 4:47 PM, Ryan Eby wrote:
 

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.

   

Once there's a good marc2json script (and format) out there, it'd be grand to see marc 
records dumped into CouchDB to allow them to be replicated between groups of librarians 
(and even up to OpenLibrary). I'm still up for helping make that possible if anyone's 
into that. :)