[CODE4LIB] AW: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-13 Thread Claußnitzer , Ralf
 Noo!!! NoSQL is terrible for startup projects ;)
 http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/

Yes, this one is great :)

But i think there are some real issues for companies in using databases. The 
matured RDBMS technology is backed up by mathematical theories. This does not 
hold for NoSQL systems as far as i know. Maybe there is no need here, hence 
NoSQL DBs dont want to support ACID style transactions and schema at all.

Having a database schema is crucial for integration of applications, and that 
is what relational DBs have actually been built for. Their main purpose is not 
in driving multi-server web-applications dealing with forum-users.

http://www.mountainman.com.au/software/history/it2.html 

Having a data-store setup quickly without the need to think about actual 
data-structures seems a perfect match to agile, feature-driven application 
development. Because changing data-structures can be handled in a snap and 
domain model objects map so easy to documents.

RDBMS forces you to have some detailed analysis of your application domain 
before actual implementing your data-model. Complex relational schemas, once 
rolled-out, are likely to resist change. But there are approaches on this:

http://www.informit.com/store/product.aspx?isbn=032150206X 

Regards!

-Ralf


[CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Thomas Dowling
So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Kozlowski,Brendon
I personally would vote for: This guy's on the bleeding edge.  Personally, I'd 
hold off, but it could
work.  However, I attended a webinar on MongoDB and apparently the 
representative stated that SourceForge has moved to a NoSQL platform using 
MongoDB and tested their load with 100x growth and visits of what they are 
already seeing and had zero issues with scalability.  That's pretty impressive.
 
Oh, it also managed to be more efficient than a traditional RDBMS.
 
 
 
Brendon Kozlowski
Web Administrator
Saratoga Springs Public Library
49 Henry Street
Saratoga Springs, NY, 12866
[518] 584-7860 x217



From: Code for Libraries on behalf of Thomas Dowling
Sent: Mon 4/12/2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?



So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu



To report this message as spam, offensive, or if you feel you have received 
this in error,
please send e-mail to ab...@sals.edu including the entire contents and subject 
of the message.
It will be reviewed by staff and acted upon appropriately.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Robert Sanderson
Depends on the sort of features required, in particular the access
patterns, and the hardware it's going to run on.

In my experience, NoSQL systems (for example apache's Cassandra) have
extremely good distribution properties over multiple machines, much
better than SQL databases.  Essentially, it's easier to store a bunch
of key/values in a distributed fashion, as you don't need to do joins
across tables (there aren't any) and eventually consistent systems
(such as Cassandra) don't even need to always be internally consistent
between nodes.

If many concurrent write accesses are required, then NoSQL can also be
a good choice, for the same reasons as it's easily distributed.
And for the same reasons, it can be much faster than SQL systems with
the same data given a data model that fits the access patterns.

The flip side is that if later you want to do something that just
requires the equivalent of table joins, it has to be done at the
application level.  This is going to be MUCH MUCH slower and harder
than if there was SQL underneath.


Rob


On Mon, Apr 12, 2010 at 7:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

 That's a sensible, forward-looking approach.  Lots of sites are putting
 lots of data into these databases and they'll only get better.

 This guy's on the bleeding edge.  Personally, I'd hold off, but it could
 work.

 Schedule that 2012 re-migration to Oracle or Postgres now.

 Bwahahahah!!!

 Or something else?



 (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


 --
 Thomas Dowling
 tdowl...@ohiolink.edu



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer
The advantage of the NoSQL DBs is that they're schema-less which
allows much more flexibility in your data going in.

However, it sounds like your schema may be pretty standardized -- I'm
not sure of a huge advantage (outside the aforementioned replication
functionality) you'd get.

-Ross.

On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

 That's a sensible, forward-looking approach.  Lots of sites are putting
 lots of data into these databases and they'll only get better.

 This guy's on the bleeding edge.  Personally, I'd hold off, but it could
 work.

 Schedule that 2012 re-migration to Oracle or Postgres now.

 Bwahahahah!!!

 Or something else?



 (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


 --
 Thomas Dowling
 tdowl...@ohiolink.edu



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Peter Schlumpf
I'd opt for the first response.  I hope NoSQL is not flash in the pan.  It 
makes eminent sense to me.  SQL is just one way of looking at data.  A level of 
abstraction.  What authority says that SQL is the only or the best way of 
looking at a dataset?  Or the MARC record format for that matter?  They 
certainly weren't inscribed on stone tablets.   These things can become mind 
prisons.  I think it's refreshing that there are those willing to look at 
databases beyond SQL.

Peter Schlumpf
www.avantilibrarysystems.com


-Original Message-
From: Thomas Dowling tdowl...@ohiolink.edu
Sent: Apr 12, 2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
I'd actually vote for the sensible, forward-looking approach. The BBC 
(for one) is already using CouchDB in a production: 
http://damienkatz.net/2010/03/bbc_and_couchdb.html


That said, NoSQL as a movement is as wide and varied as the RDBMS 
world, and there are pros and cons to each. I'm personally a proponent 
of CouchDB because it's RESTful API, JSON storage system, and JavaScript 
(or Erlang, PHP, Python, Ruby, etc) map/reduce view engine. If your 
project need replication at all (whether for scaling, data sharing, 
etc), I'd take a good hard look at CouchDB as that's it's core 
distinction among the other NoSQL databases.


Hope that helps,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 10:55 AM, Thomas Dowling wrote:

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL  is a good jumping-in point.)


   


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
SQL-style JOINs can be done in CouchDB (can't speak for the other NoSQL 
DB's).


In CouchDB, it's called view collation:
http://chrischandler.name/couchdb/view-collation-for-join-like-behavior-in-couchdb/

It's a different way of thinking (as there are no tables, and map/reduce 
goes through every document to generate it's output), but it is possible 
to get interestingly combined data out of the whole database.


Later,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 11:08 AM, Robert Sanderson wrote:

Depends on the sort of features required, in particular the access
patterns, and the hardware it's going to run on.

In my experience, NoSQL systems (for example apache's Cassandra) have
extremely good distribution properties over multiple machines, much
better than SQL databases.  Essentially, it's easier to store a bunch
of key/values in a distributed fashion, as you don't need to do joins
across tables (there aren't any) and eventually consistent systems
(such as Cassandra) don't even need to always be internally consistent
between nodes.

If many concurrent write accesses are required, then NoSQL can also be
a good choice, for the same reasons as it's easily distributed.
And for the same reasons, it can be much faster than SQL systems with
the same data given a data model that fits the access patterns.

The flip side is that if later you want to do something that just
requires the equivalent of table joins, it has to be done at the
application level.  This is going to be MUCH MUCH slower and harder
than if there was SQL underneath.


Rob


On Mon, Apr 12, 2010 at 7:55 AM, Thomas Dowlingtdowl...@ohiolink.edu  wrote:
   

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL  is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu

 


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind
The thing is, the NoSQL stuff is pretty much just a key-value store.  
There's generally no way to query the store, instead you can simply 
look up a document by ID.


If this meets the needs of your application, all you need is a key-value 
store, and not any kind of query, then it's definitely going to be a lot 
less overhead than an actual SQL rdbms, and simpler to manage, with 
advantages for scalability and replication etc.  The reason it's simpler 
and more performant, is well, because it's _simpler_, you don't actually 
have querrying or joining abilities.


But if you are actually going to need querrying on values other than 
ID...   SQL rdbms is a pretty standardized, well understood way to do 
this.  There are certainly other ways -- you could combine a noSQL 
key-value store with Solr/Lucene, for instance.  Which in some cases may 
get you even better performance and more flexiblity than an rdbms 
solution.  But it's (IMO) going to be a bit harder to set up and manage 
and use in your favorite development environment, precisely because 
rdbms is such a time-tested standardized mature approach. 

So, as usual, the right tool for the job. If all you really need is a 
key-value store on ID, then a NoSQL solution may be the right thing.  
But if you need actual querrying and joining, then personally I'd stick 
with rdbms unless I had some concrete reason to think a more complicated 
nosql+solr solution was required.  Certainly if you are planning on 
using Solr _anyway_ because your application is a search engine of some 
type, that would lessen the incremental 'cost' of a nosql+solr solution.


[ Note that if all you want is a schemaless storage, you CAN just 
stick large chunks of binary or text in an rdbms 'blob' or 'text' 
column.  You won't be able to efficiently search on these -- but you 
aren't able to efficiently search in a 'nosql' solution either.  So you 
_can_ use an rdbms like a nosql solution to store arbitrary data, no 
problem.  If you're using an rdbms, you can have _other_ columns in 
addition to your blob/text one, that you can populate for select and 
join.  If you _aren't_ going to need those -- then there's be no reason 
to do it in an rdbms (even though you could), you would indeed then just 
want to use a 'nosql' key-value store solution which will be higher 
performance.  So the conclusion again I think is that rdbms is _more 
powerful_ than nosql, but that power comes with a performance cost.  If 
you don't need it, nosql.  If you do need it -- there's no reason you 
can't store structureless units of data in text/blob in an rdbms too. ]


Peter Schlumpf wrote:

I'd opt for the first response.  I hope NoSQL is not flash in the pan.  It 
makes eminent sense to me.  SQL is just one way of looking at data.  A level of 
abstraction.  What authority says that SQL is the only or the best way of 
looking at a dataset?  Or the MARC record format for that matter?  They 
certainly weren't inscribed on stone tablets.   These things can become mind 
prisons.  I think it's refreshing that there are those willing to look at 
databases beyond SQL.

Peter Schlumpf
www.avantilibrarysystems.com


-Original Message-
  

From: Thomas Dowling tdowl...@ohiolink.edu
Sent: Apr 12, 2010 10:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of documents, where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself).  Would your reaction be:

That's a sensible, forward-looking approach.  Lots of sites are putting
lots of data into these databases and they'll only get better.

This guy's on the bleeding edge.  Personally, I'd hold off, but it could
work.

Schedule that 2012 re-migration to Oracle or Postgres now.

Bwahahahah!!!

Or something else?



(http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


--
Thomas Dowling
tdowl...@ohiolink.edu



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle

On Mon, 12 Apr 2010, Jonathan Rochkind wrote:

So, as usual, the right tool for the job. If all you really need is a 
key-value store on ID, then a NoSQL solution may be the right thing.  But 
if you need actual querrying and joining, then personally I'd stick with 
rdbms unless I had some concrete reason to think a more complicated 
nosql+solr solution was required.  Certainly if you are planning on using 
Solr _anyway_ because your application is a search engine of some type, that 
would lessen the incremental 'cost' of a nosql+solr solution.


I'm surprised that I keep hearing so much about NoSQL for key-value 
stores, and everyone seems to forget the *old* key-value stores, such as 
directory services (X.500 and LDAP, although that's actually the protocol 
used to query them, not the storage implementation).


Yes, there are things that LDAP doesn't do so well (relationships being 
one of them), but it supports querying, you can adjust the matching by 
attribute (ie, this one's matched as a number, this one's matched as a 
string, this one's a case insensitive string ... I think some 
implementations have functionality to run the search term through a 
functions for things like soundex, so it might be possible add hooks for 
stemming and query expansion, etc.)



I think that NoSQL got a lot of press because of Google having used it 
(and their having a *VERY* large data system -- but not everyone has that 
large of a system; also, Google did it 10+ years ago -- you can now 
through a lot more CPU and RAM at an RDBMS, so the point at which the 
database becomes a problem isn't the same as it was when Google first came 
out.)


...

So, I think that there are cases where NoSQL is the right solution for the 
job, and I think there are times when an DRBMS is the right solution ... 
there are also plenty of times for flat file databases, XML, LDAP, and a 
slew of other storage standards.


-Joe


hmm ... now I'm going to have to try to bring back my attempt to put my 
catalogs into a directory service ... I have a feeling I'm going to run 
into issues with unit conversions when searching.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.

Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jay Luker
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.


Schemaless != no way to query.

Key-value stores, like memcache,  are just one end of what most consider the
nosql spectrum. For instance, I can query my CouchDB instances through the
different views I create.

I thought this blog post had an interesting take on NoSQL, although this
guy, Mike Stonebreaker of VoltDB, obviously has a horse in the race.
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext

--jay


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind

Yeah, I may have gotten it completely wrong.

Okay, help this grasshopper (possibly by pointing me to relevant 
documentation), what's the difference between document-based and 
key-value store?  When I've looked at CouchDB before, despite it 
describing itself as document based, I haven't been able to tell what 
the difference is between it and a key value store.  It seemed to 
support storing a document by key, and retrieving it by key.  It 
didn't seem to _do_ anything special with the document other than 
storing it there (maybe it DOES, but I missed it?).  So you can call it 
a document instead of a value, but I couldn't figure out how that 
differed from a key-value store.


I guess it's that CouchDB _does_ let you build indexes on values other 
than the key?  Wacky, wonder how I missed that when I reviewed it last.


Jonathan

Ross Singer wrote:

On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
  

The thing is, the NoSQL stuff is pretty much just a key-value store.
 There's generally no way to query the store, instead you can simply look
up a document by ID.



Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.

  


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ryan Eby
On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:


There's really two reactions in here. One about NoSQL and the other
about your colleague.

As for NoSQL i would be on the side that the ecosystem is here to stay
although individual projects may or may not take off/evolve. The best
description I've seen about nosql as a whole is choice[1]. Not
having to shove everything in a similar style database for every
project and making the database fit the data/use. Theres a large
number of projects now, each with their own priorities and the
trade-offs they've made to reach them. Some care about consistency,
others eventual consistency is good enough and others go as far as
distributed transactions over nodes. Some do lazy writes to disk,
others not. How you query your data also varies quite a bit with
sql-like, map/reduce, hadoop, etc.

From your brief description it sounds like quite a few projects could
fit the bill, including rdbms-types, and which one you want would
probably depend on what you think you might do in the future. If you
foresee yourself having lots of fields that might only cover certain
subsets of the dataset then couchdb or the like are probably worth
looking at.

As for the colleague, I guess the question is why? If it is because of
trendiness then Bwahahahah!!! might be the best answer. But I'm
guessing they've thought about the data and what benefits they would
get out of the backend.

[1] http://blog.couch.io/post/511008668/nosql-is-about


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle

On Mon, 12 Apr 2010, Ryan Eby wrote:

[trimmed]


But I'm
guessing they've thought about the data and what benefits they would
get out of the backend.



Wow.  You obviously don't work with the same folks that I do.

I've been attached to one project for about 16 months now, while the rest 
of the team's been together for 4 years ... I've been trying to get a few 
changes made to better support my user community (basically, all of the 
people who don't have access to their system, or don't want to spend the 6 
months using the system 'to be able to do something almost useful'.


About 2-3 months ago, the main project team finally realized that they 
have *no*idea* what the user community wants or needs.


Oh, and they have to go live on April 21st.  I'm expecting a major 'wtf?' 
reaction from the majority of the community.


-Joe


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young
From my understanding of key/value stores, one can put documents on the 
other side of the key, but any and all parsing/processing of that value 
happens outside of the database. In CouchDB, the entire document is 
query-able from within map/reduce views. After being querying on, those 
keys are indexed for faster future queries. So, in that way, CouchDB 
jumps over the key/value limitations and becomes a document database.


In addition to map/reduce output, there's also a handy _update system 
that can be used to validate a JSON document prior to it's insertion in 
the database--again, something not possible with key/value storage.


You can, though, use CouchDB in a key/value fashion by storing binary 
data (or HTML, XML, RDF, etc) as attachments or JSON encoded strings 
(where possible). In that case, you would just be retrieving them by id 
(or URL), but you could store all kinds of ad hoc metadata about those 
attachments and use those to query with later.


Also, the blog article Ryan Eby just posted, is a great (and quick) 
overview of the varied noSQL ecosystem. In many ways, these systems are 
as different as they are similar.


Hope you (re)search goes well,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


On 4/12/10 2:42 PM, Jonathan Rochkind wrote:

Yeah, I may have gotten it completely wrong.

Okay, help this grasshopper (possibly by pointing me to relevant 
documentation), what's the difference between document-based and 
key-value store?  When I've looked at CouchDB before, despite it 
describing itself as document based, I haven't been able to tell 
what the difference is between it and a key value store.  It seemed 
to support storing a document by key, and retrieving it by key.  It 
didn't seem to _do_ anything special with the document other than 
storing it there (maybe it DOES, but I missed it?).  So you can call 
it a document instead of a value, but I couldn't figure out how 
that differed from a key-value store.


I guess it's that CouchDB _does_ let you build indexes on values other 
than the key?  Wacky, wonder how I missed that when I reviewed it last.


Jonathan

Ross Singer wrote:
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind 
rochk...@jhu.edu wrote:

The thing is, the NoSQL stuff is pretty much just a key-value store.
 There's generally no way to query the store, instead you can 
simply look

up a document by ID.


Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.



Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Sam Kome
Michael Stonebraker *is* the horse, and yet has pointed pointed out that RDBMSs 
aren't always the hammer you're looking for.  Next time you use a B-tree or 
R-tree (spatial search, anyone?), give him a toast with your favorite beverage.

http://cacm.acm.org/blogs/blog-cacm/32212-the-end-of-a-dbms-era-might-be-upon-us/fulltext

http://en.wikipedia.org/wiki/Michael_Stonebraker


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jay 
Luker
Sent: Monday, April 12, 2010 10:38 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.


Schemaless != no way to query.

Key-value stores, like memcache,  are just one end of what most consider the
nosql spectrum. For instance, I can query my CouchDB instances through the
different views I create.

I thought this blog post had an interesting take on NoSQL, although this
guy, Mike Stonebreaker of VoltDB, obviously has a horse in the race.
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext

--jay


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Thomas Dowling
On 04/12/2010 03:26 PM, Ryan Eby wrote:

 
 As for the colleague, I guess the question is why?...

He's hoping it'll impress the babes.  :-)

Seriously (and not to draw the conversation to a close), thanks to all for
their insights.


-- 
Thomas Dowling
tdowl...@ohiolink.edu


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Chad Fennell
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

Noo!!! NoSQL is terrible for startup projects ;)
http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/

But seriously, it depends.  You know, a lotta ins, lotta outs, lotta
what-have-yous.  I sort of like MongoDB's characterization of the
landscape as tradeoffs between scale  performance on the one hand and
depth of  functionality on the other:
http://www.mongodb.org/display/DOCS/Philosophy I suspect we'll
continue to see more hybrid systems for some time to come with various
data stores handling the pieces they do best.


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young

On 4/12/10 4:47 PM, Ryan Eby wrote:

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.
   
Once there's a good marc2json script (and format) out there, it'd be 
grand to see marc records dumped into CouchDB to allow them to be 
replicated between groups of librarians (and even up to OpenLibrary). 
I'm still up for helping make that possible if anyone's into that. :)


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Andrew Hankinson
Couldn't you do MARC - MARCXML - JSON?

-Andrew

On 2010-04-12, at 5:00 PM, Benjamin Young wrote:

 On 4/12/10 4:47 PM, Ryan Eby wrote:
 You could put your logs, marc records broken out by fields or
 arrays/hashes (types in couchdb) in any of them but the approach each
 takes would limit you (or empower you) differently.
   
 Once there's a good marc2json script (and format) out there, it'd be grand to 
 see marc records dumped into CouchDB to allow them to be replicated between 
 groups of librarians (and even up to OpenLibrary). I'm still up for helping 
 make that possible if anyone's into that. :)


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Jonathan Rochkind
There are at least TWO good marc2json formats, and several open source 
scripts at least for Bill Dueber's, no?


Benjamin Young wrote:

On 4/12/10 4:47 PM, Ryan Eby wrote:
  

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.
   

Once there's a good marc2json script (and format) out there, it'd be 
grand to see marc records dumped into CouchDB to allow them to be 
replicated between groups of librarians (and even up to OpenLibrary). 
I'm still up for helping make that possible if anyone's into that. :)


  


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Benjamin Young

On 4/12/10 5:04 PM, Andrew Hankinson wrote:

Couldn't you do MARC -  MARCXML -  JSON?

-Andrew
   
Certainly, but the hard part is knowing what you want MARC to look like 
once it's in JSON. XML 2 JSON conversions generally need some love to 
make the data meaningful on the JSON side (as attributes and such make a 
1-to-1 conversion complicated--though there have been attempts at 
general conversion scripts).


Once a JSON output format for MARC is done, then converting from MARCXML 
to marc.json (or whatever) would be an easy first step.

On 2010-04-12, at 5:00 PM, Benjamin Young wrote:

   

On 4/12/10 4:47 PM, Ryan Eby wrote:
 

You could put your logs, marc records broken out by fields or
arrays/hashes (types in couchdb) in any of them but the approach each
takes would limit you (or empower you) differently.

   

Once there's a good marc2json script (and format) out there, it'd be grand to see marc 
records dumped into CouchDB to allow them to be replicated between groups of librarians 
(and even up to OpenLibrary). I'm still up for helping make that possible if anyone's 
into that. :)