Re: Alternative Storage choice?

lasizoillo Sun, 24 May 2009 07:40:34 -0700

2009/5/24 Jorge Vargas <[email protected]>:
>
> Hello,
>
> I know this may be a little offtopic but I know a lot of people here,
> have experience in this sort of thing, which is why I'll like to ask.
> The very short summary of what I need for this project is.
>
> A "document" with a metadata object (Bunch) and a unique id (possible
> cross domain) per document.
>
> If you don't know what a Bunch object is. It's simply a dict with
> attribute access
>
> Then these documents could be interlinked in a tree-like data
> structure (think threaded comments in a blog post)
>
> Each one of these documents is some form of content that will
> eventually be html, so they can get big but not huge.
>
> Therefore the storage will grow in number of 'documents' rather than
> the size of each one.
>
> As for search I need only 2 forms search by metadata and full-text-search.
>
Why not use a full-text search engine. They have full-text search and
attributes (metadata).


You can use solr, whosh, xapian, ...

> From my research into this they are a ton of options.
>
I delete the options that I know nothing.

> couchdb,
I am porting a couchdb app to other storage. I am getting a lot of db
corruptions and I havn't tools to repair this. Couchdb is great for
read-only (or add-only) schema. Modifications and deletes are
painfull.


> tokyo tyrant / cabinet
>
If you want full text search you need also tokyo dystopia. There are a
lot of incomplete bindings for tokyo stuff. Most wrapers not implement
the transactions, many functions and not do a pythonic encapsulation.
I am working on this, but is my first wraper, it's more a toy than a
production project for now[1].
Tokyo cabinet TableDB performs better than Berkeley DB
joined/asociated tables, but performs worse that a pre-generated
b-tree of joined result (couch-db do this, but make a new index or
change it is very slow).

You can test also Durus[2]. Durus have a storage backend over
BerkeleyDB[3] (performed by the core bsddb maintainer). You can extend
it to provide replication, distributed transaction, ... and a lot of
stuff provided by berkeley db without change the durus frontend.

Durus is very pythonic, is documented in pylons and you can prove
diferents backends easily because Durus is simple. You can write a
pythonic trie structure for fulltexsearch index and save it to durus
easily. Berkeley DB have a lot of tools to recover a database,
performs hot-backups, and a lot of stuff [4].

> And to be honest I'll prefer to spend more time on getting the UI part
> of this project than trying out all the storage options. For now I
> just need a solid prototype.
>

Right. If your lib is pythonic you only need write python. With many
storage libraries you'll do:

my_table = customInitFunction() #diferent implementations in diferent storages
....
my_table[uuid.uuid4().int] = my_dict #Save a dict into the storage
with a globaly unique index
...
my_table.commit() #Only in transactional environments. Preferibily in
a with block.

[1] http://bitbucket.org/lasizoillo/tokyocabinet/
[2] http://www.mems-exchange.org/software/durus/
[3] http://www.jcea.es/programacion/durus-berkeleydbstorage.htm
[4] 
http://www.oracle.com/technology/documentation/berkeley-db/db/utility/index.html


Excuse my poor english. I hope help to you.

Regards,

Javi

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Alternative Storage choice?

Reply via email to