Hi, Jason,

It might be that this is a little bit to big for sqLITE.

Maybe a "big iron" database like PostgreSQL or the Greenplum Database will fit 
your requirements better.

Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

> -----Ursprüngliche Nachricht-----
> Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] Im Auftrag von Jason H
> Gesendet: Montag, 16. September 2013 23:04
> An: sqlite-users@sqlite.org
> Betreff: [sqlite] SQLite clusters?
> 
> I'm transitioning my job from embedded space to Hadoop space. I was
> wondering if it is possible to come up with a SQLite cluster
> adaptation.
> 
> I will give you a crash course in hadoop. Basically we get a very
> large CSV, which is chopped up into 64MB chunks, and distributed to a
> number of nodes. The file is actually replicated 2 times for a total
> of 3 copies of all chunks on the cluster (no chunk is repeatedly
> stored on the same node). Then MapReduce logic is run, and the
> results are combined. Instrumental to this is the keys are returned
> in sorted order.
> 
> All of this is done in java (70% slower than C, on average, and with
> some non-trivial start-up cost). Everyone is clamoring for SQL to be
> run on the nodes. Hive attempts to leverage SQL, and is successful to
> some degree. But being able to use Full SQL would be a huge
> improvement. Akin to Hadoop is HBase
> 
> HBase is similar with Hadoop, but it approaches things in a more
> conventional columnar format It a copy of "BigTable" form google..
> Here, the notion of "column families" is important because column
> families are files. A row is made up of keys, at leas one column
> family. There is an implied join between the key, and each column
> family. As the table is viewed though, it is void as a join between
> the key and all column families. What denotes a column family (cf) is
> not specified, however the idea is to group columns into cfs by usage.
> That is cf1 is your most commonly needed data, and cfN is the least
> often needed.
> 
> HBase is queried by a specialized API. This API is written to work
> over very large datasets, working directly with the data. However not
> all uses of HBase need this. The majority of queries are distributed
> just because they are over a huge dataset, with a modest amount of
> rows returned. Distribution allows for much more paralleled disk
> reading.  For this case, a SQLite cluster makes perfect sense.
> 
> Mapping all of this to SQLite, I could see a bit of work could go a
> long way. Column families can be implemented as separate files, which
> are ATTACHed and joined as needed. The most complicated operation is
> a join, where we have to coordinate the list of distinct values of
> the join to all other notes, for join matching. We then have to move
> all of that data to the same node for the join.
> 
> The non-data input is a traditional SQL statement, but we will have
> to parse and restructure the statement to join for the needed column
> families. Also needed is a way to ship a row to another server for
> processing.
> 
> I'm just putting this out there as me thinking out loud. I wonder how
> it would turn out. Comments?
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to