Hi, Jason, It might be that this is a little bit to big for sqLITE.
Maybe a "big iron" database like PostgreSQL or the Greenplum Database will fit your requirements better. Best regards Markus Schaber CODESYS® a trademark of 3S-Smart Software Solutions GmbH Inspiring Automation Solutions 3S-Smart Software Solutions GmbH Dipl.-Inf. Markus Schaber | Product Development Core Technology Memminger Str. 151 | 87439 Kempten | Germany Tel. +49-831-54031-979 | Fax +49-831-54031-50 E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com CODESYS forum: http://forum.codesys.com Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 > -----Ursprüngliche Nachricht----- > Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] Im Auftrag von Jason H > Gesendet: Montag, 16. September 2013 23:04 > An: sqlite-users@sqlite.org > Betreff: [sqlite] SQLite clusters? > > I'm transitioning my job from embedded space to Hadoop space. I was > wondering if it is possible to come up with a SQLite cluster > adaptation. > > I will give you a crash course in hadoop. Basically we get a very > large CSV, which is chopped up into 64MB chunks, and distributed to a > number of nodes. The file is actually replicated 2 times for a total > of 3 copies of all chunks on the cluster (no chunk is repeatedly > stored on the same node). Then MapReduce logic is run, and the > results are combined. Instrumental to this is the keys are returned > in sorted order. > > All of this is done in java (70% slower than C, on average, and with > some non-trivial start-up cost). Everyone is clamoring for SQL to be > run on the nodes. Hive attempts to leverage SQL, and is successful to > some degree. But being able to use Full SQL would be a huge > improvement. Akin to Hadoop is HBase > > HBase is similar with Hadoop, but it approaches things in a more > conventional columnar format It a copy of "BigTable" form google.. > Here, the notion of "column families" is important because column > families are files. A row is made up of keys, at leas one column > family. There is an implied join between the key, and each column > family. As the table is viewed though, it is void as a join between > the key and all column families. What denotes a column family (cf) is > not specified, however the idea is to group columns into cfs by usage. > That is cf1 is your most commonly needed data, and cfN is the least > often needed. > > HBase is queried by a specialized API. This API is written to work > over very large datasets, working directly with the data. However not > all uses of HBase need this. The majority of queries are distributed > just because they are over a huge dataset, with a modest amount of > rows returned. Distribution allows for much more paralleled disk > reading. For this case, a SQLite cluster makes perfect sense. > > Mapping all of this to SQLite, I could see a bit of work could go a > long way. Column families can be implemented as separate files, which > are ATTACHed and joined as needed. The most complicated operation is > a join, where we have to coordinate the list of distinct values of > the join to all other notes, for join matching. We then have to move > all of that data to the same node for the join. > > The non-data input is a traditional SQL statement, but we will have > to parse and restructure the statement to join for the needed column > families. Also needed is a way to ship a row to another server for > processing. > > I'm just putting this out there as me thinking out loud. I wonder how > it would turn out. Comments? > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users