interesting -- thanks for the link! Let me know if you have any more Kiji questions. Cheers - Aaron
On Wed, Jan 30, 2013 at 6:49 PM, Russell Jurney <[email protected]>wrote: > I'm looking at Panthera, I'll check out Kiji too. Inferring the schema > from the first record and creating a table it what is done in Voldemort's > build/push job, so I'll look into that. > > > https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores > > Russell Jurney http://datasyndrome.com > > On Jan 30, 2013, at 6:33 PM, Aaron Kimball <[email protected]> wrote: > > Hi Russell, > > Great question. Kiji is more strongly typed than systems like MongoDB. > While your schema can evolve (using Avro evolution) without structurally > updating existing data, you still need to specify your Avro schemas in a > data dictionary. It's challenging to author systems in Java (as is typical > of HBase/HDFS/MapReduce-facing applications) without some strong typing in > the persistence layer. You wind up reading a lot of other peoples' code to > figure out what types were written -- assuming you can find the code (or > the hbase columns) in the first place. > > You can create table schemas either "manually" by filling out a JSON / > Avro-based table layout specification, or you can use the DDL shell which > lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the > table's set up, then you can write to it. I think the DDL shell included > with the bento box makes this a reasonably low-overhead process. > > We don't currently have any Pig integration. We've made some initial > proof-of-concept progress on a StorageHandler that lets Hive query Kiji, > but it's not in a ready state yet. Someone (you? :) could write a Pig > integration; Pig already supports Avro I think. And you could even make it > analyze the first output tuple and use that to infer types/column names to > set up a result table with the appropriate table schema by invoking the DDL > procedurally. > > Sorry I don't have a "magic wand" answer for you -- for the use cases we > target, these sorts of setup costs often pay off in the long run, so that's > the case we've optimized the design around. Let me know if there's anything > else I can help with. > Thanks, > - Aaron > > > On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney > <[email protected]>wrote: > >> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit >> of not specifying schemas with Voldemort and MongoDB, just storing a Pig >> relation and the schema is set in the store. If I can arrange that somehow, >> I'm all over Kiji. Panthera is a fork :/ >> >> >> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <[email protected]>wrote: >> >>> Hi ccleve, >>> >>> I'd definitely urge you to try out Kiji -- we who work on it think it's >>> a pretty good fit for this specific use case. If you've got further >>> questions about Kiji and how to use it, please send them to me, or ask the >>> kiji user mailing list: http://www.kiji.org/getinvolved#Mailing_Lists >>> >>> cheers, >>> - Aaron >>> >>> >>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <[email protected]>wrote: >>> >>>> Avro and Trevni files do not support record update or delete. >>>> >>>> For large changing datasets you might use Kiji (http://www.kiji.org/) >>>> to store Avro data in HBase. >>>> >>>> Doug >>>> >>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <[email protected]> wrote: >>>> > I've gone through the documentation, but haven't been able to get a >>>> definite >>>> > answer: is Avro, or specifically Trevni, only for read-only data? >>>> > >>>> > Is it possible to update or delete records? >>>> > >>>> > If records can be deleted, is there any code that will merge row sets >>>> to get >>>> > rid of the unused space? >>>> > >>>> > >>>> > >>>> >>> >>> >> >> >> -- >> Russell Jurney twitter.com/rjurney [email protected] datasyndrome. >> com >> > >
