Maybe try a different schema yeah (hard to help without knowing exactly how you end up overwriting the same triples all the time tho).
Setting timestamps yourself is usually bad yes. J-D On Tue, Oct 4, 2011 at 7:14 AM, Christopher Dorner <[email protected]> wrote: > Why do you advise against setting timestamps by oneself? Is it generally not > a good practice? > > If i do not want to insert anymore data later, then it shouldn't be a > problem. Of course i probably will have trouble if i want to insert > something later (e.g. from another file, then the byte offset could be > exactly the same and again overwrite my data). I didn't think about that > yet. > > The thing is, that i do not want to loose data while inserting and i need to > insert all of them. Maybe i could consider some different schema. > > I will try it with a reduce step, but i am pretty sure i will again have > some loss of data. > > Thank you, > > Christopher > > > Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans: >> >> I would advise against setting the timestamps yourself and instead >> reduce in order to prune the versions you don't need to insert in >> HBase. >> >> J-D >> >> On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner >> <[email protected]> wrote: >>> >>> Hi again, >>> >>> i think i solved my issue. >>> >>> I simply use the byte offset of the row currently read by the Mapper as >>> the >>> timestamp for the Put. This is unique for my input file, which contains >>> one >>> triple for each row. So the timestamps are unique. >>> >>> Regards, >>> Christopher >>> >>> >>> Am 01.10.2011 13:19, schrieb Christopher Dorner: >>>> >>>> Hallo, >>>> >>>> I am reading a File containing RDF triples in a Map-job. the RDF triples >>>> then are stored in a table, where columns can have lots of versions. >>>> So i need to store many values for one rowKey in the same column. >>>> >>>> I made the observation, that reading the file is very fast and thus some >>>> values are put into the table with the same timestamp and therefore >>>> overriding an existing value. >>>> >>>> How can i avoid that? The timestamps are not necessary for later usage. >>>> >>>> Could i simply use some sort of custom counter? >>>> >>>> How would that work in fully distributed mode? I am working on >>>> pseudo-distributed-mode for testing purpose right now. >>>> >>>> Thank You and Regards, >>>> Christopher >>> >>> > >
