Re: question about writing to columns with lots of versions in map task

Christopher Dorner Tue, 04 Oct 2011 07:15:24 -0700

Why do you advise against setting timestamps by oneself? Is it generallynot a good practice?

If i do not want to insert anymore data later, then it shouldn't be aproblem. Of course i probably will have trouble if i want to insertsomething later (e.g. from another file, then the byte offset could beexactly the same and again overwrite my data). I didn't think about thatyet.

The thing is, that i do not want to loose data while inserting and ineed to insert all of them. Maybe i could consider some different schema.

I will try it with a reduce step, but i am pretty sure i will again havesome loss of data.


Thank you,

Christopher


Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans:

I would advise against setting the timestamps yourself and instead
reduce in order to prune the versions you don't need to insert in
HBase.

J-D

On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner
<[email protected]>  wrote:

Hi again,

i think i solved my issue.

I simply use the byte offset of the row currently read by the Mapper as the
timestamp for the Put. This is unique for my input file, which contains one
triple for each row. So the timestamps are unique.

Regards,
Christopher


Am 01.10.2011 13:19, schrieb Christopher Dorner:


Hallo,

I am reading a File containing RDF triples in a Map-job. the RDF triples
then are stored in a table, where columns can have lots of versions.
So i need to store many values for one rowKey in the same column.

I made the observation, that reading the file is very fast and thus some
values are put into the table with the same timestamp and therefore
overriding an existing value.

How can i avoid that? The timestamps are not necessary for later usage.

Could i simply use some sort of custom counter?

How would that work in fully distributed mode? I am working on
pseudo-distributed-mode for testing purpose right now.

Thank You and Regards,
Christopher

Re: question about writing to columns with lots of versions in map task

Reply via email to