Yeah, creating hfiles manually with Long.MAX_VALUE Delete markers for those
cells would be my next suggestion. It would be nice to confirm how those
Cells could get through with Long.MAX_VALUE timestamp, it would be
surprising if it was WAL replay, I would expect it would reuse the
timestamps checks from the client write path.

Em qua., 13 de mai. de 2020 às 06:33, Bharath Vissapragada <
bhara...@apache.org> escreveu:

> Interesting behavior, I just tried it out on my local setup (master/HEAD)
> out of curiosity to check if we can trick HBase into deleting this bad row
> and the following worked for me. I don't know how you ended up with that
> row though (bad bulk load? just guessing).
>
> To have a table with the Long.MAX timestamp, I commented out some pieces of
> HBase code so that it doesn't override the timestamp with the current
> millis on the region server (otherwise, I just see the expected behavior of
> current ms).
>
> *Step1: Create a table and generate the problematic row*
>
> hbase(main):002:0> create 't1', 'f'
> Created table t1
>
> -- patch hbase to accept Long.MAX_VALUE ts ---
>
> hbase(main):005:0> put 't1', 'row1', 'f:a', 'val', 9223372036854775807
> Took 0.0054 seconds
>
> -- make sure the put with the ts is present --
> hbase(main):006:0> scan 't1'
> ROW                                  COLUMN+CELL
>
>  row1                                column=f:a, timestamp=
> *9223372036854775807*, value=val
>
> 1 row(s)
> Took 0.0226 seconds
>
> *Step 2: Hand craft an HFile with the delete marker*
>
>  ...with this row/col/max ts [Let me know if you want the code, I can put
> it somewhere. I just used the StoreFileWriter utility ]
>
> -- dump the contents of hfile using the utility ---
>
> $ bin/hbase hfile -f file:///tmp/hfiles/f/bf84f424544f4675880494e09b750ce8
> -p
> ......
> Scanned kv count -> 1
> K: row1/f:a/LATEST_TIMESTAMP/Delete/vlen=0/seqid=0 V:  <==== Delete marker
>
> *Step 3: Bulk load this HFile with the delete marker *
>
> bin/hbase completebulkload file:///tmp/hfiles t1
>
> *Step 4: Make sure the delete marker is inserted correctly.*
>
> hbase(main):001:0> scan 't1'
> ......
>
> 0 row(s)
> Took 0.1387 seconds
>
> -- Raw scan to make sure the delete marker is inserted and nothing funky is
> happening ---
>
> hbase(main):003:0> scan 't1', {RAW=>true}
> ROW                                          COLUMN+CELL
>
>
>  row1                                        column=f:a,
> timestamp=9223372036854775807, type=Delete
>
>  row1                                        column=f:a,
> timestamp=9223372036854775807, value=val
>
> 1 row(s)
> Took 0.0044 seconds
>
> Thoughts?
>
> On Tue, May 12, 2020 at 2:00 PM Alexander Batyrshin <0x62...@gmail.com>
> wrote:
>
> > Table is ~ 10TB SNAPPY data. I don’t have such a big time window on
> > production for re-inserting all data.
> >
> > I don’t know how we got those cells. I can only assume that this is
> > phoenix and/or replaying from WAL after region server crash.
> >
> > > On 12 May 2020, at 18:25, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> > >
> > > How large is this table? Can you afford re-insert all current data on a
> > > new, temp table? If so, you could write a mapreduce job that scans this
> > > table and rewrite all its cells to this new, temp table. I had verified
> > > that 1.4.10 does have the timestamp replacing logic here:
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> > <
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> > >
> > >
> > > So if you re-insert all this table cells into a new one, the timestamps
> > > would be inserted correctly and you would then be able to delete those.
> > > Now, how those cells managed to get inserted with max timestamp? Was
> this
> > > cluster running on an old version that then got upgraded to 1.4.10?
> > >
> > >
> > > Em ter., 12 de mai. de 2020 às 13:49, Alexander Batyrshin <
> > 0x62...@gmail.com <mailto:0x62...@gmail.com>>
> > > escreveu:
> > >
> > >> Any ideas how to delete these rows?
> > >>
> > >> I see only this way:
> > >> - backup data from region that contains “damaged” rows
> > >> - close region
> > >> - remove region files from HDFS
> > >> - assign region
> > >> - copy needed rows from backup to recreated region
> > >>
> > >>> On 30 Apr 2020, at 21:00, Alexander Batyrshin <0x62...@gmail.com>
> > wrote:
> > >>>
> > >>> The same effect for CF:
> > >>>
> > >>> d =
> > >>
> >
> org.apache.hadoop.hbase.client.Delete.new("\x0439d58wj434dd".to_s.to_java_bytes)
> > >>> d.deleteFamily("d".to_s.to_java_bytes,
> > >> 9223372036854775807.to_java(Java::long))
> > >>> table.delete(d)
> > >>>
> > >>> ROW
> > COLUMN+CELL
> > >>> \x0439d58wj434dd
> column=d:,
> > >> timestamp=1588269277879, type=DeleteFamily
> > >>>
> > >>>
> > >>>> On 29 Apr 2020, at 18:30, Wellington Chevreuil <
> > >> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com
> >
> > <mailto:wellington.chevre...@gmail.com <mailto:
> > wellington.chevre...@gmail.com>>>
> > >> wrote:
> > >>>>
> > >>>> Well, it's weird that puts with such TS values were allowed,
> according
> > >> to
> > >>>> current code state. Can you afford delete the whole CF for those
> rows?
> > >>>>
> > >>>> Em qua., 29 de abr. de 2020 às 14:41, junhyeok park <
> > >> runnerren...@gmail.com <mailto:runnerren...@gmail.com> <mailto:
> > runnerren...@gmail.com <mailto:runnerren...@gmail.com>>>
> > >>>> escreveu:
> > >>>>
> > >>>>> I've been through the same thing. I use 2.2.0
> > >>>>>
> > >>>>> 2020년 4월 29일 (수) 오후 10:32, Alexander Batyrshin <0x62...@gmail.com
> > <mailto:0x62...@gmail.com>
> > >> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>>님이 작성:
> > >>>>>
> > >>>>>> As you can see in example I already tried DELETE operation with
> > >> timestamp
> > >>>>>> = Long.MAX_VALUE without any success.
> > >>>>>>
> > >>>>>>> On 29 Apr 2020, at 12:41, Wellington Chevreuil <
> > >>>>>> wellington.chevre...@gmail.com <mailto:
> > wellington.chevre...@gmail.com> <mailto:wellington.chevre...@gmail.com
> > <mailto:wellington.chevre...@gmail.com>>>
> > >> wrote:
> > >>>>>>>
> > >>>>>>> That's expected behaviour [1]. If you are "travelling to the
> > future",
> > >>>>> you
> > >>>>>>> need to do a delete specifying Long.MAX_VALUE timestamp as the
> > >>>>> timestamp
> > >>>>>>> optional parameter in the delete operation [2], if you don't
> > specify
> > >>>>>>> timestamp on the delete, it will assume current time for the
> delete
> > >>>>>> marker,
> > >>>>>>> which will be smaller than the Long.MAX_VALUE set to your cells,
> so
> > >>>>> scans
> > >>>>>>> wouldn't filter it.
> > >>>>>>>
> > >>>>>>> [1] https://hbase.apache.org/book.html#version.delete <
> > https://hbase.apache.org/book.html#version.delete> <
> > >> https://hbase.apache.org/book.html#version.delete <
> > https://hbase.apache.org/book.html#version.delete>>
> > >>>>>>> [2]
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> > <
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> > >
> > >> <
> > >>
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> > >>>
> > >>>>>>>
> > >>>>>>> Em qua., 29 de abr. de 2020 às 08:57, Alexander Batyrshin <
> > >>>>>> 0x62...@gmail.com>
> > >>>>>>> escreveu:
> > >>>>>>>
> > >>>>>>>> Hello all,
> > >>>>>>>> We had faced with strange situation: table has rows with
> > >>>>> Long.MAX_VALUE
> > >>>>>>>> timestamp.
> > >>>>>>>> These rows impossible to delete, because DELETE mutation uses
> > >>>>>>>> System.currentTimeMillis() timestamp.
> > >>>>>>>> Is there any way to delete these rows?
> > >>>>>>>> We use HBase-1.4.10
> > >>>>>>>>
> > >>>>>>>> Example:
> > >>>>>>>>
> > >>>>>>>> hbase(main):037:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > >>>>>> "\x0439d58wj434dd",
> > >>>>>>>> RAW=>true, VERSIONS=>10}
> > >>>>>>>> ROW
> > >>>>> COLUMN+CELL
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=9223372036854775807, value=x
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> hbase(main):045:0* delete 'TRACET', "\x0439d58wj434dd", "d:_0"
> > >>>>>>>> 0 row(s) in 0.0120 seconds
> > >>>>>>>>
> > >>>>>>>> hbase(main):046:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > >>>>>> "\x0439d58wj434dd",
> > >>>>>>>> RAW=>true, VERSIONS=>10}
> > >>>>>>>> ROW
> > >>>>> COLUMN+CELL
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=9223372036854775807, value=x
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=1588146570005, type=Delete
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> hbase(main):047:0> delete 'TRACET', "\x0439d58wj434dd", "d:_0",
> > >>>>>>>> 9223372036854775807
> > >>>>>>>> 0 row(s) in 0.0110 seconds
> > >>>>>>>>
> > >>>>>>>> hbase(main):048:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > >>>>>> "\x0439d58wj434dd",
> > >>>>>>>> RAW=>true, VERSIONS=>10}
> > >>>>>>>> ROW
> > >>>>> COLUMN+CELL
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=9223372036854775807, value=x
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=1588146678086, type=Delete
> > >>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> > >>>>>>>> timestamp=1588146570005, type=Delete
> >
> >
>

Reply via email to