[jira] [Created] (HBASE-25204) Nightly job failed as the name of jdk and maven changed

2020-10-19 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-25204:
--

 Summary: Nightly job failed as  the name of jdk and maven changed
 Key: HBASE-25204
 URL: https://issues.apache.org/jira/browse/HBASE-25204
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


See 
[https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console]
[https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console]
 
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: 
WorkflowScript: 508: Tool type "maven" does not have an install of "Maven 
(latest)" configured - did you mean "maven_latest"? @ line 508, column 19. 
maven 'Maven (latest)' ^ WorkflowScript: 510: Tool type "jdk" does not have an 
install of "JDK 1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ 
line 510, column 17. jdk "JDK 1.8 (latest)"
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25166) MobFileCompactionChore is closing the master's shared cluster connection

2020-10-19 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal resolved HBASE-25166.
---
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> MobFileCompactionChore is closing the master's shared cluster connection
> 
>
> Key: HBASE-25166
> URL: https://issues.apache.org/jira/browse/HBASE-25166
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Code which is doing so in MobFileCompactionChore
> {code:java}
> try (Connection conn = master.getConnection();
> Admin admin = conn.getAdmin();) { {code}
> As master uses this connection to read the meta or other system tables, so 
> none of the meta operations through master will work.
> Symptoms in master logs:-
> {code:java}
> s, events=841, succcessCount=123, totalEvents=12824192, 
> totalSuccessCount=1891300
> 2020-10-05 16:34:25,062 INFO 
> org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: Unable 
> to get remote Address
> 2020-10-05 16:34:25,062 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:241)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:797)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:768)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:727)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:215)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1821)
> at 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:188)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> Symptoms at the client:-
> {code:java}
>  RpcRetryingCaller{globalStartTime=1602099132430, pause=100, maxAttempts=11}, 
> java.io.IOException: java.io.IOException: connection is closed
>    at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:241)
>    at 
> org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1116)
>    at 
> org.apache.hadoop.hbase.master.TableStateManager.readMetaState(TableStateManager.java:258)
>    at 
> org.apache.hadoop.hbase.master.TableStateManager.isTablePresent(TableStateManager.java:175)
>    at 
> org.apache.hadoop.hbase.master.HMaster.getTableDescriptors(HMaster.java:3277)
>    at 
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptors(HMaster.java:3221)
>    at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getTableDescriptors(MasterRpcServices.java:1064)
>    at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
>    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>    at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>    at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25203) Change the reference url to flaky list in nightly and flaky tests job

2020-10-19 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-25203:
-

 Summary: Change the reference url to flaky list in nightly and 
flaky tests job
 Key: HBASE-25203
 URL: https://issues.apache.org/jira/browse/HBASE-25203
 Project: HBase
  Issue Type: Sub-task
  Components: flakies, jenkins
Reporter: Duo Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25194) Do not publish workspace in flaky find job

2020-10-19 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-25194.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to all active branches.

Thanks [~busbey] for reviewing.

> Do not publish workspace in flaky find job
> --
>
> Key: HBASE-25194
> URL: https://issues.apache.org/jira/browse/HBASE-25194
> Project: HBase
>  Issue Type: Sub-task
>  Components: jenkins
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 1.4.14, 2.2.7
>
>
> As said in the parent issue.
> And I tried to use the publishHTML target on jenkins job configuration page, 
> the help message for reportDir is
> {noformat}
> The path to the HTML report directory relative to the workspace.
> {noformat}
> For the reportFiles is
> {noformat}
> The file(s) to provide links inside the report directory
> {noformat}
> I think this clearly means that the plugin will publish all the files under 
> reportDir, the reportFiles is just the index page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: branch-2.2/branch-2.3 nightly jon failed to run

2020-10-19 Thread Duo Zhang
Master is failing too, the same problem.

https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/98/console

I think it is what Nick said, the name of jdk and maven seems to be
changed. Mind opening an issue to fix it Guanghao?

Thanks.



Sean Busbey  于2020年10月20日周二 上午11:42写道:

> that sounds probable given the symptoms.
>
> On Mon, Oct 19, 2020 at 9:16 PM Nick Dimiduk  wrote:
> >
> > The Jenkins server was restarted within the last 24 hours, presumably for
> > maintenance. Maybe they changed the name of this binary mapping without
> > telling us?
> >
> > On Mon, Oct 19, 2020 at 17:42 Guanghao Zhang  wrote:
> >
> > > See
> > >
> > >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console
> > >
> > >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console
> > >
> > > org.codehaus.groovy.control.MultipleCompilationErrorsException: startup
> > > failed:
> > > WorkflowScript: 508: Tool type "maven" does not have an install of
> > > "Maven (latest)" configured - did you mean "maven_latest"? @ line 508,
> > > column 19.
> > >maven 'Maven (latest)'
> > >  ^
> > >
> > > WorkflowScript: 510: Tool type "jdk" does not have an install of "JDK
> > > 1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ line 510,
> > > column 17.
> > >jdk "JDK 1.8 (latest)"
> > >
> > >
> > > But the Jenkinsfile did not change recently. Any ideas?
> > >
>
>
>
> --
> Sean
>


Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Anoop John
The best way IMO, would be to pass these stuff as attributes within
Put/Delete  ( Any Mutation) and make the CP to process it and convert as
Tags and attach to Cells before write to region.. This is how cell level
ACL/ TTL etc works..   I agree to Andy on keeping tags not exposed to
clients.

Anoop


On Mon, Oct 19, 2020 at 11:45 PM Andrew Purtell  wrote:

> Thanks for the clarification.
>
> My opinion about client use of cell tags remains unchanged. Also, the offer
> to assist with any coprocessor side API issues with using tags on the
> server side.
>
>
> On Mon, Oct 19, 2020 at 11:07 AM Rushabh Shah  >
> wrote:
>
> >
> > Thank you Andrew and Geoffrey for the comments.
> >
> >
> > >  Why not add values to Deletes instead. Values (in the key value sense)
> > are ignored for Deletes, so could serve as annotations as
> > you like. Nobody has proposed changing value semantics on Puts or
> whatever.
> >
> > I think I didn't do a good job in explaining the use case and focused my
> > discussion/question on Deletes _only_.
> > The current requirement is to add tags to Deletes but we want a solution
> > which can be extensible to Puts also. Since the annotation we want to add
> > (i.e source of operation) is not limited to only Deletes.
> > Here are the use cases we considered to add tags for puts mutations other
> > than adding source of mutation information:
> > 1. Identifying whether the put came from primary cluster or replicated
> > cluster so that we can make the backup tool more smarter and not backup
> the
> > same put twice in source and replicated cluster.
> > 2. We have a multi-tenancy concept in Phoenix. We want to track whether
> > the upsert (put operation in hbase) came from Global or Tenant
> connection.
> >
> > > There should be no limitations on tag use to coprocessors. If there are
> > API issues in that regard we can certainly improve the situation.
> >
> > I am writing POC for this. Thank you for the suggestion, Andrew !
> >
> >
> > Rushabh Shah
> >
> >
> >
> >
> >
> > On Mon, Oct 19, 2020 at 10:09 AM Andrew Purtell 
> > wrote:
> >
> >> Just to be clear I think we are talking past each other somehow. You ask
> >> to
> >> add tags to Deletes. Why not add values to Deletes instead. Values (in
> the
> >> key value sense) are ignored for Deletes, so could serve as annotations
> as
> >> you like. Nobody has proposed changing value semantics on Puts or
> >> whatever.
> >>
> >> On Mon, Oct 19, 2020 at 10:07 AM Andrew Purtell 
> >> wrote:
> >>
> >> > Because tags are meant to be a server side internal feature. There is
> no
> >> > strong technical rationale to change here because values in Deletes
> can
> >> > serve just as well as tags. Unless there is something I am missing. If
> >> > there were it could be reasonable to reconsider. In the absence of an
> >> > actual need, it is not.
> >> >
> >> > On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby 
> >> > wrote:
> >> >
> >> >> I completely understand why HBase wouldn't want to expose tags that
> it
> >> >> uses
> >> >> for internal security purposes, like ACLs or visibility, to clients.
> >> >> However, making _all_ tags be off-limits seems to me to limit quite a
> >> few
> >> >> useful features.
> >> >>
> >> >> Overloading the delete marker's value solves one particular problem,
> >> but
> >> >> not the general case, because it can't be extended to Puts, which
> >> already
> >> >> use their value field for real data. The motivating example in
> >> HBASE-25118
> >> >> is distinguishing a bulk delete from customer operations. But there
> are
> >> >> times we may want to distinguish an ETL or bulk write from customer
> >> >> operations.
> >> >>
> >> >> Let's say I have a batch job that does an ETL into a cluster at the
> >> same
> >> >> time the cluster is taking other writes. I want to be really sure
> that
> >> all
> >> >> my data got loaded properly, so I generate a checksum from the ETL
> >> dataset
> >> >> before I load it. After the ETL, I want to generate a checksum for
> the
> >> >> loaded data on the cluster and compare. So I need to write a Filter
> >> that
> >> >> distinguishes the loaded data from any other operations going on at
> the
> >> >> same time. (Let's assume I'm scanning raw and have major compaction
> >> >> disabled so nothing gets purged, and there's nothing distinguishing
> >> about
> >> >> the data itself)
> >> >>
> >> >> The simplest way to do this would be to have a (hopefully tiny)
> >> Cell-level
> >> >> annotation that identifies that it originally came from my ETL.
> That's
> >> >> exactly what the Tag array field would provide. Now, I could hack
> >> >> something
> >> >> into the Put value and change all my applications to ignore part of
> the
> >> >> value array, but that assumes that I have full control over the
> value's
> >> >> format (not true if I'm using, say, Phoenix). And like using the
> Delete
> >> >> value, that's just hacking my own proprietary "Tag" capability into
> >> HBase
> >> >> when a real one 

Re: branch-2.2/branch-2.3 nightly jon failed to run

2020-10-19 Thread Sean Busbey
that sounds probable given the symptoms.

On Mon, Oct 19, 2020 at 9:16 PM Nick Dimiduk  wrote:
>
> The Jenkins server was restarted within the last 24 hours, presumably for
> maintenance. Maybe they changed the name of this binary mapping without
> telling us?
>
> On Mon, Oct 19, 2020 at 17:42 Guanghao Zhang  wrote:
>
> > See
> >
> > https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console
> >
> > https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console
> >
> > org.codehaus.groovy.control.MultipleCompilationErrorsException: startup
> > failed:
> > WorkflowScript: 508: Tool type "maven" does not have an install of
> > "Maven (latest)" configured - did you mean "maven_latest"? @ line 508,
> > column 19.
> >maven 'Maven (latest)'
> >  ^
> >
> > WorkflowScript: 510: Tool type "jdk" does not have an install of "JDK
> > 1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ line 510,
> > column 17.
> >jdk "JDK 1.8 (latest)"
> >
> >
> > But the Jenkinsfile did not change recently. Any ideas?
> >



-- 
Sean


Re: branch-2.2/branch-2.3 nightly jon failed to run

2020-10-19 Thread Nick Dimiduk
The Jenkins server was restarted within the last 24 hours, presumably for
maintenance. Maybe they changed the name of this binary mapping without
telling us?

On Mon, Oct 19, 2020 at 17:42 Guanghao Zhang  wrote:

> See
>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console
>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console
>
> org.codehaus.groovy.control.MultipleCompilationErrorsException: startup
> failed:
> WorkflowScript: 508: Tool type "maven" does not have an install of
> "Maven (latest)" configured - did you mean "maven_latest"? @ line 508,
> column 19.
>maven 'Maven (latest)'
>  ^
>
> WorkflowScript: 510: Tool type "jdk" does not have an install of "JDK
> 1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ line 510,
> column 17.
>jdk "JDK 1.8 (latest)"
>
>
> But the Jenkinsfile did not change recently. Any ideas?
>


branch-2.2/branch-2.3 nightly jon failed to run

2020-10-19 Thread Guanghao Zhang
See
https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console
https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
WorkflowScript: 508: Tool type "maven" does not have an install of
"Maven (latest)" configured - did you mean "maven_latest"? @ line 508,
column 19.
   maven 'Maven (latest)'
 ^

WorkflowScript: 510: Tool type "jdk" does not have an install of "JDK
1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ line 510,
column 17.
   jdk "JDK 1.8 (latest)"


But the Jenkinsfile did not change recently. Any ideas?


Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Andrew Purtell
Thanks for the clarification.

My opinion about client use of cell tags remains unchanged. Also, the offer
to assist with any coprocessor side API issues with using tags on the
server side.


On Mon, Oct 19, 2020 at 11:07 AM Rushabh Shah 
wrote:

>
> Thank you Andrew and Geoffrey for the comments.
>
>
> >  Why not add values to Deletes instead. Values (in the key value sense)
> are ignored for Deletes, so could serve as annotations as
> you like. Nobody has proposed changing value semantics on Puts or whatever.
>
> I think I didn't do a good job in explaining the use case and focused my
> discussion/question on Deletes _only_.
> The current requirement is to add tags to Deletes but we want a solution
> which can be extensible to Puts also. Since the annotation we want to add
> (i.e source of operation) is not limited to only Deletes.
> Here are the use cases we considered to add tags for puts mutations other
> than adding source of mutation information:
> 1. Identifying whether the put came from primary cluster or replicated
> cluster so that we can make the backup tool more smarter and not backup the
> same put twice in source and replicated cluster.
> 2. We have a multi-tenancy concept in Phoenix. We want to track whether
> the upsert (put operation in hbase) came from Global or Tenant connection.
>
> > There should be no limitations on tag use to coprocessors. If there are
> API issues in that regard we can certainly improve the situation.
>
> I am writing POC for this. Thank you for the suggestion, Andrew !
>
>
> Rushabh Shah
>
>
>
>
>
> On Mon, Oct 19, 2020 at 10:09 AM Andrew Purtell 
> wrote:
>
>> Just to be clear I think we are talking past each other somehow. You ask
>> to
>> add tags to Deletes. Why not add values to Deletes instead. Values (in the
>> key value sense) are ignored for Deletes, so could serve as annotations as
>> you like. Nobody has proposed changing value semantics on Puts or
>> whatever.
>>
>> On Mon, Oct 19, 2020 at 10:07 AM Andrew Purtell 
>> wrote:
>>
>> > Because tags are meant to be a server side internal feature. There is no
>> > strong technical rationale to change here because values in Deletes can
>> > serve just as well as tags. Unless there is something I am missing. If
>> > there were it could be reasonable to reconsider. In the absence of an
>> > actual need, it is not.
>> >
>> > On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby 
>> > wrote:
>> >
>> >> I completely understand why HBase wouldn't want to expose tags that it
>> >> uses
>> >> for internal security purposes, like ACLs or visibility, to clients.
>> >> However, making _all_ tags be off-limits seems to me to limit quite a
>> few
>> >> useful features.
>> >>
>> >> Overloading the delete marker's value solves one particular problem,
>> but
>> >> not the general case, because it can't be extended to Puts, which
>> already
>> >> use their value field for real data. The motivating example in
>> HBASE-25118
>> >> is distinguishing a bulk delete from customer operations. But there are
>> >> times we may want to distinguish an ETL or bulk write from customer
>> >> operations.
>> >>
>> >> Let's say I have a batch job that does an ETL into a cluster at the
>> same
>> >> time the cluster is taking other writes. I want to be really sure that
>> all
>> >> my data got loaded properly, so I generate a checksum from the ETL
>> dataset
>> >> before I load it. After the ETL, I want to generate a checksum for the
>> >> loaded data on the cluster and compare. So I need to write a Filter
>> that
>> >> distinguishes the loaded data from any other operations going on at the
>> >> same time. (Let's assume I'm scanning raw and have major compaction
>> >> disabled so nothing gets purged, and there's nothing distinguishing
>> about
>> >> the data itself)
>> >>
>> >> The simplest way to do this would be to have a (hopefully tiny)
>> Cell-level
>> >> annotation that identifies that it originally came from my ETL. That's
>> >> exactly what the Tag array field would provide. Now, I could hack
>> >> something
>> >> into the Put value and change all my applications to ignore part of the
>> >> value array, but that assumes that I have full control over the value's
>> >> format (not true if I'm using, say, Phoenix). And like using the Delete
>> >> value, that's just hacking my own proprietary "Tag" capability into
>> HBase
>> >> when a real one already exists.
>> >>
>> >> So I'm curious why, so long as HBase internal tags continue to be
>> >> suppressed, is the Tag capability a bad thing to expose?
>> >>
>> >> Geoffrey
>> >>
>> >>
>> >>
>> >> On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell 
>> >> wrote:
>> >>
>> >> > I responded on the JIRA.
>> >> >
>> >> > You would be far better served adapting values for your proposal
>> >> instead of
>> >> > tags. Tags are not a client side feature. Tags were and are designed
>> for
>> >> > server side use only, and are stripped from client inbound and
>> outbound
>> >> > RPCs.
>> >> >
>> >> > On Wed, Oct 

Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Rushabh Shah
Thank you Andrew and Geoffrey for the comments.


>  Why not add values to Deletes instead. Values (in the key value sense)
are ignored for Deletes, so could serve as annotations as
you like. Nobody has proposed changing value semantics on Puts or whatever.

I think I didn't do a good job in explaining the use case and focused my
discussion/question on Deletes _only_.
The current requirement is to add tags to Deletes but we want a solution
which can be extensible to Puts also. Since the annotation we want to add
(i.e source of operation) is not limited to only Deletes.
Here are the use cases we considered to add tags for puts mutations other
than adding source of mutation information:
1. Identifying whether the put came from primary cluster or replicated
cluster so that we can make the backup tool more smarter and not backup the
same put twice in source and replicated cluster.
2. We have a multi-tenancy concept in Phoenix. We want to track whether the
upsert (put operation in hbase) came from Global or Tenant connection.

> There should be no limitations on tag use to coprocessors. If there are
API issues in that regard we can certainly improve the situation.

I am writing POC for this. Thank you for the suggestion, Andrew !


Rushabh Shah





On Mon, Oct 19, 2020 at 10:09 AM Andrew Purtell  wrote:

> Just to be clear I think we are talking past each other somehow. You ask to
> add tags to Deletes. Why not add values to Deletes instead. Values (in the
> key value sense) are ignored for Deletes, so could serve as annotations as
> you like. Nobody has proposed changing value semantics on Puts or whatever.
>
> On Mon, Oct 19, 2020 at 10:07 AM Andrew Purtell 
> wrote:
>
> > Because tags are meant to be a server side internal feature. There is no
> > strong technical rationale to change here because values in Deletes can
> > serve just as well as tags. Unless there is something I am missing. If
> > there were it could be reasonable to reconsider. In the absence of an
> > actual need, it is not.
> >
> > On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby 
> > wrote:
> >
> >> I completely understand why HBase wouldn't want to expose tags that it
> >> uses
> >> for internal security purposes, like ACLs or visibility, to clients.
> >> However, making _all_ tags be off-limits seems to me to limit quite a
> few
> >> useful features.
> >>
> >> Overloading the delete marker's value solves one particular problem, but
> >> not the general case, because it can't be extended to Puts, which
> already
> >> use their value field for real data. The motivating example in
> HBASE-25118
> >> is distinguishing a bulk delete from customer operations. But there are
> >> times we may want to distinguish an ETL or bulk write from customer
> >> operations.
> >>
> >> Let's say I have a batch job that does an ETL into a cluster at the same
> >> time the cluster is taking other writes. I want to be really sure that
> all
> >> my data got loaded properly, so I generate a checksum from the ETL
> dataset
> >> before I load it. After the ETL, I want to generate a checksum for the
> >> loaded data on the cluster and compare. So I need to write a Filter that
> >> distinguishes the loaded data from any other operations going on at the
> >> same time. (Let's assume I'm scanning raw and have major compaction
> >> disabled so nothing gets purged, and there's nothing distinguishing
> about
> >> the data itself)
> >>
> >> The simplest way to do this would be to have a (hopefully tiny)
> Cell-level
> >> annotation that identifies that it originally came from my ETL. That's
> >> exactly what the Tag array field would provide. Now, I could hack
> >> something
> >> into the Put value and change all my applications to ignore part of the
> >> value array, but that assumes that I have full control over the value's
> >> format (not true if I'm using, say, Phoenix). And like using the Delete
> >> value, that's just hacking my own proprietary "Tag" capability into
> HBase
> >> when a real one already exists.
> >>
> >> So I'm curious why, so long as HBase internal tags continue to be
> >> suppressed, is the Tag capability a bad thing to expose?
> >>
> >> Geoffrey
> >>
> >>
> >>
> >> On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell 
> >> wrote:
> >>
> >> > I responded on the JIRA.
> >> >
> >> > You would be far better served adapting values for your proposal
> >> instead of
> >> > tags. Tags are not a client side feature. Tags were and are designed
> for
> >> > server side use only, and are stripped from client inbound and
> outbound
> >> > RPCs.
> >> >
> >> > On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
> >> >  wrote:
> >> >
> >> > > Thank you Ram for your response !
> >> > >
> >> > > > For your case, is there a possibility to have yournew feature as a
> >> > first
> >> > > class feature using Tags? Just asking?
> >> > >
> >> > > Could you elaborate what you mean by first class feature ?
> >> > >
> >> > >
> >> > > Rushabh Shah
> >> > >
> >> > >- Software 

Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Andrew Purtell
For what it's worth I am not in favor of exposing tags to clients nor
letting clients submit tags because either leaking security tags or
spoofing security tags would be a CVE worthy mistake. There is no strong
rationale here to overcome the risk, in my opinion. Of course, because
Phoenix is a coprocessor application (in part), if you did want to use tags
on the server side, you are welcome to add them in the Phoenix coprocessors
and consume them in those code paths as well. There should be no
limitations on tag use to coprocessors. If there are API issues in that
regard we can certainly improve the situation.

On Mon, Oct 19, 2020 at 10:07 AM Andrew Purtell  wrote:

> Because tags are meant to be a server side internal feature. There is no
> strong technical rationale to change here because values in Deletes can
> serve just as well as tags. Unless there is something I am missing. If
> there were it could be reasonable to reconsider. In the absence of an
> actual need, it is not.
>
> On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby 
> wrote:
>
>> I completely understand why HBase wouldn't want to expose tags that it
>> uses
>> for internal security purposes, like ACLs or visibility, to clients.
>> However, making _all_ tags be off-limits seems to me to limit quite a few
>> useful features.
>>
>> Overloading the delete marker's value solves one particular problem, but
>> not the general case, because it can't be extended to Puts, which already
>> use their value field for real data. The motivating example in HBASE-25118
>> is distinguishing a bulk delete from customer operations. But there are
>> times we may want to distinguish an ETL or bulk write from customer
>> operations.
>>
>> Let's say I have a batch job that does an ETL into a cluster at the same
>> time the cluster is taking other writes. I want to be really sure that all
>> my data got loaded properly, so I generate a checksum from the ETL dataset
>> before I load it. After the ETL, I want to generate a checksum for the
>> loaded data on the cluster and compare. So I need to write a Filter that
>> distinguishes the loaded data from any other operations going on at the
>> same time. (Let's assume I'm scanning raw and have major compaction
>> disabled so nothing gets purged, and there's nothing distinguishing about
>> the data itself)
>>
>> The simplest way to do this would be to have a (hopefully tiny) Cell-level
>> annotation that identifies that it originally came from my ETL. That's
>> exactly what the Tag array field would provide. Now, I could hack
>> something
>> into the Put value and change all my applications to ignore part of the
>> value array, but that assumes that I have full control over the value's
>> format (not true if I'm using, say, Phoenix). And like using the Delete
>> value, that's just hacking my own proprietary "Tag" capability into HBase
>> when a real one already exists.
>>
>> So I'm curious why, so long as HBase internal tags continue to be
>> suppressed, is the Tag capability a bad thing to expose?
>>
>> Geoffrey
>>
>>
>>
>> On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell 
>> wrote:
>>
>> > I responded on the JIRA.
>> >
>> > You would be far better served adapting values for your proposal
>> instead of
>> > tags. Tags are not a client side feature. Tags were and are designed for
>> > server side use only, and are stripped from client inbound and outbound
>> > RPCs.
>> >
>> > On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
>> >  wrote:
>> >
>> > > Thank you Ram for your response !
>> > >
>> > > > For your case, is there a possibility to have yournew feature as a
>> > first
>> > > class feature using Tags? Just asking?
>> > >
>> > > Could you elaborate what you mean by first class feature ?
>> > >
>> > >
>> > > Rushabh Shah
>> > >
>> > >- Software Engineering SMTS | Salesforce
>> > >-
>> > >   - Mobile: 213 422 9052
>> > >
>> > >
>> > >
>> > > On Wed, Oct 14, 2020 at 9:35 AM ramkrishna vasudevan <
>> > > ramkrishna.s.vasude...@gmail.com> wrote:
>> > >
>> > > > Hi Rushabh
>> > > >
>> > > > If I remember correctly, the decision was not to expose tags for
>> > clients
>> > > > directly. All the tags were used as internal to the cell formation
>> at
>> > the
>> > > > server side (for eg ACL and Visibility labels).
>> > > >
>> > > > For your case, is there a possibility to have yournew feature as a
>> > first
>> > > > class feature using Tags? Just asking?
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > On Wed, Oct 14, 2020 at 8:17 PM Rushabh Shah
>> > > >  wrote:
>> > > >
>> > > > > Hi Everyone,
>> > > > > I want to understand how to use the Hbase Cell Tags feature. We
>> have
>> > a
>> > > > use
>> > > > > case to identify the source of deletes (not the same as
>> authenticated
>> > > > > kerberos user). I have added more details about my use case in
>> > > > HBASE-25118
>> > > > > . At my day
>> job
>> > we
>> > > > use
>> > > > > Phoenix to interact with 

Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Andrew Purtell
Just to be clear I think we are talking past each other somehow. You ask to
add tags to Deletes. Why not add values to Deletes instead. Values (in the
key value sense) are ignored for Deletes, so could serve as annotations as
you like. Nobody has proposed changing value semantics on Puts or whatever.

On Mon, Oct 19, 2020 at 10:07 AM Andrew Purtell  wrote:

> Because tags are meant to be a server side internal feature. There is no
> strong technical rationale to change here because values in Deletes can
> serve just as well as tags. Unless there is something I am missing. If
> there were it could be reasonable to reconsider. In the absence of an
> actual need, it is not.
>
> On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby 
> wrote:
>
>> I completely understand why HBase wouldn't want to expose tags that it
>> uses
>> for internal security purposes, like ACLs or visibility, to clients.
>> However, making _all_ tags be off-limits seems to me to limit quite a few
>> useful features.
>>
>> Overloading the delete marker's value solves one particular problem, but
>> not the general case, because it can't be extended to Puts, which already
>> use their value field for real data. The motivating example in HBASE-25118
>> is distinguishing a bulk delete from customer operations. But there are
>> times we may want to distinguish an ETL or bulk write from customer
>> operations.
>>
>> Let's say I have a batch job that does an ETL into a cluster at the same
>> time the cluster is taking other writes. I want to be really sure that all
>> my data got loaded properly, so I generate a checksum from the ETL dataset
>> before I load it. After the ETL, I want to generate a checksum for the
>> loaded data on the cluster and compare. So I need to write a Filter that
>> distinguishes the loaded data from any other operations going on at the
>> same time. (Let's assume I'm scanning raw and have major compaction
>> disabled so nothing gets purged, and there's nothing distinguishing about
>> the data itself)
>>
>> The simplest way to do this would be to have a (hopefully tiny) Cell-level
>> annotation that identifies that it originally came from my ETL. That's
>> exactly what the Tag array field would provide. Now, I could hack
>> something
>> into the Put value and change all my applications to ignore part of the
>> value array, but that assumes that I have full control over the value's
>> format (not true if I'm using, say, Phoenix). And like using the Delete
>> value, that's just hacking my own proprietary "Tag" capability into HBase
>> when a real one already exists.
>>
>> So I'm curious why, so long as HBase internal tags continue to be
>> suppressed, is the Tag capability a bad thing to expose?
>>
>> Geoffrey
>>
>>
>>
>> On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell 
>> wrote:
>>
>> > I responded on the JIRA.
>> >
>> > You would be far better served adapting values for your proposal
>> instead of
>> > tags. Tags are not a client side feature. Tags were and are designed for
>> > server side use only, and are stripped from client inbound and outbound
>> > RPCs.
>> >
>> > On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
>> >  wrote:
>> >
>> > > Thank you Ram for your response !
>> > >
>> > > > For your case, is there a possibility to have yournew feature as a
>> > first
>> > > class feature using Tags? Just asking?
>> > >
>> > > Could you elaborate what you mean by first class feature ?
>> > >
>> > >
>> > > Rushabh Shah
>> > >
>> > >- Software Engineering SMTS | Salesforce
>> > >-
>> > >   - Mobile: 213 422 9052
>> > >
>> > >
>> > >
>> > > On Wed, Oct 14, 2020 at 9:35 AM ramkrishna vasudevan <
>> > > ramkrishna.s.vasude...@gmail.com> wrote:
>> > >
>> > > > Hi Rushabh
>> > > >
>> > > > If I remember correctly, the decision was not to expose tags for
>> > clients
>> > > > directly. All the tags were used as internal to the cell formation
>> at
>> > the
>> > > > server side (for eg ACL and Visibility labels).
>> > > >
>> > > > For your case, is there a possibility to have yournew feature as a
>> > first
>> > > > class feature using Tags? Just asking?
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > On Wed, Oct 14, 2020 at 8:17 PM Rushabh Shah
>> > > >  wrote:
>> > > >
>> > > > > Hi Everyone,
>> > > > > I want to understand how to use the Hbase Cell Tags feature. We
>> have
>> > a
>> > > > use
>> > > > > case to identify the source of deletes (not the same as
>> authenticated
>> > > > > kerberos user). I have added more details about my use case in
>> > > > HBASE-25118
>> > > > > . At my day
>> job
>> > we
>> > > > use
>> > > > > Phoenix to interact with hbase and we are passing this information
>> > via
>> > > > > Phoenix ConnectionProperties. We are exploring the Cell Tags
>> feature
>> > to
>> > > > add
>> > > > > this metadata to Hbase Cells (only to Delete Markers as of now).
>> > > > >
>> > > > > Via HBASE-18995 <
>> 

Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Andrew Purtell
Because tags are meant to be a server side internal feature. There is no
strong technical rationale to change here because values in Deletes can
serve just as well as tags. Unless there is something I am missing. If
there were it could be reasonable to reconsider. In the absence of an
actual need, it is not.

On Mon, Oct 19, 2020 at 9:34 AM Geoffrey Jacoby  wrote:

> I completely understand why HBase wouldn't want to expose tags that it uses
> for internal security purposes, like ACLs or visibility, to clients.
> However, making _all_ tags be off-limits seems to me to limit quite a few
> useful features.
>
> Overloading the delete marker's value solves one particular problem, but
> not the general case, because it can't be extended to Puts, which already
> use their value field for real data. The motivating example in HBASE-25118
> is distinguishing a bulk delete from customer operations. But there are
> times we may want to distinguish an ETL or bulk write from customer
> operations.
>
> Let's say I have a batch job that does an ETL into a cluster at the same
> time the cluster is taking other writes. I want to be really sure that all
> my data got loaded properly, so I generate a checksum from the ETL dataset
> before I load it. After the ETL, I want to generate a checksum for the
> loaded data on the cluster and compare. So I need to write a Filter that
> distinguishes the loaded data from any other operations going on at the
> same time. (Let's assume I'm scanning raw and have major compaction
> disabled so nothing gets purged, and there's nothing distinguishing about
> the data itself)
>
> The simplest way to do this would be to have a (hopefully tiny) Cell-level
> annotation that identifies that it originally came from my ETL. That's
> exactly what the Tag array field would provide. Now, I could hack something
> into the Put value and change all my applications to ignore part of the
> value array, but that assumes that I have full control over the value's
> format (not true if I'm using, say, Phoenix). And like using the Delete
> value, that's just hacking my own proprietary "Tag" capability into HBase
> when a real one already exists.
>
> So I'm curious why, so long as HBase internal tags continue to be
> suppressed, is the Tag capability a bad thing to expose?
>
> Geoffrey
>
>
>
> On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell 
> wrote:
>
> > I responded on the JIRA.
> >
> > You would be far better served adapting values for your proposal instead
> of
> > tags. Tags are not a client side feature. Tags were and are designed for
> > server side use only, and are stripped from client inbound and outbound
> > RPCs.
> >
> > On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
> >  wrote:
> >
> > > Thank you Ram for your response !
> > >
> > > > For your case, is there a possibility to have yournew feature as a
> > first
> > > class feature using Tags? Just asking?
> > >
> > > Could you elaborate what you mean by first class feature ?
> > >
> > >
> > > Rushabh Shah
> > >
> > >- Software Engineering SMTS | Salesforce
> > >-
> > >   - Mobile: 213 422 9052
> > >
> > >
> > >
> > > On Wed, Oct 14, 2020 at 9:35 AM ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Hi Rushabh
> > > >
> > > > If I remember correctly, the decision was not to expose tags for
> > clients
> > > > directly. All the tags were used as internal to the cell formation at
> > the
> > > > server side (for eg ACL and Visibility labels).
> > > >
> > > > For your case, is there a possibility to have yournew feature as a
> > first
> > > > class feature using Tags? Just asking?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Wed, Oct 14, 2020 at 8:17 PM Rushabh Shah
> > > >  wrote:
> > > >
> > > > > Hi Everyone,
> > > > > I want to understand how to use the Hbase Cell Tags feature. We
> have
> > a
> > > > use
> > > > > case to identify the source of deletes (not the same as
> authenticated
> > > > > kerberos user). I have added more details about my use case in
> > > > HBASE-25118
> > > > > . At my day job
> > we
> > > > use
> > > > > Phoenix to interact with hbase and we are passing this information
> > via
> > > > > Phoenix ConnectionProperties. We are exploring the Cell Tags
> feature
> > to
> > > > add
> > > > > this metadata to Hbase Cells (only to Delete Markers as of now).
> > > > >
> > > > > Via HBASE-18995  >,
> > > we
> > > > > have moved all the createCell methods which use Tag(s) as an
> argument
> > > to
> > > > > PrivateCellUtil class and made the InterfaceAudience of that class
> > > > Private.
> > > > > I saw some discussion on that jira
> > > > > <
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-18995?focusedCommentId=16219960=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16219960
> > > > > >]
> > > > > to expose some methods as 

Re: Recommended way of using HBase Cell Tags.

2020-10-19 Thread Geoffrey Jacoby
I completely understand why HBase wouldn't want to expose tags that it uses
for internal security purposes, like ACLs or visibility, to clients.
However, making _all_ tags be off-limits seems to me to limit quite a few
useful features.

Overloading the delete marker's value solves one particular problem, but
not the general case, because it can't be extended to Puts, which already
use their value field for real data. The motivating example in HBASE-25118
is distinguishing a bulk delete from customer operations. But there are
times we may want to distinguish an ETL or bulk write from customer
operations.

Let's say I have a batch job that does an ETL into a cluster at the same
time the cluster is taking other writes. I want to be really sure that all
my data got loaded properly, so I generate a checksum from the ETL dataset
before I load it. After the ETL, I want to generate a checksum for the
loaded data on the cluster and compare. So I need to write a Filter that
distinguishes the loaded data from any other operations going on at the
same time. (Let's assume I'm scanning raw and have major compaction
disabled so nothing gets purged, and there's nothing distinguishing about
the data itself)

The simplest way to do this would be to have a (hopefully tiny) Cell-level
annotation that identifies that it originally came from my ETL. That's
exactly what the Tag array field would provide. Now, I could hack something
into the Put value and change all my applications to ignore part of the
value array, but that assumes that I have full control over the value's
format (not true if I'm using, say, Phoenix). And like using the Delete
value, that's just hacking my own proprietary "Tag" capability into HBase
when a real one already exists.

So I'm curious why, so long as HBase internal tags continue to be
suppressed, is the Tag capability a bad thing to expose?

Geoffrey



On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell  wrote:

> I responded on the JIRA.
>
> You would be far better served adapting values for your proposal instead of
> tags. Tags are not a client side feature. Tags were and are designed for
> server side use only, and are stripped from client inbound and outbound
> RPCs.
>
> On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
>  wrote:
>
> > Thank you Ram for your response !
> >
> > > For your case, is there a possibility to have yournew feature as a
> first
> > class feature using Tags? Just asking?
> >
> > Could you elaborate what you mean by first class feature ?
> >
> >
> > Rushabh Shah
> >
> >- Software Engineering SMTS | Salesforce
> >-
> >   - Mobile: 213 422 9052
> >
> >
> >
> > On Wed, Oct 14, 2020 at 9:35 AM ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> > > Hi Rushabh
> > >
> > > If I remember correctly, the decision was not to expose tags for
> clients
> > > directly. All the tags were used as internal to the cell formation at
> the
> > > server side (for eg ACL and Visibility labels).
> > >
> > > For your case, is there a possibility to have yournew feature as a
> first
> > > class feature using Tags? Just asking?
> > >
> > > Regards
> > > Ram
> > >
> > > On Wed, Oct 14, 2020 at 8:17 PM Rushabh Shah
> > >  wrote:
> > >
> > > > Hi Everyone,
> > > > I want to understand how to use the Hbase Cell Tags feature. We have
> a
> > > use
> > > > case to identify the source of deletes (not the same as authenticated
> > > > kerberos user). I have added more details about my use case in
> > > HBASE-25118
> > > > . At my day job
> we
> > > use
> > > > Phoenix to interact with hbase and we are passing this information
> via
> > > > Phoenix ConnectionProperties. We are exploring the Cell Tags feature
> to
> > > add
> > > > this metadata to Hbase Cells (only to Delete Markers as of now).
> > > >
> > > > Via HBASE-18995 ,
> > we
> > > > have moved all the createCell methods which use Tag(s) as an argument
> > to
> > > > PrivateCellUtil class and made the InterfaceAudience of that class
> > > Private.
> > > > I saw some discussion on that jira
> > > > <
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-18995?focusedCommentId=16219960=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16219960
> > > > >]
> > > > to expose some methods as LimitedPrivate accessible to CP but was
> > decided
> > > > to do it later. We only expose CellBuilderFactory
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/CellBuilderFactory.java
> > > > >
> > > > which returns which returns an instance of CellBuilder
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/CellBuilder.java
> > > > >
> > > > which doesn't have a setTags method. Also the code is vastly
> different
> > in
> > > > branch-1.
> > > >
> > > > Could 

[jira] [Resolved] (HBASE-24628) Region normalizer now respects a rate limit

2020-10-19 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-24628.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

The flaky board is back to normal.

Resolve.

> Region normalizer now respects a rate limit
> ---
>
> Key: HBASE-24628
> URL: https://issues.apache.org/jira/browse/HBASE-24628
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> There's no limits on the normalizer right now. It will iterate through the 
> tables one at a time until it's touched all tables. For any table, it 
> generates a complete plan and executes that plan in totality.
> It would be nice to allow operators to configure some limits here. Off the 
> top of my head, the two metrics that might be interesting are {{split|merge 
> actions / hour}} or {{hfile mb volume / hour}}. Either way, we'd need to 
> track a little more metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)