Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Jerry He
I was more on the thinking to avoid/reduce pulling in hbase dependencies
into hbase-spark, and that maybe even hbase-spark can depend on shaded
client and server -- it will be easier and more feasible if the shaded
client becomes the default as you mentioned.

Your idea that hbase-spark itself becomes a shaded artifact sounds better
if I understand you correctly. The spark/scala dependencies are 'provided'
already.

Jerry



On Wed, Feb 8, 2017 at 6:14 PM, Nick Dimiduk  wrote:

> On Wed, Feb 8, 2017 at 10:24 AM Jerry He  wrote:
>
> > Yeah.  Talking about the dependency, the hbase-spark module already has
> > dependency on hbase-server (coming from the spark bulk load producing
> > hfiles).
> > This is not very good. We have to be careful not entangling it more.
> > Also, there is already problem running the hbase-spark due to dependency
> > conflict, and one has to be careful about the order of the classpath to
> > make it work.
>
> We own the hbase-spark module, do we not? In that case, we control our own
> destiny. An explicit goal of this effort would be to make use of that
> module agnostic to classpath load order. As per my earlier reply,
> hbase-spark could itself be an artifact shaded over hbase-server and all of
> its dependencies. That way the user doesn't need to think about it at all.
>
> It further seems to me that the maven-shade-plugin could gain a new analyze
> goal, similar to the that of the dependency-plugin, which would audit the
> classes packaged in a jar vs the contract defined in configuration. This
> could further be used to fail the build of there's any warnings reported.
> I've found myself wanting this very functionality as I consume ES and
> Phoenix, shaded in downstream projects.
>


[jira] [Reopened] (HBASE-17280) Add mechanism to control hbase cleaner behavior

2017-02-08 Thread Ajay Jadhav (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Jadhav reopened HBASE-17280:
-

> Add mechanism to control hbase cleaner behavior
> ---
>
> Key: HBASE-17280
> URL: https://issues.apache.org/jira/browse/HBASE-17280
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, hbase, shell
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Ajay Jadhav
>Assignee: Ajay Jadhav
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17280.branch-1.2.patch, 
> HBASE-17280.branch-2.0.patch, HBASE-17280.master.003.patch, 
> HBASE-17280.master.004.patch, HBASE-17280.master.005.patch, 
> HBASE-17280.v1-branch-1.2.patch, HBASE-17280.v2-branch-1.2.patch, 
> HBASE-17280.v2-branch-2.patch
>
>
> Cleaner is used to get rid of archived HFiles and old WALs in HBase.
> In the case of heavy workload, cleaner can affect query performance by 
> creating a lot of connections to perform costly reads/ writes against 
> underlying filesystem.
> This patch allows the user to control HBase cleaner behavior by providing 
> shell commands to enable/ disable and manually run it.
> Our main intention with this patch was to avoid running the expensive cleaner 
> chore during peak times. During our experimentation, we saw a lot of HFiles 
> and WAL log related files getting created inside archive dir (didn't see 
> ZKlock related files). Since we were replacing hdfs with S3, these delete 
> calls will take forever to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17615) Use nonce and procedure v2 for add/remove replication peer

2017-02-08 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-17615:
--

 Summary: Use nonce and procedure v2 for add/remove replication peer
 Key: HBASE-17615
 URL: https://issues.apache.org/jira/browse/HBASE-17615
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Nick Dimiduk
On Wed, Feb 8, 2017 at 10:24 AM Jerry He  wrote:

> Yeah.  Talking about the dependency, the hbase-spark module already has
> dependency on hbase-server (coming from the spark bulk load producing
> hfiles).
> This is not very good. We have to be careful not entangling it more.
> Also, there is already problem running the hbase-spark due to dependency
> conflict, and one has to be careful about the order of the classpath to
> make it work.

We own the hbase-spark module, do we not? In that case, we control our own
destiny. An explicit goal of this effort would be to make use of that
module agnostic to classpath load order. As per my earlier reply,
hbase-spark could itself be an artifact shaded over hbase-server and all of
its dependencies. That way the user doesn't need to think about it at all.

It further seems to me that the maven-shade-plugin could gain a new analyze
goal, similar to the that of the dependency-plugin, which would audit the
classes packaged in a jar vs the contract defined in configuration. This
could further be used to fail the build of there's any warnings reported.
I've found myself wanting this very functionality as I consume ES and
Phoenix, shaded in downstream projects.


[jira] [Reopened] (HBASE-17572) HMaster: Caught throwable while processing event C_M_MERGE_REGION (UndeclaredThrowableException)

2017-02-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-17572:


Reopening.

I was able to reproduce again with branch-1 (not branch-1.3). I misjudged this. 
In retrospect it's obvious, the exception thrown out of the doAs is 
UndeclaredThrowableException. We have to catch transform the ServiceException 
inside the doAs block. Will revert the earlier commit and replace with the 
proper fix. 

> HMaster: Caught throwable while processing event C_M_MERGE_REGION 
> (UndeclaredThrowableException)
> 
>
> Key: HBASE-17572
> URL: https://issues.apache.org/jira/browse/HBASE-17572
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 1.4.0, 1.3.1
>
> Attachments: HBASE-17572-branch-1.3.patch
>
>
> Running ITBLL 1B rows against branch-1.3 compiled against Hadoop 2.7.3 with 
> the noKill monkey policy, I see both masters go down with
> master.HMaster: Caught throwable while processing event C_M_MERGE_REGION
> java.lang.reflect.UndeclaredThrowableException
> In ServerManager#sendRegionsMerge we call ProtobufUtil#mergeRegions, which 
> does a doAs, and the code within that block invokes 
> RSRpcServices#mergeRegions, but is not resilient against 
> RegionOpeningException ("region is opening")
> An UndeclaredThrowableException is "thrown by a method invocation on a proxy 
> instance if its invocation handler's invoke method throws a checked exception 
> (a Throwable that is not assignable to RuntimeException or Error) that is not 
> assignable to any of the exception types declared in the throws clause of the 
> method that was invoked on the proxy instance and dispatched to the 
> invocation handler." 
> (http://docs.oracle.com/javase/7/docs/api/java/lang/reflect/UndeclaredThrowableException.html)
>  
> {noformat}
> 2017-01-31 07:21:17,495 FATAL [MASTER_TABLE_OPERATIONS-node-1:16000-0] 
> master.HMaster: Caught throwable while processing event C_M_MERGE_REGION
> java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
> at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeRegions(ProtobufUtil.java:1990)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionsMerge(ServerManager.java:925)
> at 
> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler.process(DispatchMergingRegionHandler.java:153)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.RegionOpeningException):
>  org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region 
> IntegrationTestBigLinkedList,|\xFFnk\x1C\x85<[\x1Ef\xFDE\xF9\xAA\xAC\x08,1485846598043.f56ad22121e872777468020c4452a7c7.
>  is opening on node-2.cluster,16020,1485822382322
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2964)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1139)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mergeRegions(RSRpcServices.java:1497)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22749)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2355)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:244)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:340)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.mergeRegions(AdminProtos.java:23695)
> at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil$1.run(ProtobufUtil.java:1993)
> at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil$1.run(ProtobufUtil.java:1990)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:42

Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Josh Elser

(late to the party, but..)

+1 Nick sums this up better than I could have.

Nick Dimiduk wrote:

For the client: I'm a fan of shaded client modules by default and
minimizing the exposure of that surface area of 3rd party libs (none, if
possible). For example, Elastic Search has a similar set of challenges, the
solve it by advocating users shade from step 1. It's addressed first thing
in the docs for their client libs. We could take it a step further by
making the shaded client the default client (o.a.hbase:hbase-client)
artifact and internally consume an hbase-client-unshaded. Turns the whole
thing on it's head in a way that's better for the naive user.

For MR/Spark/etc connectors: We're probably stuck as it is until necessary
classes can be extracted from hbase-server. I haven't looked into this
lately, so I hesitate to give a prescription.

For coprocessors: They forfeit their right to 3rd party library dependency
stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
on jigsaw or OSGi, but for today I think the best we should do is provide
relatively stable internal APIs. I also find it unlikely that we'd want to
spend loads of cycles optimizing for this usecase. There's other, bigger
fish, IMHO.

For size/compile time: I think these ultimately matter less than user
experience. Let's find a solution that sucks less for downstreamers and
work backward on reducing bloat.

On the point of leaning heavily on Guava: their pace is traditionally too
fast for us to expose in any public API. Maybe that's changing, in which
case we could reconsider for 3.0. Better to start using the new API's
available in Java 8...

Thanks for taking this up, Stack.
-n

On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:


Here's an old thorny issue that won't go away. I'd like to hear what folks
are thinking these times.

My immediate need is that I want to upgrade Guava [1]. I want to move us to
guava 21.0, the latest release [2]. We currently depend on guava 12.0.
Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We could
just do it in an hbase-2.0.0, a major version release, but then
downstreamers and coprocessors that may have been a little lazy and that
have transitively come to depend on our versions of libs will break [3].
Then there is the murky area around the running of YARN/MR/Spark jobs where
the ordering of libs on the CLASSPATH gets interesting where fat-jaring or
command-line antics can get you over (most) problems if you persevere.

Multiply the above by netty, jackson, and a few other favorites.

Our proffered solution to the above is the shaded hbase artifact project;
have applications and tasks refer to the shaded hbase client instead.
Because we've not done the work to narrow the surface area we expose to
downstreamers, most consumers of our API -- certainly in a spark/MR context
since our MR utility is buried in hbase-server module still -- need both
the shaded hbase client and server on their CLASSPATH (i.e. near all of
hbase).

Leaving aside for the moment that our shaded client and server need
untangling, getting folks up on the shaded artifacts takes effort
evangelizing. We also need to be doing work to make sure our shading
doesn't leak dependencies, that it works for all deploy scenarios, and that
this route forward is well doc'd, and so on.

I don't see much evidence of our pushing the shaded artifacts route nor of
their being used. What is the perception of others?

I played with adding a new module to host shaded 3rd party libs[4]. The
downsides are a couple; would have to internally, refer to the offset
version of the lib and we bulk up our tarball by a bunch of megs (Build
gets a few seconds longer, not much). Upside is that we can float over a
variety of hadoop/spark versions using whatever guava or netty we want;
downstreamers and general users should have an easier time of it too
because they'll be less likely to run into library clashes. is this project
worth finishing?

WDYT?
St.Ack

1. I wanted to make use of the protobuf to-json tool. It is in the
extra-jar, protobuf-util. It requires a guava 16.0.
2. Guava is a quality lib that should be at the core of all our dev but we
are gun shy around using it because it semver's with gusto at a rate that
is orders of magnitude in advance of the Hadoop/HBase cadence.
3. We are trying to minimize breakage when we go to hbase-2.0.0.
4. HBASE-15749 suggested this but was shutdown because it made no case for
why we'd want to do it.





[jira] [Created] (HBASE-17614) Move Backup/Restore into separate module

2017-02-08 Thread Vladimir Rodionov (JIRA)
Vladimir Rodionov created HBASE-17614:
-

 Summary: Move Backup/Restore into separate module 
 Key: HBASE-17614
 URL: https://issues.apache.org/jira/browse/HBASE-17614
 Project: HBase
  Issue Type: Task
Reporter: Vladimir Rodionov


Move all the backup code into separate hbase-backup module.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Jerry He
Yeah.  Talking about the dependency, the hbase-spark module already has
dependency on hbase-server (coming from the spark bulk load producing
hfiles).
This is not very good. We have to be careful not entangling it more.
Also, there is already problem running the hbase-spark due to dependency
conflict, and one has to be careful about the order of the classpath to
make it work.

Jerry


Successful: HBase Generate Website

2017-02-08 Thread Apache Jenkins Server
Build status: Successful

If successful, the website and docs have been generated. To update the live 
site, follow the instructions below. If failed, skip to the bottom of this 
email.

Use the following commands to download the patch and apply it to a clean branch 
based on origin/asf-site. If you prefer to keep the hbase-site repo around 
permanently, you can skip the clone step.

  git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git

  cd hbase-site
  wget -O- 
https://builds.apache.org/job/hbase_generate_website/483/artifact/website.patch.zip
 | funzip > d8f3c6cff93c62d68ac3f68703bad86deaa03f14.patch
  git fetch
  git checkout -b asf-site-d8f3c6cff93c62d68ac3f68703bad86deaa03f14 
origin/asf-site
  git am --whitespace=fix d8f3c6cff93c62d68ac3f68703bad86deaa03f14.patch

At this point, you can preview the changes by opening index.html or any of the 
other HTML pages in your local 
asf-site-d8f3c6cff93c62d68ac3f68703bad86deaa03f14 branch.

There are lots of spurious changes, such as timestamps and CSS styles in 
tables, so a generic git diff is not very useful. To see a list of files that 
have been added, deleted, renamed, changed type, or are otherwise interesting, 
use the following command:

  git diff --name-status --diff-filter=ADCRTXUB origin/asf-site

To see only files that had 100 or more lines changed:

  git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}'

When you are satisfied, publish your changes to origin/asf-site using these 
commands:

  git commit --allow-empty -m "Empty commit" # to work around a current ASF 
INFRA bug
  git push origin asf-site-d8f3c6cff93c62d68ac3f68703bad86deaa03f14:asf-site
  git checkout asf-site
  git branch -D asf-site-d8f3c6cff93c62d68ac3f68703bad86deaa03f14

Changes take a couple of minutes to be propagated. You can verify whether they 
have been propagated by looking at the Last Published date at the bottom of 
http://hbase.apache.org/. It should match the date in the index.html on the 
asf-site branch in Git.

As a courtesy- reply-all to this email to let other committers know you pushed 
the site.



If failed, see https://builds.apache.org/job/hbase_generate_website/483/console

FINAL REMINDER: CFP for ApacheCon closes February 11th

2017-02-08 Thread Rich Bowen
Dear Apache Enthusiast,

This is your FINAL reminder that the Call for Papers (CFP) for ApacheCon
Miami is closing this weekend - February 11th. This is your final
opportunity to submit a talk for consideration at this event.

This year, we are running several mini conferences in conjunction with
the main event, so if you're submitting for one of those events, please
pay attention to the instructions below.

Apache: Big Data
* Event information:
http://events.linuxfoundation.org/events/apache-big-data-north-america
* CFP:
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp

Apache: IoT (Internet of Things)
* Event Information: http://us.apacheiot.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'IoT' in the Target Audience field)

CloudStack Collaboration Conference
* Event information: http://us.cloudstackcollab.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'CloudStack' in the Target Audience field)

FlexJS Summit
* Event information - http://us.apacheflexjs.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'Flex' in the Target Audience field)

TomcatCon
* Event information - https://tomcat.apache.org/conference.html
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'Tomcat' in the Target Audience field)

All other topics and projects
* Event information -
http://events.linuxfoundation.org/events/apachecon-north-america/program/about
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp

Admission to any of these events also grants you access to all of the
others.

Thanks, and we look forward to seeing you in Miami!

-- 
Rich Bowen
VP Conferences, Apache Software Foundation
rbo...@apache.org
Twitter: @apachecon



(You are receiving this email because you are subscribed to a dev@ or
users@ list of some Apache Software Foundation project. If you do not
wish to receive email from these lists any more, you must follow that
list's unsubscription procedure. View the headers of this message for
unsubscription instructions.)


[jira] [Resolved] (HBASE-17607) Rest api for scan should return 404 when table not exists

2017-02-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-17607.
---
Resolution: Duplicate

Duplicated by HBASE-17603.

> Rest api for scan should return 404 when table not exists
> -
>
> Key: HBASE-17607
> URL: https://issues.apache.org/jira/browse/HBASE-17607
> Project: HBase
>  Issue Type: Bug
>  Components: REST, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
>
> The problem is introduced after HBASE-17508. After HBASE-17508 we will not 
> contact RS when getScanner. So for rest, get scanner will not return 404 
> either. But we should get a 404 when fetching data from the scanner but now 
> it will return 204.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17613) avoid copy of family when initializing the FSWALEntry

2017-02-08 Thread ChiaPing Tsai (JIRA)
ChiaPing Tsai created HBASE-17613:
-

 Summary: avoid copy of family when initializing the FSWALEntry
 Key: HBASE-17613
 URL: https://issues.apache.org/jira/browse/HBASE-17613
 Project: HBase
  Issue Type: Improvement
Reporter: ChiaPing Tsai
Priority: Minor
 Fix For: 2.0.0


We should compare the families before cloning it.
{noformat}
Set familySet = Sets.newTreeSet(Bytes.BYTES_COMPARATOR);
for (Cell cell : cells) {
  if (!CellUtil.matchingFamily(cell, WALEdit.METAFAMILY)) {
  // TODO: Avoid this clone?
familySet.add(CellUtil.cloneFamily(cell));
  }
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)