Re: Need help in HBase 0.98.1 CoProcessor Execution
Hi Ted i attached the coprocessor using the shell. Coprocessor jar contains RowCountEndpoint + ExampleProtos (taken from hbase-example.jar)* do i need to add anything else in that jar?* Client has some debugging code + RowCountEndpointTest command that i used disable 'author_30YR' author_30YR',METHOD='table_att','coprocessor'='/user/cloud/ICDS/CoPro/lib/RowCountCoPro0.095.jar|org.apache.hadoop.hbase.coprocessor.RowCountEndpoint|1001' enable 'author_30YR' On Mon, Sep 29, 2014 at 8:23 PM, Ted Yu yuzhih...@gmail.com wrote: bq. rowcount endpoint and the Example protos Can you describe how you deployed the rowcount endpoint on regionservers ? bq. want to utilize Bucket Cache of HBase You need 0.96+ in order to utilize Bucket Cache Cheers On Mon, Sep 29, 2014 at 1:48 AM, Vikram Singh Chandel vikramsinghchan...@gmail.com wrote: Hi We are trying to migrating to* HBase 0.98.1(CDH 5.1.1) from 0.94.6*, to use *Bucket Cache + CoProcessor* and to check the performance improvement but looking into the API i found that a lot has changed. I tried using the HBase-example jar for the row count coprocessor, the coprocessor jar contains the *rowcount endpoint and the Example protos(do i need to add anything else) *and i used TestRowCountEndpoint code as my client (added a main method to call Coprocessor Service) Table is spilt on 13 regions over a 4 node cluster (POC test cluster) Getting following exceptions: *RS1 (Region Server 1)* Unexpected throwable object com.google.protobuf.UninitializedMessageException: Message missing required fields: count at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) at org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos$CountResponse$Builder.build(ExampleProtos.java:684) at org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos$CountResponse$Builder.build(ExampleProtos.java:628) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:5554) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3300) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3282) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29501) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:745) *RS2* (responseTooSlow): {processingtimems:37683,call:ExecService(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$CoprocessorServiceRequest),client: 10.206.55.0:53769, starttimems:1411725518216,queuetimems:3,class:HRegionServer,responsesize:199,method:ExecService} RpcServer.listener,port=60020: count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *RS3* Scanner 69 lease expired on region imsf_urg_sep23_Author,,1411488149043.8b17b191ff617b79f0e47137d969cc7e. *We basically want to utilize Bucket Cache of HBase so if there's any version of HBase which has older API code + Bucket Cache that would do for us. Because right now we are stuck with this newer HBase version* -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet. -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet.
ExportSnapshot webhdfs problems
I’m trying to use ExportSnapshot to copy a snapshot from a Hadoop 1 to a Hadoop 2 cluster using the webhdfs protocol. I’ve done this successfully before, though there are always mapper failures and retries in the job log. However, I’m not having success with a rather large table due to an excessive number of failures. The exceptions in the job log are always: 14/09/29 20:28:11 INFO mapred.JobClient: Task Id : attempt_201409241055_0024_m_05_1, Status : FAILED org.apache.hadoop.ipc.RemoteException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:290) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:98) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$2.close(WebHdfsFileSystem.java:653) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:146) … … … So I presume the real exception is taking place on the target system. However, examining the namenode logs and a handful of the datanode logs has not revealed any exceptions that correlate with those in the job log. Is there some other log I should be looking at? I reduced the number of mappers to 6 and the target cluster has 10 datanodes, so it’s hard to believe its a capacity problem. Thanks Brian
Getting Class not Found Exception while attaching CoProcessor jar( in HDFS) to table
Hi *HBase : 0.98.1 CDH 5.1.1* When i am trying to attach CoPro jar to table in RS logs i am getting following Exceptions ERROR org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost:* Failed to load coprocessor * org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro java.io.IOException: *Cannot load external coprocessor class * org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:208) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:207) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.init(RegionCoprocessorHost.java:163) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:623) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:530) at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:4137) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4448) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4421) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4377) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4328) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: *java.lang.ClassNotFoundException: org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro* at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.hadoop.hbase.util.CoprocessorClassLoader.loadClass(CoprocessorClassLoader.java:275) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:206) *command used to load the jar* alter'vsc_sample',METHOD='table_att','coprocessor'='hdfs://InCites-head.amers1b.ciscloud:8020/user/cloud/ICDS/CoPro/lib/RowCountCoPro0.004.jar|org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro|1001' -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet.
HBase 0.98.6.1 maven dependency
Hi, I'm relatively new to HBase and maven. Currently I'm trying to add dependency for hbase-0.98.6.1-hadoop2.jar to my application. But when I run 'mvn package' after adding the dependency, it fails with the following error: [ERROR] Failed to execute goal on project MyApp: Could not resolve dependencies for project com.innowireless.xcapvuze:MyApp:jar:0.0.1-SNAPSHOT: Failure to find org.apache.hbase:hbase:jar:0.98.6.1-hadoop2 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced - [Help 1] 'mvn package -U' also failed. When I inspected the local repository in .m2\repository\org\apache\hbase\hbase\0.98.6.1-hadoop2, there was no jar file, while 0.94.6 directory has one. How can I get the 0.98.6.1 jar? Should I build it myself? Or should I add each 'component' jars (hbase-client, hbase-common, hbase-protocol.) to the dependency? Thank you.
Re: HBase 0.98.6.1 maven dependency
Please add the modules (components as you mentioned) to dependency. Cheers On Sep 30, 2014, at 5:21 AM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Hi, I'm relatively new to HBase and maven. Currently I'm trying to add dependency for hbase-0.98.6.1-hadoop2.jar to my application. But when I run 'mvn package' after adding the dependency, it fails with the following error: [ERROR] Failed to execute goal on project MyApp: Could not resolve dependencies for project com.innowireless.xcapvuze:MyApp:jar:0.0.1-SNAPSHOT: Failure to find org.apache.hbase:hbase:jar:0.98.6.1-hadoop2 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced - [Help 1] 'mvn package -U' also failed. When I inspected the local repository in .m2\repository\org\apache\hbase\hbase\0.98.6.1-hadoop2, there was no jar file, while 0.94.6 directory has one. How can I get the 0.98.6.1 jar? Should I build it myself? Or should I add each 'component' jars (hbase-client, hbase-common, hbase-protocol.) to the dependency? Thank you.
Re: A use case for ttl deletion?
OP wants to know good use cases where to use ttl setting. Answer: Any situation where the cost of retaining the data exceeds the value to be gained from the data. Using ttl allows for automatic purging of data. Answer2: Any situation where you have to enforce specific retention policies for compliance reasons. As an example, not retaining client or customer access information longer than 12 months. (I can’t give a specific, but there are EU data retention laws which limit the length you can retain the data.) Again here, you want to be able to show that there is an automated method for removing aged data to ensure compliance. When you start to get in to the IoT, a lot of data is generated and the potential value from the data can easily exceed the cost of storage. While there is some value in capturing telemetry from your android phone to show the path you take from your desk down to the local starbucks and which local starbucks you go to, 3 years from now, that raw data has very little value. So it would make sense to purge it. On Sep 26, 2014, at 11:21 AM, Ted Yu yuzhih...@gmail.com wrote: This is a good writeup that should probably go to refguide. bq. example would be password reset attempts In some systems such information would have long retention period (maybe to conform to certain regulation). Cheers On Fri, Sep 26, 2014 at 9:10 AM, Wilm Schumacher wilm.schumac...@cawoom.com wrote: Hi, your mail got me thinking about a general answer. I think a good answer would be: all data that are only usefull for a specific time AND are possibly generated infinitely for a finite number of users should have a ttl. OR when the space is very small compared to the number of users. An example are e.g. cookies. A single user generates a handfull of cookie events per day. Let's just look at the generation of a session. Perhaps once a day. So for a number of finite users and finite number of data per user the number of cookies would grow and grow by day. Without any usefull purpose (under the assumption that you use such a cookie system with a session that expires). Another example would be password reset attempts or something like that in a web app. This events should expire after a number of days and should be deleted after a longer time (to say that the attempt is out of date or something like that there should be 2 different expiration times). Without that the password reset attempts would be just old junk in your db. Or you would have to make MR jobs to clean the db on a regular basis. An example could also be a aggregation service, where a user can make a list of things to be saved that are generated elsewhere (e.g. news headlines). A finite number of users would generate infinite number of rows just by waiting. So you could make policy where only the last 30 days are aggregated. And this could be implemented by a ttl. A further example would be a mechanism to prevent brute force attacks where you save the last attempts, and if a user has more than N attempts in M seconds the attempt fails. This could be implemented by a column family attempts, where the last attempts are saved. If it's larger than N = fail. And when you set the TTL to M seconds, you are ready to go. An example for the second use case (finite space for large number of users) would be a service that serves files for fast and easy sharing between the users. Paid by ads. Thus you have a large user base, but very small space. An example would be one click hosting or something like that, where the users use the files perhaps a week, and the forget anything about it. So in your policy there could be something like expire after 30 days after last use which you can implement just by a ttl and without MR jobs. All this example come from the usage of hbase for the implementation of user driven systems. Web apps or something like that. However, it should be easy to find examples for more general applications of hbase. I once read a question from a hbase user, which had the problem that the logging (which was saved in the hbase) went to large, and he only wants to save the last N days and asked for help for implemeneting a MR job which regularly kicks older logging messages. A ttl and he was good to go ;). Hope this helped. Best wishes Wilm Am 26.09.2014 um 17:20 schrieb yonghu: Hello, Can anyone give me a concrete use case for ttl deletions? I mean in which situation we should set ttl property? regards! Yong
Re: Getting Class not Found Exception while attaching CoProcessor jar( in HDFS) to table
What endpoint class is in your jar ? The ClassNotFound exception means the class given by your command cannot be found. Cheers On Sep 30, 2014, at 4:52 AM, Vikram Singh Chandel vikramsinghchan...@gmail.com wrote: Hi *HBase : 0.98.1 CDH 5.1.1* When i am trying to attach CoPro jar to table in RS logs i am getting following Exceptions ERROR org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost:* Failed to load coprocessor * org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro java.io.IOException: *Cannot load external coprocessor class * org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:208) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:207) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.init(RegionCoprocessorHost.java:163) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:623) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:530) at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:4137) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4448) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4421) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4377) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4328) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: *java.lang.ClassNotFoundException: org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro* at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.hadoop.hbase.util.CoprocessorClassLoader.loadClass(CoprocessorClassLoader.java:275) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:206) *command used to load the jar* alter'vsc_sample',METHOD='table_att','coprocessor'='hdfs://InCites-head.amers1b.ciscloud:8020/user/cloud/ICDS/CoPro/lib/RowCountCoPro0.004.jar|org.apache.hadoop.hbase.coprocessor.RowCountEndpointCoPro|1001' -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet.
Re: HBase 0.98.6.1 maven dependency
For 0.98.x you should use hbase - client. See the FAQ item on updating a maven managed project from 0.94 to 0.98: http://hbase.apache.org/book.html#d0e22846 -- Sean On Sep 30, 2014 7:20 AM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Hi, I'm relatively new to HBase and maven. Currently I'm trying to add dependency for hbase-0.98.6.1-hadoop2.jar to my application. But when I run 'mvn package' after adding the dependency, it fails with the following error: [ERROR] Failed to execute goal on project MyApp: Could not resolve dependencies for project com.innowireless.xcapvuze:MyApp:jar:0.0.1-SNAPSHOT: Failure to find org.apache.hbase:hbase:jar:0.98.6.1-hadoop2 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced - [Help 1] 'mvn package -U' also failed. When I inspected the local repository in .m2\repository\org\apache\hbase\hbase\0.98.6.1-hadoop2, there was no jar file, while 0.94.6 directory has one. How can I get the 0.98.6.1 jar? Should I build it myself? Or should I add each 'component' jars (hbase-client, hbase-common, hbase-protocol.) to the dependency? Thank you.
are column qualifiers safe as user inputed values?
Hi I'm wondering if it's safe to use user inputed values as column qualifiers. I realised there maybe a sensible size limit, but that's easily checked. The scenario is if you wanted to store simple key/value pairs into column/values like perhaps some ones preferences like : FavouriteColour=Red FavouritePet=Cat where the user may get to choose both the key and value. Basically the concern is special characters and or special parsing of the column names, as an example the column names are allegedly = family_name : column_qualifier so what happens if people put more colons in the qualifier and or escape characters like backspace or other control characters etc? Is there any danger or is it all just uninterpreted bytes values after the first colon? thanks -- Ted.
Re: are column qualifiers safe as user inputed values?
This depends more on your parsing code than on HBase. All values are converted into byte[]'s for HBase. Once your code has parsed the user input and generated the byte[], there's no place for ambiguity on the HBase side. On Tue, Sep 30, 2014 at 5:19 PM, Ted r6squee...@gmail.com wrote: Hi I'm wondering if it's safe to use user inputed values as column qualifiers. I realised there maybe a sensible size limit, but that's easily checked. The scenario is if you wanted to store simple key/value pairs into column/values like perhaps some ones preferences like : FavouriteColour=Red FavouritePet=Cat where the user may get to choose both the key and value. Basically the concern is special characters and or special parsing of the column names, as an example the column names are allegedly = family_name : column_qualifier so what happens if people put more colons in the qualifier and or escape characters like backspace or other control characters etc? Is there any danger or is it all just uninterpreted bytes values after the first colon? thanks -- Ted.