HBase Negation or NOT operator
Hello All, Does HBase not support an SQL NOT operator on complex filters? I would like to filter out whatever matches a complex nested filter. my use case is to parse a query like this(below) and build a HBase filter from it. (field1=value1 AND NOT ((field2=value2 OR field3=value3) AND field4=value4)) How to go about this , any ideas? What will be a better approach - implement a custom filter that excludes a row qualified by another filter or to convert input query into an opposite query. Thanks, Ashwin
Re: user_permission ERROR: Unknown table
Well we are trying to find out why our application works when we use 'Selectors' table. When we use 'Selectors2' it works just fine. So we wanted to see if it was a permission error. That is why we tried out user_permissions, but when they gave errors we wondered if that might enforce that maybe it is a permissions problem. bg -- View this message in context: http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050838.html Sent from the HBase User mailing list archive at Nabble.com.
Running HBase Client in WebSphere Application Server
Hello All, Has anyone experience in using HBase (Client) in WebSphere (8.0)? My application is processing messages from a MDB an writes them into a HBase Cluster. For testing this, I have used TomEE and everything worked perfectly. But now in the production environment using WebSphere Application Server, I have some problems. First the login makes some problems: Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException bm.security.auth.module.LinuxLoginModuleat com.ibm.security.auth.module.LinuxLoginModule.login(LinuxLoginModule.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:795) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:209) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:732) at java.security.AccessController.doPrivileged(AccessController.java:314) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:729) at javax.security.auth.login.LoginContext.login(LoginContext.java:599) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:675) Secondly and more scary is my second problem: ACWA0022I: In this particular case, a request was made to use the WorkArea service outside the lifecycle of a valid container (e.g. EJB/Web) which is not allowed by the WorkArea service. [9/17/13 15:10:03:995 CEST] 0055 SystemOut Oat com.ibm.ws.workarea.UserWorkAreaServerImpl.begin(UserWorkAreaServerImpl.java:299) [9/17/13 15:10:03:995 CEST] 0055 SystemOut Oat ch.css.cesar.base.log4j.WebSphereLogContext.getProperty(WebSphereLogContext.java:121) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat ch.css.cesar.base.log4j.LogContextManager.getProperty(LogContextManager.java:226) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat ch.css.cesar.base.log4j.DefaultLogInterceptor.preLog(DefaultLogInterceptor.java:77) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat ch.css.cesar.base.log4j.CSSLogger.callAppenders(CSSLogger.java:49) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.log4j.Category.forcedLog(Category.java:391) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.log4j.Category.log(Category.java:856) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:209) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:815) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) [9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) Has anyone already seen those problems? Thanks, Lukas
show processlist equivalent in Hbase
Hi Guys, I want know show processlist in mysql equivalent in hbase any tool is there?. In Hbase Master webpage says only requestsPerSecond and table details only. I want know which process hitting load Please guide me. -Dhanasekaran. Did I learn something today? If not, I wasted it.
RE: HBase Negation or NOT operator
https://github.com/forcedotcom/phoenix to your help. It is worth taking a look at least. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ashwin Jain [ashvyn.j...@gmail.com] Sent: Tuesday, September 17, 2013 1:34 AM To: user@hbase.apache.org Subject: HBase Negation or NOT operator Hello All, Does HBase not support an SQL NOT operator on complex filters? I would like to filter out whatever matches a complex nested filter. my use case is to parse a query like this(below) and build a HBase filter from it. (field1=value1 AND NOT ((field2=value2 OR field3=value3) AND field4=value4)) How to go about this , any ideas? What will be a better approach - implement a custom filter that excludes a row qualified by another filter or to convert input query into an opposite query. Thanks, Ashwin Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: HBase Negation or NOT operator
You can always remove the NOT clause by changing the statement, but I'm wondering what your use case really is. HBase doesn't have secondary indexes so, unless you are doing a short-ish scan (let's say a million rows), it means you want to do a full table scan and that doesn't scale. J-D On Tue, Sep 17, 2013 at 1:34 AM, Ashwin Jain ashvyn.j...@gmail.com wrote: Hello All, Does HBase not support an SQL NOT operator on complex filters? I would like to filter out whatever matches a complex nested filter. my use case is to parse a query like this(below) and build a HBase filter from it. (field1=value1 AND NOT ((field2=value2 OR field3=value3) AND field4=value4)) How to go about this , any ideas? What will be a better approach - implement a custom filter that excludes a row qualified by another filter or to convert input query into an opposite query. Thanks, Ashwin
Re: HBase and Hive
Many thanks for the assist. That makes sense. The map keys represent the column-family columns, and the values the column values, right? I think I've made progress but now am trying to resolve a classpath issue with hive that others have run into: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit and Error: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars and defining auxlib in hive-site.xml. I noticed hive 0.11.0 comes with hbase-0.94.6.1, but I'm running hbase 0.94.11. Could that also cause incompatibilities? -Michael From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer rock...@yahoo.com Sent: Monday, September 16, 2013 4:59 PM Subject: Re: HBase and Hive hbase.columns.mapping = :key,f:vals This is where the error is. Instead of vals, you should have the name of the column under that column family. If you want to pull in all the column you can simply change it to f: and it will pull in all the columns. However be sure to change the corresponding hive column to be of a map type then. Hope this helps. On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com wrote: Hi, Newbie here. hbase 0.94.11 hadoop 1.2.1 hive 0.11.0 I've created an HBase table in hbase shell using command: create 'mytable', 'f' I've loaded data into that table using a thrift Ruby client. A table row has a string key like 'someurl.com:-mm-dd'. The column-family 'f' has a variable number of columns/cells of data that look like: 'f:n' timestamp=some timestamp, value='some JSON', where n is an integer. All this data is viewable in hbase shell and via thrift. I've created a Hive external table pointing to this HBase table with: create external table hbase_mytable (id string, vals ARRAYstring) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals) TBLPROPERTIES (hbase.table.name = mytable); This succeeds. However any Hive QL queries against this table return no data. Even select count(*) from hbase_mytable return 0 records. Anyone have a suggestion for me as to what might be missing from my hive installation that prevents it from seeing the data in the HBase table? Much appreciated, Michael -- Swarnim
hbase schema design
Howdy all, I'm trying to use hbase for the first time (plenty of other experience with RDBMS database though), and I have a couple of questions after reading The Book. I am a bit confused by the advice to reduce the row size in the hbase book. It states that every cell value is accomplished by the coordinates (row, column and timestamp). I'm just trying to be thorough, so am I to understand that the rowkey is stored and/ or read for every cell value in a record or just once per column family in a record? I am intrigued by the rows as columns design as described in the book at http://hbase.apache.org/book.html#rowkey.design. To make a long story short, I will end up with a table to store event types and number of occurrences in each day. I would prefer to have the event description as the row key and the dates when it happened as columns - up to 7300 for roughly 20 years. However, the event description is a string of 0.1 to 2Kb and if it is stored for each cell value, I will need to use a surrogate (shorter) value. Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch?
Re: HBase and Hive
hbase 0.94.6.1 and hbase 0.94.11 are compatible. Looks like Hive user mailing list would be better place for future discussion on remaining issues. On Tue, Sep 17, 2013 at 10:19 AM, Michael Kintzer rock...@yahoo.com wrote: Many thanks for the assist. That makes sense. The map keys represent the column-family columns, and the values the column values, right? I think I've made progress but now am trying to resolve a classpath issue with hive that others have run into: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit and Error: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars and defining auxlib in hive-site.xml. I noticed hive 0.11.0 comes with hbase-0.94.6.1, but I'm running hbase 0.94.11. Could that also cause incompatibilities? -Michael From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer rock...@yahoo.com Sent: Monday, September 16, 2013 4:59 PM Subject: Re: HBase and Hive hbase.columns.mapping = :key,f:vals This is where the error is. Instead of vals, you should have the name of the column under that column family. If you want to pull in all the column you can simply change it to f: and it will pull in all the columns. However be sure to change the corresponding hive column to be of a map type then. Hope this helps. On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com wrote: Hi, Newbie here. hbase 0.94.11 hadoop 1.2.1 hive 0.11.0 I've created an HBase table in hbase shell using command: create 'mytable', 'f' I've loaded data into that table using a thrift Ruby client. A table row has a string key like 'someurl.com:-mm-dd'. The column-family 'f' has a variable number of columns/cells of data that look like: 'f:n' timestamp=some timestamp, value='some JSON', where n is an integer. All this data is viewable in hbase shell and via thrift. I've created a Hive external table pointing to this HBase table with: create external table hbase_mytable (id string, vals ARRAYstring) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals) TBLPROPERTIES (hbase.table.name = mytable); This succeeds. However any Hive QL queries against this table return no data. Even select count(*) from hbase_mytable return 0 records. Anyone have a suggestion for me as to what might be missing from my hive installation that prevents it from seeing the data in the HBase table? Much appreciated, Michael -- Swarnim
Re: user_permission ERROR: Unknown table
Ah I see, well unless you setup Secure HBase there won't be any perms enforcement. So in which way is your application failing to use Selector? Do you have an error message or stack trace handy? J-D On Tue, Sep 17, 2013 at 5:43 AM, BG bge...@mitre.org wrote: Well we are trying to find out why our application works when we use 'Selectors' table. When we use 'Selectors2' it works just fine. So we wanted to see if it was a permission error. That is why we tried out user_permissions, but when they gave errors we wondered if that might enforce that maybe it is a permissions problem. bg -- View this message in context: http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050838.html Sent from the HBase User mailing list archive at Nabble.com.
Re: show processlist equivalent in Hbase
(putting cdh user in BCC, please don't cross-post) The web UIs for both the master and the region server have a section called Tasks and has a bunch of links like this: Tasks Show All Monitored Tasks Show non-RPC Tasks Show All RPC Handler Tasks Show Active RPC Calls Show Client Operations View as JSON J-D On Tue, Sep 17, 2013 at 5:41 AM, Dhanasekaran Anbalagan bugcy...@gmail.comwrote: Hi Guys, I want know show processlist in mysql equivalent in hbase any tool is there?. In Hbase Master webpage says only requestsPerSecond and table details only. I want know which process hitting load Please guide me. -Dhanasekaran. Did I learn something today? If not, I wasted it.
Re: HBase and Hive
Hi Michael, One way to solve that is creating a symbolic link from: hive directory/version/libexec/lib/zookerper-3.4.3.jar and hive directory/version/libexec/lib/hbase-0.94.6.1.jar to your hadoop lib folder: hadoop directory/version/libexec/lib. Or simply, copy those jars. I'm not sure if this is the ideal solution though. I hope this helps! 2013/9/17 Michael Kintzer rock...@yahoo.com Many thanks for the assist. That makes sense. The map keys represent the column-family columns, and the values the column values, right? I think I've made progress but now am trying to resolve a classpath issue with hive that others have run into: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit and Error: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars and defining auxlib in hive-site.xml. I noticed hive 0.11.0 comes with hbase-0.94.6.1, but I'm running hbase 0.94.11. Could that also cause incompatibilities? -Michael From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer rock...@yahoo.com Sent: Monday, September 16, 2013 4:59 PM Subject: Re: HBase and Hive hbase.columns.mapping = :key,f:vals This is where the error is. Instead of vals, you should have the name of the column under that column family. If you want to pull in all the column you can simply change it to f: and it will pull in all the columns. However be sure to change the corresponding hive column to be of a map type then. Hope this helps. On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com wrote: Hi, Newbie here. hbase 0.94.11 hadoop 1.2.1 hive 0.11.0 I've created an HBase table in hbase shell using command: create 'mytable', 'f' I've loaded data into that table using a thrift Ruby client. A table row has a string key like 'someurl.com:-mm-dd'. The column-family 'f' has a variable number of columns/cells of data that look like: 'f:n' timestamp=some timestamp, value='some JSON', where n is an integer. All this data is viewable in hbase shell and via thrift. I've created a Hive external table pointing to this HBase table with: create external table hbase_mytable (id string, vals ARRAYstring) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals) TBLPROPERTIES (hbase.table.name = mytable); This succeeds. However any Hive QL queries against this table return no data. Even select count(*) from hbase_mytable return 0 records. Anyone have a suggestion for me as to what might be missing from my hive installation that prevents it from seeing the data in the HBase table? Much appreciated, Michael -- Swarnim
Re: hbase schema design
I guess you were referring to section 6.3.2 bq. rowkey is stored and/ or read for every cell value The above is true. bq. the event description is a string of 0.1 to 2Kb You can enable Data Block encoding to reduce storage. Cheers On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Howdy all, I'm trying to use hbase for the first time (plenty of other experience with RDBMS database though), and I have a couple of questions after reading The Book. I am a bit confused by the advice to reduce the row size in the hbase book. It states that every cell value is accomplished by the coordinates (row, column and timestamp). I'm just trying to be thorough, so am I to understand that the rowkey is stored and/ or read for every cell value in a record or just once per column family in a record? I am intrigued by the rows as columns design as described in the book at http://hbase.apache.org/book.html#rowkey.design. To make a long story short, I will end up with a table to store event types and number of occurrences in each day. I would prefer to have the event description as the row key and the dates when it happened as columns - up to 7300 for roughly 20 years. However, the event description is a string of 0.1 to 2Kb and if it is stored for each cell value, I will need to use a surrogate (shorter) value. Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch?
Re: hbase schema design
w.r.t. Data Block Encoding, you can find some performance numbers here: https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337 On Tue, Sep 17, 2013 at 10:49 AM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Thank you for confirming the rowkey is written for every cell value (I was referring to 6.3.2 indeed). I have looked into data block encoding, but I'm not sure that would help me (more so if I need to link this table to a separate table later on). I will look into the surrogate value option. On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu yuzhih...@gmail.com wrote: I guess you were referring to section 6.3.2 bq. rowkey is stored and/ or read for every cell value The above is true. bq. the event description is a string of 0.1 to 2Kb You can enable Data Block encoding to reduce storage. Cheers On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy all, I'm trying to use hbase for the first time (plenty of other experience with RDBMS database though), and I have a couple of questions after reading The Book. I am a bit confused by the advice to reduce the row size in the hbase book. It states that every cell value is accomplished by the coordinates (row, column and timestamp). I'm just trying to be thorough, so am I to understand that the rowkey is stored and/ or read for every cell value in a record or just once per column family in a record? I am intrigued by the rows as columns design as described in the book at http://hbase.apache.org/book.html#rowkey.design. To make a long story short, I will end up with a table to store event types and number of occurrences in each day. I would prefer to have the event description as the row key and the dates when it happened as columns - up to 7300 for roughly 20 years. However, the event description is a string of 0.1 to 2Kb and if it is stored for each cell value, I will need to use a surrogate (shorter) value. Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch?
Re: hbase schema design
Thanks for the tip. In the data warehousing world I used to call them surrogate keys - I wonder if there's any difference between the two. On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch? There is no such functionality in HBase. What are asking for is known as a dictionary compression : unique 1-1 association between arbitrary strings and numeric values. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Tuesday, September 17, 2013 9:53 AM To: user@hbase.apache.org Subject: Re: hbase schema design I guess you were referring to section 6.3.2 bq. rowkey is stored and/ or read for every cell value The above is true. bq. the event description is a string of 0.1 to 2Kb You can enable Data Block encoding to reduce storage. Cheers On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy all, I'm trying to use hbase for the first time (plenty of other experience with RDBMS database though), and I have a couple of questions after reading The Book. I am a bit confused by the advice to reduce the row size in the hbase book. It states that every cell value is accomplished by the coordinates (row, column and timestamp). I'm just trying to be thorough, so am I to understand that the rowkey is stored and/ or read for every cell value in a record or just once per column family in a record? I am intrigued by the rows as columns design as described in the book at http://hbase.apache.org/book.html#rowkey.design. To make a long story short, I will end up with a table to store event types and number of occurrences in each day. I would prefer to have the event description as the row key and the dates when it happened as columns - up to 7300 for roughly 20 years. However, the event description is a string of 0.1 to 2Kb and if it is stored for each cell value, I will need to use a surrogate (shorter) value. Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch? Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: How to manage retry failures in the HBase client
bq. What is the formal way to request a specific documentation change? Once suggested change is acknowledged, JIRA can be opened where you attach patch. bq. Do I need to sign a contributor agreement? I don't think so. On Tue, Sep 17, 2013 at 10:48 AM, Tom Brown tombrow...@gmail.com wrote: I had read that section for those values, but it was unclear (the hbase.client.retries.number description subtly switches to describe hbase.client.pause, and I missed that context switch). If I could make a recommendation as to changing those items descriptions, I would rearrange it like so: hbase.client.pause General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. The actual retry interval is a rough function based on this setting. At first we retry at this interval but then with backoff, we pretty quickly reach retrying every ten seconds. See HConstants#RETRY_BACKOFF for how the backup ramps up. Default: 100 hbase.client.retries.number Maximum retries. Used as maximum for all retryable operations such as the getting of a cell's value, starting a row update, etc. Change this setting and hbase.client.pause to suit your workload. Default: 35 What is the formal way to request a specific documentation change? Do I need to sign a contributor agreement? --Tom On Tue, Sep 17, 2013 at 11:40 AM, Ted Yu yuzhih...@gmail.com wrote: Have you looked at http://hbase.apache.org/book.html#hbase_default_configurations where hbase.client.retries.number and hbase.client.pause are explained ? Cheers On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote: I have a region-server coprocessor that scans it's portion of a table based on a request and summarizes the results (designed this way to reduce network data transfer). In certain circumstances, the HBase cluster gets a bit overloaded, and a query will take too long. In that instance, the HBase client will retry the query (up to N times). When this happens, any other running queries will often timeout and generate retries as well. This results in the cluster becoming unresponsive, until I'm able to kill the clients that are retrying their requests. I have found the hbase.client.retries.number property, but that doesn't claim to set the number of retries, rather the amount of time between retries. Is there a different property I can use to set the maximum number of retries? Or is this property mis-documented? Thanks in advance! --Tom
Re: hbase schema design
Thank you for confirming the rowkey is written for every cell value (I was referring to 6.3.2 indeed). I have looked into data block encoding, but I'm not sure that would help me (more so if I need to link this table to a separate table later on). I will look into the surrogate value option. On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu yuzhih...@gmail.com wrote: I guess you were referring to section 6.3.2 bq. rowkey is stored and/ or read for every cell value The above is true. bq. the event description is a string of 0.1 to 2Kb You can enable Data Block encoding to reduce storage. Cheers On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy all, I'm trying to use hbase for the first time (plenty of other experience with RDBMS database though), and I have a couple of questions after reading The Book. I am a bit confused by the advice to reduce the row size in the hbase book. It states that every cell value is accomplished by the coordinates (row, column and timestamp). I'm just trying to be thorough, so am I to understand that the rowkey is stored and/ or read for every cell value in a record or just once per column family in a record? I am intrigued by the rows as columns design as described in the book at http://hbase.apache.org/book.html#rowkey.design. To make a long story short, I will end up with a table to store event types and number of occurrences in each day. I would prefer to have the event description as the row key and the dates when it happened as columns - up to 7300 for roughly 20 years. However, the event description is a string of 0.1 to 2Kb and if it is stored for each cell value, I will need to use a surrogate (shorter) value. Is there a built-in functionality to generate (integer) surrogate values in hbase that can be used on the rowkey or does it need to be hand code it from scratch?
Re: How to manage retry failures in the HBase client
I had read that section for those values, but it was unclear (the hbase.client.retries.number description subtly switches to describe hbase.client.pause, and I missed that context switch). If I could make a recommendation as to changing those items descriptions, I would rearrange it like so: hbase.client.pause General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. The actual retry interval is a rough function based on this setting. At first we retry at this interval but then with backoff, we pretty quickly reach retrying every ten seconds. See HConstants#RETRY_BACKOFF for how the backup ramps up. Default: 100 hbase.client.retries.number Maximum retries. Used as maximum for all retryable operations such as the getting of a cell's value, starting a row update, etc. Change this setting and hbase.client.pause to suit your workload. Default: 35 What is the formal way to request a specific documentation change? Do I need to sign a contributor agreement? --Tom On Tue, Sep 17, 2013 at 11:40 AM, Ted Yu yuzhih...@gmail.com wrote: Have you looked at http://hbase.apache.org/book.html#hbase_default_configurations where hbase.client.retries.number and hbase.client.pause are explained ? Cheers On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote: I have a region-server coprocessor that scans it's portion of a table based on a request and summarizes the results (designed this way to reduce network data transfer). In certain circumstances, the HBase cluster gets a bit overloaded, and a query will take too long. In that instance, the HBase client will retry the query (up to N times). When this happens, any other running queries will often timeout and generate retries as well. This results in the cluster becoming unresponsive, until I'm able to kill the clients that are retrying their requests. I have found the hbase.client.retries.number property, but that doesn't claim to set the number of retries, rather the amount of time between retries. Is there a different property I can use to set the maximum number of retries? Or is this property mis-documented? Thanks in advance! --Tom
How to manage retry failures in the HBase client
I have a region-server coprocessor that scans it's portion of a table based on a request and summarizes the results (designed this way to reduce network data transfer). In certain circumstances, the HBase cluster gets a bit overloaded, and a query will take too long. In that instance, the HBase client will retry the query (up to N times). When this happens, any other running queries will often timeout and generate retries as well. This results in the cluster becoming unresponsive, until I'm able to kill the clients that are retrying their requests. I have found the hbase.client.retries.number property, but that doesn't claim to set the number of retries, rather the amount of time between retries. Is there a different property I can use to set the maximum number of retries? Or is this property mis-documented? Thanks in advance! --Tom
Re: How to manage retry failures in the HBase client
Have you looked at http://hbase.apache.org/book.html#hbase_default_configurations where hbase.client.retries.number and hbase.client.pause are explained ? Cheers On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote: I have a region-server coprocessor that scans it's portion of a table based on a request and summarizes the results (designed this way to reduce network data transfer). In certain circumstances, the HBase cluster gets a bit overloaded, and a query will take too long. In that instance, the HBase client will retry the query (up to N times). When this happens, any other running queries will often timeout and generate retries as well. This results in the cluster becoming unresponsive, until I'm able to kill the clients that are retrying their requests. I have found the hbase.client.retries.number property, but that doesn't claim to set the number of retries, rather the amount of time between retries. Is there a different property I can use to set the maximum number of retries? Or is this property mis-documented? Thanks in advance! --Tom
Re: HBase and Hive
Thanks Ted. I'll move further questions to the hive list. From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer rock...@yahoo.com Cc: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com Sent: Tuesday, September 17, 2013 10:24 AM Subject: Re: HBase and Hive hbase 0.94.6.1 and hbase 0.94.11 are compatible. Looks like Hive user mailing list would be better place for future discussion on remaining issues. On Tue, Sep 17, 2013 at 10:19 AM, Michael Kintzer rock...@yahoo.com wrote: Many thanks for the assist. That makes sense. The map keys represent the column-family columns, and the values the column values, right? I think I've made progress but now am trying to resolve a classpath issue with hive that others have run into: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit and Error: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars and defining auxlib in hive-site.xml. I noticed hive 0.11.0 comes with hbase-0.94.6.1, but I'm running hbase 0.94.11. Could that also cause incompatibilities? -Michael From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer rock...@yahoo.com Sent: Monday, September 16, 2013 4:59 PM Subject: Re: HBase and Hive hbase.columns.mapping = :key,f:vals This is where the error is. Instead of vals, you should have the name of the column under that column family. If you want to pull in all the column you can simply change it to f: and it will pull in all the columns. However be sure to change the corresponding hive column to be of a map type then. Hope this helps. On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com wrote: Hi, Newbie here. hbase 0.94.11 hadoop 1.2.1 hive 0.11.0 I've created an HBase table in hbase shell using command: create 'mytable', 'f' I've loaded data into that table using a thrift Ruby client. A table row has a string key like 'someurl.com:-mm-dd'. The column-family 'f' has a variable number of columns/cells of data that look like: 'f:n' timestamp=some timestamp, value='some JSON', where n is an integer. All this data is viewable in hbase shell and via thrift. I've created a Hive external table pointing to this HBase table with: create external table hbase_mytable (id string, vals ARRAYstring) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals) TBLPROPERTIES (hbase.table.name = mytable); This succeeds. However any Hive QL queries against this table return no data. Even select count(*) from hbase_mytable return 0 records. Anyone have a suggestion for me as to what might be missing from my hive installation that prevents it from seeing the data in the HBase table? Much appreciated, Michael -- Swarnim
Re: user_permission ERROR: Unknown table
Ah. I assumed user_permissions would work, but i guess more set up is necessary. We have just found the problem. And of course the problem was equivalent to leaving a semicolon off your java. At some point either the 's' was dropped off or the co-processor was renamed. But for the table that works it points to 'Thecoprocessors.jar' and for the table that does not work it points to 'Thecoprocessor.jar' (without the 's') Thanks for everyone' help. Bg -- View this message in context: http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050845.html Sent from the HBase User mailing list archive at Nabble.com.
HFile2 issue
hbase0.94 default is to use HFIle2 ? HFile2 encode the data de-duplication, can be further reduced by about 20% of the storage space?How do I enable HFIle2, and set it? -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
Re: HFile2 issue
Hi Kuan, Are you migrating from a previous HBase version to 0.94? If not, all your HFiles should already be v2... JM 2013/9/17 kun yan yankunhad...@gmail.com hbase0.94 default is to use HFIle2 ? HFile2 encode the data de-duplication, can be further reduced by about 20% of the storage space?How do I enable HFIle2, and set it? -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
Re: HFile2 issue
Thanks Jean-Marc。 Now I use HBase 0.94 version 2013/9/18 Jean-Marc Spaggiari jean-m...@spaggiari.org Hi Kuan, Are you migrating from a previous HBase version to 0.94? If not, all your HFiles should already be v2... JM 2013/9/17 kun yan yankunhad...@gmail.com hbase0.94 default is to use HFIle2 ? HFile2 encode the data de-duplication, can be further reduced by about 20% of the storage space?How do I enable HFIle2, and set it? -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com