HBase Negation or NOT operator

2013-09-17 Thread Ashwin Jain
Hello All,

Does HBase not support an SQL NOT operator on complex filters? I would like
to filter out whatever matches a complex nested filter.

my use case is to parse a query like this(below) and build a HBase filter
from it.
(field1=value1 AND NOT ((field2=value2 OR field3=value3) AND field4=value4))

How to go about this , any ideas?  What will be a better approach -
implement a custom filter that excludes a row qualified by another filter
or to convert input query into an opposite query.

Thanks,
Ashwin


Re: user_permission ERROR: Unknown table

2013-09-17 Thread BG
Well we are trying to find out why our application works when we use
'Selectors' table.
When we use 'Selectors2' it works just fine. So we wanted to see if it was a
permission
error. That is why we tried out user_permissions, but when they gave errors
we wondered
if that might enforce that maybe it is a permissions problem.

bg



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050838.html
Sent from the HBase User mailing list archive at Nabble.com.


Running HBase Client in WebSphere Application Server

2013-09-17 Thread lukas.wei...@bluewin.ch




Hello All,

Has anyone experience  in using HBase (Client) in WebSphere (8.0)? My 
application is processing messages from a MDB
an writes them into a HBase Cluster. For testing this, I have used TomEE and 
everything worked perfectly.
But now in the production environment using WebSphere Application Server, I 
have some problems.

 

First the login makes some problems:

 

Caused by: javax.security.auth.login.LoginException: 
java.lang.NullPointerException
bm.security.auth.module.LinuxLoginModuleat 
com.ibm.security.auth.module.LinuxLoginModule.login(LinuxLoginModule.java:165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:795)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:209)
at javax.security.auth.login.LoginContext$5.run(LoginContext.java:732)
at 
java.security.AccessController.doPrivileged(AccessController.java:314)
at 
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:729)
at javax.security.auth.login.LoginContext.login(LoginContext.java:599)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:675)

 

 

Secondly and more scary is my second problem:

 

ACWA0022I: In this particular case, a request was made to use the WorkArea 
service outside the lifecycle of a valid container (e.g. EJB/Web) which is not 
allowed by the WorkArea service.
[9/17/13 15:10:03:995 CEST] 0055 SystemOut Oat 
com.ibm.ws.workarea.UserWorkAreaServerImpl.begin(UserWorkAreaServerImpl.java:299)
[9/17/13 15:10:03:995 CEST] 0055 SystemOut Oat 
ch.css.cesar.base.log4j.WebSphereLogContext.getProperty(WebSphereLogContext.java:121)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
ch.css.cesar.base.log4j.LogContextManager.getProperty(LogContextManager.java:226)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
ch.css.cesar.base.log4j.DefaultLogInterceptor.preLog(DefaultLogInterceptor.java:77)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
ch.css.cesar.base.log4j.CSSLogger.callAppenders(CSSLogger.java:49)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.log4j.Category.forcedLog(Category.java:391)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.log4j.Category.log(Category.java:856)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:209)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:815)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
[9/17/13 15:10:03:996 CEST] 0055 SystemOut Oat 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

 

Has anyone already seen those problems?

 

Thanks,
Lukas


 

show processlist equivalent in Hbase

2013-09-17 Thread Dhanasekaran Anbalagan
Hi Guys,

I want know show processlist in mysql equivalent in hbase any tool is
there?.
In Hbase Master webpage says only requestsPerSecond and table details only.
I want know which process hitting load

Please guide me.

-Dhanasekaran.

Did I learn something today? If not, I wasted it.


RE: HBase Negation or NOT operator

2013-09-17 Thread Vladimir Rodionov
https://github.com/forcedotcom/phoenix to your help. It is worth taking a look 
at least.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ashwin Jain [ashvyn.j...@gmail.com]
Sent: Tuesday, September 17, 2013 1:34 AM
To: user@hbase.apache.org
Subject: HBase Negation or NOT operator

Hello All,

Does HBase not support an SQL NOT operator on complex filters? I would like
to filter out whatever matches a complex nested filter.

my use case is to parse a query like this(below) and build a HBase filter
from it.
(field1=value1 AND NOT ((field2=value2 OR field3=value3) AND field4=value4))

How to go about this , any ideas?  What will be a better approach -
implement a custom filter that excludes a row qualified by another filter
or to convert input query into an opposite query.

Thanks,
Ashwin

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: HBase Negation or NOT operator

2013-09-17 Thread Jean-Daniel Cryans
You can always remove the NOT clause by changing the statement, but I'm
wondering what your use case really is. HBase doesn't have secondary
indexes so, unless you are doing a short-ish scan (let's say a million
rows), it means you want to do a full table scan and that doesn't scale.

J-D


On Tue, Sep 17, 2013 at 1:34 AM, Ashwin Jain ashvyn.j...@gmail.com wrote:

 Hello All,

 Does HBase not support an SQL NOT operator on complex filters? I would like
 to filter out whatever matches a complex nested filter.

 my use case is to parse a query like this(below) and build a HBase filter
 from it.
 (field1=value1 AND NOT ((field2=value2 OR field3=value3) AND
 field4=value4))

 How to go about this , any ideas?  What will be a better approach -
 implement a custom filter that excludes a row qualified by another filter
 or to convert input query into an opposite query.

 Thanks,
 Ashwin



Re: HBase and Hive

2013-09-17 Thread Michael Kintzer
Many thanks for the assist.  That makes sense.  The map keys represent the 
column-family columns, and the values the column values, right?

I think I've made progress but now am trying to resolve a classpath issue with 
hive that others have run into:

java.io.IOException: Cannot create an instance of InputSplit class = 
org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit


and 

Error: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException


Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars and 
defining auxlib in hive-site.xml.  I noticed hive 0.11.0 comes with 
hbase-0.94.6.1, but I'm running hbase 0.94.11.  Could that also cause 
incompatibilities?

-Michael



 From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer 
rock...@yahoo.com 
Sent: Monday, September 16, 2013 4:59 PM
Subject: Re: HBase and Hive
 


 hbase.columns.mapping = :key,f:vals


This is where the error is. Instead of vals, you should have the name of the 
column under that column family. If you want to pull in all the column you can 
simply change it to f: and it will pull in all the columns. However be sure 
to change the corresponding hive column to be of a map type then.

Hope this helps.



On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com wrote:

Hi,

Newbie here.

hbase 0.94.11
hadoop 1.2.1
hive 0.11.0

I've created an HBase table in hbase shell using command:   create 'mytable', 
'f'

I've loaded data into that table using a thrift Ruby client.   A table row has 
a string key like 'someurl.com:-mm-dd'.   The column-family 'f' has a 
variable number of columns/cells of data that look like:

'f:n'  timestamp=some timestamp, value='some JSON',  where n is an integer.

All this data is viewable in hbase shell and via thrift.

I've created a Hive external table pointing to this HBase table with:

create external table hbase_mytable (id string, vals ARRAYstring)            
                             
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
  WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals)               
                                           
  TBLPROPERTIES (hbase.table.name = mytable);


This succeeds.  However any Hive QL queries against this table return no data. 
  Even select count(*) from hbase_mytable return 0 records.

Anyone have a suggestion for me as to what might be missing from my hive 
installation that prevents it from seeing the data in the HBase table?

Much appreciated,

Michael



-- 
Swarnim 

hbase schema design

2013-09-17 Thread Adrian CAPDEFIER
Howdy all,

I'm trying to use hbase for the first time (plenty of other experience with
RDBMS database though), and I have a couple of questions after reading The
Book.

I am a bit confused by the advice to reduce the row size in the hbase
book. It states that every cell value is accomplished by the coordinates
(row, column and timestamp). I'm just trying to be thorough, so am I to
understand that the rowkey is stored and/ or read for every cell value in a
record or just once per column family in a record?

I am intrigued by the rows as columns design as described in the book at
http://hbase.apache.org/book.html#rowkey.design. To make a long story
short, I will end up with a table to store event types and number of
occurrences in each day. I would prefer to have the event description as
the row key and the dates when it happened as columns - up to 7300 for
roughly 20 years.
However, the event description is a string of 0.1 to 2Kb and if it is
stored for each cell value, I will need to use a surrogate (shorter) value.

Is there a built-in functionality to generate (integer) surrogate values in
hbase that can be used on the rowkey or does it need to be hand code it
from scratch?


Re: HBase and Hive

2013-09-17 Thread Ted Yu
hbase 0.94.6.1 and hbase 0.94.11 are compatible.

Looks like Hive user mailing list would be better place for future
discussion on remaining issues.


On Tue, Sep 17, 2013 at 10:19 AM, Michael Kintzer rock...@yahoo.com wrote:

 Many thanks for the assist.  That makes sense.  The map keys represent the
 column-family columns, and the values the column values, right?

 I think I've made progress but now am trying to resolve a classpath issue
 with hive that others have run into:

 java.io.IOException: Cannot create an instance of InputSplit class =
 org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit


 and

 Error: java.lang.ClassNotFoundException:
 org.apache.zookeeper.KeeperException


 Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars
 and defining auxlib in hive-site.xml.  I noticed hive 0.11.0 comes with
 hbase-0.94.6.1, but I'm running hbase 0.94.11.  Could that also cause
 incompatibilities?

 -Michael


 
  From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer 
 rock...@yahoo.com
 Sent: Monday, September 16, 2013 4:59 PM
 Subject: Re: HBase and Hive



  hbase.columns.mapping = :key,f:vals


 This is where the error is. Instead of vals, you should have the name of
 the column under that column family. If you want to pull in all the column
 you can simply change it to f: and it will pull in all the columns.
 However be sure to change the corresponding hive column to be of a map type
 then.

 Hope this helps.



 On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com
 wrote:

 Hi,
 
 Newbie here.
 
 hbase 0.94.11
 hadoop 1.2.1
 hive 0.11.0
 
 I've created an HBase table in hbase shell using command:   create
 'mytable', 'f'
 
 I've loaded data into that table using a thrift Ruby client.   A table
 row has a string key like 'someurl.com:-mm-dd'.   The column-family
 'f' has a variable number of columns/cells of data that look like:
 
 'f:n'  timestamp=some timestamp, value='some JSON',  where n is an
 integer.
 
 All this data is viewable in hbase shell and via thrift.
 
 I've created a Hive external table pointing to this HBase table with:
 
 create external table hbase_mytable (id string, vals ARRAYstring)

   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals)

   TBLPROPERTIES (hbase.table.name = mytable);
 
 
 This succeeds.  However any Hive QL queries against this table return no
 data.   Even select count(*) from hbase_mytable return 0 records.
 
 Anyone have a suggestion for me as to what might be missing from my hive
 installation that prevents it from seeing the data in the HBase table?
 
 Much appreciated,
 
 Michael
 


 --
 Swarnim



Re: user_permission ERROR: Unknown table

2013-09-17 Thread Jean-Daniel Cryans
Ah I see, well unless you setup Secure HBase there won't be any perms
enforcement.

So in which way is your application failing to use Selector? Do you have
an error message or stack trace handy?

J-D


On Tue, Sep 17, 2013 at 5:43 AM, BG bge...@mitre.org wrote:

 Well we are trying to find out why our application works when we use
 'Selectors' table.
 When we use 'Selectors2' it works just fine. So we wanted to see if it was
 a
 permission
 error. That is why we tried out user_permissions, but when they gave errors
 we wondered
 if that might enforce that maybe it is a permissions problem.

 bg



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050838.html
 Sent from the HBase User mailing list archive at Nabble.com.



Re: show processlist equivalent in Hbase

2013-09-17 Thread Jean-Daniel Cryans
(putting cdh user in BCC, please don't cross-post)

The web UIs for both the master and the region server have a section called
Tasks and has a bunch of links like this:

Tasks

Show All Monitored Tasks Show non-RPC Tasks Show All RPC Handler Tasks Show
Active RPC Calls Show Client Operations View as JSON

J-D


On Tue, Sep 17, 2013 at 5:41 AM, Dhanasekaran Anbalagan
bugcy...@gmail.comwrote:

 Hi Guys,

 I want know show processlist in mysql equivalent in hbase any tool is
 there?.
 In Hbase Master webpage says only requestsPerSecond and table details only.
 I want know which process hitting load

 Please guide me.

 -Dhanasekaran.

 Did I learn something today? If not, I wasted it.



Re: HBase and Hive

2013-09-17 Thread Pedro Assis
Hi Michael,

One way to solve that is creating a symbolic link from: hive
directory/version/libexec/lib/zookerper-3.4.3.jar and hive
directory/version/libexec/lib/hbase-0.94.6.1.jar to your hadoop lib
folder: hadoop directory/version/libexec/lib. Or simply, copy those
jars.

I'm not sure if this is the ideal solution though.
I hope this helps!


2013/9/17 Michael Kintzer rock...@yahoo.com

 Many thanks for the assist.  That makes sense.  The map keys represent the
 column-family columns, and the values the column values, right?

 I think I've made progress but now am trying to resolve a classpath issue
 with hive that others have run into:

 java.io.IOException: Cannot create an instance of InputSplit class =
 org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit


 and

 Error: java.lang.ClassNotFoundException:
 org.apache.zookeeper.KeeperException


 Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars
 and defining auxlib in hive-site.xml.  I noticed hive 0.11.0 comes with
 hbase-0.94.6.1, but I'm running hbase 0.94.11.  Could that also cause
 incompatibilities?

 -Michael


 
  From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer 
 rock...@yahoo.com
 Sent: Monday, September 16, 2013 4:59 PM
 Subject: Re: HBase and Hive



  hbase.columns.mapping = :key,f:vals


 This is where the error is. Instead of vals, you should have the name of
 the column under that column family. If you want to pull in all the column
 you can simply change it to f: and it will pull in all the columns.
 However be sure to change the corresponding hive column to be of a map type
 then.

 Hope this helps.



 On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com
 wrote:

 Hi,
 
 Newbie here.
 
 hbase 0.94.11
 hadoop 1.2.1
 hive 0.11.0
 
 I've created an HBase table in hbase shell using command:   create
 'mytable', 'f'
 
 I've loaded data into that table using a thrift Ruby client.   A table
 row has a string key like 'someurl.com:-mm-dd'.   The column-family
 'f' has a variable number of columns/cells of data that look like:
 
 'f:n'  timestamp=some timestamp, value='some JSON',  where n is an
 integer.
 
 All this data is viewable in hbase shell and via thrift.
 
 I've created a Hive external table pointing to this HBase table with:
 
 create external table hbase_mytable (id string, vals ARRAYstring)

   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals)

   TBLPROPERTIES (hbase.table.name = mytable);
 
 
 This succeeds.  However any Hive QL queries against this table return no
 data.   Even select count(*) from hbase_mytable return 0 records.
 
 Anyone have a suggestion for me as to what might be missing from my hive
 installation that prevents it from seeing the data in the HBase table?
 
 Much appreciated,
 
 Michael
 


 --
 Swarnim



Re: hbase schema design

2013-09-17 Thread Ted Yu
I guess you were referring to section 6.3.2

bq. rowkey is stored and/ or read for every cell value

The above is true.

bq. the event description is a string of 0.1 to 2Kb

You can enable Data Block encoding to reduce storage.

Cheers



On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.comwrote:

 Howdy all,

 I'm trying to use hbase for the first time (plenty of other experience with
 RDBMS database though), and I have a couple of questions after reading The
 Book.

 I am a bit confused by the advice to reduce the row size in the hbase
 book. It states that every cell value is accomplished by the coordinates
 (row, column and timestamp). I'm just trying to be thorough, so am I to
 understand that the rowkey is stored and/ or read for every cell value in a
 record or just once per column family in a record?

 I am intrigued by the rows as columns design as described in the book at
 http://hbase.apache.org/book.html#rowkey.design. To make a long story
 short, I will end up with a table to store event types and number of
 occurrences in each day. I would prefer to have the event description as
 the row key and the dates when it happened as columns - up to 7300 for
 roughly 20 years.
 However, the event description is a string of 0.1 to 2Kb and if it is
 stored for each cell value, I will need to use a surrogate (shorter) value.

 Is there a built-in functionality to generate (integer) surrogate values in
 hbase that can be used on the rowkey or does it need to be hand code it
 from scratch?



Re: hbase schema design

2013-09-17 Thread Ted Yu
w.r.t. Data Block Encoding, you can find some performance numbers here:

https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337


On Tue, Sep 17, 2013 at 10:49 AM, Adrian CAPDEFIER
chivas314...@gmail.comwrote:

 Thank you for confirming the rowkey is written for every cell value (I was
 referring to 6.3.2 indeed). I have looked into data block encoding, but I'm
 not sure that would help me (more so if I need to link this table to a
 separate table later on).

 I will look into the surrogate value option.




 On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu yuzhih...@gmail.com wrote:

  I guess you were referring to section 6.3.2
 
  bq. rowkey is stored and/ or read for every cell value
 
  The above is true.
 
  bq. the event description is a string of 0.1 to 2Kb
 
  You can enable Data Block encoding to reduce storage.
 
  Cheers
 
 
 
  On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER 
 chivas314...@gmail.com
  wrote:
 
   Howdy all,
  
   I'm trying to use hbase for the first time (plenty of other experience
  with
   RDBMS database though), and I have a couple of questions after reading
  The
   Book.
  
   I am a bit confused by the advice to reduce the row size in the hbase
   book. It states that every cell value is accomplished by the
 coordinates
   (row, column and timestamp). I'm just trying to be thorough, so am I to
   understand that the rowkey is stored and/ or read for every cell value
  in a
   record or just once per column family in a record?
  
   I am intrigued by the rows as columns design as described in the book
 at
   http://hbase.apache.org/book.html#rowkey.design. To make a long story
   short, I will end up with a table to store event types and number of
   occurrences in each day. I would prefer to have the event description
 as
   the row key and the dates when it happened as columns - up to 7300 for
   roughly 20 years.
   However, the event description is a string of 0.1 to 2Kb and if it is
   stored for each cell value, I will need to use a surrogate (shorter)
  value.
  
   Is there a built-in functionality to generate (integer) surrogate
 values
  in
   hbase that can be used on the rowkey or does it need to be hand code it
   from scratch?
  
 



Re: hbase schema design

2013-09-17 Thread Adrian CAPDEFIER
Thanks for the tip. In the data warehousing world I used to call them
surrogate keys - I wonder if there's any difference between the two.


On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

  Is there a built-in functionality to generate (integer) surrogate values
 in
  hbase that can be used on the rowkey or does it need to be hand code it
  from scratch?

 There is no such functionality in HBase. What are asking for is known as a
 dictionary compression :
 unique 1-1 association between arbitrary strings and numeric values.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ted Yu [yuzhih...@gmail.com]
 Sent: Tuesday, September 17, 2013 9:53 AM
 To: user@hbase.apache.org
 Subject: Re: hbase schema design

 I guess you were referring to section 6.3.2

 bq. rowkey is stored and/ or read for every cell value

 The above is true.

 bq. the event description is a string of 0.1 to 2Kb

 You can enable Data Block encoding to reduce storage.

 Cheers



 On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.com
 wrote:

  Howdy all,
 
  I'm trying to use hbase for the first time (plenty of other experience
 with
  RDBMS database though), and I have a couple of questions after reading
 The
  Book.
 
  I am a bit confused by the advice to reduce the row size in the hbase
  book. It states that every cell value is accomplished by the coordinates
  (row, column and timestamp). I'm just trying to be thorough, so am I to
  understand that the rowkey is stored and/ or read for every cell value
 in a
  record or just once per column family in a record?
 
  I am intrigued by the rows as columns design as described in the book at
  http://hbase.apache.org/book.html#rowkey.design. To make a long story
  short, I will end up with a table to store event types and number of
  occurrences in each day. I would prefer to have the event description as
  the row key and the dates when it happened as columns - up to 7300 for
  roughly 20 years.
  However, the event description is a string of 0.1 to 2Kb and if it is
  stored for each cell value, I will need to use a surrogate (shorter)
 value.
 
  Is there a built-in functionality to generate (integer) surrogate values
 in
  hbase that can be used on the rowkey or does it need to be hand code it
  from scratch?
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



Re: How to manage retry failures in the HBase client

2013-09-17 Thread Ted Yu
bq. What is the formal way to request a specific documentation change?

Once suggested change is acknowledged, JIRA can be opened where you attach
patch.

bq. Do I need to sign a contributor agreement?

I don't think so.


On Tue, Sep 17, 2013 at 10:48 AM, Tom Brown tombrow...@gmail.com wrote:

 I had read that section for those values, but it was unclear (the
 hbase.client.retries.number description subtly switches to describe
 hbase.client.pause, and I missed that context switch).

 If I could make a recommendation as to changing those items descriptions, I
 would rearrange it like so:

 hbase.client.pause
 General client pause value. Used mostly as value to wait before running a
 retry of a failed get, region lookup, etc. The actual retry interval is a
 rough function based on this setting. At first we retry at this interval
 but then with backoff, we pretty quickly reach retrying every ten seconds.
 See HConstants#RETRY_BACKOFF for how the backup ramps up.

 Default: 100

 hbase.client.retries.number
 Maximum retries. Used as maximum for all retryable operations such as the
 getting of a cell's value, starting a row update, etc. Change this setting
 and hbase.client.pause to suit your workload.

 Default: 35


 What is the formal way to request a specific documentation change? Do I
 need to sign a contributor agreement?

 --Tom


 On Tue, Sep 17, 2013 at 11:40 AM, Ted Yu yuzhih...@gmail.com wrote:

  Have you looked at
  http://hbase.apache.org/book.html#hbase_default_configurations where
  hbase.client.retries.number
  and hbase.client.pause are explained ?
 
  Cheers
 
 
  On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com
 wrote:
 
   I have a region-server coprocessor that scans it's portion of a table
  based
   on a request and summarizes the results (designed this way to reduce
   network data transfer).
  
   In certain circumstances, the HBase cluster gets a bit overloaded, and
 a
   query will take too long. In that instance, the HBase client will retry
  the
   query (up to N times). When this happens, any other running queries
 will
   often timeout and generate retries as well. This results in the cluster
   becoming unresponsive, until I'm able to kill the clients that are
  retrying
   their requests.
  
   I have found the hbase.client.retries.number property, but that
 doesn't
   claim to set the number of retries, rather the amount of time between
   retries. Is there a different property I can use to set the maximum
  number
   of retries? Or is this property mis-documented?
  
   Thanks in advance!
  
   --Tom
  
 



Re: hbase schema design

2013-09-17 Thread Adrian CAPDEFIER
Thank you for confirming the rowkey is written for every cell value (I was
referring to 6.3.2 indeed). I have looked into data block encoding, but I'm
not sure that would help me (more so if I need to link this table to a
separate table later on).

I will look into the surrogate value option.




On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu yuzhih...@gmail.com wrote:

 I guess you were referring to section 6.3.2

 bq. rowkey is stored and/ or read for every cell value

 The above is true.

 bq. the event description is a string of 0.1 to 2Kb

 You can enable Data Block encoding to reduce storage.

 Cheers



 On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER chivas314...@gmail.com
 wrote:

  Howdy all,
 
  I'm trying to use hbase for the first time (plenty of other experience
 with
  RDBMS database though), and I have a couple of questions after reading
 The
  Book.
 
  I am a bit confused by the advice to reduce the row size in the hbase
  book. It states that every cell value is accomplished by the coordinates
  (row, column and timestamp). I'm just trying to be thorough, so am I to
  understand that the rowkey is stored and/ or read for every cell value
 in a
  record or just once per column family in a record?
 
  I am intrigued by the rows as columns design as described in the book at
  http://hbase.apache.org/book.html#rowkey.design. To make a long story
  short, I will end up with a table to store event types and number of
  occurrences in each day. I would prefer to have the event description as
  the row key and the dates when it happened as columns - up to 7300 for
  roughly 20 years.
  However, the event description is a string of 0.1 to 2Kb and if it is
  stored for each cell value, I will need to use a surrogate (shorter)
 value.
 
  Is there a built-in functionality to generate (integer) surrogate values
 in
  hbase that can be used on the rowkey or does it need to be hand code it
  from scratch?
 



Re: How to manage retry failures in the HBase client

2013-09-17 Thread Tom Brown
I had read that section for those values, but it was unclear (the
hbase.client.retries.number description subtly switches to describe
hbase.client.pause, and I missed that context switch).

If I could make a recommendation as to changing those items descriptions, I
would rearrange it like so:

hbase.client.pause
General client pause value. Used mostly as value to wait before running a
retry of a failed get, region lookup, etc. The actual retry interval is a
rough function based on this setting. At first we retry at this interval
but then with backoff, we pretty quickly reach retrying every ten seconds.
See HConstants#RETRY_BACKOFF for how the backup ramps up.

Default: 100

hbase.client.retries.number
Maximum retries. Used as maximum for all retryable operations such as the
getting of a cell's value, starting a row update, etc. Change this setting
and hbase.client.pause to suit your workload.

Default: 35


What is the formal way to request a specific documentation change? Do I
need to sign a contributor agreement?

--Tom


On Tue, Sep 17, 2013 at 11:40 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you looked at
 http://hbase.apache.org/book.html#hbase_default_configurations where
 hbase.client.retries.number
 and hbase.client.pause are explained ?

 Cheers


 On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote:

  I have a region-server coprocessor that scans it's portion of a table
 based
  on a request and summarizes the results (designed this way to reduce
  network data transfer).
 
  In certain circumstances, the HBase cluster gets a bit overloaded, and a
  query will take too long. In that instance, the HBase client will retry
 the
  query (up to N times). When this happens, any other running queries will
  often timeout and generate retries as well. This results in the cluster
  becoming unresponsive, until I'm able to kill the clients that are
 retrying
  their requests.
 
  I have found the hbase.client.retries.number property, but that doesn't
  claim to set the number of retries, rather the amount of time between
  retries. Is there a different property I can use to set the maximum
 number
  of retries? Or is this property mis-documented?
 
  Thanks in advance!
 
  --Tom
 



How to manage retry failures in the HBase client

2013-09-17 Thread Tom Brown
I have a region-server coprocessor that scans it's portion of a table based
on a request and summarizes the results (designed this way to reduce
network data transfer).

In certain circumstances, the HBase cluster gets a bit overloaded, and a
query will take too long. In that instance, the HBase client will retry the
query (up to N times). When this happens, any other running queries will
often timeout and generate retries as well. This results in the cluster
becoming unresponsive, until I'm able to kill the clients that are retrying
their requests.

I have found the hbase.client.retries.number property, but that doesn't
claim to set the number of retries, rather the amount of time between
retries. Is there a different property I can use to set the maximum number
of retries? Or is this property mis-documented?

Thanks in advance!

--Tom


Re: How to manage retry failures in the HBase client

2013-09-17 Thread Ted Yu
Have you looked at
http://hbase.apache.org/book.html#hbase_default_configurations where
hbase.client.retries.number
and hbase.client.pause are explained ?

Cheers


On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote:

 I have a region-server coprocessor that scans it's portion of a table based
 on a request and summarizes the results (designed this way to reduce
 network data transfer).

 In certain circumstances, the HBase cluster gets a bit overloaded, and a
 query will take too long. In that instance, the HBase client will retry the
 query (up to N times). When this happens, any other running queries will
 often timeout and generate retries as well. This results in the cluster
 becoming unresponsive, until I'm able to kill the clients that are retrying
 their requests.

 I have found the hbase.client.retries.number property, but that doesn't
 claim to set the number of retries, rather the amount of time between
 retries. Is there a different property I can use to set the maximum number
 of retries? Or is this property mis-documented?

 Thanks in advance!

 --Tom



Re: HBase and Hive

2013-09-17 Thread Michael Kintzer
Thanks Ted.  I'll move further questions to the hive list.



 From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer 
rock...@yahoo.com 
Cc: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com 
Sent: Tuesday, September 17, 2013 10:24 AM
Subject: Re: HBase and Hive
 

hbase 0.94.6.1 and hbase 0.94.11 are compatible.

Looks like Hive user mailing list would be better place for future
discussion on remaining issues.


On Tue, Sep 17, 2013 at 10:19 AM, Michael Kintzer rock...@yahoo.com wrote:

 Many thanks for the assist.  That makes sense.  The map keys represent the
 column-family columns, and the values the column values, right?

 I think I've made progress but now am trying to resolve a classpath issue
 with hive that others have run into:

 java.io.IOException: Cannot create an instance of InputSplit class =
 org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit


 and

 Error: java.lang.ClassNotFoundException:
 org.apache.zookeeper.KeeperException


 Trying to solve that by setting up $HIVE_HOME/auxlib with the correct jars
 and defining auxlib in hive-site.xml.  I noticed hive 0.11.0 comes with
 hbase-0.94.6.1, but I'm running hbase 0.94.11.  Could that also cause
 incompatibilities?

 -Michael


 
  From: kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org; Michael Kintzer 
 rock...@yahoo.com
 Sent: Monday, September 16, 2013 4:59 PM
 Subject: Re: HBase and Hive



  hbase.columns.mapping = :key,f:vals


 This is where the error is. Instead of vals, you should have the name of
 the column under that column family. If you want to pull in all the column
 you can simply change it to f: and it will pull in all the columns.
 However be sure to change the corresponding hive column to be of a map type
 then.

 Hope this helps.



 On Mon, Sep 16, 2013 at 6:33 PM, Michael Kintzer rock...@yahoo.com
 wrote:

 Hi,
 
 Newbie here.
 
 hbase 0.94.11
 hadoop 1.2.1
 hive 0.11.0
 
 I've created an HBase table in hbase shell using command:   create
 'mytable', 'f'
 
 I've loaded data into that table using a thrift Ruby client.   A table
 row has a string key like 'someurl.com:-mm-dd'.   The column-family
 'f' has a variable number of columns/cells of data that look like:
 
 'f:n'  timestamp=some timestamp, value='some JSON',  where n is an
 integer.
 
 All this data is viewable in hbase shell and via thrift.
 
 I've created a Hive external table pointing to this HBase table with:
 
 create external table hbase_mytable (id string, vals ARRAYstring)

   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,f:vals)

   TBLPROPERTIES (hbase.table.name = mytable);
 
 
 This succeeds.  However any Hive QL queries against this table return no
 data.   Even select count(*) from hbase_mytable return 0 records.
 
 Anyone have a suggestion for me as to what might be missing from my hive
 installation that prevents it from seeing the data in the HBase table?
 
 Much appreciated,
 
 Michael
 


 --
 Swarnim


Re: user_permission ERROR: Unknown table

2013-09-17 Thread BG
Ah. I assumed user_permissions would work, but i guess more set up is
necessary.

We have just found the problem. And of course the problem was equivalent to
leaving a semicolon off
your java. At some point either the 's' was dropped off or the co-processor
was renamed.

But for the table that works it points to 'Thecoprocessors.jar' and for the
table
that does not work it points to 'Thecoprocessor.jar' (without the 's')

Thanks for everyone' help.

Bg



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050845.html
Sent from the HBase User mailing list archive at Nabble.com.


HFile2 issue

2013-09-17 Thread kun yan
 hbase0.94 default is to use HFIle2 ? HFile2 encode the data
de-duplication, can be further reduced by about 20% of the storage
space?How do I enable HFIle2, and set it?

-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


Re: HFile2 issue

2013-09-17 Thread Jean-Marc Spaggiari
Hi Kuan,

Are you migrating from a previous HBase version to 0.94? If not, all your
HFiles should already be v2...

JM


2013/9/17 kun yan yankunhad...@gmail.com

  hbase0.94 default is to use HFIle2 ? HFile2 encode the data
 de-duplication, can be further reduced by about 20% of the storage
 space?How do I enable HFIle2, and set it?

 --

 In the Hadoop world, I am just a novice, explore the entire Hadoop
 ecosystem, I hope one day I can contribute their own code

 YanBit
 yankunhad...@gmail.com



Re: HFile2 issue

2013-09-17 Thread kun yan
Thanks Jean-Marc。 Now I use  HBase 0.94 version


2013/9/18 Jean-Marc Spaggiari jean-m...@spaggiari.org

 Hi Kuan,

 Are you migrating from a previous HBase version to 0.94? If not, all your
 HFiles should already be v2...

 JM


 2013/9/17 kun yan yankunhad...@gmail.com

   hbase0.94 default is to use HFIle2 ? HFile2 encode the data
  de-duplication, can be further reduced by about 20% of the storage
  space?How do I enable HFIle2, and set it?
 
  --
 
  In the Hadoop world, I am just a novice, explore the entire Hadoop
  ecosystem, I hope one day I can contribute their own code
 
  YanBit
  yankunhad...@gmail.com
 




-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com