Re: Limit number of columns in column family

2013-09-19 Thread M. BagherEsmaeily
any cell in the same row.
Sorry because of my poor language!


On Thu, Sep 19, 2013 at 9:28 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi MBE,

 When you are saying cells  with least timestamp being removed you mean
 versions of the same cell? Or any cell in the same row/cf?

 JM


 2013/9/18 M. BagherEsmaeily mbesmae...@gmail.com

  Hi,
  I have a column family that I want the number of columns on it has a
  specific limit, and when this number becomes greater than the limit,
 cells
  with least timestamp being removed, like TTL on count not time.
  Please guide me to find best optimized way.
 
  Thanks.
  MBE
 



Re: Limit number of columns in column family

2013-09-19 Thread Jean-Marc Spaggiari
Don't worry for the language ;)

I don't think there is any mecanism today to limit the number of columns
into a column family.

There might be multiple options but they will all have some drawback.

On option is to have a daily mapreduce job looking at each row and doing
the cleanup. This can work if you don't have millions of huge columns
because you will have to keep track of all of them to see how many you have
and how many you need to remove...

There might be some other options, like keep the index in the column name
so you know you need to remove all column with name  XXX where XXX is the
last index value minus the numbre of columns you can to keep.

etc.

JM


2013/9/18 M. BagherEsmaeily mbesmae...@gmail.com

 any cell in the same row.
 Sorry because of my poor language!


 On Thu, Sep 19, 2013 at 9:28 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi MBE,
 
  When you are saying cells  with least timestamp being removed you mean
  versions of the same cell? Or any cell in the same row/cf?
 
  JM
 
 
  2013/9/18 M. BagherEsmaeily mbesmae...@gmail.com
 
   Hi,
   I have a column family that I want the number of columns on it has a
   specific limit, and when this number becomes greater than the limit,
  cells
   with least timestamp being removed, like TTL on count not time.
   Please guide me to find best optimized way.
  
   Thanks.
   MBE
  
 



Re: Limit number of columns in column family

2013-09-19 Thread M. BagherEsmaeily
Thanks,
I think these limiting does not optimize for my millions of records. And it
is better to change my design.


Re: Running HBase on Yarn … HoYa ?

2013-09-19 Thread Steve Loughran
On 18 September 2013 21:43, Jay Vyas jayunit...@gmail.com wrote:

 How Are vendor specific versions of hbase running on yarn? Are they using
 Hoya?



I don't who else is playing with it right now, but all it takes is a .tar
or .gz file -or path to HBASE_HOME, and execs hbase.sh after some (minor)
patching of hbase-site.xml. One irritant there is that as the .tar file
keeps bin/hbase under a build-specific path, you need to specify the hbase
version, hbase-0.95.2 purely to find that path. I'd love hbase.tar to
have a version.properties file at the root to locate things.

What do you do in RPMs there? They have all that metadata to include
version info, don't they?

-steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Namenode log - /hbase/.archive/table_name is non empty

2013-09-19 Thread Jason Huang
Thanks Ted and JM.

Jason


On Wed, Sep 18, 2013 at 6:46 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 But...

 if you can't update, then you will have to checkout the 0.94.3 version from
 SVN, apply the patch manually, build and re-deploy. Patch might be pretty
 easy to apply.

 JM


 2013/9/18 Ted Yu yuzhih...@gmail.com

  The fix is in 0.94.4
 
  It would be easier for you to upgrade to newer release since rolling
  restart is supported.
 
  Cheers
 
 
  On Wed, Sep 18, 2013 at 12:24 PM, Jason Huang jason.hu...@icare.com
  wrote:
 
   Hello,
  
   We are using hadoop 1.1.2 and HBase 0.94.3 and we found the following
   entries appear every minute in namenode's log:
  
   2013-09-17 14:00:25,710 INFO org.apache.hadoop.ipc.Server: IPC Server
   handler 5 on 54310, call delete(/hbase/.archive/mytable, false)
   from **.**.**.**:42912 error: java.io.IOException:
  /hbase/.archive/mytable
   is non empty
   .
  
   Searches found that this is likely due to HBASE-7465:
   https://issues.apache.org/jira/browse/HBASE-7465
  
   Since we do not have any plan to upgrade HBase version,  what's the
 best
   way to fix this? Could we take a source code package for 0.94.3 from
   somewhere and apply this patch manually, then rebuild the jars?  Does
  this
   patch has any other dependency code that's not in 0.94.3?
  
   thanks,
  
   Jason
  
 



Hbase in embedded mode

2013-09-19 Thread samar.opensource

Hi Guys,
Can we use HBase in a embedded more. So whole HBase should start in 
the same JVM and there should no RPC calls. Something like our embedded 
java dbs.


Do we have some like this or something close to this.
Regards,
Samar


Re: Hbase in embedded mode

2013-09-19 Thread Ted Yu
See 2.2.1 in http://hbase.apache.org/book.html#standalone_dist

On Sep 19, 2013, at 6:49 AM, samar.opensource samar.opensou...@gmail.com 
wrote:

 Hi Guys,
Can we use HBase in a embedded more. So whole HBase should start in the 
 same JVM and there should no RPC calls. Something like our embedded java dbs.
 
 Do we have some like this or something close to this.
 Regards,
 Samar


Re: openTSDB lose large amount of data when the client are writing

2013-09-19 Thread Jean-Daniel Cryans
Could happen if a region moves since locks aren't persisted, but if I were
you I'd ask on the opentsdb mailing list first.

J-D


On Thu, Sep 19, 2013 at 10:09 AM, Tianying Chang tich...@ebaysf.com wrote:

 Hi,

 I have a customer who use openTSDB. Recently we found that only less than
 10% data are written, rest are are lost. By checking the RS log, there are
 many row lock related issues, like below. It seems large amount of write to
 tsdb that need row lock caused the problem. Anyone else see similar
 problem?  Is it a bug of openTSDB? Or it is due to HBase exposed a
 vulnerable API?

 org.apache.hadoop.hbase.UnknownRowLockException: Invalid row lock
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getLockFromId(HRegionServer.java:2732)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2071)
 at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 13/09/18 12:08:30 ERROR regionserver.HRegionServer:
 org.apache.hadoop.hbase.UnknownRowLockException: -6180307918863136448
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.unlockRow(HRegionServer.java:2765)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)


 Thanks
 Tian-Ying



Re: Hbase in embedded mode

2013-09-19 Thread samar kumar
Hi Ted
   I am aware of the stand alone mode but I was looking for something which
will not have any ipc calls.  everything should be a local api call.

so  no listen to ports. eg embed dbs like derby does.

Regards
Samar
On 19 Sep 2013 19:20, Ted Yu yuzhih...@gmail.com wrote:

 See 2.2.1 in http://hbase.apache.org/book.html#standalone_dist

 On Sep 19, 2013, at 6:49 AM, samar.opensource 
 samar.opensou...@gmail.com wrote:

  Hi Guys,
 Can we use HBase in a embedded more. So whole HBase should start in
 the same JVM and there should no RPC calls. Something like our embedded
 java dbs.
 
  Do we have some like this or something close to this.
  Regards,
  Samar



Fwd: Stable version of Hadoop with Hbase

2013-09-19 Thread hadoop hive
-- Forwarded message --
From: hadoop hive hadooph...@gmail.com
Date: Thu, Sep 19, 2013 at 1:02 AM
Subject: Stable version of Hadoop
To: u...@hadoop.apache.org


Hi Folks,

I want to use hbase for my data storage on the top of HDFS, Please help me
to find out the best version which i should used , like CDH4

I data size would be around 500gb - 5Tb.

My operations would be write intensive

Thanks


Bulkload into empty table with configureIncrementalLoad()

2013-09-19 Thread Dolan Antenucci
I have about 1 billion values I am trying to load into a new HBase table
(with just one column and column family), but am running into some issues.
 Currently I am trying to use MapReduce to import these by first converting
them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.  My
code is essentially the same as this example:
https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java

The problem I'm running into is that only 1 reducer is created
by configureIncrementalLoad(), and there is not enough space on this node
to handle all this data.  configureIncrementalLoad() should start one
reducer for every region the table has, so apparently the table only has 1
region -- maybe because it is empty and brand new (my understanding of how
regions work is not crystal clear)?  The cluster has 5 region servers, so
I'd at least like that many reducers to handle this loading.

On a side note, I also tried the command line tool, completebulkload, but
am running into other issues with this (timeouts, possible heap issues) --
probably due to only one server being assigned the task of inserting all
the records (i.e. I look at the region servers' logs, and only one of the
servers has log entries; the rest are idle).

Any help is appreciated

-Dolan Antenucci


Re: Bulkload into empty table with configureIncrementalLoad()

2013-09-19 Thread Jean-Daniel Cryans
You need to create the table with pre-splits, see
http://hbase.apache.org/book.html#perf.writing

J-D


On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci antenucc...@gmail.comwrote:

 I have about 1 billion values I am trying to load into a new HBase table
 (with just one column and column family), but am running into some issues.
  Currently I am trying to use MapReduce to import these by first converting
 them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
 use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.  My
 code is essentially the same as this example:

 https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java

 The problem I'm running into is that only 1 reducer is created
 by configureIncrementalLoad(), and there is not enough space on this node
 to handle all this data.  configureIncrementalLoad() should start one
 reducer for every region the table has, so apparently the table only has 1
 region -- maybe because it is empty and brand new (my understanding of how
 regions work is not crystal clear)?  The cluster has 5 region servers, so
 I'd at least like that many reducers to handle this loading.

 On a side note, I also tried the command line tool, completebulkload, but
 am running into other issues with this (timeouts, possible heap issues) --
 probably due to only one server being assigned the task of inserting all
 the records (i.e. I look at the region servers' logs, and only one of the
 servers has log entries; the rest are idle).

 Any help is appreciated

 -Dolan Antenucci



Re: openTSDB lose large amount of data when the client are writing

2013-09-19 Thread Stack
On Thu, Sep 19, 2013 at 10:09 AM, Tianying Chang tich...@ebaysf.com wrote:

 Hi,

 I have a customer who use openTSDB. Recently we found that only less than
 10% data are written, rest are are lost. By checking the RS log, there are
 many row lock related issues, like below. It seems large amount of write to
 tsdb that need row lock caused the problem. Anyone else see similar
 problem?  Is it a bug of openTSDB? Or it is due to HBase exposed a
 vulnerable API?

 org.apache.hadoop.hbase.UnknownRowLockException: Invalid row lock
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getLockFromId(HRegionServer.java:2732)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2071)
 at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 13/09/18 12:08:30 ERROR regionserver.HRegionServer:
 org.apache.hadoop.hbase.UnknownRowLockException: -6180307918863136448
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.unlockRow(HRegionServer.java:2765)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)


 Thanks
 Tian-Ying



Local filesystem or hdfs?
St.Ack


Re: Hbase in embedded mode

2013-09-19 Thread Enis Söztutar
Right now we do not have what you suggest.

Eric has created an issue for this:
https://issues.apache.org/jira/browse/HBASE-8016

I think it makes a lot of sense, especially enabling HRegion as a library
to work on top of shared hdfs and building a simple layer to embed the
client side, etc.

The closes thing right now, is MiniHBaseCluster, but that requires an
in-memory zookeeper, master, regionserver, etc and still uses rpc's.

Enis


On Thu, Sep 19, 2013 at 11:29 AM, samar kumar samar.opensou...@gmail.comwrote:

 Hi Ted
I am aware of the stand alone mode but I was looking for something which
 will not have any ipc calls.  everything should be a local api call.

 so  no listen to ports. eg embed dbs like derby does.

 Regards
 Samar
 On 19 Sep 2013 19:20, Ted Yu yuzhih...@gmail.com wrote:

  See 2.2.1 in http://hbase.apache.org/book.html#standalone_dist
 
  On Sep 19, 2013, at 6:49 AM, samar.opensource 
  samar.opensou...@gmail.com wrote:
 
   Hi Guys,
  Can we use HBase in a embedded more. So whole HBase should start in
  the same JVM and there should no RPC calls. Something like our embedded
  java dbs.
  
   Do we have some like this or something close to this.
   Regards,
   Samar
 



openTSDB lose large amount of data when the client are writing

2013-09-19 Thread Tianying Chang
Hi, 

I have a customer who use openTSDB. Recently we found that only less than 10% 
data are written, rest are are lost. By checking the RS log, there are many row 
lock related issues, like below. It seems large amount of write to tsdb that 
need row lock caused the problem. Anyone else see similar problem?  Is it a bug 
of openTSDB? Or it is due to HBase exposed a vulnerable API? 

org.apache.hadoop.hbase.UnknownRowLockException: Invalid row lock
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getLockFromId(HRegionServer.java:2732)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2071)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
13/09/18 12:08:30 ERROR regionserver.HRegionServer: 
org.apache.hadoop.hbase.UnknownRowLockException: -6180307918863136448
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.unlockRow(HRegionServer.java:2765)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)


Thanks
Tian-Ying 


Re: Bulkload into empty table with configureIncrementalLoad()

2013-09-19 Thread Dolan Antenucci
Thanks J-D.  Any recommendations on how to determine what splits to use?
 For the keys I'm using strings, so wasn't sure what to put for my startKey
and endKey. For number of regions, I have a table pre-populated with the
same data (not using bulk load), so I can see that it has 68 regions.


On Thu, Sep 19, 2013 at 12:55 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 You need to create the table with pre-splits, see
 http://hbase.apache.org/book.html#perf.writing

 J-D


 On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci antenucc...@gmail.com
 wrote:

  I have about 1 billion values I am trying to load into a new HBase table
  (with just one column and column family), but am running into some
 issues.
   Currently I am trying to use MapReduce to import these by first
 converting
  them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
  use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.
  My
  code is essentially the same as this example:
 
 
 https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
 
  The problem I'm running into is that only 1 reducer is created
  by configureIncrementalLoad(), and there is not enough space on this node
  to handle all this data.  configureIncrementalLoad() should start one
  reducer for every region the table has, so apparently the table only has
 1
  region -- maybe because it is empty and brand new (my understanding of
 how
  regions work is not crystal clear)?  The cluster has 5 region servers, so
  I'd at least like that many reducers to handle this loading.
 
  On a side note, I also tried the command line tool, completebulkload, but
  am running into other issues with this (timeouts, possible heap issues)
 --
  probably due to only one server being assigned the task of inserting all
  the records (i.e. I look at the region servers' logs, and only one of the
  servers has log entries; the rest are idle).
 
  Any help is appreciated
 
  -Dolan Antenucci
 



storing custom bloomfilter/BitSet

2013-09-19 Thread John
Hi,

Is there a way to store a custom BitSet for every row and add new bits
while importing? I can't use the bloomfilter that is already there because
in every columnnames are 2 elements.

Here is my scenario:
My table looks like this:
rowKey1 - cf:data1,data2,  cf:data3,data4, ...
rowKey2 - cf:data234,data5. ...

the columname includes data1 and data2.

This setup  works for me now, but I try to imrpove it. I'm using the
BulkLoad feature. At first I import a CSV file that looks like this:
ROWKEY COLUMNFAMILY COLUMNAME HASH_INDEX_1 HASH_INDEX_2
rowKey1   cfdata1,data2 5
  12
rowKey1   cfdata3,data4 8
   5

For every hash in HASH_INDEX_1/2 I creat a new column with the index as a
name and the columnfamily bloomfilter1 or bloomfilter2. I store the
columname as a 4byte Integer String. For the Example above I would store
this: bloomfilter1:5 and bloomfilter2:12. This method works fine, but the
export and backtransformation to a BitSet become very slow if the
bloomfilter is to big ( 1 million). So a better solution would be to store
only the BitSet instead of a 4byte Integer for every index.

Does anyone now if it is possible to create this filter while importing the
data?

thanks


Stopping hbase results in core dump.

2013-09-19 Thread Kim Chew
Hello there,

I use stop-hbase.sh to shut down HBase but I always got a core dump,

stopping hbase./home/kchew/hbase-0.94.8/bin/stop-hbase.sh: line 58: 55477
Aborted (core dumped) nohup nice -n ${HBASE_NICENESS:-0}
$HBASE_HOME/bin/hbase --config ${HBASE_CONF_DIR} master stop $@ 
$logout 21  /dev/null

It happens no matter if I run HBase in standalone or pseudo-distributed
mode. My OS is a RHEL 6 vm running using the vmplayer. Also I have not
modify any scripts under hbase/bin.

THT

Kim


Re: Stopping hbase results in core dump.

2013-09-19 Thread Jean-Marc Spaggiari
Hi Kim,

Which java version are you using and which HBase version?

JM


2013/9/19 Kim Chew kchew...@gmail.com

 Hello there,

 I use stop-hbase.sh to shut down HBase but I always got a core dump,

 stopping hbase./home/kchew/hbase-0.94.8/bin/stop-hbase.sh: line 58: 55477
 Aborted (core dumped) nohup nice -n ${HBASE_NICENESS:-0}
 $HBASE_HOME/bin/hbase --config ${HBASE_CONF_DIR} master stop $@ 
 $logout 21  /dev/null

 It happens no matter if I run HBase in standalone or pseudo-distributed
 mode. My OS is a RHEL 6 vm running using the vmplayer. Also I have not
 modify any scripts under hbase/bin.

 THT

 Kim



Re: HFile2 issue

2013-09-19 Thread Jean-Marc Spaggiari
So you should be on V2 already all over the place. No need to set it up.


2013/9/17 kun yan yankunhad...@gmail.com

 Thanks Jean-Marc。 Now I use  HBase 0.94 version


 2013/9/18 Jean-Marc Spaggiari jean-m...@spaggiari.org

  Hi Kuan,
 
  Are you migrating from a previous HBase version to 0.94? If not, all your
  HFiles should already be v2...
 
  JM
 
 
  2013/9/17 kun yan yankunhad...@gmail.com
 
hbase0.94 default is to use HFIle2 ? HFile2 encode the data
   de-duplication, can be further reduced by about 20% of the storage
   space?How do I enable HFIle2, and set it?
  
   --
  
   In the Hadoop world, I am just a novice, explore the entire Hadoop
   ecosystem, I hope one day I can contribute their own code
  
   YanBit
   yankunhad...@gmail.com
  
 



 --

 In the Hadoop world, I am just a novice, explore the entire Hadoop
 ecosystem, I hope one day I can contribute their own code

 YanBit
 yankunhad...@gmail.com



Re: Stopping hbase results in core dump.

2013-09-19 Thread Kim Chew
Hi Jean-Marc,

JDK 1.7 and hbase-0.94.8

Kim


On Thu, Sep 19, 2013 at 5:18 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Kim,

 Which java version are you using and which HBase version?

 JM


 2013/9/19 Kim Chew kchew...@gmail.com

  Hello there,
 
  I use stop-hbase.sh to shut down HBase but I always got a core dump,
 
  stopping hbase./home/kchew/hbase-0.94.8/bin/stop-hbase.sh: line 58: 55477
  Aborted (core dumped) nohup nice -n ${HBASE_NICENESS:-0}
  $HBASE_HOME/bin/hbase --config ${HBASE_CONF_DIR} master stop $@ 
  $logout 21  /dev/null
 
  It happens no matter if I run HBase in standalone or pseudo-distributed
  mode. My OS is a RHEL 6 vm running using the vmplayer. Also I have not
  modify any scripts under hbase/bin.
 
  THT
 
  Kim
 



Re: Bulkload into empty table with configureIncrementalLoad()

2013-09-19 Thread Dolan Antenucci
To follow up on my previous question about how best to do the pre-splits, I
ended up using to following when creating my table:

admin.createTable(desc, Bytes.toBytes(0), Bytes.toBytes(2147483647),
100);

This was somewhat of a stab in the dark, but I based it
on RegionSplitter.MD5StringSplit's documentation, which said: Row are long
values in the range  = 7FFF. (Reminder: I'm using strings,
probably not uniformly distributed, as my row ID's).

It looks like about 80 of the regions received very little keys (many
received 0), and the other 20 received between 35m - 70m each.  Glancing at
the nodes responsible for the 20 popular regions, it looks like a fairly
even distribution across my cluster, so overall I'm optimistic with the
result (performance at first glance seems fine too).

Question: is there something I can do to achieve an even better
distribution across my regions?  As mentioned before, I have a table that I
populated via puts, so maybe this can be used to guide my pre-splits?  I
did try passing the result of this table's HTable.getStartKeys() (as well
as getEndKeys()) in as the splits, but got an error along the lines of key
cannot be empty.

Thanks again for your help.

-Dolan Antenucci


On Thu, Sep 19, 2013 at 2:53 PM, Dolan Antenucci antenucc...@gmail.comwrote:

 Thanks J-D.  Any recommendations on how to determine what splits to use?
  For the keys I'm using strings, so wasn't sure what to put for my startKey
 and endKey. For number of regions, I have a table pre-populated with the
 same data (not using bulk load), so I can see that it has 68 regions.


 On Thu, Sep 19, 2013 at 12:55 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 You need to create the table with pre-splits, see
 http://hbase.apache.org/book.html#perf.writing

 J-D


 On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci antenucc...@gmail.com
 wrote:

  I have about 1 billion values I am trying to load into a new HBase table
  (with just one column and column family), but am running into some
 issues.
   Currently I am trying to use MapReduce to import these by first
 converting
  them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I
 also
  use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.
  My
  code is essentially the same as this example:
 
 
 https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
 
  The problem I'm running into is that only 1 reducer is created
  by configureIncrementalLoad(), and there is not enough space on this
 node
  to handle all this data.  configureIncrementalLoad() should start one
  reducer for every region the table has, so apparently the table only
 has 1
  region -- maybe because it is empty and brand new (my understanding of
 how
  regions work is not crystal clear)?  The cluster has 5 region servers,
 so
  I'd at least like that many reducers to handle this loading.
 
  On a side note, I also tried the command line tool, completebulkload,
 but
  am running into other issues with this (timeouts, possible heap issues)
 --
  probably due to only one server being assigned the task of inserting all
  the records (i.e. I look at the region servers' logs, and only one of
 the
  servers has log entries; the rest are idle).
 
  Any help is appreciated
 
  -Dolan Antenucci
 





Re: Stopping hbase results in core dump.

2013-09-19 Thread Jean-Marc Spaggiari
Hi Kim,

Oracle JDK? Or OpenJDK?

Anything on the hbase .out file?

JM


2013/9/19 Kim Chew kchew...@gmail.com

 Hi Jean-Marc,

 JDK 1.7 and hbase-0.94.8

 Kim


 On Thu, Sep 19, 2013 at 5:18 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Kim,
 
  Which java version are you using and which HBase version?
 
  JM
 
 
  2013/9/19 Kim Chew kchew...@gmail.com
 
   Hello there,
  
   I use stop-hbase.sh to shut down HBase but I always got a core dump,
  
   stopping hbase./home/kchew/hbase-0.94.8/bin/stop-hbase.sh: line 58:
 55477
   Aborted (core dumped) nohup nice -n
 ${HBASE_NICENESS:-0}
   $HBASE_HOME/bin/hbase --config ${HBASE_CONF_DIR} master stop $@ 
   $logout 21  /dev/null
  
   It happens no matter if I run HBase in standalone or pseudo-distributed
   mode. My OS is a RHEL 6 vm running using the vmplayer. Also I have not
   modify any scripts under hbase/bin.
  
   THT
  
   Kim