How to scan only Memstore from end point co-processor

2015-06-01 Thread Gautam Borah
Hi all,

Here is our use case,

We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last 10-15
mins, every 10 minute.

Is there a way to only query in the MemStore from the end point
co-processor? The periodic job scans for data using a time range. We would
like to implement a simple logic,

a. if query time range is within MemStore's TimeRangeTracker, then query
only memstore.
b. If end Time of the query time range is within MemStore's
TimeRangeTracker, but query start Time is outside MemStore's
TimeRangeTracker (memstore flush happened), then query both MemStore and
Files.
c. If start time and end time of the query is outside of MemStore
TimeRangeTracker we query only files.

The incoming data is time series and we do not allow old data (out of sync
from clock) to come into the system(HBase).

Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
this available in Trunk?

Also, how do I access the Memstore for a Column Family in the end point
co-processor from CoprocessorEnvironment?


Re: How to scan only Memstore from end point co-processor

2015-06-01 Thread ramkrishna vasudevan
We have a postScannerOpen hook in the CP but that may not give you a direct
access to know which one are the internal scanners on the Memstore and
which one are on the store files. But this is possible but we may need to
add some new hooks at this place where we explicitly add the internal
scanners required for a scan.

But still a general question - are you sure that your data will be only in
the memstore and that the latest data would not have been flushed by that
time from your memstore to the Hfiles.  I see that your scenario is write
centric and how can you guarentee your data can be in memstore only?
Though your time range may say it is the latest data (may be 10 to 15 min)
but you should be able to configure your memstore flushing in such a way
that there are no flushes happening for the latest data in that 10 to 15
min time.  Just saying my thoughts here.




On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah gbo...@appdynamics.com
wrote:

 Hi all,

 Here is our use case,

 We have a very write heavy cluster. Also we run periodic end point co
 processor based jobs that operate on the data written in the last 10-15
 mins, every 10 minute.

 Is there a way to only query in the MemStore from the end point
 co-processor? The periodic job scans for data using a time range. We would
 like to implement a simple logic,

 a. if query time range is within MemStore's TimeRangeTracker, then query
 only memstore.
 b. If end Time of the query time range is within MemStore's
 TimeRangeTracker, but query start Time is outside MemStore's
 TimeRangeTracker (memstore flush happened), then query both MemStore and
 Files.
 c. If start time and end time of the query is outside of MemStore
 TimeRangeTracker we query only files.

 The incoming data is time series and we do not allow old data (out of sync
 from clock) to come into the system(HBase).

 Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
 that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
 this available in Trunk?

 Also, how do I access the Memstore for a Column Family in the end point
 co-processor from CoprocessorEnvironment?



HBase client: refreshing the connection

2015-06-01 Thread Hariharan_Sethuraman
Hi All,

We are using 0.94.15 in our Opendaylight/TSDR project currently.

We observed put operation hanged for 20 mins (with all default timeouts) and 
then throws an IOException. Even when we re-attempt the same put operation, it 
hangs for 20 mins again. We observed there is an zxid mismatch on hbase server 
logs.

We wanted to get clarified for the following items.

1)  Reducing this hanging time from 20 mins to 5 mins: Looks there are many 
timeout configuration (hbase-client, zookeeper, client.pause etc) and it 
slightly confusing how they are all calculated with backoff series. If I add 
the configuration hbase.client.retries.number=3 in hbase-site.xml will bring  
down it to 5 mins?



2)  When we receive this exception, we deletedAllConnections and subsequent 
put operation succeeded. We wish to continue this approach. Following is our 
code where we create HTable.

 HTableInterface htableResult = null;

 htableResult = htableMap.get(tableName);

..

 if (htableResult == null) {

 if (htablePool == null || htablePool.getTable(tableName) == 
null) {

 htablePool = new HTablePool(getConfiguration(), poolSize);

 }

 if ( htablePool != null){

 htableResult =   htablePool.getTable(tableName);

..

 }

  }

  htableMap.put(tableName, htableResult);



We create 5 tables in our application. Will there be 5 HConnection totally and 
each HConnection for each Table? If yes, how do I delete a connection for the 
given table as most of the delete(All)Connections in HConnectionManager are 
deprecated in 0.94.15. No alternatives given in the java doc. Even if we use 
deleteConnection, it asks for conf which doesn't bind to any table, correct?

deleteConnection
@Deprecated
public static void deleteConnection(org.apache.hadoop.conf.Configuration conf)
Deprecated.
Delete connection information for the instance specified by configuration. If 
there are no more references to it, this will then close connection to the 
zookeeper ensemble and let go of all resources.
Parameters:
conf - configuration whose identity is used to find HConnection instance.


deleteAllConnections
@Deprecated
public static void deleteAllConnections(boolean stopProxy)
Deprecated. use deleteAllConnections() instead
Delete information for all connections.
Parameters:
stopProxy - No longer used. This parameter is ignored.


deleteAllConnections
@Deprecated
public static void deleteAllConnections()
Deprecated.
Delete information for all connections.
Throws:
IOException

Thanks,
Hari


Monitor off heap Bucket Cache

2015-06-01 Thread Dejan Menges
Hi,

What's the best way to monitor / know how's bucket cache being used, how
much stuff is cached there, etc?

Our RegionServer can use 32G of heap size, so we exported HBASE_OFFHEAPSIZE
to 24G in hbase-env.sh, set hfile.block.cache.size to 0.05, and set couple
of block sizes that we know we are using knowing our usage patterns. And
this is where strange part starts - in web UI we see now, with turning this
off, with those values, that total BlockCache available is 1G - before it
was 10G. What we basically tried to achieve was to double it to 20G.

Documentation we were referring to was
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hbase_blockcache_configure.html#concept_cp3_fhy_dr_unique_1__section_m3r_2cz_dr_unique_1
as HBase book is not going into too much details how to properly configure
this and get what you want.

Btw. if we put hfile.block.cache.size to 0.2 we see in web UI that total
available cache is 24G, but then after some time we had region server
crashing. Host server have enough RAM, considering only data node and
region server running there (128G of RAM in total) so we thought we could
increase caching with turning on this functionality.

Do you maybe see what exactly we are doing wrong? How to exactly increase
offheap caching - and is it possible to monitor it anyhow, as in metrics we
don't see anything associating to it?

Thanks,
Dejan


Java Hbase Client or Rest approach

2015-06-01 Thread Mahadevappa, Shobha
Hi,
We have a java based web application.
There is a requirement to fetch the data from Hbase and build some dashboards.

What is the best way to go about fetching the data from Hbase?
1  Using java hbase client api OR
2  Using the hbase rest api.

Appreciate if anyone can provide the pros and cons of both of the above 
approaches.


Regards,
Shobha


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Re: How to scan only Memstore from end point co-processor

2015-06-01 Thread Vladimir Rodionov
InternalScan has ctor from Scan object

See https://issues.apache.org/jira/browse/HBASE-12720

You can instantiate InternalScan from Scan, set checkOnlyMemStore, then
open RegionScanner, but the best approach is
to cache data on write and run regular RegionScanner from memstore and
block cache.

best,
-Vlad




On Sun, May 31, 2015 at 11:45 PM, Anoop John anoop.hb...@gmail.com wrote:

 If your scan is having a time range specified in it, HBase internally will
 check this against the time range of files etc and will avoid those which
 are clearly out of your interested time range.  You dont have to do any
 thing for this.  Make sure you set the TimeRange for ur read

 -Anoop-

 On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

  We have a postScannerOpen hook in the CP but that may not give you a
 direct
  access to know which one are the internal scanners on the Memstore and
  which one are on the store files. But this is possible but we may need to
  add some new hooks at this place where we explicitly add the internal
  scanners required for a scan.
 
  But still a general question - are you sure that your data will be only
 in
  the memstore and that the latest data would not have been flushed by that
  time from your memstore to the Hfiles.  I see that your scenario is write
  centric and how can you guarentee your data can be in memstore only?
  Though your time range may say it is the latest data (may be 10 to 15
 min)
  but you should be able to configure your memstore flushing in such a way
  that there are no flushes happening for the latest data in that 10 to 15
  min time.  Just saying my thoughts here.
 
 
 
 
  On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah gbo...@appdynamics.com
  wrote:
 
   Hi all,
  
   Here is our use case,
  
   We have a very write heavy cluster. Also we run periodic end point co
   processor based jobs that operate on the data written in the last 10-15
   mins, every 10 minute.
  
   Is there a way to only query in the MemStore from the end point
   co-processor? The periodic job scans for data using a time range. We
  would
   like to implement a simple logic,
  
   a. if query time range is within MemStore's TimeRangeTracker, then
 query
   only memstore.
   b. If end Time of the query time range is within MemStore's
   TimeRangeTracker, but query start Time is outside MemStore's
   TimeRangeTracker (memstore flush happened), then query both MemStore
 and
   Files.
   c. If start time and end time of the query is outside of MemStore
   TimeRangeTracker we query only files.
  
   The incoming data is time series and we do not allow old data (out of
  sync
   from clock) to come into the system(HBase).
  
   Cloudera has a scanner
 org.apache.hadoop.hbase.regionserver.InternalScan,
   that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
   this available in Trunk?
  
   Also, how do I access the Memstore for a Column Family in the end point
   co-processor from CoprocessorEnvironment?
  
 



Re: hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Dejan Menges
Oh, cool, something that will push us to upgrade sooner than later :)

Just for my information - what limit was used than in 2.1 as maximum cache
block size (or whatever name it was)? Size of the block, or something else?

On Mon, Jun 1, 2015 at 5:00 PM Ted Yu yuzhih...@gmail.com wrote:

 Dejan:
 hbase.bucketcache.bucket.sizes was introduced by:
 HBASE-10641 Configurable Bucket Sizes in bucketCache

 which was integrated to 0.98.4

 HDP 2.2 has the fix while HDP 2.1 didn't.

 FYI

 On Mon, Jun 1, 2015 at 7:23 AM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi Ted,
 
  It's 0.98.0 with bunch of patches (from Hortonworks).
 
  Let me try with that key, on my way :)
 
  On Mon, Jun 1, 2015 at 4:19 PM Ted Yu yuzhih...@gmail.com wrote:
 
   Which hbase release are you using ?
  
   I seem to recall that hbase.bucketcache.bucket.sizes was the key.
  
   Cheers
  
   On Mon, Jun 1, 2015 at 7:04 AM, Dejan Menges dejan.men...@gmail.com
   wrote:
  
Hi,
   
I'm getting messages like:
   
015-06-01 14:02:29,529 WARN
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed
 allocating
   for
block ce18012f4dfa424db88e92de29e76a9b_25809098330
   
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=750465
   
at
   
org.apache.hadoop.hbase.io
   .hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)
   
at
   
org.apache.hadoop.hbase.io
  
 
 .hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)
   
at
   
org.apache.hadoop.hbase.io
   .hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)
   
at
   
org.apache.hadoop.hbase.io
   .hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)
   
at java.lang.Thread.run(Thread.java:745)
   
   
However, not sure why is this. If I understood it correctly (and
   probably I
didn't :/) this should fit in one of those:
   
   property
   
   namehbase.bucketcache.sizes/name
   
   
 value65536,131072,196608,262144,327680,393216,655360,1310720/value
   
/property
   
In the same time, hbase,bucketcache.size is 24G. Not sure what I did
(again) wrong?
   
  
 



Re: hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Ted Yu
Which hbase release are you using ?

I seem to recall that hbase.bucketcache.bucket.sizes was the key.

Cheers

On Mon, Jun 1, 2015 at 7:04 AM, Dejan Menges dejan.men...@gmail.com wrote:

 Hi,

 I'm getting messages like:

 015-06-01 14:02:29,529 WARN
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed allocating for
 block ce18012f4dfa424db88e92de29e76a9b_25809098330

 org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
 Allocation too big size=750465

 at

 org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)

 at

 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)

 at

 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)

 at

 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)

 at java.lang.Thread.run(Thread.java:745)


 However, not sure why is this. If I understood it correctly (and probably I
 didn't :/) this should fit in one of those:

property

namehbase.bucketcache.sizes/name

  value65536,131072,196608,262144,327680,393216,655360,1310720/value

 /property

 In the same time, hbase,bucketcache.size is 24G. Not sure what I did
 (again) wrong?



Re: hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Ted Yu
Dejan:
hbase.bucketcache.bucket.sizes was introduced by:
HBASE-10641 Configurable Bucket Sizes in bucketCache

which was integrated to 0.98.4

HDP 2.2 has the fix while HDP 2.1 didn't.

FYI

On Mon, Jun 1, 2015 at 7:23 AM, Dejan Menges dejan.men...@gmail.com wrote:

 Hi Ted,

 It's 0.98.0 with bunch of patches (from Hortonworks).

 Let me try with that key, on my way :)

 On Mon, Jun 1, 2015 at 4:19 PM Ted Yu yuzhih...@gmail.com wrote:

  Which hbase release are you using ?
 
  I seem to recall that hbase.bucketcache.bucket.sizes was the key.
 
  Cheers
 
  On Mon, Jun 1, 2015 at 7:04 AM, Dejan Menges dejan.men...@gmail.com
  wrote:
 
   Hi,
  
   I'm getting messages like:
  
   015-06-01 14:02:29,529 WARN
   org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed allocating
  for
   block ce18012f4dfa424db88e92de29e76a9b_25809098330
  
   org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
   Allocation too big size=750465
  
   at
  
   org.apache.hadoop.hbase.io
  .hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)
  
   at
  
   org.apache.hadoop.hbase.io
 
 .hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)
  
   at
  
   org.apache.hadoop.hbase.io
  .hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)
  
   at
  
   org.apache.hadoop.hbase.io
  .hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)
  
   at java.lang.Thread.run(Thread.java:745)
  
  
   However, not sure why is this. If I understood it correctly (and
  probably I
   didn't :/) this should fit in one of those:
  
  property
  
  namehbase.bucketcache.sizes/name
  
value65536,131072,196608,262144,327680,393216,655360,1310720/value
  
   /property
  
   In the same time, hbase,bucketcache.size is 24G. Not sure what I did
   (again) wrong?
  
 



Re: hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Dejan Menges
Hi Ted,

It's 0.98.0 with bunch of patches (from Hortonworks).

Let me try with that key, on my way :)

On Mon, Jun 1, 2015 at 4:19 PM Ted Yu yuzhih...@gmail.com wrote:

 Which hbase release are you using ?

 I seem to recall that hbase.bucketcache.bucket.sizes was the key.

 Cheers

 On Mon, Jun 1, 2015 at 7:04 AM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi,
 
  I'm getting messages like:
 
  015-06-01 14:02:29,529 WARN
  org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed allocating
 for
  block ce18012f4dfa424db88e92de29e76a9b_25809098330
 
  org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
  Allocation too big size=750465
 
  at
 
  org.apache.hadoop.hbase.io
 .hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)
 
  at
 
  org.apache.hadoop.hbase.io
 .hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)
 
  at
 
  org.apache.hadoop.hbase.io
 .hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)
 
  at
 
  org.apache.hadoop.hbase.io
 .hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)
 
  at java.lang.Thread.run(Thread.java:745)
 
 
  However, not sure why is this. If I understood it correctly (and
 probably I
  didn't :/) this should fit in one of those:
 
 property
 
 namehbase.bucketcache.sizes/name
 
   value65536,131072,196608,262144,327680,393216,655360,1310720/value
 
  /property
 
  In the same time, hbase,bucketcache.size is 24G. Not sure what I did
  (again) wrong?
 



hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Dejan Menges
Hi,

I'm getting messages like:

015-06-01 14:02:29,529 WARN
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed allocating for
block ce18012f4dfa424db88e92de29e76a9b_25809098330

org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=750465

at
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)

at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)

at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)

at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)

at java.lang.Thread.run(Thread.java:745)


However, not sure why is this. If I understood it correctly (and probably I
didn't :/) this should fit in one of those:

   property

   namehbase.bucketcache.sizes/name

 value65536,131072,196608,262144,327680,393216,655360,1310720/value

/property

In the same time, hbase,bucketcache.size is 24G. Not sure what I did
(again) wrong?


Re: hfile.bucket.BucketAllocatorException: Allocation too big size

2015-06-01 Thread Anoop John
Yes Ted is right. hbase.bucketcache.bucket.sizes is the correct config
name...  I think wrong name was added to hbase-default.xml..   There was
bug already raised for this? Some thing related to bucket cache was already
there.. Am not sure.. We need fix in xml.

-Anoop-

On Mon, Jun 1, 2015 at 7:49 PM, Ted Yu yuzhih...@gmail.com wrote:

 Which hbase release are you using ?

 I seem to recall that hbase.bucketcache.bucket.sizes was the key.

 Cheers

 On Mon, Jun 1, 2015 at 7:04 AM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi,
 
  I'm getting messages like:
 
  015-06-01 14:02:29,529 WARN
  org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Failed allocating
 for
  block ce18012f4dfa424db88e92de29e76a9b_25809098330
 
  org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
  Allocation too big size=750465
 
  at
 
 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400)
 
  at
 
 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1153)
 
  at
 
 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:703)
 
  at
 
 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:675)
 
  at java.lang.Thread.run(Thread.java:745)
 
 
  However, not sure why is this. If I understood it correctly (and
 probably I
  didn't :/) this should fit in one of those:
 
 property
 
 namehbase.bucketcache.sizes/name
 
   value65536,131072,196608,262144,327680,393216,655360,1310720/value
 
  /property
 
  In the same time, hbase,bucketcache.size is 24G. Not sure what I did
  (again) wrong?
 



Re: Monitor off heap Bucket Cache

2015-06-01 Thread Nick Dimiduk
Also note that configuration is slightly changed between 0.98 and 1.0,
see HBASE-11520. From the release note:

 Remove hbase.bucketcache.percentage.in.combinedcache. Simplifies config
of block cache. If you are using this config., after
 this patch goes in, it will be ignored. The L1 LruBlockCache will be
whatever hfile.block.cache.size is set to and the L2
 BucketCache will be whatever hbase.bucketcache.size is set to.

On Mon, Jun 1, 2015 at 8:10 AM, Stack st...@duboce.net wrote:

 On Mon, Jun 1, 2015 at 2:24 AM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi,
 
  What's the best way to monitor / know how's bucket cache being used, how
  much stuff is cached there, etc?
 
 
 See the UI on a regionserver. Look down the page to the 'Block Cache'
 section. It has detail on both onheap and LRU offheap. See also the
 documentation on offheap cache:
 http://hbase.apache.org/book.html#offheap.blockcache Look also at metrics
 where we report block cache stats as well as offheap used by the JVM.



  Our RegionServer can use 32G of heap size, so we exported
 HBASE_OFFHEAPSIZE
  to 24G in hbase-env.sh, set hfile.block.cache.size to 0.05, and set
 couple
  of block sizes that we know we are using knowing our usage patterns. And
  this is where strange part starts - in web UI we see now, with turning
 this
  off,


 Turning off what?



  with those values, that total BlockCache available is 1G - before it
  was 10G. What we basically tried to achieve was to double it to 20G.
 
 
 Are you sure this not the onheap portion of BlockCache?



  Documentation we were referring to was
 
 
 http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hbase_blockcache_configure.html#concept_cp3_fhy_dr_unique_1__section_m3r_2cz_dr_unique_1
  as HBase book is not going into too much details how to properly
 configure
  this and get what you want.
 
 
 Please let us know what is missing from here:
 http://hbase.apache.org/book.html#offheap.blockcache  We'd like to fix it.



  Btw. if we put hfile.block.cache.size to 0.2 we see in web UI that total
  available cache is 24G, but then after some time we had region server
  crashing. Host server have enough RAM, considering only data node and
  region server running there (128G of RAM in total) so we thought we could
  increase caching with turning on this functionality.
 
 
 Make sure your hbase has HBASE-11678



  Do you maybe see what exactly we are doing wrong? How to exactly increase
  offheap caching - and is it possible to monitor it anyhow, as in metrics
 we
  don't see anything associating to it?
 
 
 Please provide more detail. Your configurations and what you've changed in
 hbase-env.sh.

 St.Ack



  Thanks,
  Dejan
 



PhoenixIOException resolved only after compaction, is there a way to avoid it?

2015-06-01 Thread Siva
Hi Everyone,

We load the data to Hbase tables through BulkImports.

If the data set is small, we can query the imported data from phoenix with
no issues.

If data size is huge (with respect to our cluster, we have very small
cluster), I m encountering the following error
(org.apache.phoenix.exception.PhoenixIOException).

0: jdbc:phoenix:172.31.45.176:2181:/hbase selectcount(*)
. . . . . . . . . . . . . . . . . . . . .from  ldll_compression
 ldll join ds_compression  ds on (ds.statusid = ldll.statusid)
. . . . . . . . . . . . . . . . . . . . .where ldll.logdate  =
'2015-02-04'
. . . . . . . . . . . . . . . . . . . . .and  ldll.logdate  =
'2015-02-06'
. . . . . . . . . . . . . . . . . . . . .and ldll.dbname =
'lmguaranteedrate';
+--+
| COUNT(1) |
+--+
java.lang.RuntimeException:
org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36,
exceptions:
Mon Jun 01 13:50:57 EDT 2015, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=62358: row '' on table 'ldll_compression'
at
region=ldll_compression,,1432851434288.1a8b511def7d0c9e69a5491c6330d715.,
hostname=ip-172-31-32-181.us-west-2.compute.internal,60020,1432768597149,
seqNum=16566

at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)
at sqlline.SqlLine.print(SqlLine.java:1735)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
at sqlline.SqlLine.dispatch(SqlLine.java:821)
at sqlline.SqlLine.begin(SqlLine.java:699)
at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
at sqlline.SqlLine.main(SqlLine.java:424)

I did the major compaction for ldll_compression through Hbase
shell(major_compact 'ldll_compression'). Same query ran successfully after
the compaction.

0: jdbc:phoenix:172.31.45.176:2181:/hbase selectcount(*)
. . . . . . . . . . . . . . . . . . . . .from  ldll_compression
 ldll join ds_compression  ds on (ds.statusid = ldll.statusid)
. . . . . . . . . . . . . . . . . . . . .where ldll.logdate  =
'2015-02-04'
. . . . . . . . . . . . . . . . . . . . .and  ldll.logdate  =
'2015-02-06'
. . . . . . . . . . . . . . . . . . . . .and ldll.dbname =
'lmguaranteedrate'
. . . . . . . . . . . . . . . . . . . . . ;
+--+
| COUNT(1) |
+--+
| 13480|
+--+
1 row selected (72.36 seconds)

Did anyone face the similar issue? Is IO exception is because of Phoenix
not able to read from multiple regions since error was resolved after the
compaction? or Any other thoughts?

Thanks,
Siva.


Re: Hbase vs Cassandra

2015-06-01 Thread Jerry He
Another point to add is the new HBase read high-availability using
timeline-consistent region replicas feature from HBase 1.0 onward,
which brings HBase closer to Cassandra in term of Read Availability during
node failures.  You have a choice for Read Availability now.

https://issues.apache.org/jira/browse/HBASE-10070



On Sun, May 31, 2015 at 12:32 PM, Vladimir Rodionov vladrodio...@gmail.com
wrote:

 Couple more + for HBase

 * Coprocessor framework (custom code inside Region Server and Master
 Servers), which Cassandra is missing, afaik.
Coprocessors have been widely used by hBase users (Phoenix SQL, for
 example) since inception (in 0.92).
 * HBase security model is more mature and align well with Hadoop/HDFS
 security. Cassandra provides just basic authentication/authorization/SSL
 encryption, no Kerberos, no end-to-end data encryption, no cell level
 security.

 -Vlad

 On Sun, May 31, 2015 at 12:05 PM, lars hofhansl la...@apache.org wrote:

  You really have to try out both if you want to be sure.
 
  The fundamental differences that come to mind are:
  * HBase is always consistent. Machine outages lead to inability to read
 or
  write data on that machine. With Cassandra you can always write.
 
  * Cassandra defaults to a random partitioner, so range scans are not
  possible (by default)
  * HBase has a range partitioner (if you don't want that the client has to
  prefix the rowkey with a prefix of a hash of the rowkey). The main
 feature
  that set HBase apart are range scans.
 
  * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
  You can map reduce directly into HFiles and map those into HBase
 instantly.
 
  * Cassandra has a dedicated company supporting (and promoting) it.
  * Getting started is easier with Cassandra. For HBase you need to run
 HDFS
  and Zookeeper, etc.
  * I've heard lots of anecdotes about Cassandra working nicely with small
  cluster ( 50 nodes) and quick degenerating above that.
  * HBase does not have a query language (but you can use Phoenix for full
  SQL support)
  * HBase does not have secondary indexes (having an eventually consistent
  index, similar to what Cassandra has, is easy in HBase, but making it as
  consistent as the rest of HBase is hard)
 
  * Everything you'll hear here is biased :)
 
 
 
  From personal experience... At Salesforce we spent a few months
  prototyping various stores (including Cassandra) and arrived at HBase.
 Your
  mileage may vary.
 
 
  -- Lars
 
 
  - Original Message -
  From: Ajay ajay.ga...@gmail.com
  To: user@hbase.apache.org
  Cc:
  Sent: Friday, May 29, 2015 12:12 PM
  Subject: Hbase vs Cassandra
 
  Hi,
 
  I need some info on Hbase vs Cassandra as a data store (in general plus
  specific to time series data).
 
  The comparison in the following helps:
  1: features
  2: deployment and monitoring
  3: performance
  4: anything else
 
  Thanks
  Ajay
 



Re: zookeeper closing socket connection exception

2015-06-01 Thread Ted Yu
How many zookeeper servers do you have ?

Cheers

On Mon, Jun 1, 2015 at 12:15 PM, jeevi tesh jeevitesh...@gmail.com wrote:

 Hi,
 I'm running into this issue several times but still not able resolve kindly
 help me in this regard.
 I have written a crawler which will be keep running for several days after
 4 days of continuous interaction of data base with my application system.
 Data base fails to responsed. I'm not able to figure where things all of a
 sudden can go wrong after 4 days of proper running.
 My configuration i have used hbase 0.96.2 single server.
 jdk 1.7

 issue is this following error
 WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
 (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
 unexpected error, closing socket connection and attempting reconnect
 java.net.ConnectException: Connection refused
 If this exception happens only solution i have is restart hbase that is not
 a viable solution because that will corrupt my system data.



Re: zookeeper closing socket connection exception

2015-06-01 Thread Esteban Gutierrez
Hi Jeevi,

Have you looked into why the ZooKeeper server is no longer accepting
connections? what is the number of clients you have running per host and
what is the configured value of maxClientCnxns in the ZooKeeper servers?
Also is the issue impacting clients only or is it also impacting the
RegionServers?

cheers,
esteban.




--
Cloudera, Inc.


On Mon, Jun 1, 2015 at 12:15 PM, jeevi tesh jeevitesh...@gmail.com wrote:

 Hi,
 I'm running into this issue several times but still not able resolve kindly
 help me in this regard.
 I have written a crawler which will be keep running for several days after
 4 days of continuous interaction of data base with my application system.
 Data base fails to responsed. I'm not able to figure where things all of a
 sudden can go wrong after 4 days of proper running.
 My configuration i have used hbase 0.96.2 single server.
 jdk 1.7

 issue is this following error
 WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
 (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
 unexpected error, closing socket connection and attempting reconnect
 java.net.ConnectException: Connection refused
 If this exception happens only solution i have is restart hbase that is not
 a viable solution because that will corrupt my system data.



zookeeper closing socket connection exception

2015-06-01 Thread jeevi tesh
Hi,
I'm running into this issue several times but still not able resolve kindly
help me in this regard.
I have written a crawler which will be keep running for several days after
4 days of continuous interaction of data base with my application system.
Data base fails to responsed. I'm not able to figure where things all of a
sudden can go wrong after 4 days of proper running.
My configuration i have used hbase 0.96.2 single server.
jdk 1.7

issue is this following error
WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
(ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
If this exception happens only solution i have is restart hbase that is not
a viable solution because that will corrupt my system data.


Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
Well since you brought up coprocessors… lets talk about a lack of security and 
stability that’s been introduced by coprocessors. ;-) 

I’m not saying that you don’t want server side extensibility, but you need to 
recognize the risks introduced by coprocessors. 


 On May 31, 2015, at 3:32 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:
 
 Couple more + for HBase
 
 * Coprocessor framework (custom code inside Region Server and Master
 Servers), which Cassandra is missing, afaik.
   Coprocessors have been widely used by hBase users (Phoenix SQL, for
 example) since inception (in 0.92).
 * HBase security model is more mature and align well with Hadoop/HDFS
 security. Cassandra provides just basic authentication/authorization/SSL
 encryption, no Kerberos, no end-to-end data encryption, no cell level
 security.
 
 -Vlad
 
 On Sun, May 31, 2015 at 12:05 PM, lars hofhansl la...@apache.org wrote:
 
 You really have to try out both if you want to be sure.
 
 The fundamental differences that come to mind are:
 * HBase is always consistent. Machine outages lead to inability to read or
 write data on that machine. With Cassandra you can always write.
 
 * Cassandra defaults to a random partitioner, so range scans are not
 possible (by default)
 * HBase has a range partitioner (if you don't want that the client has to
 prefix the rowkey with a prefix of a hash of the rowkey). The main feature
 that set HBase apart are range scans.
 
 * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
 You can map reduce directly into HFiles and map those into HBase instantly.
 
 * Cassandra has a dedicated company supporting (and promoting) it.
 * Getting started is easier with Cassandra. For HBase you need to run HDFS
 and Zookeeper, etc.
 * I've heard lots of anecdotes about Cassandra working nicely with small
 cluster ( 50 nodes) and quick degenerating above that.
 * HBase does not have a query language (but you can use Phoenix for full
 SQL support)
 * HBase does not have secondary indexes (having an eventually consistent
 index, similar to what Cassandra has, is easy in HBase, but making it as
 consistent as the rest of HBase is hard)
 
 * Everything you'll hear here is biased :)
 
 
 
 From personal experience... At Salesforce we spent a few months
 prototyping various stores (including Cassandra) and arrived at HBase. Your
 mileage may vary.
 
 
 -- Lars
 
 
 - Original Message -
 From: Ajay ajay.ga...@gmail.com
 To: user@hbase.apache.org
 Cc:
 Sent: Friday, May 29, 2015 12:12 PM
 Subject: Hbase vs Cassandra
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay
 



Re: Hbase vs Cassandra

2015-06-01 Thread Andrew Purtell
You are both making correct points, but FWIW HBase does not require use of 
Hadoop YARN or MapReduce. We do require HDFS of course. Some of the tools we 
ship are MapReduce applications but these are not core functions. We know of 
several large production use cases where the HBase(+HDFS) clusters are used as 
a data store backing online applications without colocated computation.


On Jun 2, 2015, at 7:29 AM, Vladimir Rodionov vladrodio...@gmail.com wrote:

 The key issue is that unless you need or want to use Hadoop, you
 shouldn’t be using HBase. Its not a stand alone product or system.
 Hello, what is use case of a big data application w/o Hadoop?
 
 -Vlad
 
 On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Saying Ambari rules is like saying that you like to drink MD 20/20 and
 calling it a fine wine.
 
 Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
 immature.
 
 What that has to do with Cassandra vs HBase? I haven’t a clue.
 
 The key issue is that unless you need or want to use Hadoop, you shouldn’t
 be using HBase. Its not a stand alone product or system.
 
 
 
 
 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay
 
 


Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
Saying Ambari rules is like saying that you like to drink MD 20/20 and calling 
it a fine wine.

Sorry to all the Hortonworks guys but Amabari has a long way to go…. very 
immature. 

What that has to do with Cassandra vs HBase? I haven’t a clue. 

The key issue is that unless you need or want to use Hadoop, you shouldn’t be 
using HBase. Its not a stand alone product or system. 




 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay
 



Re: Hbase vs Cassandra

2015-06-01 Thread Vladimir Rodionov
 The key issue is that unless you need or want to use Hadoop, you
shouldn’t be using HBase. Its not a stand alone product or system.

Hello, what is use case of a big data application w/o Hadoop?

-Vlad

On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
wrote:

 Saying Ambari rules is like saying that you like to drink MD 20/20 and
 calling it a fine wine.

 Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
 immature.

 What that has to do with Cassandra vs HBase? I haven’t a clue.

 The key issue is that unless you need or want to use Hadoop, you shouldn’t
 be using HBase. Its not a stand alone product or system.




  On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
  1. No killer features comparing to hbase
  2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for
  Cassandra but it doesn't support vnodes.
  3. Rumors say it fast when it works;) the reason- it can silently drop
 data
  you try to write.
  4. Timeseries is a nightmare. The easiest approach is just replicate data
  to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
  пятница, 29 мая 2015 г. пользователь Ajay написал:
 
  Hi,
 
  I need some info on Hbase vs Cassandra as a data store (in general plus
  specific to time series data).
 
  The comparison in the following helps:
  1: features
  2: deployment and monitoring
  3: performance
  4: anything else
 
  Thanks
  Ajay
 




Re: Hbase vs Cassandra

2015-06-01 Thread Otis Gospodnetic
Hi Ajay,

You won't be able to get unbiased opinion here easily.  You'll need to try
and see how each works for your use case.  We use HBase for the SPM backend
and it has worked well for us - it's stable, handles billions and billions
of rows (I lost track of the actual number many moons ago) and fast, if you
get your key design right.  I'll answer your Q about monitoring:

I'd say both are equally well monitorable.  SPM http://sematext.com/spm
can monitor both HBase and Cassandra equally well.  Because Cassandra is a
bit simpler (vs. HBase having multiple processes one needs to run), it's a
bit simpler to add monitoring to Cassandra, but the difference is small.

SPM is at http://sematext.com/spm if you want to have a look.  We expose
our own HBase clusters in the live demo, so you can see what metrics HBase
exposes.  We don't run Cassandra, so we can't show its graphs, but you can
see some charts, metrics, and filters for Cassandra at
http://blog.sematext.com/2014/06/02/announcement-cassandra-performance-monitoring-in-spm/

I hope this helps.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Fri, May 29, 2015 at 3:12 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).

 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else

 Thanks
 Ajay



Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
The point is that HBase is part of the Hadoop ecosystem. Not a stand alone 
database like Cassandra. 

This is one thing that gets lost when people want to compare NoSQL databases / 
data stores. 

As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
And there are other Big Data systems out there but are not as well known. 
Lexus/Nexus had their proprietary system that they’ve been trying to sell … 


 On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:
 
 The key issue is that unless you need or want to use Hadoop, you
 shouldn’t be using HBase. Its not a stand alone product or system.
 
 Hello, what is use case of a big data application w/o Hadoop?
 
 -Vlad
 
 On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Saying Ambari rules is like saying that you like to drink MD 20/20 and
 calling it a fine wine.
 
 Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
 immature.
 
 What that has to do with Cassandra vs HBase? I haven’t a clue.
 
 The key issue is that unless you need or want to use Hadoop, you shouldn’t
 be using HBase. Its not a stand alone product or system.
 
 
 
 
 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay
 
 
 



Re: How to scan only Memstore from end point co-processor

2015-06-01 Thread Gautam Borah
Thanks Vladimir. We will try this out soon.

Regards,
Gautam

On Mon, Jun 1, 2015 at 12:22 AM, Vladimir Rodionov vladrodio...@gmail.com
wrote:

 InternalScan has ctor from Scan object

 See https://issues.apache.org/jira/browse/HBASE-12720

 You can instantiate InternalScan from Scan, set checkOnlyMemStore, then
 open RegionScanner, but the best approach is
 to cache data on write and run regular RegionScanner from memstore and
 block cache.

 best,
 -Vlad




 On Sun, May 31, 2015 at 11:45 PM, Anoop John anoop.hb...@gmail.com
 wrote:

  If your scan is having a time range specified in it, HBase internally
 will
  check this against the time range of files etc and will avoid those which
  are clearly out of your interested time range.  You dont have to do any
  thing for this.  Make sure you set the TimeRange for ur read
 
  -Anoop-
 
  On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan 
  ramkrishna.s.vasude...@gmail.com wrote:
 
   We have a postScannerOpen hook in the CP but that may not give you a
  direct
   access to know which one are the internal scanners on the Memstore and
   which one are on the store files. But this is possible but we may need
 to
   add some new hooks at this place where we explicitly add the internal
   scanners required for a scan.
  
   But still a general question - are you sure that your data will be only
  in
   the memstore and that the latest data would not have been flushed by
 that
   time from your memstore to the Hfiles.  I see that your scenario is
 write
   centric and how can you guarentee your data can be in memstore only?
   Though your time range may say it is the latest data (may be 10 to 15
  min)
   but you should be able to configure your memstore flushing in such a
 way
   that there are no flushes happening for the latest data in that 10 to
 15
   min time.  Just saying my thoughts here.
  
  
  
  
   On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah gbo...@appdynamics.com
   wrote:
  
Hi all,
   
Here is our use case,
   
We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last
 10-15
mins, every 10 minute.
   
Is there a way to only query in the MemStore from the end point
co-processor? The periodic job scans for data using a time range. We
   would
like to implement a simple logic,
   
a. if query time range is within MemStore's TimeRangeTracker, then
  query
only memstore.
b. If end Time of the query time range is within MemStore's
TimeRangeTracker, but query start Time is outside MemStore's
TimeRangeTracker (memstore flush happened), then query both MemStore
  and
Files.
c. If start time and end time of the query is outside of MemStore
TimeRangeTracker we query only files.
   
The incoming data is time series and we do not allow old data (out of
   sync
from clock) to come into the system(HBase).
   
Cloudera has a scanner
  org.apache.hadoop.hbase.regionserver.InternalScan,
that has methods like checkOnlyMemStore() and checkOnlyStoreFiles().
 Is
this available in Trunk?
   
Also, how do I access the Memstore for a Column Family in the end
 point
co-processor from CoprocessorEnvironment?
   
  
 



Re: Hbase vs Cassandra

2015-06-01 Thread Russell Jurney
Hbase can do range scans, and one can attack many problems with range
scans. Cassandra can't do range scans.

Hbase has a master. Cassandra does not.

Those are the two main differences.

On Monday, June 1, 2015, Andrew Purtell andrew.purt...@gmail.com wrote:

 HBase can very well be a standalone database, but we are debating
 semantics not technology I suspect. HBase uses some Hadoop ecosystem
 technologies but is absolutely a first class data store. I need to look no
 further than my employer for an example of a rather large production deploy
 of HBase* as a (internal) service, a high scale data storage platform.

 * - Strictly speaking HBase accessed with Apache Phoenix's JDBC driver.


  On Jun 2, 2015, at 10:32 AM, Michael Segel michael_se...@hotmail.com
 javascript:; wrote:
 
  The point is that HBase is part of the Hadoop ecosystem. Not a stand
 alone database like Cassandra.
 
  This is one thing that gets lost when people want to compare NoSQL
 databases / data stores.
 
  As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
  And there are other Big Data systems out there but are not as well known.
  Lexus/Nexus had their proprietary system that they’ve been trying to
 sell …
 
 
  On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov vladrodio...@gmail.com
 javascript:; wrote:
 
  The key issue is that unless you need or want to use Hadoop, you
  shouldn’t be using HBase. Its not a stand alone product or system.
 
  Hello, what is use case of a big data application w/o Hadoop?
 
  -Vlad
 
  On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel 
 michael_se...@hotmail.com javascript:;
  wrote:
 
  Saying Ambari rules is like saying that you like to drink MD 20/20 and
  calling it a fine wine.
 
  Sorry to all the Hortonworks guys but Amabari has a long way to go….
 very
  immature.
 
  What that has to do with Cassandra vs HBase? I haven’t a clue.
 
  The key issue is that unless you need or want to use Hadoop, you
 shouldn’t
  be using HBase. Its not a stand alone product or system.
 
 
 
 
  On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 javascript:;
  wrote:
 
  1. No killer features comparing to hbase
  2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own
 tool
  for
  Cassandra but it doesn't support vnodes.
  3. Rumors say it fast when it works;) the reason- it can silently drop
  data
  you try to write.
  4. Timeseries is a nightmare. The easiest approach is just replicate
 data
  to hdfs, partition it by hour/day and run
 spark/scalding/pig/hive/Impala
 
  пятница, 29 мая 2015 г. пользователь Ajay написал:
 
  Hi,
 
  I need some info on Hbase vs Cassandra as a data store (in general
 plus
  specific to time series data).
 
  The comparison in the following helps:
  1: features
  2: deployment and monitoring
  3: performance
  4: anything else
 
  Thanks
  Ajay
 
 
 
 



-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com


Re: Hbase vs Cassandra

2015-06-01 Thread Andrew Purtell
HBase can very well be a standalone database, but we are debating semantics not 
technology I suspect. HBase uses some Hadoop ecosystem technologies but is 
absolutely a first class data store. I need to look no further than my employer 
for an example of a rather large production deploy of HBase* as a (internal) 
service, a high scale data storage platform. 

* - Strictly speaking HBase accessed with Apache Phoenix's JDBC driver. 


 On Jun 2, 2015, at 10:32 AM, Michael Segel michael_se...@hotmail.com wrote:
 
 The point is that HBase is part of the Hadoop ecosystem. Not a stand alone 
 database like Cassandra. 
 
 This is one thing that gets lost when people want to compare NoSQL databases 
 / data stores. 
 
 As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
 And there are other Big Data systems out there but are not as well known. 
 Lexus/Nexus had their proprietary system that they’ve been trying to sell … 
 
 
 On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:
 
 The key issue is that unless you need or want to use Hadoop, you
 shouldn’t be using HBase. Its not a stand alone product or system.
 
 Hello, what is use case of a big data application w/o Hadoop?
 
 -Vlad
 
 On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Saying Ambari rules is like saying that you like to drink MD 20/20 and
 calling it a fine wine.
 
 Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
 immature.
 
 What that has to do with Cassandra vs HBase? I haven’t a clue.
 
 The key issue is that unless you need or want to use Hadoop, you shouldn’t
 be using HBase. Its not a stand alone product or system.
 
 
 
 
 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay
 
 
 
 


[OFFTOPIC] Big Data Application Meetup

2015-06-01 Thread Alex Baranau
Hi everyone,

I wanted to drop a note about a newly organized developer meetup in Bay
Area: the Big Data Application Meetup (http://meetup.com/bigdataapps) and
call for speakers. The plan is for meetup topics to be focused on
application use-cases: how developers can build end-to-end solutions with
open-source big data technologies.

HBase is extremely popular among developers building on Hadoop stack and we
would love to see talks about using it in big data solutions. If you want
to share your experience, please email me back. If you have any questions -
I will be happy to answer them.

We plan for the first event to be hosted by Cask at its HQ in Palo Alto in
end of June.

Thank you,
Alex Baranau