Client side batch processing is done at RegionServer level, i.e., all Action
objects are grouped together per RS basis and send in one RPC. Once the
batch arrives at a RS, it gets distributed across corresponding Regions, and
these Action objects are processed, one by one. This include
Hello,
our test data generator client uses HTable.batch to transfer puts to the
server. For example, I think auto flush at table-level is off when
running batches? Any other client API side optimizations when working
with HTable.batch? We have writing to the WAL disabled already.
Thanks,
Excuse me, I‘ve got a question here. Can we use coprocessor in hbase-0.90-*?
I've heard of a saying that we can't use coprocessor until
the hbase-0.91-*. And is there any possible way to get the hbase-0.91-*
release or even after that?
Thanks.
With regards.
Woo
2011/8/11 Himanshu Vashishtha
Hi,
I started to use bulk upload and encounter a strange problem.
I'm using Cloudera cdh3-u1.
I'm using HFileOutputFormat.configureIncrementalLoad() to configure my job.
This method create partition file for the TotalOrderPartitioner and save it
to HDFS.
When the TotalOrderPartitioner initiated
Thanks all.
i've seen that there is no limit with HBase. I mean the following statement
: SELECT ... FROM ... LIMIT 1. (Because there is this method with Mongo^^)
Is it implemented ?
2011/8/11 Jason Rutherglen jason.rutherg...@gmail.com
Laurent,
This could be implemented with Lucene, eg,
Now I see that it uses the distributed cache - but for some reason
the TotalOrderPartitioner does not grab it.
Ophir
On Thu, Aug 11, 2011 at 11:08, Ophir Cohen oph...@gmail.com wrote:
Hi,
I started to use bulk upload and encounter a strange problem.
I'm using Cloudera cdh3-u1.
I'm using
I did some more tests and found the problem: on local run the distribtued
cache does not work.
On full cluster it works.
Sorry for your time...
Ophir
PS
Is there any way to use distributed cache localy as well (i.e. when I'm
running MR from intellijIdea )?
On Thu, Aug 11, 2011 at 11:20, Ophir
Hi,
we had an interesting failure yesterday on the old 0.20.4 version of hbase.
I realize that this is a very old version but am wondering whether this is
an issue that is still present and should be fixed.
We added a new node to a 44 node cluster starting the datanode and
regionserver processes
Hi,
I am relatively new to HBase and I am trying to find the path that is followed
(i.e. the lines executed in the source code) in a query, say select * . Can
anyone help me in this regard.
Thanks
Hi Anurag,
Before starting this hope you have created a table and inserted data. In
HBase insert is done using put.
So for tracking the read flow in HBase it is get data from table
There are two things
- Get entire table contents
- get a specific row.
For getting a specific row using a java
Thanks for your reply Ram. Yes I have set up a 2 node cluster.
That does help to some extent. However I am aiming to modify the hbase source
code itself, and see the lines executed when I pass any query in the Hbase
shell, select *, or create table for example. Any heads up on this front?
HI, everyone, I have some question about the region-split mechanism. I've
learnt that a region contains several rows and when a store file in a
region grows large than the configured hbse.hregion.max.filesize, the
region is split in two. But what if I keep putting millions of values into a
Hi
Create table will be done by hBaseAdmin.
On every table object you will have put and get.
Read the functionality of HBase and documentation on HBase. It will help
you.
Regards
Ram
-Original Message-
From: Anurag Awasthi [mailto:anuragawasth...@gmail.com]
Sent: Thursday, August
Thanks for your reply Woo. However as I just posted to Ram, I aiming to modify
the hbase source code itself, and see the lines executed when I pass any query
in the Hbase shell, select *, or create table for example. Any heads up on this
front?
No offense, but may I ask whether you want to modify the Hbase source code?
If you just want to provide some SQL operations to others using Hbase,
maybe you can use filters to realize some basic SQL operations. If you are
interested in filters, please read some API documentation on HBase.
With
Ah, cool!
Thanks,
Thomas
-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Mittwoch, 10. August 2011 13:28
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Filters for non-Java clients?
See HBase-4176.
Cheers
On Aug 10, 2011, at 2:36 AM,
Actually I wanted to modify the source code, and on the long run aim to make
some structural changes in an attempt to optimise the query processing
approaches for a specific hardware. In this process I was looking forward to
the program flow for a query in the source code.
Regards,
Anurag
On
Hi there-
Please see the Hbase book. http://hbase.apache.org/book.html
This has a chapter on developing with Hbase. As for SQL, that's not
supported directly because Hbase is a NoSQL database (see the Data Model
chapter).
Doug
On 8/11/11 6:44 AM, Anurag Awasthi anuragawasth...@gmail.com
Sorry for off topic, but just as a sample to understand fundamental
difference:
1. SELECT COUNT will take few hours on MySQL InnoDB in most typical
cases, and _it_is_ implemented.
2. Same with HBase: full table scan. However, with MapReduce it might take
less time. Or, we can query Solr
Hi,
we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, which
resolved most of our previous issues, but we are now having much more
ScannerTimeoutExceptions than before. All these exceptions come from
trace like this
org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms
On Thu, Aug 11, 2011 at 3:38 AM, Xian Woo infinity0...@gmail.com wrote:
HI, everyone, I have some question about the region-split mechanism. I've
learnt that a region contains several rows and when a store file in a
region grows large than the configured hbse.hregion.max.filesize, the
region
Mind making an issue and pasting full stack traces with some
surrounding log. My guess is we likely will do same in 0.90. Your
snippets will help us figure where to dig in.
Thanks Matthias,
St.Ack
On Thu, Aug 11, 2011 at 2:33 AM, Matthias Hofschen hofsc...@gmail.com wrote:
Hi,
we had an
Coprocessors are only in trunk and will be part of 0.92. If you want to use
them now you'll have to build hbase from source yourself.
-- Lars
Xian Woo infinity0...@gmail.com schrieb:
Excuse me, I‘ve got a question here. Can we use coprocessor in hbase-0.90-*?
I've heard of a saying that we
so I delete the corrpupt .logs files. OK, fine no more issue there. But a
handful of regions in a very large table (2000+ regions) are offline (.META.
says offline=true).
How do I go about trying to get the region online, and how come restarting
hbase has no effect (region still offline).
Thanks Himanshu,
but that is not quite what I meant.
Yes, a batch operation is broken up in chunks per regionserver and then the
chunks are shipped to the individual regionservers.
But then there is no way to interact with those chunks at the regionserver
through coprocessors(as a whole).
Hi All,
I am running the HBase distributed mode in seven node cluster with backup
master. The HBase is running properly in the backup master environment. I want
to run this HBase on top of the High Availability Hadoop. I saw about Avatar
node in the following link
On Wed, Aug 10, 2011 at 10:46 PM, lars hofhansl lhofha...@yahoo.com wrote:
I guess there could either be a {pre|post}Multi on RegionObserver (although
HRegionServer.multi does a lot of munging).
Or maybe a general {pre|post}Request with no arguments - in which case it
would be at least
Hey Lars,
Sorry if I have mislead you.
The current Coprocessor infrastructure is at _Region_ level, not at
_RegionServer_ level.
All these batch operations are ultimately ends up at some rows in some
Regions, where you have hooked your CPs.
I am not able to follow your example. If you end up
Why not just read the source code? It isnt that many LOC, and it
doesnt really use anything that obscures the call chain, few
interfaces, etc. A solid IDE with code inspection will make short
work of it, just go at it!
Start at HRegionServer - it has the top level RPC calls that are
made.
I run into same issue. I tried check_meta.rb --fix and add_table.rb, and
still get the same hbck inconsistent table,
however, I am able to do a rowcount for the table and there is no problem.
Jimmy
--
From: Geoff Hendrey ghend...@decarta.com
Thanks Gary (and Himanshu),
correct, using batch operations is just an optimization and not a semantic
difference to single row ops.
Should then all RPC triggered by a coprocessor be avoided (and hence the use
the env-provided HTableInterface be generally discouraged)?
The 2ndary index
Hello,
We need to do range query from 20 million rows table. I thought using
RowFilter with BinaryPrefixComparator will help. It gets the result
correctly. However, the speed is not acceptable. Here is the code
snippet:
1. Scan s = new Scan();
2. s.addFamily(myFamily);
3.
Hey -
Our table behaves fine until we try to do a mapreduce job that reads and
writes from the table. When we try to retrieve keys from the afflicted
regions, the job just hangs forever. It's interesting because we never
get timeouts of any sort. This is different than other failures we've
seen
On Thu, Aug 11, 2011 at 2:20 PM, Allan Yan hailun...@gmail.com wrote:
Hello,
1. Scan s = new Scan();
2. s.addFamily(myFamily);
3. s.setStartRow(startRow);
4. Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new
BinaryPrefixComparator(startRow));
5. s.setFilter(rowFilter);
Usual reasons would be like a mix of taking a long time to process
rows in the mapper and scanners that grab a lot of rows (using scanner
caching and maybe filters).
Do you enable DEBUG for HBase in your mapreduce context? This would
give relevant information like if the client was doing lots of
I'm sure you've already seen this but just to be sure, do read
http://hbase.apache.org/book/perf.writing.html
Auto-flush is always on unless you turn it off yourself. Using
HTable.batch still respects this except that it flushes all the rows
at the same time.
If you have fat values to insert and
as I said, run hbase org.jruby.Main add_table.rb table_name first, then
run hbase org.jruby.Main check_meta.rb --fix
then restart hbase.
It doesn't completely solve problem for me, as hbck still complains.
but at least it recovers all data and I can do full rowcount for the table.
Jimmy.
Hey Gary,
Thanks a lot!
With WhileMathFilter, I am able to get the same performance as stopRow
solution. And if the search is in the same HBase cluster, I am able to
get the result in a few milliseconds with 6 nodes cluster.
On Thu, Aug 11, 2011 at 2:26 PM, Gary Helmling ghelml...@gmail.com
One thing you might want to look at is HTableUtil. It's on trunk, but you
can look at the source and port it to whatever version you are using.
We've found that region-sorting helps a lot by minimizing the number of RS
calls in any given flush.
On 8/11/11 5:57 PM, Jean-Daniel Cryans
I am trying to bulk load data from HDFS location into an HBase table. My key
is an IP address stored as a string in the HDFS location. How can I load
this IP address as a binary while doing a bulk load?
--
View this message in context:
Thanks,
check_meta.rb stack traces with NPE...
[hroot@doop10 bin]$ hbase org.jruby.Main check_meta.rb
Writables.java:75:in
`org.apache.hadoop.hbase.util.Writables.getWritable':
java.lang.NullPointerException: null (NativeException)
from Writables.java:119:in
41 matches
Mail list logo