Hey Stack,
It looks like 0.90.0 hbase is showing in the releases repository now.
Let me know if an issue with it.
I've been out of the loop. Sorry that I didn't catch this earlier but
I'm glad you managed to figure it out. Thank you very much for the
release.
Cheers,
Lars
Thanks a lot St.Ack. Will
https://repository.apache.org/content/repositories/releases/ be
synchronized with Maven Central?
Regards,
Imran
On Mon, Jan 31, 2011 at 10:33 AM, Stack st...@duboce.net wrote:
Daniel:
It looks like 0.90.0 hbase is showing in the releases repository now.
Let me know
I was wondering if there was a function in hbase that would translate to
running this code atomically on the server side...
checkAndPut(byte[] key, byte[] previousValue, byte[] newValue)
OR maybe there is something similar to the code below in hbase
such that this code is run(so I
in both 0.20.6 and 0.90 there's exactly checkAndPut. Checkout HTable class.
On 1/31/11 4:00 PM, Hiller, Dean (Contractor) wrote:
I was wondering if there was a function in hbase that would translate to
running this code atomically on the server side...
checkAndPut(byte[] key, byte[]
Hi Stack,
Good news, thanks. I've used it and compiled the project successfully.
Issues, no, one remark still.
We use MiniHBaseCluster from hbase-tests library and this class is used
together with MiniDFSCluster from hadoop-tests.
The dependency to MiniDFSCluster being in our code is not
Great stuff, thanks.
-Pete
-Original Message-
From: Lars George [mailto:lars.geo...@gmail.com]
Sent: Sunday, January 30, 2011 10:07 PM
To: user@hbase.apache.org
Subject: Re: Row Keys
Hi Pete,
Look into the Mozilla Socorro project
(http://code.google.com/p/socorro/) for how to salt the
(moving this to the user mailing list where it belongs)
You need to make sure that your webapp knows the address of the
JobTracker, usually this is done by either putting mapred-site.xml on
your app's classpath or you can set mapred.job.tracker correctly so
that in createSubmittableJob you would
The Hbase cluster doesn't have the master problems with hadoop-append turned
on: we will try finding out why it wasn't working with a non-append version of
hadoop (with a previous version of hadoop, it was getting stuck while splitting
logs).
But there are other issues now (with append turned
On Mon, Jan 31, 2011 at 9:54 AM, Vidhyashankar Venkataraman
vidhy...@yahoo-inc.com wrote:
The Hbase cluster doesn't have the master problems with hadoop-append turned
on: we will try finding out why it wasn't working with a non-append version
of hadoop (with a previous version of hadoop, it
Yes, I will file an issue after collecting the right logs.
We will try finding the cause of the META server choke.
Another question: the master still seems to be taking (a lot of) time to load
the table during startup: I found that the regions percheckin config variable
isnt used anymore. I
(moving to the user mailing list, where it belongs)
My educated guess is that you had a GC pause that lasted for more than a
minute while a file was being written to. Even if the write wasn't
happening, your region server would have committed suicide anyways since it
was probably past it's lease
On Mon, Jan 31, 2011 at 6:18 AM, Imran M Yousuf imyou...@gmail.com wrote:
Thanks a lot St.Ack. Will
https://repository.apache.org/content/repositories/releases/ be
synchronized with Maven Central?
It looks like other published artifacts of Apaches show in maven
central. Maybe there is a step
Thanks for the below Daniel. Any chance of your filing a JIRA, adding
the below as a patch, and marking it as fix against 0.90.1 (critical)
which should be out soon?
St.Ack
On Fri, Jan 14, 2011 at 10:51 PM, tsuna tsuna...@gmail.com wrote:
On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
sean.bigdata...@gmail.com wrote:
But how can the client understand which k-v belongs to an individual RS?
Does it need to scan the .META. table? (if so, it's an expensive op).
Hi there,
As a part of the ASF Infrastructure Team I think Its just plain wrong
to add people as maven repository. This will just lead to many Traffic
and a Single Point of failure as it will not use any mirrors
Bye
Norman
Am Montag, 31. Januar 2011 schrieb Stack st...@duboce.net:
Thanks for
Hi,
is there a way to get top ... rows from a table using the shell, or rows
for a range of keys?
Thank you,
Mark
Its not in the manual yet Vidhya. Assignment has completely changed
in 0.90. No more do we assign by adding payload to the heartbeat.
Now we assign by direct rpc from master to regionserver with master
and regionserver moving the region through state changes up in zk
until region is successfully
Top? You mean those that sort first? You could start a scan in the
shell at a particular row and then limit the return:
hbase scan 'TABLENAME', {STARTROW = 'xyz', LIMIT = 10}
Run 'help' to see more on the scan command.
St.Ack
On Mon, Jan 31, 2011 at 11:02 AM, Mark Kerzner
Worked perfectly well. Some people know stuff...:)
Thank you
On Mon, Jan 31, 2011 at 1:09 PM, Stack st...@duboce.net wrote:
Top? You mean those that sort first? You could start a scan in the
shell at a particular row and then limit the return:
hbase scan 'TABLENAME', {STARTROW = 'xyz',
Norman:
Yes.
The repo. has a (signed) build of an Hadoop branch that has not had a
release made from it. We're in negotiations about making a release
from this branch but its 'complicated'. Hosting in a personal account
is, as we see it, a temporary stopgap until we get to an official
release.
Hi everyone,
In my company we are experimenting with HBase and I'd like to know the best
way to persist a semi-structured complex (3 levels) entity represented as
JSON to HBase.
I've already done successfully a Java client that persist rows in a table
and now my target is persist this JSON.
I've
Don't use MapWritable.
In the layer above HBase, inside whatever is hosting the HBase client,
serialize the JSON to bytes and then write that to an HBase cell. In
the same layer, reading, do the deserializations.
HBase only does byte arrays.
St.Ack
On Mon, Jan 31, 2011 at 11:30 AM, Pablo
Hi,
what should be used instead of
TableMaphttp://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapred/TableMap.html,
going forward?
Thank you,
Mark
Use the mapreduce package if you can. Otherwise, keep on with mapred
package and ignore the deprecations. There is nothing 'wrong' w/ the
mapred package; we just followed hadoop's deprecation. Also our
mapreduce has functionality not in mapred.
St.Ack
On Mon, Jan 31, 2011 at 11:59 AM, Mark
Thanks for the feedback Stack!
So you suggest to just serialize the JSON represent as a String or as a Map?
Something like this:
(supposing item is a String or a Map)
Put row = new Put(Bytes.toBytes(item.id))
row.add(Bytes.toBytes(json), Bytes.toBytes(1), Bytes.toBytes(item))
table.put(row)
Thank you.
Actually, I am OK on this, since it was taken out in 0.89. It is things
like RowResult that cause problem now - but for these I already know that I
should use Get/Put API.
Cheers,
Mark
On Mon, Jan 31, 2011 at 2:02 PM, Stack st...@duboce.net wrote:
Use the mapreduce package if you
How you serialize your objects to hbase depends on how you want to use your
objects later. Assuming that you have a good serialization to json already,
and all you want to do is put and get the items, then just convert your json
string to a byte array and put it in a column qualifier (e.g.
Thanks Dave!
It really makes sense. It all depends on how you want to process the data
later.
I guess I'm going to persist the Map instead of a String to leverage the
filters having to parse the JSON.
Pablo
On Mon, Jan 31, 2011 at 5:20 PM, Buttler, David buttl...@llnl.gov wrote:
How you
It's easy to understand how single-row operations can be atomic, but
checkAndPut allows a check on one row and a put on another while promising this
will be atomic. Is it really guaranteed to be atomic in that case? How does
that work if the two rows are in different regions (and possibly
My use of HBase is essentially what Stack describes: I serialize little log
entry objects with (mostly) protobuf and store them in a single cell in HBase.
I did this at first because it was easy, and made a note to go back and break
out the fields into their own columns, and in fact into
After doing many tests (10k serialized scans) we see that on average opening
the scanner takes 2/3 of the read time if the read is fresh
(scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second time around (1
minute later) we assume the region cache is hot and the open scanner is
much faster
Hi,
Good catch, while the API does let you specify 2 different row keys,
one in the 'put' and one in the call, doing so would be ... not
advised. Right now there is no check for this, and if you were to
pass 2 different rows, things would not be so good.
Here is an issue:
Hey,
The region location cache is held by a soft reference, so as long as
you dont have memory pressure, it will never get invalidated just
because of time.
Another thing to consider, in HBase, the open scanner code also seeks
and reads the first block of the scan. This may incur a read to disk
We have heavy writes always going on so there is always memory pressure.
If the open scanner reads the first block maybe that explains the 8ms the
second time a test is run, but why is the first run averaging 35ms to open
and when the same read requests are sent again the open is only 8ms? There
On Mon, Jan 31, 2011 at 1:38 PM, Wayne wav...@gmail.com wrote:
After doing many tests (10k serialized scans) we see that on average opening
the scanner takes 2/3 of the read time if the read is fresh
(scannerOpenWithStop=~35ms, scannerGetList=~10ms).
I've saw that this w/e. The getScanner
The Regionserver caches blocks, so a second read would benefit from
the caching of the first read. Over time blocks get evicted in a LRU
manner, and things would get slow again.
Does this make sense to you?
On Mon, Jan 31, 2011 at 1:50 PM, Wayne wav...@gmail.com wrote:
We have heavy writes
On Mon, Jan 31, 2011 at 4:54 PM, Stack st...@duboce.net wrote:
On Mon, Jan 31, 2011 at 1:38 PM, Wayne wav...@gmail.com wrote:
After doing many tests (10k serialized scans) we see that on average
opening
the scanner takes 2/3 of the read time if the read is fresh
I assume BLOCKCACHE = 'false' would turn this off? We have turned off cache
on all tables.
On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson ryano...@gmail.com wrote:
The Regionserver caches blocks, so a second read would benefit from
the caching of the first read. Over time blocks get evicted in
Even without block caching, the linux buffer cache is still a factor,
and your reads still go through them (via the datanode).
When Stack is talking about the StoreScanner, this is a particular
class inside of HBase that does the job of reading from 1 column
family. The first time you
The file system buffer cache explains what is going on. The open scanner
reads the first block and the subsequent read goes against the same block
thereby getting out of the file buffer cache.
Thanks.
On Mon, Jan 31, 2011 at 5:22 PM, Ryan Rawson ryano...@gmail.com wrote:
Even without block
The way I understand it is that old versions do not actually disappear until a
compaction occurs. A compaction should occur once per day unless you have
changed the major compaction settings, or whenever a region splits.
Dave
-Original Message-
From: Mike Percy
You are correct, since we do not prune extra version except during
these major compactions that happen about once a day, if you delete a
recent version and it exposes an older version, you will see this.
I might consider this a mis-feature. I would encourage you to
consider using the
On Mon, Jan 31, 2011 at 5:13 PM, Jim X jim.p...@gmail.com wrote:
Does Htable.getWriteBuffer() do a roll back?
I guess not --- this only allows you to know what has not been successfully
committed to the server after you catch the exception.
Correct me if I am wrong.
Sean
Jim
On Mon, Jan
It just retrieves the current state of the buffer. The buffer is
mutated to remove successful edits as they occur, during an exception
the ones that were determined to be successful were also removed.
So if you catch an exception, you can inspect this buffer and know
these puts need to be sent
Thank you very much.
I will check more about the cause according your suggestion. Can you explain
more about 'GC pause'?
Because the network wardship of my company, I'm not able to post large
amount of logs to pastebin(about 15KB logs in 30 minutes).
But thanks all the same, if I have some way to
1) Version numbers:
hadoop-0.20.2
hbase-0.20.6
2) autoFlush to 'true' works, but wouldn't that slow down the insertion
process?
3) Here's how I had set it up:
In my Mapper's setup method:
table = new HTable(new HBaseConfiguration(), XYZ_TABLE);
I feel this is a great discussion, so let's think of HDFS' customers.
(1) MapReduce --- definitely a perfect fit as Nathan has pointed out
*(2) HBase --- it seems HBase (Bigtable's log structured file) did a great
job on this. The solution comes out of Google, it must be right. But would
Google
I use HBase 0.20.6 in the latest cygwin under Window Vista. I got the
following error when I created and deleted table from a Java client. A
table can be created, listed and deleted. But the error is just
annoying.
11/01/31 22:09:18 INFO zookeeper.ZooKeeper: Initiating client
connection,
On Tue, Feb 1, 2011 at 12:39 AM, Stack st...@duboce.net wrote:
On Mon, Jan 31, 2011 at 6:18 AM, Imran M Yousuf imyou...@gmail.com wrote:
Thanks a lot St.Ack. Will
https://repository.apache.org/content/repositories/releases/ be
synchronized with Maven Central?
It looks like other published
49 matches
Mail list logo