Re: maven repository

2011-01-31 Thread Lars Francke
Hey Stack, It looks like 0.90.0 hbase is showing in the releases repository now. Let me know if an issue with it. I've been out of the loop. Sorry that I didn't catch this earlier but I'm glad you managed to figure it out. Thank you very much for the release. Cheers, Lars

Re: maven repository

2011-01-31 Thread Imran M Yousuf
Thanks a lot St.Ack. Will https://repository.apache.org/content/repositories/releases/ be synchronized with Maven Central? Regards, Imran On Mon, Jan 31, 2011 at 10:33 AM, Stack st...@duboce.net wrote: Daniel: It looks like 0.90.0 hbase is showing in the releases repository now. Let me know

is there an atomic checkAndPut in hbase?

2011-01-31 Thread Hiller, Dean (Contractor)
I was wondering if there was a function in hbase that would translate to running this code atomically on the server side... checkAndPut(byte[] key, byte[] previousValue, byte[] newValue) OR maybe there is something similar to the code below in hbase such that this code is run(so I

Re: is there an atomic checkAndPut in hbase?

2011-01-31 Thread Claudio Martella
in both 0.20.6 and 0.90 there's exactly checkAndPut. Checkout HTable class. On 1/31/11 4:00 PM, Hiller, Dean (Contractor) wrote: I was wondering if there was a function in hbase that would translate to running this code atomically on the server side... checkAndPut(byte[] key, byte[]

Re: maven repository

2011-01-31 Thread Daniel Iancu
Hi Stack, Good news, thanks. I've used it and compiled the project successfully. Issues, no, one remark still. We use MiniHBaseCluster from hbase-tests library and this class is used together with MiniDFSCluster from hadoop-tests. The dependency to MiniDFSCluster being in our code is not

RE: Row Keys

2011-01-31 Thread Peter Haidinyak
Great stuff, thanks. -Pete -Original Message- From: Lars George [mailto:lars.geo...@gmail.com] Sent: Sunday, January 30, 2011 10:07 PM To: user@hbase.apache.org Subject: Re: Row Keys Hi Pete, Look into the Mozilla Socorro project (http://code.google.com/p/socorro/) for how to salt the

Re: submitting jobs from a webapp

2011-01-31 Thread Jean-Daniel Cryans
(moving this to the user mailing list where it belongs) You need to make sure that your webapp knows the address of the JobTracker, usually this is done by either putting mapred-site.xml on your app's classpath or you can set mapred.job.tracker correctly so that in createSubmittableJob you would

Re: Unresponsive master in Hbase 0.90.0

2011-01-31 Thread Vidhyashankar Venkataraman
The Hbase cluster doesn't have the master problems with hadoop-append turned on: we will try finding out why it wasn't working with a non-append version of hadoop (with a previous version of hadoop, it was getting stuck while splitting logs). But there are other issues now (with append turned

Re: Unresponsive master in Hbase 0.90.0

2011-01-31 Thread Stack
On Mon, Jan 31, 2011 at 9:54 AM, Vidhyashankar Venkataraman vidhy...@yahoo-inc.com wrote: The Hbase cluster doesn't have the master problems with hadoop-append turned on: we will try finding out why it wasn't working with a non-append version of hadoop (with a previous version of hadoop, it

Re: Unresponsive master in Hbase 0.90.0

2011-01-31 Thread Vidhyashankar Venkataraman
Yes, I will file an issue after collecting the right logs. We will try finding the cause of the META server choke. Another question: the master still seems to be taking (a lot of) time to load the table during startup: I found that the regions percheckin config variable isnt used anymore. I

Re: IPC Server Responder out put error causing RegionServer down

2011-01-31 Thread Jean-Daniel Cryans
(moving to the user mailing list, where it belongs) My educated guess is that you had a GC pause that lasted for more than a minute while a file was being written to. Even if the write wasn't happening, your region server would have committed suicide anyways since it was probably past it's lease

Re: maven repository

2011-01-31 Thread Stack
On Mon, Jan 31, 2011 at 6:18 AM, Imran M Yousuf imyou...@gmail.com wrote: Thanks a lot St.Ack. Will https://repository.apache.org/content/repositories/releases/ be synchronized with Maven Central? It looks like other published artifacts of Apaches show in maven central. Maybe there is a step

Re: maven repository

2011-01-31 Thread Stack
Thanks for the below Daniel. Any chance of your filing a JIRA, adding the below as a patch, and marking it as fix against 0.90.1 (critical) which should be out soon? St.Ack

Re: HTable.put(ListPut puts) perform batch insert?

2011-01-31 Thread Sean Bigdatafun
On Fri, Jan 14, 2011 at 10:51 PM, tsuna tsuna...@gmail.com wrote: On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: But how can the client understand which k-v belongs to an individual RS? Does it need to scan the .META. table? (if so, it's an expensive op).

Re: maven repository

2011-01-31 Thread Norman Maurer
Hi there, As a part of the ASF Infrastructure Team I think Its just plain wrong to add people as maven repository. This will just lead to many Traffic and a Single Point of failure as it will not use any mirrors Bye Norman Am Montag, 31. Januar 2011 schrieb Stack st...@duboce.net: Thanks for

Using the Hbase shell to browse?

2011-01-31 Thread Mark Kerzner
Hi, is there a way to get top ... rows from a table using the shell, or rows for a range of keys? Thank you, Mark

Re: Unresponsive master in Hbase 0.90.0

2011-01-31 Thread Stack
Its not in the manual yet Vidhya. Assignment has completely changed in 0.90. No more do we assign by adding payload to the heartbeat. Now we assign by direct rpc from master to regionserver with master and regionserver moving the region through state changes up in zk until region is successfully

Re: Using the Hbase shell to browse?

2011-01-31 Thread Stack
Top? You mean those that sort first? You could start a scan in the shell at a particular row and then limit the return: hbase scan 'TABLENAME', {STARTROW = 'xyz', LIMIT = 10} Run 'help' to see more on the scan command. St.Ack On Mon, Jan 31, 2011 at 11:02 AM, Mark Kerzner

Re: Using the Hbase shell to browse?

2011-01-31 Thread Mark Kerzner
Worked perfectly well. Some people know stuff...:) Thank you On Mon, Jan 31, 2011 at 1:09 PM, Stack st...@duboce.net wrote: Top? You mean those that sort first? You could start a scan in the shell at a particular row and then limit the return: hbase scan 'TABLENAME', {STARTROW = 'xyz',

Re: maven repository

2011-01-31 Thread Stack
Norman: Yes. The repo. has a (signed) build of an Hadoop branch that has not had a release made from it. We're in negotiations about making a release from this branch but its 'complicated'. Hosting in a personal account is, as we see it, a temporary stopgap until we get to an official release.

Persist JSON into HBase

2011-01-31 Thread Pablo Molnar
Hi everyone, In my company we are experimenting with HBase and I'd like to know the best way to persist a semi-structured complex (3 levels) entity represented as JSON to HBase. I've already done successfully a Java client that persist rows in a table and now my target is persist this JSON. I've

Re: Persist JSON into HBase

2011-01-31 Thread Stack
Don't use MapWritable. In the layer above HBase, inside whatever is hosting the HBase client, serialize the JSON to bytes and then write that to an HBase cell. In the same layer, reading, do the deserializations. HBase only does byte arrays. St.Ack On Mon, Jan 31, 2011 at 11:30 AM, Pablo

TableMap is deprecated

2011-01-31 Thread Mark Kerzner
Hi, what should be used instead of TableMaphttp://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapred/TableMap.html, going forward? Thank you, Mark

Re: TableMap is deprecated

2011-01-31 Thread Stack
Use the mapreduce package if you can. Otherwise, keep on with mapred package and ignore the deprecations. There is nothing 'wrong' w/ the mapred package; we just followed hadoop's deprecation. Also our mapreduce has functionality not in mapred. St.Ack On Mon, Jan 31, 2011 at 11:59 AM, Mark

Re: Persist JSON into HBase

2011-01-31 Thread Pablo Molnar
Thanks for the feedback Stack! So you suggest to just serialize the JSON represent as a String or as a Map? Something like this: (supposing item is a String or a Map) Put row = new Put(Bytes.toBytes(item.id)) row.add(Bytes.toBytes(json), Bytes.toBytes(1), Bytes.toBytes(item)) table.put(row)

Re: TableMap is deprecated

2011-01-31 Thread Mark Kerzner
Thank you. Actually, I am OK on this, since it was taken out in 0.89. It is things like RowResult that cause problem now - but for these I already know that I should use Get/Put API. Cheers, Mark On Mon, Jan 31, 2011 at 2:02 PM, Stack st...@duboce.net wrote: Use the mapreduce package if you

RE: Persist JSON into HBase

2011-01-31 Thread Buttler, David
How you serialize your objects to hbase depends on how you want to use your objects later. Assuming that you have a good serialization to json already, and all you want to do is put and get the items, then just convert your json string to a byte array and put it in a column qualifier (e.g.

Re: Persist JSON into HBase

2011-01-31 Thread Pablo Molnar
Thanks Dave! It really makes sense. It all depends on how you want to process the data later. I guess I'm going to persist the Map instead of a String to leverage the filters having to parse the JSON. Pablo On Mon, Jan 31, 2011 at 5:20 PM, Buttler, David buttl...@llnl.gov wrote: How you

How does checkAndPut guarantee atomicity?

2011-01-31 Thread Joe Pallas
It's easy to understand how single-row operations can be atomic, but checkAndPut allows a check on one row and a put on another while promising this will be atomic. Is it really guaranteed to be atomic in that case? How does that work if the two rows are in different regions (and possibly

RE: Persist JSON into HBase

2011-01-31 Thread Sandy Pratt
My use of HBase is essentially what Stack describes: I serialize little log entry objects with (mostly) protobuf and store them in a single cell in HBase. I did this at first because it was easy, and made a note to go back and break out the fields into their own columns, and in fact into

Open Scanner Latency

2011-01-31 Thread Wayne
After doing many tests (10k serialized scans) we see that on average opening the scanner takes 2/3 of the read time if the read is fresh (scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second time around (1 minute later) we assume the region cache is hot and the open scanner is much faster

Re: How does checkAndPut guarantee atomicity?

2011-01-31 Thread Ryan Rawson
Hi, Good catch, while the API does let you specify 2 different row keys, one in the 'put' and one in the call, doing so would be ... not advised. Right now there is no check for this, and if you were to pass 2 different rows, things would not be so good. Here is an issue:

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
Hey, The region location cache is held by a soft reference, so as long as you dont have memory pressure, it will never get invalidated just because of time. Another thing to consider, in HBase, the open scanner code also seeks and reads the first block of the scan. This may incur a read to disk

Re: Open Scanner Latency

2011-01-31 Thread Wayne
We have heavy writes always going on so there is always memory pressure. If the open scanner reads the first block maybe that explains the 8ms the second time a test is run, but why is the first run averaging 35ms to open and when the same read requests are sent again the open is only 8ms? There

Re: Open Scanner Latency

2011-01-31 Thread Stack
On Mon, Jan 31, 2011 at 1:38 PM, Wayne wav...@gmail.com wrote: After doing many tests (10k serialized scans) we see that on average opening the scanner takes 2/3 of the read time if the read is fresh (scannerOpenWithStop=~35ms, scannerGetList=~10ms). I've saw that this w/e. The getScanner

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
The Regionserver caches blocks, so a second read would benefit from the caching of the first read. Over time blocks get evicted in a LRU manner, and things would get slow again. Does this make sense to you? On Mon, Jan 31, 2011 at 1:50 PM, Wayne wav...@gmail.com wrote: We have heavy writes

Re: Open Scanner Latency

2011-01-31 Thread Wayne
On Mon, Jan 31, 2011 at 4:54 PM, Stack st...@duboce.net wrote: On Mon, Jan 31, 2011 at 1:38 PM, Wayne wav...@gmail.com wrote: After doing many tests (10k serialized scans) we see that on average opening the scanner takes 2/3 of the read time if the read is fresh

Re: Open Scanner Latency

2011-01-31 Thread Wayne
I assume BLOCKCACHE = 'false' would turn this off? We have turned off cache on all tables. On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson ryano...@gmail.com wrote: The Regionserver caches blocks, so a second read would benefit from the caching of the first read. Over time blocks get evicted in

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
Even without block caching, the linux buffer cache is still a factor, and your reads still go through them (via the datanode). When Stack is talking about the StoreScanner, this is a particular class inside of HBase that does the job of reading from 1 column family. The first time you

Re: Open Scanner Latency

2011-01-31 Thread Wayne
The file system buffer cache explains what is going on. The open scanner reads the first block and the subsequent read goes against the same block thereby getting out of the file buffer cache. Thanks. On Mon, Jan 31, 2011 at 5:22 PM, Ryan Rawson ryano...@gmail.com wrote: Even without block

RE: Delete reveals older version of a column even when VERSIONS=1

2011-01-31 Thread Buttler, David
The way I understand it is that old versions do not actually disappear until a compaction occurs. A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits. Dave -Original Message- From: Mike Percy

Re: Delete reveals older version of a column even when VERSIONS=1

2011-01-31 Thread Ryan Rawson
You are correct, since we do not prune extra version except during these major compactions that happen about once a day, if you delete a recent version and it exposes an older version, you will see this. I might consider this a mis-feature. I would encourage you to consider using the

Re: HTable.put(ListPut puts) perform batch insert?

2011-01-31 Thread Sean Bigdatafun
On Mon, Jan 31, 2011 at 5:13 PM, Jim X jim.p...@gmail.com wrote: Does Htable.getWriteBuffer() do a roll back? I guess not --- this only allows you to know what has not been successfully committed to the server after you catch the exception. Correct me if I am wrong. Sean Jim On Mon, Jan

Re: HTable.put(ListPut puts) perform batch insert?

2011-01-31 Thread Ryan Rawson
It just retrieves the current state of the buffer. The buffer is mutated to remove successful edits as they occur, during an exception the ones that were determined to be successful were also removed. So if you catch an exception, you can inspect this buffer and know these puts need to be sent

Re: IPC Server Responder out put error causing RegionServer down

2011-01-31 Thread Zhou Shuaifeng
Thank you very much. I will check more about the cause according your suggestion. Can you explain more about 'GC pause'? Because the network wardship of my company, I'm not able to post large amount of logs to pastebin(about 15KB logs in 30 minutes). But thanks all the same, if I have some way to

Re: Tables rows disappear

2011-01-31 Thread Something Something
1) Version numbers: hadoop-0.20.2 hbase-0.20.6 2) autoFlush to 'true' works, but wouldn't that slow down the insertion process? 3) Here's how I had set it up: In my Mapper's setup method: table = new HTable(new HBaseConfiguration(), XYZ_TABLE);

Re: HDFS without Hadoop: Why?

2011-01-31 Thread Sean Bigdatafun
I feel this is a great discussion, so let's think of HDFS' customers. (1) MapReduce --- definitely a perfect fit as Nathan has pointed out *(2) HBase --- it seems HBase (Bigtable's log structured file) did a great job on this. The solution comes out of Google, it must be right. But would Google

Annoying message from ZooKeeper

2011-01-31 Thread Jim X
I use HBase 0.20.6 in the latest cygwin under Window Vista. I got the following error when I created and deleted table from a Java client. A table can be created, listed and deleted. But the error is just annoying. 11/01/31 22:09:18 INFO zookeeper.ZooKeeper: Initiating client connection,

Re: maven repository

2011-01-31 Thread Imran M Yousuf
On Tue, Feb 1, 2011 at 12:39 AM, Stack st...@duboce.net wrote: On Mon, Jan 31, 2011 at 6:18 AM, Imran M Yousuf imyou...@gmail.com wrote: Thanks a lot St.Ack. Will https://repository.apache.org/content/repositories/releases/ be synchronized with Maven Central? It looks like other published