Re: Overwrite a row

2013-04-20 Thread Kristoffer Sjögren
The schema is known beforehand so this is exactly what I need. Great! One more question. What guarantees does the batch operation have? Are the operations contained within each batch atomic? I.e. all mutations will be given the same timestamp? If something fails, all operation fail or can it fail

Re: RefGuide schema design examples

2013-04-20 Thread Ravindranath Akila
+1 R. A. On 20 Apr 2013 12:07, Viral Bajaria viral.baja...@gmail.com wrote: +1! On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Wow, great work, Doug. 2013/4/19 Doug Meil doug.m...@explorysmedical.com Hi folks, I reorganized

Re: Slow region server recoveries

2013-04-20 Thread Nicolas Liochon
Hi, I looked at it again with a fresh eye. As Varun was saying, the root cause is the wrong order of the block locations. The root cause of the root cause is actually simple: HBASE started the recovery while the node was not yet stale from an HDFS pov. Varun mentioned this timing: Lost Beat:

Re: Overwrite a row

2013-04-20 Thread Ted Yu
Operations within each batch are atomic. They would either all succeed or all fail. Time stamps would all refer to the latest cell (KeyVal). Cheers On Apr 20, 2013, at 12:17 AM, Kristoffer Sjögren sto...@gmail.com wrote: The schema is known beforehand so this is exactly what I need. Great!

Re: talk list table

2013-04-20 Thread Amit Sela
Hope I'm not too late here... regarding hot spotting with sequential keys, I'd suggest you read this Sematext blog - http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ They present a nice idea there for this kind of issues.

hbase + mapreduce

2013-04-20 Thread Adrian Acosta Mitjans
Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException {

Re: Overwrite a row

2013-04-20 Thread Kristoffer Sjögren
Just to absolutely be clear, is this also true for a batch that span multiple rows? On Sat, Apr 20, 2013 at 2:42 PM, Ted Yu yuzhih...@gmail.com wrote: Operations within each batch are atomic. They would either all succeed or all fail. Time stamps would all refer to the latest cell (KeyVal).

Re: Slow region server recoveries

2013-04-20 Thread Varun Sharma
Hi Nicholas, Regarding the following, I think this is not a recovery - the file below is an HFIle and is being accessed on a get request. On this cluster, I don't have block locality. I see these exceptions for a while and then they are gone, which means the stale node thing kicks in. 2013-04-19

default region splitting on which value?

2013-04-20 Thread Pal Konyves
Hi, I am just reading about region splitting. By default - as I understand - Hbase handles splitting the regions. I just don't know how to imagine on which key it splits the regions. 1) For example when I write MD5 hash of rowkeys, they are most probably evenly distributed from 00... to

Re: Slow region server recoveries

2013-04-20 Thread Varun Sharma
The important thing to note is the block for this rogue WAL is UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node thing kicks in correctly for UNDER_RECOVERY blocks but failed. On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma va...@pinterest.com wrote: Hi Nicholas,

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu
How many column families do you have ? For #3, per-splitting table at the row keys corresponding to peaks makes sense. On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com wrote: Hi, I am just reading about region splitting. By default - as I understand - Hbase handles

Re: default region splitting on which value?

2013-04-20 Thread Pal Konyves
Hi Ted, Only one family, my data is very simple key-value, although I want to make sequential scan, so making a hash of the key is not an option. On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote: How many column families do you have ? For #3, per-splitting table at the row

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu
The answer to your first question is yes - midkey of the key range would be chosen as split key. For #2, can you tell us how you plan to randomize the loading ? Bulk load normally means preparing HFiles which would be loaded directly into your table. Cheers On Apr 20, 2013, at 1:11 PM, Pal

Re: default region splitting on which value?

2013-04-20 Thread Pal Konyves
I am making a paper for school about HBase, so the data I chose is not a real usable example. I am familiar with GTFS that is a de facto standard for storing information about public transportation schedules: when vehicle arrives to a stop and where it goes toward. I chose to genrate the rows on

Re: talk list table

2013-04-20 Thread Otis Gospodnetic
+ http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/ if you use Maven and want to use HBaseWD. Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Sat, Apr 20, 2013 at 11:24 AM, Amit Sela am...@infolinks.com

Re: Overwrite a row

2013-04-20 Thread Ted Yu
Here is code from 0.94 code base: public void mutateRow(final RowMutations rm) throws IOException { new ServerCallableVoid(connection, tableName, rm.getRow(), operationTimeout) { public Void call() throws IOException {

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu
Thanks for sharing the information below. How do you plan to store time (when the bus gets to each stop) in the row ? Or maybe it is not of importance to you ? On Sat, Apr 20, 2013 at 2:24 PM, Pal Konyves paul.kony...@gmail.com wrote: I am making a paper for school about HBase, so the data I