Re: Coprocessors and batch processing

2011-08-11 Thread Himanshu Vashishtha
Client side batch processing is done at RegionServer level, i.e., all Action objects are grouped together per RS basis and send in one RPC. Once the batch arrives at a RS, it gets distributed across corresponding Regions, and these Action objects are processed, one by one. This include

Using HTable.batch - Still room for performance improvements?

2011-08-11 Thread Steinmaurer Thomas
Hello, our test data generator client uses HTable.batch to transfer puts to the server. For example, I think auto flush at table-level is off when running batches? Any other client API side optimizations when working with HTable.batch? We have writing to the WAL disabled already. Thanks,

Re: Coprocessors and batch processing

2011-08-11 Thread Xian Woo
Excuse me, I‘ve got a question here. Can we use coprocessor in hbase-0.90-*? I've heard of a saying that we can't use coprocessor until the hbase-0.91-*. And is there any possible way to get the hbase-0.91-* release or even after that? Thanks. With regards. Woo 2011/8/11 Himanshu Vashishtha

Bulk upload

2011-08-11 Thread Ophir Cohen
Hi, I started to use bulk upload and encounter a strange problem. I'm using Cloudera cdh3-u1. I'm using HFileOutputFormat.configureIncrementalLoad() to configure my job. This method create partition file for the TotalOrderPartitioner and save it to HDFS. When the TotalOrderPartitioner initiated

Re: Mongo vs HBase

2011-08-11 Thread Laurent Hatier
Thanks all. i've seen that there is no limit with HBase. I mean the following statement : SELECT ... FROM ... LIMIT 1. (Because there is this method with Mongo^^) Is it implemented ? 2011/8/11 Jason Rutherglen jason.rutherg...@gmail.com Laurent, This could be implemented with Lucene, eg,

Re: Bulk upload

2011-08-11 Thread Ophir Cohen
Now I see that it uses the distributed cache - but for some reason the TotalOrderPartitioner does not grab it. Ophir On Thu, Aug 11, 2011 at 11:08, Ophir Cohen oph...@gmail.com wrote: Hi, I started to use bulk upload and encounter a strange problem. I'm using Cloudera cdh3-u1. I'm using

Re: Bulk upload

2011-08-11 Thread Ophir Cohen
I did some more tests and found the problem: on local run the distribtued cache does not work. On full cluster it works. Sorry for your time... Ophir PS Is there any way to use distributed cache localy as well (i.e. when I'm running MR from intellijIdea )? On Thu, Aug 11, 2011 at 11:20, Ophir

Check permissions on unix filesystem?

2011-08-11 Thread Matthias Hofschen
Hi, we had an interesting failure yesterday on the old 0.20.4 version of hbase. I realize that this is a very old version but am wondering whether this is an issue that is still present and should be fixed. We added a new node to a 44 node cluster starting the datanode and regionserver processes

Finding the trace of a query

2011-08-11 Thread Anurag Awasthi
Hi, I am relatively new to HBase and I am trying to find the path that is followed (i.e. the lines executed in the source code) in a query, say select * . Can anyone help me in this regard. Thanks

RE: Finding the trace of a query

2011-08-11 Thread Ramkrishna S Vasudevan
Hi Anurag, Before starting this hope you have created a table and inserted data. In HBase insert is done using put. So for tracking the read flow in HBase it is get data from table There are two things - Get entire table contents - get a specific row. For getting a specific row using a java

Re: Finding the trace of a query

2011-08-11 Thread Anurag Awasthi
Thanks for your reply Ram. Yes I have set up a 2 node cluster. That does help to some extent. However I am aiming to modify the hbase source code itself, and see the lines executed when I pass any query in the Hbase shell, select *, or create table for example. Any heads up on this front?

About Region Split

2011-08-11 Thread Xian Woo
HI, everyone, I have some question about the region-split mechanism. I've learnt that a region contains several rows and when a store file in a region grows large than the configured hbse.hregion.max.filesize, the region is split in two. But what if I keep putting millions of values into a

RE: Finding the trace of a query

2011-08-11 Thread Ramkrishna S Vasudevan
Hi Create table will be done by hBaseAdmin. On every table object you will have put and get. Read the functionality of HBase and documentation on HBase. It will help you. Regards Ram -Original Message- From: Anurag Awasthi [mailto:anuragawasth...@gmail.com] Sent: Thursday, August

Re: Finding the trace of a query

2011-08-11 Thread Anurag Awasthi
Thanks for your reply Woo. However as I just posted to Ram, I aiming to modify the hbase source code itself, and see the lines executed when I pass any query in the Hbase shell, select *, or create table for example. Any heads up on this front?

Re: Finding the trace of a query

2011-08-11 Thread Xian Woo
No offense, but may I ask whether you want to modify the Hbase source code? If you just want to provide some SQL operations to others using Hbase, maybe you can use filters to realize some basic SQL operations. If you are interested in filters, please read some API documentation on HBase. With

RE: Filters for non-Java clients?

2011-08-11 Thread Steinmaurer Thomas
Ah, cool! Thanks, Thomas -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Mittwoch, 10. August 2011 13:28 To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Filters for non-Java clients? See HBase-4176. Cheers On Aug 10, 2011, at 2:36 AM,

Re: Finding the trace of a query

2011-08-11 Thread Anurag Awasthi
Actually I wanted to modify the source code, and on the long run aim to make some structural changes in an attempt to optimise the query processing approaches for a specific hardware. In this process I was looking forward to the program flow for a query in the source code. Regards, Anurag On

Re: Finding the trace of a query

2011-08-11 Thread Doug Meil
Hi there- Please see the Hbase book. http://hbase.apache.org/book.html This has a chapter on developing with Hbase. As for SQL, that's not supported directly because Hbase is a NoSQL database (see the Data Model chapter). Doug On 8/11/11 6:44 AM, Anurag Awasthi anuragawasth...@gmail.com

Re: Mongo vs HBase

2011-08-11 Thread Fuad Efendi
Sorry for off topic, butŠ just as a sample to understand fundamental difference: 1. SELECT COUNT will take few hours on MySQL InnoDB in most typical cases, and _it_is_ implemented. 2. Same with HBase: full table scan. However, with MapReduce it might take less time. Or, we can query Solr

ScannerTimeoutException during MapReduce

2011-08-11 Thread Jan Lukavský
Hi, we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, which resolved most of our previous issues, but we are now having much more ScannerTimeoutExceptions than before. All these exceptions come from trace like this org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms

Re: About Region Split

2011-08-11 Thread Stack
On Thu, Aug 11, 2011 at 3:38 AM, Xian Woo infinity0...@gmail.com wrote: HI, everyone, I have some question about the region-split mechanism.  I've learnt that a region contains several rows and  when a store file in a region grows large than the configured hbse.hregion.max.filesize, the region

Re: Check permissions on unix filesystem?

2011-08-11 Thread Stack
Mind making an issue and pasting full stack traces with some surrounding log. My guess is we likely will do same in 0.90. Your snippets will help us figure where to dig in. Thanks Matthias, St.Ack On Thu, Aug 11, 2011 at 2:33 AM, Matthias Hofschen hofsc...@gmail.com wrote: Hi, we had an

Re: Coprocessors and batch processing

2011-08-11 Thread Lars
Coprocessors are only in trunk and will be part of 0.92. If you want to use them now you'll have to build hbase from source yourself. -- Lars Xian Woo infinity0...@gmail.com schrieb: Excuse me, I‘ve got a question here. Can we use coprocessor in hbase-0.90-*? I've heard of a saying that we

RE: corrupt .logs block

2011-08-11 Thread Geoff Hendrey
so I delete the corrpupt .logs files. OK, fine no more issue there. But a handful of regions in a very large table (2000+ regions) are offline (.META. says offline=true). How do I go about trying to get the region online, and how come restarting hbase has no effect (region still offline).

Re: Coprocessors and batch processing

2011-08-11 Thread lars hofhansl
Thanks Himanshu, but that is not quite what I meant. Yes, a batch operation is broken up in chunks per regionserver and then the chunks are shipped to the individual regionservers. But then there is no way to interact with those chunks at the regionserver through coprocessors(as a whole).

Avatar namenode?

2011-08-11 Thread shanmuganathan.r
Hi All, I am running the HBase distributed mode in seven node cluster with backup master. The HBase is running properly in the backup master environment. I want to run this HBase on top of the High Availability Hadoop. I saw about Avatar node in the following link

Re: Coprocessors and batch processing

2011-08-11 Thread Gary Helmling
On Wed, Aug 10, 2011 at 10:46 PM, lars hofhansl lhofha...@yahoo.com wrote: I guess there could either be a {pre|post}Multi on RegionObserver (although HRegionServer.multi does a lot of munging). Or maybe a general {pre|post}Request with no arguments - in which case it would be at least

Re: Coprocessors and batch processing

2011-08-11 Thread Himanshu Vashishtha
Hey Lars, Sorry if I have mislead you. The current Coprocessor infrastructure is at _Region_ level, not at _RegionServer_ level. All these batch operations are ultimately ends up at some rows in some Regions, where you have hooked your CPs. I am not able to follow your example. If you end up

Re: Finding the trace of a query

2011-08-11 Thread Ryan Rawson
Why not just read the source code? It isnt that many LOC, and it doesnt really use anything that obscures the call chain, few interfaces, etc. A solid IDE with code inspection will make short work of it, just go at it! Start at HRegionServer - it has the top level RPC calls that are made.

Re: corrupt .logs block

2011-08-11 Thread Jinsong Hu
I run into same issue. I tried check_meta.rb --fix and add_table.rb, and still get the same hbck inconsistent table, however, I am able to do a rowcount for the table and there is no problem. Jimmy -- From: Geoff Hendrey ghend...@decarta.com

Re: Coprocessors and batch processing

2011-08-11 Thread lars hofhansl
Thanks Gary (and Himanshu), correct, using batch operations is just an optimization and not a semantic difference to single row ops. Should then all RPC triggered by a coprocessor be avoided (and hence the use the env-provided HTableInterface be generally discouraged)? The 2ndary index

Why RowFilter plus BinaryPrefixComparator solution is so slow

2011-08-11 Thread Allan Yan
Hello, We need to do range query from 20 million rows table. I thought using RowFilter with BinaryPrefixComparator will help. It gets the result correctly. However, the speed is not acceptable. Here is the code snippet: 1. Scan s = new Scan(); 2. s.addFamily(myFamily); 3.

RE: corrupt .logs block

2011-08-11 Thread Geoff Hendrey
Hey - Our table behaves fine until we try to do a mapreduce job that reads and writes from the table. When we try to retrieve keys from the afflicted regions, the job just hangs forever. It's interesting because we never get timeouts of any sort. This is different than other failures we've seen

Re: Why RowFilter plus BinaryPrefixComparator solution is so slow

2011-08-11 Thread Gary Helmling
On Thu, Aug 11, 2011 at 2:20 PM, Allan Yan hailun...@gmail.com wrote: Hello, 1. Scan s = new Scan(); 2. s.addFamily(myFamily); 3. s.setStartRow(startRow); 4. Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(startRow)); 5. s.setFilter(rowFilter);

Re: ScannerTimeoutException during MapReduce

2011-08-11 Thread Jean-Daniel Cryans
Usual reasons would be like a mix of taking a long time to process rows in the mapper and scanners that grab a lot of rows (using scanner caching and maybe filters). Do you enable DEBUG for HBase in your mapreduce context? This would give relevant information like if the client was doing lots of

Re: Using HTable.batch - Still room for performance improvements?

2011-08-11 Thread Jean-Daniel Cryans
I'm sure you've already seen this but just to be sure, do read http://hbase.apache.org/book/perf.writing.html Auto-flush is always on unless you turn it off yourself. Using HTable.batch still respects this except that it flushes all the rows at the same time. If you have fat values to insert and

Re: corrupt .logs block

2011-08-11 Thread Jinsong Hu
as I said, run hbase org.jruby.Main add_table.rb table_name first, then run hbase org.jruby.Main check_meta.rb --fix then restart hbase. It doesn't completely solve problem for me, as hbck still complains. but at least it recovers all data and I can do full rowcount for the table. Jimmy.

Re: Why RowFilter plus BinaryPrefixComparator solution is so slow

2011-08-11 Thread Allan Yan
Hey Gary, Thanks a lot! With WhileMathFilter, I am able to get the same performance as stopRow solution. And if the search is in the same HBase cluster, I am able to get the result in a few milliseconds with 6 nodes cluster. On Thu, Aug 11, 2011 at 2:26 PM, Gary Helmling ghelml...@gmail.com

Re: Using HTable.batch - Still room for performance improvements?

2011-08-11 Thread Doug Meil
One thing you might want to look at is HTableUtil. It's on trunk, but you can look at the source and port it to whatever version you are using. We've found that region-sorting helps a lot by minimizing the number of RS calls in any given flush. On 8/11/11 5:57 PM, Jean-Daniel Cryans

Loading RowKeys as Binary into HBase (instead of storing key as a String)

2011-08-11 Thread mkrupa
I am trying to bulk load data from HDFS location into an HBase table. My key is an IP address stored as a string in the HDFS location. How can I load this IP address as a binary while doing a bulk load? -- View this message in context:

RE: corrupt .logs block

2011-08-11 Thread Geoff Hendrey
Thanks, check_meta.rb stack traces with NPE... [hroot@doop10 bin]$ hbase org.jruby.Main check_meta.rb Writables.java:75:in `org.apache.hadoop.hbase.util.Writables.getWritable': java.lang.NullPointerException: null (NativeException) from Writables.java:119:in