Re: Problems with scan after lot of Puts

2012-05-30 Thread Ondřej Stašek
Here it is: http://pastebin.com/0AgsQjur On 29.5.2012 22:44, Jean-Daniel Cryans wrote: Care to share that TestPutScan? Just attach it in a pastebin Thx, J-D On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek ondrej.sta...@firma.seznam.cz wrote: My program writes changes to HBase table by

Re: performance of a hbase map/reduce job

2012-05-30 Thread Ey-Chih chow
It's a 3 node cluster running zookeeper-3.3.3 no modifications have been made by us. This is separate and dedicated only for hbase. We were looking at the ganglia graphs, not much load almost none, traffic is 1-2kb/sec. Thanks. Ey-Chih Chow On May 29, 2012, at 7:39 PM, Ted Yu wrote: Can you

Re: performance of a hbase map/reduce job

2012-05-30 Thread Ey-Chih chow
By the way, the number of maximal client connections is set to default value, i.e. 60. Does this matter? Thanks. Ey-Chih Chow On May 29, 2012, at 11:48 PM, Ey-Chih chow wrote: It's a 3 node cluster running zookeeper-3.3.3 no modifications have been made by us. This is separate and

Distinct counters and counting rows

2012-05-30 Thread David Koch
Hello, I am testing HBase for distinct counters - more concretely, counting unique users from a fairly large stream of user_ids. For some time to come the volume will be limited enough to use exact counting rather than approximation but already it's too big to hold the entire set of user_ids in

Re: HBase dies after some time

2012-05-30 Thread Harsh J
You may colocate your ZK with the HBase Master as its not very heavy. Depending on your cluster size, 1-3 may be enough and you can divide it among HBM, SNN and perhaps NN/JT machines. On Wed, May 30, 2012 at 2:54 AM, Something Something mailinglist...@gmail.com wrote: Hmm.. due to budget

RE: Distinct counters and counting rows

2012-05-30 Thread Ramkrishna.S.Vasudevan
To answer this question Alternatively, is there a way to trigger an increment in another table (say count) whenever a row was added to user? You can try to use Coprocessors here. Like once a put is done to the table 'user' using the coprocessor hooks you can trigger an Increment() operation on

Testing HBase Performance and Scalability

2012-05-30 Thread Konrad Tendera
Hello, Recently I'm trying to launch some performance tests using Apache tool: http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation, but I'm facing many problems. First I have to add that we're using HBase 0.92 with Hadoop 0.22. When I'm launching tool I get following error:

Re: Testing HBase Performance and Scalability

2012-05-30 Thread Andrew Purtell
You will need to recompile HBase against version 0.22 and use the resulting jars / tarball instead. On May 30, 2012, at 2:51 AM, Konrad Tendera ema...@tendera.eu wrote: Hello, Recently I'm trying to launch some performance tests using Apache tool:

Re: Efficient way to read a large number of files in S3 and upload their content to HBase

2012-05-30 Thread Marcos Ortiz Valmaseda
Like I said before, I need to store all click streams of a advertising network to do later deep analysis for this huge data. We want to store in two places: - first to Amazon S3 - then to HBase But I think that wen don't need S3 if we can store in a proper HBase cluster using the asynchbase

HBase to MapReduce Scans missing rows

2012-05-30 Thread Whitney Sorenson
We have been using HBase Scans to feed MapReduce jobs for over a year now. However, on close inspection, we have seen instances where some block of rows are inexplicably missing. We thought that this may happen during region splits or with jobs with many mappers, but we have seen, for example,

Re: Scan addFamily vs FamilyFilter(EQUAL, ...)

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 9:59 AM, Kevin kevin.macksa...@gmail.com wrote: I am curious and trying to learn which method is best when wanting to limit a scan to a particular column or column family. The Scan class carries a Filter instance and a TreeMap of the family map and I am unsure how they

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 9:37 AM, Whitney Sorenson wsoren...@hubspot.com wrote: We have been using HBase Scans to feed MapReduce jobs for over a year now. However, on close inspection, we have seen instances where some block of rows are inexplicably missing. We thought that this may happen

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Whitney Sorenson
HBase Version 0.90.4-cdh3u2, rHBase version and svn revision HBase Compiled Thu Oct 13 20:32:26 PDT 2011, jenkins When HBase version was compiled and by whom Hadoop Version 0.20.2-cdh3u2, r95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Hadoop version and svn revision Hadoop Compiled

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 10:45 AM, Whitney Sorenson wsoren...@hubspot.com wrote: HBase Version   0.90.4-cdh3u2, r        HBase version and svn revision HBase Compiled  Thu Oct 13 20:32:26 PDT 2011, jenkins   When HBase version was compiled and by whom Hadoop Version  0.20.2-cdh3u2,

Re: Problems with scan after lot of Puts

2012-05-30 Thread Jean-Daniel Cryans
I'm running it here, but I just remembered about this issue: HTable.ClientScanner needs to clone the Scan object https://issues.apache.org/jira/browse/HBASE-4891 And since you are reusing that Scan object, you could definitely hit this issue. J-D On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek

Re: Problems with scan after lot of Puts

2012-05-30 Thread Jean-Daniel Cryans
There you go: 12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting at row=testtable,,00 for max=10 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af 12/05/30 18:54:17 DEBUG client.HConnectionManager$HConnectionImplementation:

Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table

2012-05-30 Thread anil gupta
@Anoop: We recently finished out first phase of POC. It went quite well. Now, we are trying to see which all features we are going to use for final implementation. We are still in research mode trying out different options. We are also trying out LZO and Snappy compression algos. Yes, in my POC V1

Deployment Best Practices

2012-05-30 Thread Peter Naudus
Hello All, Is there a community standard / best way to deploy HBase to a cluster? We're in the process of setting up a ~15 node cluster and I'm curious how you all go about your deployments. Do you package the code into an RPM, place it into a central YUM repository, and then drive the

Re: Deployment Best Practices

2012-05-30 Thread Elliott Clark
I've used puppet for the job. Adobe posted a set of puppet scripts a while back that would deploy a tar.gz. They were my starting point (here: http://hstack.org/hstack-automated-deployment-using-puppet/). Those configs and values are pretty old so make sure you look at the docs for whatever

Re: Deployment Best Practices

2012-05-30 Thread Andrew Purtell
We package into RPMs and manage configuration with Puppet. Have a look at Apache Bigtop (incubating) for RPM and DEB package build harness and integration/smoke tests. It's a promising project. You can also Google around for presentations on this topic by Arvind at StumbleUpon. He talks

Re: Distinct counters and counting rows

2012-05-30 Thread Andrew Purtell
A common question about HBase is if statistics on row index cardinality are maintained. The short answer is no, because in some sense each HBase table region is its own database, and each region is partly in memory and partly (log structured) on disk, including perhaps tombstones, so discovering

Re: Distinct counters and counting rows

2012-05-30 Thread Andrew Purtell
I should add that getting an exact count at open time would be expensive and probably not necessary. On Wednesday, May 30, 2012, Andrew Purtell wrote: A common question about HBase is if statistics on row index cardinality are maintained. The short answer is no, because in some sense each

Re: Issues with Java sample for connecting to remote Hbase

2012-05-30 Thread Christian Schäfer
Hi, just double check that you use the correct IP / hostname by comparing to what ifconfig output and that hostname is resolved to correct ip by arp honeywel-4a7632 or ping honeywel-4a7632. I recommend to uncomment the IPV6 line in your hosts files because I experienced trouble with such

RE: Scan addFamily vs FamilyFilter(EQUAL, ...)

2012-05-30 Thread Buttler, David
One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other. So, if you do Scan.addFamily(a), and the family contains qualifiers x, y, and z; and then do Scan.addColumn(a,x), you will not get the columns y and z back. Similarly, if you

Re: RefGuide updated

2012-05-30 Thread Doug Meil
Gotcha. Will do. On 5/25/12 2:23 PM, anil gupta anilgupt...@gmail.com wrote: Hi Doug, Nice work. I went through the bulk loader part. It would be great if you can incorporate a note on loading a file with separator other than tab character. Here is the mailing list discussion regarding

Re: Scan addFamily vs FamilyFilter(EQUAL, ...)

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 5:38 PM, Buttler, David buttl...@llnl.gov wrote: One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other.  So, if you do Scan.addFamily(a), and the family contains qualifiers x, y, and z; and then do