Re: rest leaking ZooKeeper connections

2011-10-26 Thread Jean-Daniel Cryans
Joshua, I don't know if you are still around, but I took another look at this and found a big culprit: https://issues.apache.org/jira/browse/HBASE-4684 J-D On Tue, Sep 20, 2011 at 7:13 AM, Joshua Napoli jnap...@swipely.com wrote: In our HBase 0.90.0 installation, the rest interface quickly

Re: What happens if HFile contains the key that already present?

2011-10-25 Thread Jean-Daniel Cryans
Keep in mind that the timestamp is part of the key, and in your descriptions you talk about key but in HBase there's multiple keys: row key, family, qualifier and timestamp. The rule in HBase is that the highest timestamp wins, so when you import your HFiles they should normally have a higher

Re: Lease does not exist exceptions

2011-10-24 Thread Jean-Daniel Cryans
:34 AM, Eran Kutner e...@gigya.com wrote: Perfect! Thanks. -eran On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans jdcry...@apache.org wrote: hbase.regionserver.lease.period Set it bigger than 6. J-D On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner e...@gigya.com wrote

Re: Lease does not exist exceptions

2011-10-24 Thread Jean-Daniel Cryans
have other suggestions, please let me know! Thanks, Lucian On Mon, Oct 24, 2011 at 8:00 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Did you restart the region servers after changing the config? Are you sure it's the same exception/stack trace? J-D On Mon, Oct 24, 2011 at 8:04 AM

Re: data mining

2011-10-24 Thread Jean-Daniel Cryans
needs. -Jignesh On Mon, Oct 24, 2011 at 2:49 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: That's a 250$/hr type of question, I don't think you'll get much help here unless you have some more specific questions or someone feels _really_ generous of their time. My free tip is going

Re: HMaster issues

2011-10-20 Thread Jean-Daniel Cryans
Message - From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org; Ben West bwsithspaw...@yahoo.com Cc: Sent: Tuesday, October 18, 2011 1:23 PM Subject: Re: HMaster issues This line: java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem

Re: If I run Hive queries on external HBase tables, then does blooms will be used?

2011-10-20 Thread Jean-Daniel Cryans
(this is a usage question, putting dev@ in BCC and sending to user@) Yes, it will as it's in run inside the region servers. J-D On Thu, Oct 20, 2011 at 11:40 AM, AnilKumar B akumarb2...@gmail.com wrote: Hi I have created HBase table and setted the bloom filter to some column family. And I

Re: Couple of interesting observations for Hbase 0.90.4.

2011-10-20 Thread Jean-Daniel Cryans
This is a discussion that comes every once in a while, not sure if we have a FAQ entry or in the book... But anyways, short answer is putting your computer to sleep is not supported. Long answer is that putting your computer to sleep is like partitioning that node from the rest of the cluster,

Re: Lease does not exist exceptions

2011-10-20 Thread Jean-Daniel Cryans
On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner e...@gigya.com wrote: Hi J-D, Thanks for the detailed explanation. So if I understand correctly the lease we're talking about is a scanner lease and the timeout is between two scanner calls, correct? I think that make sense because I now realize

Re: Lease does not exist exceptions

2011-10-20 Thread Jean-Daniel Cryans
to configure the lease timeout? -eran On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner e...@gigya.com wrote: Hi J-D, Thanks for the detailed explanation. So if I understand correctly the lease we're talking about

Re: release 0.90.5

2011-10-20 Thread Jean-Daniel Cryans
When it's ready :) I would love to take a look at the list of opened jiras but it's been broken for a few hours for me. Anyways Stack was mentioning this morning that we should do it pretty soon, I agree. At SU we're running in prod something that's close to what 0.90.5 would be, so we know it's

Re: Error in stopping zookeeper

2011-10-19 Thread Jean-Daniel Cryans
If you started ZK via HBase, use bin/hbase-daemons.sh stop zookeeper As you can see the stop command you are using doesn't know about the process it should be looking for... J-D On Wed, Oct 19, 2011 at 4:38 AM, Arsalan Bilal charsalanbi...@gmail.comwrote: i am trying to stop zookeeper but

Re: data loss when splitLog()

2011-10-19 Thread Jean-Daniel Cryans
, archiveLogs would excute even before writeThread end. 2011/10/19 Jean-Daniel Cryans jdcry...@apache.org Even if the files aren't closed properly, the fact that you are appending should persist them. Are you using a version of Hadoop that supports sync? Do you have logs that show

Re: HMaster issues

2011-10-18 Thread Jean-Daniel Cryans
This line: java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.getFileLength() Is because it's using the local filesystem and not HDFS (so it's expected). As far as I can tell the master starts ok, maybe the client is trying to use the ZK port 2181?

Re: Increase number of reducers for bulk data load to empty HBase table

2011-10-18 Thread Jean-Daniel Cryans
(putting dev@ in bcc, please don't cross-post) You need to pre-split that table, see http://hbase.apache.org/book.html#precreate.regions J-D On Tue, Oct 18, 2011 at 2:00 AM, Matthew Tovbin matt...@tovbin.com wrote: Hello, Guys, I'm willing to bulk load data from hdfs folders into HBase, for

Re: HBase growing after issuing alter command with TTL and COMPRESSION

2011-10-18 Thread Jean-Daniel Cryans
First thing about TTL, it's specified in seconds (whereas everything else in HBase is ms) so watch out for that. Regarding the size of your table, what you are asking is hard to answer without having access to your machine. What actually grew? The store files? Or is new store files due to the

Re: Lease does not exist exceptions

2011-10-18 Thread Jean-Daniel Cryans
Actually the important setting is: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setCaching(int) The decides how many rows are fetched each time the client exhausts its local cache and goes back to the server. Reasons to have setCaching low: - Do you have a filter

Re: data loss when splitLog()

2011-10-18 Thread Jean-Daniel Cryans
Even if the files aren't closed properly, the fact that you are appending should persist them. Are you using a version of Hadoop that supports sync? Do you have logs that show the issue where the logs were moved but not written? Thx, J-D On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng

Re: Snappy for 0.90.4

2011-10-17 Thread Jean-Daniel Cryans
On Fri, Oct 14, 2011 at 12:38 AM, Yves Langisch y...@langisch.ch wrote: Hi, Is it right that there is no official release supporting Snappy yet? Yep. In my case I'd like to use it with 0.90.4 but HBASE-3691 [1] is for 0.92.0 only. Looks easy enough to backport, it may even apply

Re: Reduce-side-join, input from hbase and hdfs

2011-10-17 Thread Jean-Daniel Cryans
You cannot have 2 input formats, so at this point you need to write your own input format that is both an input format for HDFS files and HBase. Currently there's no MultipleTableInputFormat, although it wouldn't solve your problem because it won't take HDFS inputs. Your other option sounds

Re: Error while running several MapReduce jobs

2011-10-17 Thread Jean-Daniel Cryans
It says the connection was already closed, anything else before those lines in the log? J-D On Mon, Oct 17, 2011 at 10:21 AM, JohnJohnGa johnjoh...@gmail.com wrote: I run HBase standalone mode (hbase-0.90.4 (stable)). When I launch in a loop several MapReduce Job and then I try to put

Re: setting ulimit in os Lion X

2011-10-17 Thread Jean-Daniel Cryans
http://serverfault.com/questions/15564/where-are-the-default-ulimits-specified-on-os-x-10-5 J-D On Mon, Oct 17, 2011 at 10:51 AM, Jignesh Patel jigneshmpa...@gmail.comwrote: While setting Hbase there is a article to setup limit for the file. However I didn't find any appropriate command to

Re: setting ulimit in os Lion X

2011-10-17 Thread Jean-Daniel Cryans
That's usually the first problem a user will hit, so we're very up-front. J-D On Mon, Oct 17, 2011 at 11:22 AM, Jignesh Patel jigneshmpa...@gmail.comwrote: Harsh, As you understand from some other post we have exchanged. I am at very early stage of evaluating hadoop,base. We will start the

Re: setting ulimit in os Lion X

2011-10-17 Thread Jean-Daniel Cryans
as Harsh said in test it will work without changing anything even with the limit open files of 256. Is that correct? _jginesh On Mon, Oct 17, 2011 at 2:27 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: That's usually the first problem a user will hit, so we're very up-front. J-D

Re: FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append

2011-10-14 Thread Jean-Daniel Cryans
Also make sure this isn't an exception that came after a bunch of others, look for the first FATAL. J-D On Fri, Oct 14, 2011 at 2:25 AM, Ramkrishna S Vasudevan ramakrish...@huawei.com wrote: Hi Are you sure there was no insert. The stack trace shows a put operation was going on. You can

Re: Hbase 0.90.4 Matser is shutting down while starting

2011-10-14 Thread Jean-Daniel Cryans
Intuitively I would say that you didn't put the hadoop 0.20.204 jar in the HBase lib folder (or you think you did but it's using another one) and it's getting a version mismatch error that the Namenode isn't kind enough to send back. Taking a look at the namenode log would confirm that. You also

Re: problem in starting in Hbase

2011-10-14 Thread Jean-Daniel Cryans
:567) 6. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) On Thu, Oct 13, 2011 at 1:33 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Have you done what the exception message tells you to do? It's related to ZooKeeper, not HDFS. J-D

Re: HTable.autoFlush not exposed via Thrift?

2011-10-13 Thread Jean-Daniel Cryans
On Thu, Oct 13, 2011 at 8:17 AM, Norbert Burger norbert.bur...@gmail.comwrote: Thanks Ted, I created https://issues.apache.org/jira/browse/HBASE-4586. In the ticket, I mentioned setting hbase.client.write.buffer as a workaround, but unfortunately it doesn't seem that autoflush (which I

Re: Hive+HBase performance is much poorer than Hive+HDFS

2011-10-13 Thread Jean-Daniel Cryans
the execution of your hive query. Cheers, Akash On Wed, Oct 12, 2011 at 8:34 AM, Weihua JIANG weihua.ji...@gmail.com wrote: Since I am using Hive to perform query, I don't know how to set it. Can you tell me how to do so? Thanks Weihua 2011/10/12 Jean-Daniel Cryans jdcry

Re: problem in starting in Hbase

2011-10-13 Thread Jean-Daniel Cryans
Have you done what the exception message tells you to do? It's related to ZooKeeper, not HDFS. J-D On Wed, Oct 12, 2011 at 2:58 PM, Jignesh Patel jigneshmpa...@gmail.comwrote: I have following setup in my hbase-site.xml configuration property namehbase.rootdir/name

Re: HTable.autoFlush not exposed via Thrift?

2011-10-13 Thread Jean-Daniel Cryans
On Thu, Oct 13, 2011 at 1:47 PM, Norbert Burger norbert.bur...@gmail.comwrote: BTW, can your app support the semantics of having auto flush off? What's your use case like? By semantics, do you mean the possibility that unflushed data might be lost if the client goes away? Yep, or if

Re: Hive+HBase performance is much poorer than Hive+HDFS

2011-10-11 Thread Jean-Daniel Cryans
This is one big factor and you didn't mention configuring it: http://hbase.apache.org/book.html#perf.hbase.client.caching J-D On Tue, Oct 11, 2011 at 7:47 PM, Weihua JIANG weihua.ji...@gmail.comwrote: Hi all, I have made some perf test about Hive+HBase. The table is a normal 2D table with

Re: Hive+HBase performance is much poorer than Hive+HDFS

2011-10-11 Thread Jean-Daniel Cryans
: Since I am using Hive to perform query, I don't know how to set it. Can you tell me how to do so? Thanks Weihua 2011/10/12 Jean-Daniel Cryans jdcry...@apache.org: This is one big factor and you didn't mention configuring it: http://hbase.apache.org/book.html#perf.hbase.client.caching

Re: Question on timestamp, timeranges

2011-10-10 Thread Jean-Daniel Cryans
On Thu, Oct 6, 2011 at 10:10 PM, Steinmaurer Thomas thomas.steinmau...@scch.at wrote: Hard to tell without really knowing what you're trying to do, but my default answer is no. If the timestamp is part of your data model, it should be inside your row key or a column. It's part of our

Re: Adjusting column value size.

2011-10-06 Thread Jean-Daniel Cryans
(BCC'd common-user@ since this seems strictly HBase related) Interesting question... And you probably need all those ints at the same time right? No streaming? I'll assume no. So the second solution seems better due to the overhead of storing each cell. Basically, storing one int per cell you

Re: Question on timestamp, timeranges

2011-10-06 Thread Jean-Daniel Cryans
We had a discussion about timestamps recently, and like I was saying there the general rule is not to try using the timestamps. About the other questions: On Thu, Oct 6, 2011 at 6:03 AM, Steinmaurer Thomas thomas.steinmau...@scch.at wrote: Hello, we think about using the internal

Re: HBase write latency issues with hdfs replication of 3

2011-10-06 Thread Jean-Daniel Cryans
Inline at the end. J-D On Tue, Oct 4, 2011 at 11:06 AM, Ronen Itkin ro...@taykey.com wrote: Hi all! I am getting really bad write performance when writing to HBase that relies on hdfs (with replection parameter of 3) in Amazone Web Services environment. I am using Cloudera CDH3u1

Re: How to retrieve all columns of a CF and adding it in a put call

2011-10-06 Thread Jean-Daniel Cryans
Well you need to insert all the columns so yes you need to iterate them all. There's a shorter way to do it tho, look at the Import class in the HBase code: private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException { Put put = new Put(key.get());

Re: question about writing to columns with lots of versions in map task

2011-10-06 Thread Jean-Daniel Cryans
, schrieb Jean-Daniel Cryans: Maybe try a different schema yeah (hard to help without knowing exactly how you end up overwriting the same triples all the time tho). Setting timestamps yourself is usually bad yes. J-D On Tue, Oct 4, 2011 at 7:14 AM, Christopher Dorner christopher.dor

Re: Performance characteristics of scans using timestamp as the filter

2011-10-06 Thread Jean-Daniel Cryans
(super late answer, I'm cleaning up my old unread emails) This sort of sounds like what Mozilla did for the crash reports. The issue with your solution is when you're looking to get only a small portion of your whole dataset you still have to go over the rest of the data to reach it. So if you

Re: WAN HBase Replication

2011-10-04 Thread Jean-Daniel Cryans
Hey Jeff, Usually what people do is either setup a VPN between both datacenters or get a point-to-point connection. We're doing the former. J-D On Tue, Oct 4, 2011 at 1:19 PM, Jeff Whiting je...@qualtrics.com wrote: We have 2 data centers (lets call them A and B), one on the west coast and

Re: WAN HBase Replication

2011-10-04 Thread Jean-Daniel Cryans
? ~Jeff On 10/4/2011 3:10 PM, Jean-Daniel Cryans wrote: Hey Jeff, Usually what people do is either setup a VPN between both datacenters or get a point-to-point connection. We're doing the former. J-D On Tue, Oct 4, 2011 at 1:19 PM, Jeff Whitingje...@qualtrics.com  wrote: We have 2 data

Re: Best way to write to multiple tables in one map-only job

2011-10-04 Thread Jean-Daniel Cryans
. What about autoflush then? Is that also something i can set using the config on job setup? Or does it onyl work with an HTable instance? Somehow i can't really find the right information :) Regards, Christopher Am 03.10.2011 19:20, schrieb Jean-Daniel Cryans: Option a) and b) are the same

Re: question about writing to columns with lots of versions in map task

2011-10-04 Thread Jean-Daniel Cryans
of data. Thank you, Christopher Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans: I would advise against setting the timestamps yourself and instead reduce in order to prune the versions you don't need to insert in HBase. J-D On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner

Re: HBase minus shell?

2011-10-04 Thread Jean-Daniel Cryans
Yeah the shell is really just a wrapper around HTable and HBaseAdmin, both the REST and Thrift servers also do those functions (but they are also missing a few). J-D On Tue, Oct 4, 2011 at 2:46 PM, Joe Pallas joseph.pal...@oracle.com wrote: Could you manage an HBase deployment without the HBase

Re: Strange behavior on scan while writing

2011-10-04 Thread Jean-Daniel Cryans
I tried it a bunch of times, even trying it on a table that was splitting all the time, and I wasn't able to generate hit that situation. One way was able to do it is by truncating the table, the count was abruptly hitting the end of the table since it became empty all of a sudden :) J-D On

Re: HBase put.heapSize()

2011-10-03 Thread Jean-Daniel Cryans
Have you looked at the code? You should also take a look at TestHeapSize where we compare the estimated size versus the heapSize and AFAIK it passes. J-D On Mon, Oct 3, 2011 at 1:18 AM, lakshmi ponnapalli lakshmiponnapa...@gmail.com wrote: Hi, I noticed that put.heapSize() is bloating data by

Re: Best way to write to multiple tables in one map-only job

2011-10-03 Thread Jean-Daniel Cryans
Option a) and b) are the same since MultiTableOutputFormat internally uses multiple HTables. See for yourself: https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java Also you can set the write buffer but setting

Re: Backing-up HBASE

2011-10-03 Thread Jean-Daniel Cryans
I saw one broken link (the Mozilla backup tool), but the rest works and the explanations are there. Currently there's not a single perfect way that's fast yet secure to use, so it would be very difficult to know which one to recommend without first knowing which tradeoffs you're willing to make.

Re: question about writing to columns with lots of versions in map task

2011-10-03 Thread Jean-Daniel Cryans
I would advise against setting the timestamps yourself and instead reduce in order to prune the versions you don't need to insert in HBase. J-D On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner christopher.dor...@gmail.com wrote: Hi again, i think i solved my issue. I simply use the byte

Re: storefileIndexsize

2011-09-30 Thread Jean-Daniel Cryans
From the discussion in HBASE--3551, you can compute the numbers you need. This comment is important: https://issues.apache.org/jira/browse/HBASE-3551?focusedCommentId=13005272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13005272 You can use the HFile tool too on

Re: setTimeRange for HBase Increment

2011-09-29 Thread Jean-Daniel Cryans
My advice usually regarding timestamps is if it's part of your data model, it should appear somewhere in an HBase key. 99% of the time overloading the HBase timestamps is a bad idea, especially with counters since there's auto-pruning done in the Memstore! I would suggest you make time part of

Re: Installing as standalone on mac

2011-09-29 Thread Jean-Daniel Cryans
Have you taken a look at the online book? Like this: http://hbase.apache.org/book/getting_started.html J-D On Thu, Sep 29, 2011 at 9:18 PM, gerberdata dger...@socal.rr.com wrote: I am trying to install hbase and hadoop on as standalone on mac.  Is there any tutorial that will show me the best

Re: dfs.datanode.max.xcievers 4k

2011-09-28 Thread Jean-Daniel Cryans
the number of Threads its running... Thanks! Rob On Sep 27, 2011, at 12:50 PM, Jean-Daniel Cryans wrote: On Tue, Sep 27, 2011 at 12:31 PM, Robert J Berger rber...@runa.com wrote: Its not enough. We're still having errors and it caused a regionserver to shutdown again. No data loss

Re: Recommended backup/restore solution for hbase

2011-09-28 Thread Jean-Daniel Cryans
I'd suggest you first read this: http://blog.sematext.com/2011/03/11/hbase-backup-options/ Then review your questions and follow up with the list. J-D On Wed, Sep 28, 2011 at 9:15 AM, Vinod Gupta Tankala tvi...@readypulse.com wrote: Hi, Can someone answer these basic but important questions

Re: HBql query performance

2011-09-27 Thread Jean-Daniel Cryans
Inline. J-D On Mon, Sep 26, 2011 at 4:14 PM, Joan Han joan...@gmail.com wrote: Hi, Has anyone used HBql ? I don't see many discussion on this in the mailing list. Thought to ask around to see if anyone has opinion on the usage. Yeah not a lot. Here is my question: 1) Can HBql be used

Re: Java client throws WrongRegionException but same key accessible via hbase shell

2011-09-27 Thread Jean-Daniel Cryans
with remaining inconsistencies. thanks vinod On Tue, Sep 27, 2011 at 11:05 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Hi vinod, Yeah WREs are never fun, hopefully we can help you fixing it. First, about the difference when querying from the shell and your java client.  - Is it a long

Re: Using TTL tout purge data automatically ?

2011-09-23 Thread Jean-Daniel Cryans
From the book http://hbase.apache.org/book/ttl.html ColumnFamilies can set a TTL length in seconds and you have: TTL = '60' TTL = '30' It's just three orders of magnitude different from what you thought you set the TTL to :) J-D On Fri, Sep 23, 2011 at 2:22 AM, Damien Hardy

Re: Hbase-Hive integration performance issues

2011-09-19 Thread Jean-Daniel Cryans
(replying to user@, dev@ in BCC) AFAIK the HBase handler doesn't have the wits to understand that you are doing a prefix scan and thus limit the scan to only the required rows. There's a bunch of optimizations like that that need to be done. I'm pretty sure Pig does the same thing, but don't

Re: Unexpected shutdown of Zookeeper

2011-09-19 Thread Jean-Daniel Cryans
I think this is just: https://issues.apache.org/jira/browse/HBASE-3130 J-D On Sun, Sep 18, 2011 at 10:15 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi All, I was running a 2 node cluster with 1 zookeeper node and 2 region server node. I had also setup cluster replication with another

Re: REcovering from SocketTimeout during scan in 90.3

2011-09-19 Thread Jean-Daniel Cryans
it directly to me, and if I do find something I'll post the findings back here. Thanks, J-D On Fri, Sep 16, 2011 at 10:58 PM, Douglas Campbell deegs...@yahoo.com wrote: Answers below. From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org

Re: HBase Master not picking up dead regionserver

2011-09-16 Thread Jean-Daniel Cryans
This happens often to users with a broken reverse DNS setup, look at the master log around when it was supposed to process the dead node and it should tell you that it doesn't know who that is (because the server name it sees is different from the one registered in the master). One example from

Re: REcovering from SocketTimeout during scan in 90.3

2011-09-16 Thread Jean-Daniel Cryans
Yeah this should be at the HTable level... Your solution sounds right. How did you get yourself in this situation btw? Using a large scanner caching value plus filters? J-D On Fri, Sep 16, 2011 at 10:18 AM, Douglas Campbell deegs...@yahoo.com wrote: What's the best way to recover from this?

Re: REcovering from SocketTimeout during scan in 90.3

2011-09-16 Thread Jean-Daniel Cryans
On Fri, Sep 16, 2011 at 12:01 PM, Douglas Campbell deegs...@yahoo.com wrote: I'm reducing keys by regions and then building a Scan with a. start/stop = minkey/max+1 b. 100 cache rows c. cache blocks false d. configuring scan like this   public static void setRowFilters(Scan scan,

Re: REcovering from SocketTimeout during scan in 90.3

2011-09-16 Thread Jean-Daniel Cryans
On Fri, Sep 16, 2011 at 12:17 PM, Douglas Campbell deegs...@yahoo.com wrote: The min/max keys are for each region right? Are they pretty big? doug : Typically around 100 keys and each key is 24bytes Sorry, I meant to ask how big the regions were, not the rows. Are you sharing scanners

Re: Issues in Scheduling CopyTable from java.

2011-09-15 Thread Jean-Daniel Cryans
Why not just cron it? J-D On Wed, Sep 14, 2011 at 11:08 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi Friends, I wanted to write a scheduler which will take backup of Hbase tables by timestamp using CopyTable utility. I created a thread which will get timestamp range required and call

Re: update/increment only if row is present

2011-09-15 Thread Jean-Daniel Cryans
For checkAndPut, is there a column that you know will exist that you're not updating that you know for sure which value it will have? Worst case you could use a dummy column just for that. For increments, I can't think of a way to do it without either implementing a checkAndIncrement or doing 2

Re: update/increment only if row is present

2011-09-15 Thread Jean-Daniel Cryans
You can implement anything you want :) Contributing back is another story. So if you can't predict the value of a single column, see my other recommendation. J-D On Thu, Sep 15, 2011 at 5:04 PM, sagar naik sn...@attributor.com wrote: Thanks J-D On Thu, Sep 15, 2011 at 4:15 PM, Jean-Daniel

Re: scanner deadlock?

2011-09-14 Thread Jean-Daniel Cryans
Yeah like Stack said, the ClosedChannelException is how we figure the client is gone. As you have a 60s timeout on the RPC call the client _will_ go away (and possibly come right back in through another handler) when a call takes longer than that. One of my theories was that in your case if a

Re: tutorial : HBase performance testing

2011-09-14 Thread Jean-Daniel Cryans
On Tue, Sep 13, 2011 at 11:20 PM, Sujee Maniyam su...@sujee.net wrote: hehe J-D (hopefully first name!) :) I agree with your point that pre-splitting the  table can make a big difference. Do the later versions of 'PerformanceEvaluation' class has an option to pre-split the table?   I

Re: Using the same row key across column families?

2011-09-13 Thread Jean-Daniel Cryans
Yes, families are detached. J-D On Tue, Sep 13, 2011 at 4:26 PM, Neerja Bhatnagar neerja...@gmail.com wrote: Hi, Can I use the same row key across multiple column families? I have table T with column families cf1 and cf2. Can I use the same row key to refer to cf1:col1 and cf2:col1?

Re: HBase region merge problem

2011-09-12 Thread Jean-Daniel Cryans
Usually it means you need to set fs.default.name in hbase-site.xml J-D On Mon, Sep 12, 2011 at 10:33 AM, Parmod Mehta parmod.me...@gmail.com wrote: Running the merge tool to merge regions on hbase-0.90.1-cdh3u0 run into this exception trace. The first INFO level log message I guess is ok

Re: scanner deadlock?

2011-09-12 Thread Jean-Daniel Cryans
Two other thoughts to add on top of the rest: - You really should also consider using HotSpot instead of OpenJDK. - Depending on the size of your RS's heap, the number of concurrent scanners and if they are doing pre-caching, you may be just GCing like mad. Did you check that? J-D On Mon, Sep

Re: Using multiple column families

2011-09-12 Thread Jean-Daniel Cryans
Ok it's small enough you that you won't be bothered. J-D On Fri, Sep 9, 2011 at 6:25 PM, Imran M Yousuf imyou...@gmail.com wrote: Hi J-D, Thanks for your feedback. (replies inline) On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: 20k rows? If this is your

Re: scanner deadlock?

2011-09-12 Thread Jean-Daniel Cryans
I thought that as long as I specified neither -client nor -server, that Server Class detection would automatically invoke the -server option. http://download.oracle.com/javase/6/docs/technotes/guides/vm/server-clas s.html We are running 12-core AMD Opteron which is AMD64, so according to

Re: Balancer is running automatically

2011-09-12 Thread Jean-Daniel Cryans
Have a look at the master log, it's usually because a region is stuck in transition or it's still processing (or think it's processing) a dead server. J-D On Mon, Sep 12, 2011 at 3:10 PM, Jeff Whiting je...@qualtrics.com wrote: Any hints as to why the balancer isn't running automatically.  I'll

Re: hbase importtsv completebulkload - permissions error

2011-09-12 Thread Jean-Daniel Cryans
HBase is a bit greedy and expects to own all files it touches. What would be a better behavior in your opinion for this case? J-D On Mon, Sep 12, 2011 at 3:56 PM, Sateesh Lakkarsu lakka...@gmail.com wrote: I used importTsv to create the hfiles, which say end up in: - /user/slakkarsu/table/F1

Re: tutorial : HBase performance testing

2011-09-12 Thread Jean-Daniel Cryans
Hi Sujee, Both tools are lacking a step where they create the tables pre-split. The difference can be staggering and possibly lead to misguiding. Also you referred to George Lars, I don't understand those crazy Europeans that give their child two first names either (joking) but I'm 99.% sure

Re: scanner deadlock?

2011-09-12 Thread Jean-Daniel Cryans
-Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Monday, September 12, 2011 11:44 AM To: user@hbase.apache.org Subject: Re: scanner deadlock? I thought that as long as I specified neither -client nor -server, that Server Class

Re: hbase is not starting

2011-09-11 Thread Jean-Daniel Cryans
(sending to @user since it's not a dev question, bcc'ing the latter) So the important part is: Unhandled exception. Starting shutdown. java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration And it seems to be coming from the hadoop security stack. From what I can tell

Re: HBase Vs CitrusLeaf?

2011-09-08 Thread Jean-Daniel Cryans
Your company sounds lovely. J-D On Wed, Sep 7, 2011 at 11:10 PM, Something Something mailinglist...@gmail.com wrote: This is GREAT information folks.  This is why I like open source communities -:)  I will present this to management, but in the mean time, the management has thrown another

Re: Calculating the optimal number of regions (WAS - Re: big compaction queue size)

2011-09-08 Thread Jean-Daniel Cryans
(that is 1 G or more). About my case , I want to reduce pressure of compaction(that is only one thread) -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年9月8日 2:13 收件人: user@hbase.apache.org 主题: Calculating the optimal number of regions

Re: Error - Hbase Batch Import Insertion Method

2011-09-07 Thread Jean-Daniel Cryans
Same answer as last time this was asked: http://search-hadoop.com/m/z1aDB4my9g2 J-D On Wed, Sep 7, 2011 at 6:15 AM, Arsalan Bilal charsalanbi...@gmail.com wrote: Hi Dear Plateform: Ubuntu Hadoop Hbase From CDH3 Tool: NetBeans 6.0.9 I am trying to write Hbase Batch Import Insertion

Calculating the optimal number of regions (WAS - Re: big compaction queue size)

2011-09-07 Thread Jean-Daniel Cryans
(Branching this discussion since it's not directly relevant to the other thread) I think if we ever come up with a formula, it needs to come with a big your mileage may vary sign. The reasons being: - If only a subset of the regions are getting written to, then only those regions need to be

Re: Copying tables from one server to another

2011-09-07 Thread Jean-Daniel Cryans
Inline. J-D On Wed, Sep 7, 2011 at 8:02 PM, Tom Goren t...@tomgoren.com wrote: It completed successfully on server A as destination and as source, however only after I created the table with all the correlating column families (specified by --new.name=new_table_name). Without that step being

Re: big compaction queue size

2011-09-06 Thread Jean-Daniel Cryans
Inline. J-D We're running a 33-regionserver hbase cluster on top of cdh3u0 suites. On average, we have 2400 regions hosted on each regionserver. (hbase.hregion.max.filesize is 1.5GB, and we have value size up to 4MB per object). 2400 region is just too many, if you are importing data at a

Re: Hbase copytable and export/import

2011-08-30 Thread Jean-Daniel Cryans
Your remote cluster reports the location of -ROOT- as on localhost on port 41181: 11/08/30 19:27:47 DEBUG client.HConnectionManager$HConnectionImplementation: Lookedup root region location,

Re: multiple insert in hbase

2011-08-30 Thread Jean-Daniel Cryans
Well, it doesn't. I mean, AFAIK there's no bug in the code that does that, so it's probably your code. Wild guess, maybe you are using your own HTable to put plus you emit a Put in the reducer? J-D On Tue, Aug 30, 2011 at 9:21 AM, sriram rsriram...@gmail.com wrote: After the reducer phase the

Re: CopyTable

2011-08-30 Thread Jean-Daniel Cryans
It's all optional. J-D On Mon, Aug 29, 2011 at 10:56 PM, Steinmaurer Thomas thomas.steinmau...@scch.at wrote: Hi Doug, thanks, but the given example is exactly the one I provided in the link. I don't use Replication yet, so I wonder what the rs.class properties should be ... Or is this just

Re: HBase Scan returns fewer columns after a few minutes of insertion

2011-08-30 Thread Jean-Daniel Cryans
? setBatch public void *setBatch*(int batch) Set the maximum number of values to return for each call to next() *Parameters:*batch - the maximum number of valuesYour help is much appreciated. Cheers, Neerja On Mon, Aug 29, 2011 at 7:07 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote

Re: multiple insert in hbase

2011-08-30 Thread Jean-Daniel Cryans
Put only one, either the HTable or the output format. J-D On Tue, Aug 30, 2011 at 10:56 AM, sriram rsriram...@gmail.com wrote: Yes you are right. What should i do?

Re: Where are .META. and ROOT tables data

2011-08-29 Thread Jean-Daniel Cryans
Don't worry Lars you got it right :) I might also add that we don't use a DHT for the same reason the user tables don't use a DHT. And finally ROOT and META regions tend not to move too much, so once their location is cached you never need to ask their parent where to find them. Even if they do

Re: hbase-orm

2011-08-29 Thread Jean-Daniel Cryans
mmm who are you talking to exactly? J-D 2011/8/29 蔡忠达 1027100...@qq.com: Great that you have thought of doing an ORM for HBase. recently, our company has a project base on the hbase,  and i'am a newcomer, who just  graduated from university last June. for study ,I need the lightweight ORM

Re: Unable to connect the regionservers to the master.

2011-08-29 Thread Jean-Daniel Cryans
This is the important bit: Caused by: java.net.BindException: Problem binding to ubuntu/ 192.168.71.206:60020 : Cannot assign requested address It's not able to create it's own server socket. Make sure that the IP address belong to that machine and that nothing else is on port 60020. J-D On

Re: Distributed Data structure

2011-08-29 Thread Jean-Daniel Cryans
HBase as is works like a big distributed sorted list. If the key is a hash, than it becomes the backend of hash table. Anything in particular you need help with? J-D On Sat, Aug 27, 2011 at 11:39 PM, vamshi krishna vamshi2...@gmail.com wrote: Hi folks,           i am new to Hbase and recently

Re: mini-hbase configuration for tests

2011-08-29 Thread Jean-Daniel Cryans
(cleaning up my unread emails hehe) We are using 0.90.x. Which exact timing param can we set? (I'm not sure which one looking http://hbase.apache.org/book/config.files.html#hbase.site) We set configuration via htu.getConfiguration().set... and not via hbase-default.xml. Is it OK to do like

Re: Bulk Load question

2011-08-29 Thread Jean-Daniel Cryans
That's pretty much it. J-D On Wed, Aug 24, 2011 at 11:25 AM, Albert Shau as...@yahoo-inc.com wrote: Hi, I want to do bulk loads by following http://hbase.apache.org/bulk-loads.html to create HFiles, and then using LoadIncrementalHFiles to load the data into a table.  Suppose the data I'm

Re: How to move regions inherited from a 0.20.6

2011-08-29 Thread Jean-Daniel Cryans
You have to use the old encoded name which is only available in the .META. table. For example (sorry for the probably bad formatting): hbase(main):004:0 get '.META.', 'some_table,31905260,1273870975233' COLUMN CELL info:regioninfo

Re: loading data in HBase table using APIs

2011-08-29 Thread Jean-Daniel Cryans
Can you give me some example where I can use TableOutPutFormat to insert data to HBase (which does not have reduce step)? Just set the output of your map to use TableOutputFormat, an example comes with HBase:

Re: Amazon EC2 Virtualization vs. Dedicated Servers and Garbage Collection Times

2011-08-29 Thread Jean-Daniel Cryans
Looks normal to me considering the platform. It's not so much a GC as the machine was unavailable for more than 2 minutes since there's no user or sys CPU involved. J-D On Mon, Aug 8, 2011 at 12:27 PM, Fuad Efendi f...@efendi.ca wrote: Hello, How to explain that:

<    1   2   3   4   5   6   7   8   9   10   >