I don't think this issue can resovle the problem
ZKWatcher is removed,but the configuration and HConnectionImplementation
objects are still in HConnectionManager
this may still cause memery leak
but calling HConnectionManager.deleteConnection may resolve HBASE-5073
problem.
I can see
On 17/04/12 18:45, Alex Baranau wrote:
I don't think that your error is related to CPs stuff. What lib versions do
you use? Can you compare with those of the HBaseHUT pom?
Ok, I've managed to track down the source of my error. If I do normal
Put modifications in my prePut/postPut method
Hi,
fwiw, the close method was added in HBaseAdmin for HBase 0.90.5.
N.
On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee softse@gmail.com wrote:
I don't think this issue can resovle the problem
ZKWatcher is removed,but the configuration and HConnectionImplementation
objects are still in
I see, thanks to all~~
Hi,
fwiw, the close method was added in HBaseAdmin for HBase 0.90.5.
N.
On Thu, Apr 19, 2012 at 8:09 AM, Eason Leesoftse@gmail.com wrote:
I don't think this issue can resovle the problem
ZKWatcher is removed,but the configuration and HConnectionImplementation
Hi Ian,
Thank you very much, that pretty much answers it.
Best regards,
Andre Medeiros
From: Ian Varley [ivar...@salesforce.com]
Sent: Wednesday, April 18, 2012 17:11
To: user@hbase.apache.org
Subject: Re: Performance issues of prepending a table
I would
I have an issue with my HBase cluster. We have a 4 node HBase/Hadoop (4*32
GB RAM and 4*6 TB disk space) cluster. We are using Cloudera distribution
for maintaining our cluster. I have a single tweets table in which we store
the tweets, one tweet per row (it has millions of rows currently).
Now I
So in your step 2 you have the following:
FOREACH row IN TABLE alpha:
SELECT something
FROM TABLE alpha
WHERE alpha.url = row.url
Right?
And you are wondering why you are getting timeouts?
...
...
And how long does it take to do a full table scan? ;-)
(there's more, but that's the
Hi Michel
Yes, that is exactly what I do in step 2. I am aware of the reason for the
scanner timeout exceptions. It is the time between two consecutive
invocations of the next call on a specific scanner object. I increased the
scanner timeout to 10 min on the region server and still I keep seeing
Narendra,
Are you trying to solve a real problem, or is this a class project?
Your solution doesn't scale. It's a non starter. 130 seconds for each iteration
times 1 million seconds is how long? 130 million seconds, which is ~36000 hours
or over 4 years to complete.
(the numbers are rough but
Hi Narendra,
I have a few doubts:
1. Which version you are using?
2. What's the size of each KeyValue?
3. Did you change the GC parameters in client side or server side? After
changing the GC parameters, did you keep an eye on the GC logs?
Thank you.
Regards,
Jieshan
-Original
On 17/04/12 18:45, Alex Baranau wrote:
I don't think that your error is related to CPs stuff. What lib versions do
you use? Can you compare with those of the HBaseHUT pom?
Ok, I've managed to track down the source of my error. If I do normal
Put modifications in my prePut/postPut method
Michael,
Thanks for the response. This is a real problem and not a class project.
Boxes itself costed 9k ;)
I think there is some difference in understanding of the problem. The table
has 2m rows but I am looking at the latest 10k rows only in the outer for
loop. Only in the inner for loop i am
Are you sure you need to do table.close() after each put? Looks incorrect.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Thu, Apr 19, 2012 at 2:48 AM, Marcin Cylke m...@touk.pl wrote:
On 17/04/12 18:45, Alex Baranau wrote:
I don't think that
Hi Jieshan
HBase version : Version 0.90.4-cdh3u3
Size of Key Value pair should not be more than 2KB
I changed the GC parameters at the server side. I have not looked into GC
logs yet but I have noticed that it pausing the batch process every now and
then. How do I look at the server GC logs?
Narendra,
I think you are still missing the point.
130 seconds to scan the table per iteration.
Even if you have 10K rows
130 * 10^4 or 1.3*10^6 seconds. ~361 hours
Compare that to 10K rows where you then select a single row in your sub select
that has a list of all of the associated rows.
Narendra,
Since I didn't see the client logs , FullGC is one probably reason I suspect.
No matter it happens in client side or server side. So I suggest to check the
GC log (Open the client GC log both at server and client side) to see whether
FullGC happens with a high frequency, and check
Tom,
The overall tradeoff with table vs prefix is that the former adds some
(small) amount of cluster management overhead for each new table, whereas the
latter adds runtime overhead (memory, cpu, disk, etc) on every operation. In
your case, since you're just talking about ~3 tables vs 1, my
Thanks for the reply.
I see. Would HBase cache the results of the first scan so it wouldn't take as
long to collect the results? Say there were 5 facets selected one after another.
A new scan would take place with more strict filtering each time on the whole
table rather than to use the results
Michael,
I will do the redesign and build the index. Thanks a lot for the insights.
Narendra
On Thu, Apr 19, 2012 at 9:56 PM, Michael Segel michael_se...@hotmail.comwrote:
Narendra,
I think you are still missing the point.
130 seconds to scan the table per iteration.
Even if you have 10K
No problem.
One of the hardest things to do is to try to be open to other design ideas and
not become wedded to one.
I think once you get that working you can start to look at your cluster.
On Apr 19, 2012, at 1:26 PM, Narendra yadala wrote:
Michael,
I will do the redesign and build the
Would it be possible for you to pastebin a much bigger portion of the
hbase log?
Thx,
J-D
On Tue, Apr 17, 2012 at 10:35 AM, Xin Liu codeoe...@gmail.com wrote:
Hi there,
I setup hadoop and hbase on top of EC2 in Pseudo-distributed mode. I
can use hbase shell to connect. However, when I use
A good way of doing that start replicating to the new cluster using HBase
replication.
Then *after* replication has been setup and enabled you would issue a CopyTable
M/R for each table.
After the CopyTable jobs are finished you have a backup cluster that behind
only a few seconds
(however
Regarding caching during scans there are two types of caches:
* caching (bufferring) the records before returning them to the client,
enabled via scan.setCaching(numRows)
* block cache on a regionserver, enabled via setCacheBlocks(true)
The latter one (block cache) is what you are looking for.
Thanks for pointing me towards setCacheBlocks() and explaining the
difference between those two types of caching in HBase.
According to the API documentation, setCacheBlocks defaults to true, so it
looks like HBase will take care of what I am looking for automatically.
Thanks so much for your
Thanks for pointing about setCacheBlocks() ,
its HBase default value will provide better performance for following
Filters as well as for Kevin's multiple Facet search.
-Alok
On Fri, Apr 20, 2012 at 7:02 AM, Kevin M kevin.macksa...@gmail.com wrote:
Thanks for pointing me towards
25 matches
Mail list logo