Re: Later version of HBase Client has a problem with DNS

2013-05-17 Thread Stack
This has come up in the past: http://search-hadoop.com/m/mDn0i2kjGA32/NumberFormatException+dns&subj=unable+to+resolve+the+DNS+name Or check out this old thread: http://mail.openjdk.java.net/pipermail/jdk7-dev/2010-October/001605.html St.Ack On Fri, May 17, 2013 at 11:17 AM, Heng Sok wrote:

HEADS-UP: Upcoming bay area meetups and hbasecon

2013-05-17 Thread Stack
We have some meetups happening over the next few months. Sign up if you are interested in attending (or if you would like to present, write me off-list). First up, there is hbasecon2013 (http://hbasecon.com) on June 13th in SF. It is shaping up to be a great community day out with a bursting age

Later version of HBase Client has a problem with DNS

2013-05-17 Thread Heng Sok
Hi all, I have been trying to run MapReduce job that involves using Hbase as source and sink. I have Hbase 0.94.2 and Hadoop 2.0 installed using Cloudera repository and following their instructions. When I use HBase client package version 0.94.2 and above, it gave the following DNS related err

Re: bulk load skipping tsv files

2013-05-17 Thread Jinyuan Zhou
will try that. Thanks, On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus wrote: > If I understood your usecase correctly, then if you don't need to maintain > older versions of data then why don't you set the 'max version' parameter > for your table to 1? I believe that the increase in data even in

Re: bulk load skipping tsv files

2013-05-17 Thread Jinyuan Zhou
I had thought about coprocessor. But I had an impression that coprocessor is last option one shoud try becuase it is so invasive to the jvm running hbase. Not sure about current status though. However, what the croprocessor can give me in this case is less network load. My problem is the hbase's ho

Re: GET performance degrades over time

2013-05-17 Thread Viral Bajaria
On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll wrote: > Look at how much Hard Disk utilization you have (IOPS / Svctm). You may > just be under scaled for the QPS you desire for both read + write load. If > you are performing random gets, you could expect around the low to mid > 100's IOPS/sec p

Re: GET performance degrades over time

2013-05-17 Thread Anoop John
>Yes bloom filters have been enabled: ROWCOL Can u try with ROW bloom? -Anoop- On Fri, May 17, 2013 at 12:20 PM, Viral Bajaria wrote: > Thanks for all the help in advance! > > Answers inline.. > > Hi Viral, > > > > some questions: > > > > > > Are you adding new data or deleting data over time?

Re: bulk load skipping tsv files

2013-05-17 Thread Ted Yu
Jinyuan: bq. no new data needed, only some value will be changed by recalculation. Have you considered using coprocessor to fullfil the above task ? Cheers On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus wrote: > If I understood your usecase correctly, then if you don't need to maintain > older

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-17 Thread James Taylor
Anil, Yes, everything is in the Phoenix GitHub repo. Will give you more detail of specific packages and classes off-list. Thanks, James On 05/16/2013 05:33 PM, anil gupta wrote: Hi James, Is this implementation present in the GitHub repo of Phoenix? If yes, can you provide me the package nam

Re: bulk load skipping tsv files

2013-05-17 Thread Shahab Yunus
If I understood your usecase correctly, then if you don't need to maintain older versions of data then why don't you set the 'max version' parameter for your table to 1? I believe that the increase in data even in case of updates is due to that (?) Have you tried that? Regards, Shahab On Fri, Ma

Re: bulk load skipping tsv files

2013-05-17 Thread Jinyuan Zhou
Actually, I wanted to update each row of a table each day. no new data needed, only some value will be changed by recalculation. It looks like every time I do, the data is doubled in table. even though it is update. I believe even an update will result in new hfiles and the cluster is then very b

Re: GET performance degrades over time

2013-05-17 Thread Jeremy Carroll
Look at how much Hard Disk utilization you have (IOPS / Svctm). You may just be under scaled for the QPS you desire for both read + write load. If you are performing random gets, you could expect around the low to mid 100's IOPS/sec per HDD. Use bonnie++ / IOZone / IOPing to verify. Also you could

Re: bulk load skipping tsv files

2013-05-17 Thread Ted Yu
bq. What I want is to read from some hbase table and create hfiles directly Can you describe your use case in more detail ? Thanks On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou wrote: > Hi, > I wonder if there are tool similar > to org.apache.hadoop.hbase.mapreduce.ImportTsv. IimportTsv read f

bulk load skipping tsv files

2013-05-17 Thread Jinyuan Zhou
Hi, I wonder if there are tool similar to org.apache.hadoop.hbase.mapreduce.ImportTsv. IimportTsv read from tsv file and create HFiles which are ready to be loaded into the corresponding region by another tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I want is to read from som

Re: Doubt Regading HLogs

2013-05-17 Thread Nicolas Liochon
Yes, it's by design. The last log file the one beeing written by HBase The safe option is to wait for this file to be closed by HBase. As Yong said, you can change the roll parameter if you want it to be terminated sooner, but changing this parameter impacts the hdfs namenode load.10 minutes is li

Re: Scanner returning keys out of order

2013-05-17 Thread Jean-Marc Spaggiari
Hi Jan, 0.90.6 is a very old version of HBase... Will you have a chance to migrate to a more recent one? Most of your issues might probably be already fixed. JM 2013/5/17 Jan Lukavský > Hi all, > > we are seeing very strange behavior of HBase (version 0.90.6-cdh3u5) in > the following scenario

Scanner returning keys out of order

2013-05-17 Thread Jan Lukavský
Hi all, we are seeing very strange behavior of HBase (version 0.90.6-cdh3u5) in the following scenario: 1) Open scanner and start scanning. 2) Check order of returned keys (simple test if next key is lexigraphically greater than the previous one). 3) The check may occasionally fail. When

Re: Doubt Regading HLogs

2013-05-17 Thread yonghu
In this situation, you can set the > > > hbase.regionserver. logroll.period > > 360 > > to a short value, let's say 3000 and then you can see your log file with current size after 3 seconds. To Nicolas, I guess he wants somehow to analyze the HLog. regards! Yong On Fri, May 1

RE: Doubt Regading HLogs

2013-05-17 Thread Rishabh Agrawal
Thanks Nicolas, When will this file be finalized. Is it time bound? Or it will be always be zero for last one (even if it contains the data) -Original Message- From: Nicolas Liochon [mailto:nkey...@gmail.com] Sent: Friday, May 17, 2013 4:39 PM To: user Subject: Re: Doubt Regading HLogs

Re: Doubt Regading HLogs

2013-05-17 Thread Nicolas Liochon
That's HDFS. When a file is currently written, the size is not known, as the write is in progress. So the namenode reports a size of zero (more exactly, it does not take into account the hdfs block beeing written when it calculates the size). When you read, you go to the datanode owning the data,

RE: Doubt Regading HLogs

2013-05-17 Thread Rishabh Agrawal
Is it a bug or part of design. It seems more of a design to me. Can someone guide me through the purpose of this feature. Thanks Rishabh From: Rishabh Agrawal Sent: Friday, May 17, 2013 4:24 PM To: user@hbase.apache.org Subject: Doubt Regading HLogs Hello, I am working with Hlogs of Hbase and

Re: Question about HFile seeking

2013-05-17 Thread ramkrishna vasudevan
Generally we start with seeking on all the Hfiles corresponding to the region and load the blocks that correspond to that row key specified in the scan. If row1 and row1c are in the same block then we may start with row1. If they are in different blocks then we will start with the block containin

Re: Question about HFile seeking

2013-05-17 Thread Varun Sharma
Thanks Stack and Lars for the detailed answers - This question is not really motivated by performance problems... So the index indeed knows what part of the HFile key is the row and which part is the column qualifier. Thats what I needed to know. I initially thought it saw it as an opaque concaten