Re: HBase file encryption, inconsistencies observed and data loss

2014-07-27 Thread Anoop John
ck to the main > working for. > Sent from mobile excuse any typos. > On Jul 27, 2014 10:07 AM, "Anoop John" wrote: > >> As per Shankar he can get things work with below configs >> >> >> hbase.regionserver.hlog.reader.impl >

Re: HBase file encryption, inconsistencies observed and data loss

2014-07-28 Thread Anoop John
ot moved under corrupt logs is a concerning thing. Need to look at that. > > Agreed. > > >> On Jul 27, 2014, at 1:07 AM, Anoop John wrote: >> >> As per Shankar he can get things work with below configs >> >> >>hbase.regionser

Re: What is in a HBase block index entry?

2014-08-06 Thread Anoop John
It will be the key of the KeyValue. Key includes rk + cf + qualifier + ts + type. So all these part of key. Your annswer#1 is correct (but with addition of type also).. Hope this make it clear for you. -Anoop- On Tue, Aug 5, 2014 at 9:43 AM, innowireless TaeYun Kim < taeyun@innowireless.c

Re: flush table with AES encryption enabled for CF, randomly wrapException observed

2014-08-07 Thread Anoop John
Shankar You are getting this only when the HFile encryption is enabled? Seeing the exception dont think it is directly related to Encryption or so. Suggest test with out encryption multiple times and see any time you get the same in that also -Anoop- On Fri, Aug 8, 2014 at 11:29 AM, Esteban

Re: [VOTE] The 1st HBase 0.98.5 release candidate (RC0) is available, vote closing 8/11/2014

2014-08-10 Thread Anoop John
Running the IT IntegrationTestIngestWithVisibilityLabels fails. This is because we are not handling the Deletes in LoadTestDataGeneratorWithVisibilityLabels. As it is an issue only with IT test and no code level issues, you can take a call Andy.I have raised HBASE-11716 and attached a simple

Re: [VOTE] The 1st HBase 0.98.5 release candidate (RC0) is available, vote closing 8/11/2014

2014-08-11 Thread Anoop John
to fail an RC. > > Regards > Ram > > > On Mon, Aug 11, 2014 at 10:50 AM, Anoop John > wrote: > > > Running the IT IntegrationTestIngestWithVisibilityLabels fails. This > is > > because we are not handling the Deletes in > > LoadTestDataGeneratorWithVisib

Re: random reads

2014-08-15 Thread Anoop John
What about your KV size and HFile block size for the table. For a random read type of use case a lower value for HFile block size might help. -Anoop- On Fri, Aug 15, 2014 at 1:56 AM, Esteban Gutierrez wrote: > If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to > true (th

Re: blockcache usage

2014-08-15 Thread Anoop John
Pls have a look at HFileBlock#heapSize() You can know the overhead reading this. -Anoop- On Fri, Aug 15, 2014 at 1:50 AM, Nick Dimiduk wrote: > I'm not aware of specifically this experiment. You might have a look at our > HeapSize interface and it's implementations for things like HFileBlock.

Re: Shout-out for Misty

2014-08-20 Thread Anoop John
Great work! Thanks a lot Misty... -Anoop- On Wed, Aug 20, 2014 at 11:56 AM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > Great job !! Keep it up.!!! > > Regards > Ram > > > On Wed, Aug 20, 2014 at 11:49 AM, rajeshbabu chintaguntla < > rajeshbabu.chintagun...@huawei.com> wr

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

2014-08-22 Thread Anoop John
>Is it possible that the put method call on Htable does not actually put the record in the database while also not throwing an exception? You can. Implement a region CP (implementing RegionObserver) and implement prePut() . In this u can bypass the operation using ObserverContext#bypass(). So cor

Re: Custom Filter on hbase Column

2014-09-11 Thread Anoop John
And u have to implement transformCell(*final* Cell v) in your custom Filter. JFYI -Anoop- On Fri, Sep 12, 2014 at 4:36 AM, Nishanth S wrote: > Sure Sean.This is much needed. > > -Nishan > > On Thu, Sep 11, 2014 at 3:57 PM, Sean Busbey wrote: > > > I filed HBASE-11950 to get some details adde

Re: Scan vs Parallel scan.

2014-09-14 Thread Anoop John
Again full code snippet can better speak. But not getting what u r doing with below code private List generatePartitions() { List regionScanners = new ArrayList(); byte[] startKey; byte[] stopKey; HConnection connection = null; HBaseAdmin hbaseAdmin = null;

Re: HBase 0.98.1 batch Increment throws OperationConflictException

2014-09-17 Thread Anoop John
You have more than one increment for the same key in one batch? On Wed, Sep 17, 2014 at 12:33 PM, Vinay Gupta wrote: > Also the regionserver keeps throwing exceptions like > > 2014-09-17 06:56:07,151 DEBUG [RpcServer.handler=10,port=60020] > regionserver.ServerNonceManager: Conflict detected by

Re: HBase 0.98.1 batch Increment throws OperationConflictException

2014-09-17 Thread Anoop John
s. -Anoop- On Wed, Sep 17, 2014 at 1:04 PM, Vin Gup wrote: > Yes possibly. Why would that be a problem? > Earlier client (0.94) didn't complain about it. > > Thanks, > -Vinay > > > On Sep 17, 2014, at 12:16 AM, Anoop John wrote: > > > > You have mo

Re: HBase 0.98.1 batch Increment throws OperationConflictException

2014-09-17 Thread Anoop John
his error even with > batches with no row key duplicates. I still suspect that client is timing > out and retrying too often and needs to back off as the region server is > heavily loaded. > > -Vinay > > > On Sep 17, 2014, at 3:14 AM, Anoop John wrote: > > > > This

Re: scan + filter failed with OutOfOrderScannerNextException

2014-09-29 Thread Anoop John
Hi Even when the RS throws this Exception, the client side will start a new Scanner and retry. U just see this in log or the scan is failing altogether? What is the caching you use on Scan? When most of the rows are filtered out at server side, it takes more time to fetch and return the 'cachin

Re: scan + filter failed with OutOfOrderScannerNextException

2014-09-29 Thread Anoop John
I receive this error in client side, and pretty sure the scan failed. > > I'm using default caching, so it should be 100, right? > > About scan time out period, I will try to set it higher, probably 1 hour. > > > > BTW, I'm using hbase 0.96.0. > > > > Bes

Re: Increasing write throughput..

2014-11-02 Thread Anoop John
You have ~280 regions per RS. And ur memstore size % is 40% and heap size 48GB This mean the heap size for memstore is 48 * 0.4 = 19.2GB ( I am just considering the upper water mark alone) If u have to consider all 280 regions each with 512 MB heap you need much more size of heap. And your writ

Re: Version in HBase

2014-11-12 Thread Anoop John
So you want one version with ts<= give ts? Have a look at Scan#setTimeRange(long minStamp, long maxStamp) If you know the exact ts for cells, you can use Scan#setTimeStamp(long timestamp) -Anoop- On Wed, Nov 12, 2014 at 11:17 AM, Krishna Kalyan wrote: > For Example for table 'test_table', Valu

Re: Replacing a full Row content in HBase

2014-11-20 Thread Anoop John
If you want the delete and new row put in a single transaction, (well that is the best thing to do) you can try using mutateRow(final RowMutations rm) Add a delete row mutation followed by a Put You should be careful ab the timestamp of 2 Mutations. You should provide ts from client side. May b

Re: scan column qualifiers in column family

2014-11-20 Thread Anoop John
byte[] email_b=Bytes.toBytes(mail);//column qulifier byte[] colmnfamily=Bytes.toBytes("colmn_fam");//column family Scan scan_col=new Scan (Bytes.toBytes("colmn_fam"),email_b); Scan constructor is taking start and stop rows (rks). You seem to pass a cf and q names. Scan s = new Sca

Re: HBase region assignment by range?

2015-04-08 Thread Anoop John
You can pre split the table as per the key ranges and use a custom Load Balancer to keep the regions to required nodes (?) Seems you have to collocate 2 table regions in these nodes (to do the join)... So hope you already work with the LB -Anoop- On Wed, Apr 8, 2015 at 8:17 AM, Alok Singh wrot

Re: HBase region assignment by range?

2015-04-08 Thread Anoop John
bq.while the region can surely split when more data added-on, but can HBase keep the new regions still on the same regionServer according to the predefined bounary? You need custom LB for that.. If there, it is possible to restrict -Anoop- On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni wrote: > hi

Re: scan startrow and stoprow

2015-04-22 Thread Anoop John
If u want data of timeStamp2 also, (All 6 rows as shown in eg: above) then you have to put timeStamp3 in stop row.. The stop row is exclusive. startRow : aabb|timeStamp1| stopRow is : aabb|timeStamp3 -Anoop- On Wed, Apr 22, 2015 at 5:09 PM, Shahab Yunus wrote: > I see that you are already

Re: Where is code in hbase that physically delete a record?

2012-10-19 Thread Anoop John
Yes the KVs coming out from your delegate Scanner will be in sorted form.. Also with all other logic applied like removing TTL expired data, handling max versions etc.. Thanks for updating.. -Anoop- On Sat, Oct 20, 2012 at 1:11 AM, PG wrote: > Hi, Anoop and Ram, > As I have coded the idea, th

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
Hi Using ImportTSV tool you are trying to bulk load your data. Can you see and tell how many mappers and reducers were there. Out of total time what is the time taken by the mapper phase and by the reducer phase. Seems like MR related issue (may be some conf issue). In this bulk load case most

Re: How do I get multiple rows under coprocessor

2012-10-23 Thread Anoop John
How many rows want to get within CP? What is the time taken now? Do you enable block caching? Pls observe the cache hit ratio. As you issue the Get at the server side only(CP), one by one get is the only way. Also how many CFs and how many HFiles for each of the CF? Blooms have you tried? -Anoo

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
create > > HFiles directly. > > > > Regards > > Ram > > > > On Wed, Oct 24, 2012 at 8:59 AM, Anoop John > wrote: > > > > > Hi > > > Using ImportTSV tool you are trying to bulk load your data. Can you > > see > > >

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
RS will be used.. So there is no point in WAL there... I am making it clear for you? The data is already present in form of raw data in some txt or csv file :) -Anoop- On Wed, Oct 24, 2012 at 10:41 AM, Anoop John wrote: > Hi Anil > > > > On Wed, Oct 24, 2012 at 10:39 AM, a

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
27;s a very interesting fact. You made it clear but my custom Bulk Loader > generates an unique ID for every row in map phase. So, all my data is not > in csv or text. Is there a way that i can explicitly turn on WAL for bulk > loading? > > On Tue, Oct 23, 2012 at 10:14 PM, Anoop Jo

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
n that mapper again on the same data > set.. Then the unique id will be different? > > Anil: Yes, for the same dataset also the UniqueId will be different. > UniqueID does not depends on the data. > > Thanks, > Anil Gupta > > On Tue, Oct 23, 2012 at 11:07 PM, Anoop John >

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Anoop John
>What I still don’t understand is, since both CP and MR are both >running on the region side, with is the MR better than the CP? For the case bulk delete alone CP (Endpoint) will be better than MR for sure.. Considering your over all need people were suggesting better MR.. U need a scan and move s

Re: SingleColumnValueFilter for empty column qualifier

2012-10-31 Thread Anoop John
You have one CF such that all rows will have KVs for that CF? You need to implement your own filter. Your scan can select the above CF and the on which u need the filtering. Have a look at the QualifierFilter.. similar approach you might need to do in the new filter.. Good luck :) -Anoop- On Thu

Re: Coprocessor slow down problem!

2012-12-01 Thread Anoop John
Ram, This issue was for prePut()..postPut() was fine only... Can you take a look that at the time of slow put what the corresponding RS threads doing. May be can get some clues from that. -Anoop- On Fri, Nov 30, 2012 at 2:04 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote

Re: Coprocessor slow down problem!

2012-12-02 Thread Anoop John
ramkrishna vasudevan > wrote: > > Ok...fine...Ya seeing what is happening in postPut should give an idea. > > > > Regards > > Ram > > > > On Sat, Dec 1, 2012 at 1:52 PM, Anoop John > wrote: > > > >> Ram, This issue was for prePut()..po

Re: Reg:delete performance on HBase table

2012-12-05 Thread Anoop John
Hi Manoj Can u tell more abt your use case.. You know the rowkey range which needs to be deleted? (all the rowkeys) Or is it like based on some condition you want to delete a set of rows? Which version of HBase you are using? HBASE-6284 provided some performance improvement in c

Re: Can a RegionObserver coprocessor increment counter of a row key that may not belong to the region ?

2012-12-05 Thread Anoop John
In that case the CP hook might need to make an RPC call.. Might be to another RS? In this case why cant you think of doing both the table updation from the client side? Sorry I am not fully sure abt your use case -Anoop- On Wed, Dec 5, 2012 at 11:00 PM, Amit Sela wrote: > And if they are not i

Re: Bulk Loading From Local File

2012-12-13 Thread Anoop John
>Can I load file rows to hbase table without importing to hdfs Where you want the data to get stored finally..I mean the raw data.. I hope that in HDFS only ( u want) Have a look at ImportTSV tool.. -Anoop- On Thu, Dec 13, 2012 at 9:23 PM, Mehmet Simsek wrote: > Can I load file rows to hbase t

Re: HBase - Secondary Index

2012-12-27 Thread Anoop John
>how the massive number of get() is going to perform againt the main table Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. -Anoop- On Thu, Dec 27, 2012 at 6:37

Re: Increment operations in hbase

2013-01-12 Thread Anoop John
Hi Can you check with using API HTable#batch()? Here you can batch a number of increments for many rows in just one RPC call. Might help you to reduce the net time taken. Good luck. -Anoop- On Sat, Jan 12, 2013 at 4:07 PM, kiran wrote: > Hi, > > My usecase is I need to increment 1 milli

Re: Tune MapReduce over HBase to insert data

2013-01-12 Thread Anoop John
Hi Can you think of using HFileOutputFormat ? Here you use TableOutputFormat now. There will be put calls to HTable. Instead in HFileOutput format the MR will write the HFiles directly.[No flushes , compactions] Later using LoadIncrementalHFiles need to load the HFiles to the regions.

Re: Increment operations in hbase

2013-01-12 Thread Anoop John
t; > > > > > Thanks > > > > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika > > > wrote: > > > > > > > Most time is spent reading from Store file and not on network > transfer > > > time > > > > of Increment obje

Re: exception during coprocessor run

2013-01-13 Thread Anoop John
HBase throw this Exception when the connection btw the client process and the RS process (connection through which the op request came) is broken. Any issues with your client app or nw? The operation will be getting retried from client right? -Anoop- On Sun, Jan 13, 2013 at 8:24 PM, Ted Yu wrot

Re: Coprocessor / threading model

2013-01-13 Thread Anoop John
In your CP methods you will get ObserverContext object from which you can get HRS object. ObserverContext.getEnvironment().getRegionServerServices() >From this HRS you can get hold to any of the region served by that RS. Then directly call methods on HRegion to insert data. :) Good luck.. -Anoop-

Re: best read path explanation

2013-01-13 Thread Anoop John
At a read time, if there are more than one HFile for a store, HBase will read that row from all the HFiles (check whether this row is there and if so read) and also from memstore. So it can get the latest data. Also remember that there will be compaction happening for HFiles which will merge more

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread Anoop John
In case of Hive data insertion means placing the file under table path in HDFS. HBase need to read the data and convert it into its format. (HFiles) MR is doing this work.. So this makes it clear that HBase will be slower. :) As Michael said the read operation... -Anoop- On Thu, Jan 17, 2013

Re: Pagination with HBase - getting previous page of data

2013-02-03 Thread Anoop John
>lets say for a scan setCaching is 10 and scan is done across two regions. 9 Results(satisfying the filter) are in Region1 and 10 Results(satisfying the filter) are in Region2. Then will this scan return 19 (9+10) results? @Anil. No it will return 10 results only not 19. The client here takes into

Re: Co-Processor in scanning the HBase's Table

2013-02-21 Thread Anoop John
What is this filtering at client side doing exactly? postScannerClose() wont deal with any scanned data. This hook will be called later.. You should be using hooks with scanner's next calls. Mind telling the exact thing you are doing now at client side. Then we might be able to suggest some thin

Re: Co-Processor in scanning the HBase's Table

2013-02-22 Thread Anoop John
ng on > the > >> server side then maybe an EndPoint coprocessor would be more fitting. > >> You can iterate over the InternalScanner and return a Map<> with your > >> filtered values. > >> > >> You can also check this link: > >> > h

Re: queries and MR jobs

2013-03-02 Thread Anoop John
HBase data is ultimately persisted in HDFS and there it will be replicated in different nodes. But HBase table's each region will be associated with exactly one RS. So doing any operation on that region, ant client need to contact this HRS only. -Anoop- On Sat, Feb 16, 2013 at 2:07 AM, Pamecha, A

Re: Concurrently Reading Still Got Exceptions

2013-03-02 Thread Anoop John
Is this really related to concurrent reads? I think some thing else.. Will dig into code tomorrow. Can you attach a junit test case which will produce NPE. -Anoop- On Sat, Mar 2, 2013 at 9:29 PM, Ted Yu wrote: > Looks like the issue might be related to HTable: > > at org.apache.had

Re: Eliminating duplicate values

2013-03-03 Thread Anoop John
Matt Corgan I remember, some one else also sent mail some days back looking for same use case Yes CP can help. May be do deletion of duplicates at Major compact time? -Anoop- On Sun, Mar 3, 2013 at 9:12 AM, Matt Corgan wrote: > I have a few use cases where I'd like to leverage

Re: [HBase] dummy cached location when batch insert result in bad performances.

2013-03-04 Thread Anoop John
The guide explains it well.. The region moves across RSs and splits of region will cause the location cache(at client) to be stale and it will look into META again. The memory flush/compaction and all will not make it to happen. When there is a change for the META entry for a region, the location

Re: How HBase perform per-column scan?

2013-03-09 Thread Anoop John
When you say column, you mean one column family (CF) or column qualifier? If this is one column qualifier and there are other qualifiers in the same CF? -Anoop- On Sun, Mar 10, 2013 at 12:41 AM, yun peng wrote: > Hi, All, > I want to find all existing values for a given column in a HBase, and w

Re: How HBase perform per-column scan?

2013-03-10 Thread Anoop John
As per the above said, you will need a full table scan on that CF. As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu wrote: > bq. physically column family should be able to perform efficiently (storage > layer > > When you scan a row, da

Re: Compaction problem

2013-03-22 Thread Anoop John
How many regions per RS? And CF in table? What is the -Xmx for RS process? You will bget 35% of that memory for all the memstores in the RS. hbase.hregion.memstore.flush.size = 1GB!! Can you closely observe the flushQ size and compactionQ size? You may be getting so many small file flushes(Due t

Re: Essential column family performance

2013-04-08 Thread Anoop John
Agree here. The effectiveness depends on what % of data satisfies the condition, how it is distributed across HFile blocks. We will get performance gain when the we will be able to skip some HFile blocks (from non essential CFs). Can test with different HFile block size (lower value)? -Anoop- On

Re: Overwrite a row

2013-04-21 Thread Anoop John
You can use MultiRowMutationEndpoint for atomic op on multiple rows (within same region).. On Sun, Apr 21, 2013 at 5:55 AM, Ted Yu wrote: > Here is code from 0.94 code base: > > public void mutateRow(final RowMutations rm) throws IOException { > new ServerCallable(connection, tableName, r

Re: HBase - Performance issue

2013-04-24 Thread Anoop John
Hi How many request handlers are there in ur RS? Can you up this number and see? -Anoop- On Wed, Apr 24, 2013 at 3:42 PM, kzurek wrote: > The problem is that when I'm putting my data (multithreaded client, ~30MB/s > traffic outgoing) into the cluster the load is equally spread over a

Re: writing and reading from a region at once

2013-04-25 Thread Anoop John
>But it seems that I'm losing writes somewhere, is it possible the writes could fail silently Which version you are using? How you say writes missed silently? The current read, which was going on, has not returned the row that you just wrote? Or you have created a new scan after wards and in th

Re: max regionserver handler count

2013-04-29 Thread Anoop John
You are making use of batch Gets? get(List) -Anoop- On Tue, Apr 30, 2013 at 11:40 AM, Viral Bajaria wrote: > Thanks for getting back, Ted. I totally understand other priorities and > will wait for some feedback. I am adding some more info to this post to > allow better diagnosing of performance

Re: max regionserver handler count

2013-04-30 Thread Anoop John
Apr 30, 2013 at 12:12 PM, Viral Bajaria wrote: > I am using asynchbase which does not have the notion of batch gets. It > allows you to batch at a rowkey level in a single get request. > > -Viral > > On Mon, Apr 29, 2013 at 11:29 PM, Anoop John > wrote: > > > You a

Re: Very poor read performance with composite keys in hbase

2013-05-01 Thread Anoop John
Navis Thanks for the issue link. Currently the read queries will start MR jobs as usual for reading from HBase. Correct? Is there any plan for supporting noMR? -Anoop- On Thu, May 2, 2013 at 7:09 AM, Navis류승우 wrote: > Currently, hive storage handler reads rows one by one. > > https://

Re: Endpoint vs. Observer Coprocessors

2013-05-03 Thread Anoop John
>data in one common variable Didn't follow u completely. Can u tell us little more on your usage. How exactly the endpoint to be related with the CP hook (u said postPut) -Anoop- On Fri, May 3, 2013 at 4:04 PM, Pavel Hančar wrote: > Hello, > I've just started to discover coprocessors. Namely t

Re: Documentation for append

2013-05-08 Thread Anoop John
I have just gone through the code and trying answering ur questions. >From what I currently understand, this operation allows me to append bytes to an existing cell. Yes >Does this append by creating a new cell with a new timestamp? Yes >Does this update the cell while maintaining its timestamp? No

Re: Error about rs block seek

2013-05-13 Thread Anoop John
> Current pos = 32651; currKeyLen = 45; currValLen = 80; block limit = 32775 This means after the cur position we need to have atleast 45+80+4(key length stored as 4 bytes) +4(value length 4 bytes) So atleast 32784 should have been the limit. If we have memstoreTS also written with this KV some

Re: Error about rs block seek

2013-05-13 Thread Anoop John
tables, > does something make troubles? > > > 2013/5/13 Anoop John > > > > Current pos = 32651; > > currKeyLen = 45; currValLen = 80; block limit = 32775 > > > > This means after the cur position we need to have atleast 45+80+4(key > > length stored as 4 b

Re: Block size of HBase files

2013-05-13 Thread Anoop John
Praveen, How many regions there in ur table and how and CFs? Under /hbase/ there will be many files and dir u will be able to see. There will be .tableinfo file and every region will have .regionInfo file and then under cf the data file (HFiles) . Your total data is 250GB. When your block size is

Re: Block size of HBase files

2013-05-13 Thread Anoop John
>now have 731 regions (each about ~350 mb !!). I checked the configuration in CM, and the value for hbase.hregion.max.filesize is 1 GB too !!! You mentioned the splits at the time of table creation? How u created the table? -Anoop- On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani wrote: > Hi,

Re: Block size of HBase files

2013-05-13 Thread Anoop John
re not providing any details in the configuration object , > except for the zookeeper quorum, port number. Should we specify explicitly > at this stage ? > > On 13 May 2013 19:54, Anoop John wrote: > > > >now have 731 regions (each about ~350 mb !!). I checked the > > co

Re: GET performance degrades over time

2013-05-17 Thread Anoop John
>Yes bloom filters have been enabled: ROWCOL Can u try with ROW bloom? -Anoop- On Fri, May 17, 2013 at 12:20 PM, Viral Bajaria wrote: > Thanks for all the help in advance! > > Answers inline.. > > Hi Viral, > > > > some questions: > > > > > > Are you adding new data or deleting data over time?

Re: What is in BlockCache?

2013-05-20 Thread Anoop John
>So in BlockCache, does HBase store b1 and b2 separately, or store the merged form? store b1 and b2 separately.. Stores the blocks read from HFiles. -Anoop- On Mon, May 20, 2013 at 5:37 PM, yun peng wrote: > Hi, All, > I am wondering what is exactly stored in BlockCache: Is it the same raw >

Re: How does BlockCache find relevant Blocks?

2013-05-29 Thread Anoop John
There is an index for the blocks in a HFile. This index contains details like start row in the block, its offset and length in the HFile... So as a 1st setp to get a rowkey, we will find this rk can be present in which HFile block.. (I am assuming only one HFile as of now).. Now we will see whe

Re: How does BlockCache find relevant Blocks?

2013-05-29 Thread Anoop John
vel of index (like meta data index) here in memory? if > there > is, is it a hash index or other?... > > Regards > Yun > > > On Wed, May 29, 2013 at 7:45 AM, Anoop John wrote: > > > There is an index for the blocks in a HFile. This index contains details > > like s

Re: HConnectionManager$HConnectionImplementation.locateRegionInMeta

2013-05-29 Thread Anoop John
Can you have a look at issue HBASE-8476? Seems related? A fix is available in HBASE-8346's patch.. -Anoop- On Thu, May 30, 2013 at 9:21 AM, Kireet wrote: > We are running hbase 0.94.6 in a concurrent environment and we are seeing > the majority of our code stuck in this method at the synchron

Re: RPC Replication Compression

2013-06-04 Thread Anoop John
> 0.96 will support HBase RPC compression Yes > Replication between master and slave will enjoy it as well (important since bandwidth between geographically distant data centers is scarce and more expensive) But I can not see it is being utilized in replication. May be we can do improvements in t

Re: Replication is on columnfamily level or table level?

2013-06-04 Thread Anoop John
Yes the replication can be specified at the CF level.. You have used HCD#setScope() right? > S => '3', BLOCKSIZE => '65536'}, {*NAME => 'cf2', REPLICATION_SCOPE => '2'*, You set scope as 2?? You have to set one CF to be replicated to one cluster and another to to another cluster. I dont think it

Re: Questions about HBase

2013-06-04 Thread Anoop John
>4. This one is related to what I read in the HBase definitive guide bloom filter section Given a random row key you are looking for, it is very likely that this key will fall in between two block start keys. The only way for HBase to figure out if the key actually exists is by loading

Re: Scan + Gets are disk bound

2013-06-04 Thread Anoop John
When you set time range on Scan, some files can get skipped based on the max min ts values in that file. Said this, when u do major compact and do scan based on time range, dont think u will get some advantage. -Anoop- On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran wrote: > Our row-keys do

Re: Questions about HBase

2013-06-05 Thread Anoop John
Why there are so many miss for the index blocks? WHat is the block cache mem you use? On Wed, Jun 5, 2013 at 12:37 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > I get your point Pankaj. > Going thro the code to confirm it > // Data index. We also read statistics about

Re: Replication is on columnfamily level or table level?

2013-06-05 Thread Anoop John
gt; > > On Wed, Jun 5, 2013 at 12:37 AM, Anoop John wrote: > > > Yes the replication can be specified at the CF level.. You have used > > HCD#setScope() right? > > > > > S => '3', BLOCKSIZE => '65536'}, {*NAME => 'cf2',

Re: Handling regionserver crashes in production cluster

2013-06-05 Thread Anoop John
How many total RS in the cluster? You mean u can not do any operation on other regions in the live clusters? It should not happen.. Is it so happening that the client ops are targetted at the regions which were in the dead RS( and in transition now)? Can u have a closer look and see? If not pl

Re: Questions about HBase

2013-06-05 Thread Anoop John
e > of scan for finding a key in a block. I feel that warming up the block and > index cache could be a useful feature for many workflows. Would it be a > good idea to have a JIRA for that? > > Thanks, > Pankaj > > > On Wed, Jun 5, 2013 at 1:24 AM, Anoop John wrote: > &

Re: observer coprocessor question regarding puts

2013-06-08 Thread Anoop John
You want to have an index per every CF+CQ right? You want to maintain diff tables for diff columns? Put is having getFamilyMap method Map CF vs List KVs. From this List of KVs you can get all the CQ names and values etc.. -Anoop- On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso wrote: > Hi, > >

Re: how can I extend or replace the read cache

2013-06-09 Thread Anoop John
Shixiaolong, You would like to contribute your work to open src? If so mind raising a jira and attaching a solution doc for all of us? -Anoop- On Thu, Jun 6, 2013 at 10:31 AM, Ted Yu wrote: > HBASE-7404 Bucket Cache has done some work in this regard. > > Please refer to the late

Re: bulk-load bug ?

2013-06-21 Thread Anoop John
When adding data to HBase with same key, it is the timestamp (ts) which determines the version. Diff ts will make diff versions for the cell. But in case of bulk load using ImportTSV tool, the ts used by one mapper will be same. All the Puts created from it will have the same ts. The tool allows us

Re: Possibility of using timestamp as row key in HBase

2013-06-21 Thread Anoop John
You can specify max size to indicate the region split (when a region should get split) But this size is the size of the HFile. To be precise it is the size of the biggest HFile under that region. If u specify this size as 10G and when the region is having a file of size bigger than 10G the region w

Re: Scan performance

2013-06-21 Thread Anoop John
Have a look at FuzzyRowFilter -Anoop- On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean wrote: > I understand more, but have additional questions about the internals... > > So, in this example I have 6000 rows X 40 columns in this table. In this > test my startRow and stopRow do not narrow the scan c

Re: flushing + compactions after config change

2013-06-27 Thread Anoop John
>the flush size is at 128m and there is no memory pressure You mean there is enough memstore reserved heap in the RS, so that there wont be premature flushes because of global heap pressure? What is the RS max mem and how many regions and CFs in each? Can you check whether the flushes happening b

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Anoop John
The config "hbase.regionserver.maxlogs" specifies what is the max #logs and defaults to 32. But remember if there are so many log files to replay then the MTTR will become more (RS down case ) -Anoop- On Thu, Jun 27, 2013 at 1:59 PM, Viral Bajaria wrote: > Thanks Liang! > > Found the logs. I had

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Anoop John
Viral, Basically when you increase the memstore flush size ( well ur aim there is to reduce flushes and make data sit in memory for longer time) you need to carefully consider the 2 things 1. What is the max heap and what is the % memory you have allocated max for all the memstores in a RS. An

Re: Problems while exporting from Hbase to CSV file

2013-06-28 Thread Anoop John
> so i can not use default scan() constructor as it will scan whole table in one go which results in OutOfMemory error in client process Not getting what you mean by this. Client calls next() on the Scanner and gets the rows. The setCaching() and setBatch() determines how much of data (rows, cells

Re: Help in designing row key

2013-07-03 Thread Anoop John
When you make the RK and convert the int parts into byte[] ( Use org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *) it will give 4 bytes for every byte.. Be careful about the ordering... When u convert a +ve and -ve integer into byte[] and u do Lexiographical compare (as done in HBase) u will

Re: Deleting range of rows from Hbase.

2013-07-04 Thread Anoop John
It is not supported from shell. Not directly from delete API also.. You can have a look at BulkDeleteEndpoint which can do what you want to -Anoop- On Thu, Jul 4, 2013 at 4:09 PM, yonghu wrote: > I check the latest api of Delete class. I am afraid you have to do it by > yourself. > > regards!

Re: Deleting range of rows from Hbase.

2013-07-04 Thread Anoop John
wrote: > Hi Anoop > one more question. Can I use BulkDeleteEndpoint at the client side or > should I use it like coprocessor which deployed in the server side? > > Thanks! > > Yong > > > On Thu, Jul 4, 2013 at 12:50 PM, Anoop John wrote: > > > It is not s

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
checksumFailures will get updated when the HBase handled checksum feature is in use and checksum check done at RS side failed.. If it happens we will try to read from DN with DN checksum check enabled. Agree that right now the HBase handled checksum will work only with SCR. But it might work wit

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
Viral DFS client uses org.apache.hadoop.hdfs.BlockReaderLocal for SCR.. I can see some debug level logs in this * LOG*.debug("New BlockReaderLocal for file " + blkfile + " of size " + blkfile.length() + " startOffset " + startOffset + " length " + length + " short circuit checksu

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
re basically logs for > HDFS_READ and HDFS_WRITE ops. I wanted to see if it's a valid assumption > that SCR is working if I don't see any clienttrace logs for the RS that is > hosted on the same box as the DN. > > Hopefully I clarified it. > > On Fri, Jul 5, 2013 at 1

Re: Bulk loading HFiles via LoadIncrementalHFiles fails at a region that is being compacted, a bug?

2013-07-09 Thread Anoop John
Hello Stan, Your bulk load trying to load data to multiple column families? -Anoop- On Wed, Jul 10, 2013 at 11:13 AM, Stack wrote: > File a bug Stan please. Paste your log snippet and surrounding what is > going on at the time. It looks broke that a bulk load would be kept o

Re: can coprocessor be used to do distinct operation?

2013-07-11 Thread Anoop John
Can u be little more specific? CPs works per region basis. So it can be utalized for distinct ops for one region.. Overall at table level if u want to do, then some work at client side also will be needed.. Have a look at Phoenix. http://forcedotcom.github.io/phoenix/functions.html On Thu, Ju

  1   2   3   >