Re: SingleColumnValueFilter for empty column qualifier

2012-11-01 Thread Anoop John
You have one CF such that all rows will have KVs for that CF? You need to implement your own filter. Your scan can select the above CF and the on which u need the filtering. Have a look at the QualifierFilter.. similar approach you might need to do in the new filter.. Good luck :) -Anoop- On

Re: Coprocessor slow down problem!

2012-12-01 Thread Anoop John
Ram, This issue was for prePut()..postPut() was fine only... Can you take a look that at the time of slow put what the corresponding RS threads doing. May be can get some clues from that. -Anoop- On Fri, Nov 30, 2012 at 2:04 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote:

Re: Coprocessor slow down problem!

2012-12-02 Thread Anoop John
ramkrishna.s.vasude...@gmail.com wrote: Ok...fine...Ya seeing what is happening in postPut should give an idea. Regards Ram On Sat, Dec 1, 2012 at 1:52 PM, Anoop John anoop.hb...@gmail.com wrote: Ram, This issue was for prePut()..postPut() was fine only... Can you take a look

Re: Reg:delete performance on HBase table

2012-12-05 Thread Anoop John
Hi Manoj Can u tell more abt your use case.. You know the rowkey range which needs to be deleted? (all the rowkeys) Or is it like based on some condition you want to delete a set of rows? Which version of HBase you are using? HBASE-6284 provided some performance improvement in

Re: Can a RegionObserver coprocessor increment counter of a row key that may not belong to the region ?

2012-12-05 Thread Anoop John
In that case the CP hook might need to make an RPC call.. Might be to another RS? In this case why cant you think of doing both the table updation from the client side? Sorry I am not fully sure abt your use case -Anoop- On Wed, Dec 5, 2012 at 11:00 PM, Amit Sela am...@infolinks.com wrote:

Re: Bulk Loading From Local File

2012-12-13 Thread Anoop John
Can I load file rows to hbase table without importing to hdfs Where you want the data to get stored finally..I mean the raw data.. I hope that in HDFS only ( u want) Have a look at ImportTSV tool.. -Anoop- On Thu, Dec 13, 2012 at 9:23 PM, Mehmet Simsek nurettinsim...@gmail.comwrote: Can I

Re: HBase - Secondary Index

2012-12-27 Thread Anoop John
how the massive number of get() is going to perform againt the main table Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. -Anoop- On Thu, Dec 27, 2012 at 6:37

Re: Increment operations in hbase

2013-01-12 Thread Anoop John
Hi Can you check with using API HTable#batch()? Here you can batch a number of increments for many rows in just one RPC call. Might help you to reduce the net time taken. Good luck. -Anoop- On Sat, Jan 12, 2013 at 4:07 PM, kiran kiran.sarvabho...@gmail.com wrote: Hi, My usecase is I

Re: Tune MapReduce over HBase to insert data

2013-01-12 Thread Anoop John
Hi Can you think of using HFileOutputFormat ? Here you use TableOutputFormat now. There will be put calls to HTable. Instead in HFileOutput format the MR will write the HFiles directly.[No flushes , compactions] Later using LoadIncrementalHFiles need to load the HFiles to the

Re: Increment operations in hbase

2013-01-12 Thread Anoop John
and not on network transfer time of Increment objects. Sent from my iPhone On 12 בינו 2013, at 17:40, Anoop John anoop.hb...@gmail.com wrote: Hi Can you check with using API HTable#batch()? Here you can batch a number of increments for many rows in just

Re: exception during coprocessor run

2013-01-13 Thread Anoop John
HBase throw this Exception when the connection btw the client process and the RS process (connection through which the op request came) is broken. Any issues with your client app or nw? The operation will be getting retried from client right? -Anoop- On Sun, Jan 13, 2013 at 8:24 PM, Ted Yu

Re: Coprocessor / threading model

2013-01-13 Thread Anoop John
In your CP methods you will get ObserverContext object from which you can get HRS object. ObserverContext.getEnvironment().getRegionServerServices() From this HRS you can get hold to any of the region served by that RS. Then directly call methods on HRegion to insert data. :) Good luck.. -Anoop-

Re: best read path explanation

2013-01-13 Thread Anoop John
At a read time, if there are more than one HFile for a store, HBase will read that row from all the HFiles (check whether this row is there and if so read) and also from memstore. So it can get the latest data. Also remember that there will be compaction happening for HFiles which will merge

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread Anoop John
In case of Hive data insertion means placing the file under table path in HDFS. HBase need to read the data and convert it into its format. (HFiles) MR is doing this work.. So this makes it clear that HBase will be slower. :) As Michael said the read operation... -Anoop- On Thu, Jan 17,

Re: Pagination with HBase - getting previous page of data

2013-02-03 Thread Anoop John
lets say for a scan setCaching is 10 and scan is done across two regions. 9 Results(satisfying the filter) are in Region1 and 10 Results(satisfying the filter) are in Region2. Then will this scan return 19 (9+10) results? @Anil. No it will return 10 results only not 19. The client here takes into

Re: Co-Processor in scanning the HBase's Table

2013-02-21 Thread Anoop John
What is this filtering at client side doing exactly? postScannerClose() wont deal with any scanned data. This hook will be called later.. You should be using hooks with scanner's next calls. Mind telling the exact thing you are doing now at client side. Then we might be able to suggest some

Re: Co-Processor in scanning the HBase's Table

2013-02-22 Thread Anoop John
the InternalScanner and return a Map with your filtered values. You can also check this link: http://hbase-coprocessor-experiments.blogspot.co.il/2011/05/extending.html Good Luck! On Thu, Feb 21, 2013 at 5:52 PM, Anoop John anoop.hb...@gmail.com wrote: What is this filtering at client side

Re: queries and MR jobs

2013-03-02 Thread Anoop John
HBase data is ultimately persisted in HDFS and there it will be replicated in different nodes. But HBase table's each region will be associated with exactly one RS. So doing any operation on that region, ant client need to contact this HRS only. -Anoop- On Sat, Feb 16, 2013 at 2:07 AM, Pamecha,

Re: Concurrently Reading Still Got Exceptions

2013-03-02 Thread Anoop John
Is this really related to concurrent reads? I think some thing else.. Will dig into code tomorrow. Can you attach a junit test case which will produce NPE. -Anoop- On Sat, Mar 2, 2013 at 9:29 PM, Ted Yu yuzhih...@gmail.com wrote: Looks like the issue might be related to HTable:

Re: Eliminating duplicate values

2013-03-03 Thread Anoop John
Matt Corgan I remember, some one else also sent mail some days back looking for same use case Yes CP can help. May be do deletion of duplicates at Major compact time? -Anoop- On Sun, Mar 3, 2013 at 9:12 AM, Matt Corgan mcor...@hotpads.com wrote: I have a few use cases where

Re: [HBase] dummy cached location when batch insert result in bad performances.

2013-03-04 Thread Anoop John
The guide explains it well.. The region moves across RSs and splits of region will cause the location cache(at client) to be stale and it will look into META again. The memory flush/compaction and all will not make it to happen. When there is a change for the META entry for a region, the

Re: How HBase perform per-column scan?

2013-03-10 Thread Anoop John
As per the above said, you will need a full table scan on that CF. As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu yuzhih...@gmail.com wrote: bq. physically column family should be able to perform efficiently (storage layer When

Re: Compaction problem

2013-03-22 Thread Anoop John
How many regions per RS? And CF in table? What is the -Xmx for RS process? You will bget 35% of that memory for all the memstores in the RS. hbase.hregion.memstore.flush.size = 1GB!! Can you closely observe the flushQ size and compactionQ size? You may be getting so many small file flushes(Due

Re: Essential column family performance

2013-04-08 Thread Anoop John
Agree here. The effectiveness depends on what % of data satisfies the condition, how it is distributed across HFile blocks. We will get performance gain when the we will be able to skip some HFile blocks (from non essential CFs). Can test with different HFile block size (lower value)? -Anoop-

Re: Overwrite a row

2013-04-21 Thread Anoop John
You can use MultiRowMutationEndpoint for atomic op on multiple rows (within same region).. On Sun, Apr 21, 2013 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote: Here is code from 0.94 code base: public void mutateRow(final RowMutations rm) throws IOException { new

Re: HBase - Performance issue

2013-04-24 Thread Anoop John
Hi How many request handlers are there in ur RS? Can you up this number and see? -Anoop- On Wed, Apr 24, 2013 at 3:42 PM, kzurek kzu...@proximetry.pl wrote: The problem is that when I'm putting my data (multithreaded client, ~30MB/s traffic outgoing) into the cluster the load is

Re: writing and reading from a region at once

2013-04-25 Thread Anoop John
But it seems that I'm losing writes somewhere, is it possible the writes could fail silently Which version you are using? How you say writes missed silently? The current read, which was going on, has not returned the row that you just wrote? Or you have created a new scan after wards and in

Re: max regionserver handler count

2013-04-30 Thread Anoop John
You are making use of batch Gets? get(List) -Anoop- On Tue, Apr 30, 2013 at 11:40 AM, Viral Bajaria viral.baja...@gmail.comwrote: Thanks for getting back, Ted. I totally understand other priorities and will wait for some feedback. I am adding some more info to this post to allow better

Re: max regionserver handler count

2013-04-30 Thread Anoop John
, Apr 30, 2013 at 12:12 PM, Viral Bajaria viral.baja...@gmail.comwrote: I am using asynchbase which does not have the notion of batch gets. It allows you to batch at a rowkey level in a single get request. -Viral On Mon, Apr 29, 2013 at 11:29 PM, Anoop John anoop.hb...@gmail.com wrote: You

Re: Very poor read performance with composite keys in hbase

2013-05-01 Thread Anoop John
Navis Thanks for the issue link. Currently the read queries will start MR jobs as usual for reading from HBase. Correct? Is there any plan for supporting noMR? -Anoop- On Thu, May 2, 2013 at 7:09 AM, Navis류승우 navis@nexr.com wrote: Currently, hive storage handler reads rows one by

Re: Endpoint vs. Observer Coprocessors

2013-05-03 Thread Anoop John
data in one common variable Didn't follow u completely. Can u tell us little more on your usage. How exactly the endpoint to be related with the CP hook (u said postPut) -Anoop- On Fri, May 3, 2013 at 4:04 PM, Pavel Hančar pavel.han...@gmail.com wrote: Hello, I've just started to discover

Re: Documentation for append

2013-05-08 Thread Anoop John
I have just gone through the code and trying answering ur questions. From what I currently understand, this operation allows me to append bytes to an existing cell. Yes Does this append by creating a new cell with a new timestamp? Yes Does this update the cell while maintaining its timestamp? No.

Re: Error about rs block seek

2013-05-13 Thread Anoop John
Current pos = 32651; currKeyLen = 45; currValLen = 80; block limit = 32775 This means after the cur position we need to have atleast 45+80+4(key length stored as 4 bytes) +4(value length 4 bytes) So atleast 32784 should have been the limit. If we have memstoreTS also written with this KV some

Re: Error about rs block seek

2013-05-13 Thread Anoop John
tables, does something make troubles? 2013/5/13 Anoop John anoop.hb...@gmail.com Current pos = 32651; currKeyLen = 45; currValLen = 80; block limit = 32775 This means after the cur position we need to have atleast 45+80+4(key length stored as 4 bytes) +4(value length 4 bytes) So

Re: Block size of HBase files

2013-05-13 Thread Anoop John
Praveen, How many regions there in ur table and how and CFs? Under /hbase/table-name there will be many files and dir u will be able to see. There will be .tableinfo file and every region will have .regionInfo file and then under cf the data file (HFiles) . Your total data is 250GB. When your

Re: Block size of HBase files

2013-05-13 Thread Anoop John
now have 731 regions (each about ~350 mb !!). I checked the configuration in CM, and the value for hbase.hregion.max.filesize is 1 GB too !!! You mentioned the splits at the time of table creation? How u created the table? -Anoop- On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani

Re: Block size of HBase files

2013-05-13 Thread Anoop John
.*) . However we are not providing any details in the configuration object , except for the zookeeper quorum, port number. Should we specify explicitly at this stage ? On 13 May 2013 19:54, Anoop John anoop.hb...@gmail.com wrote: now have 731 regions (each about ~350 mb !!). I checked

Re: GET performance degrades over time

2013-05-17 Thread Anoop John
Yes bloom filters have been enabled: ROWCOL Can u try with ROW bloom? -Anoop- On Fri, May 17, 2013 at 12:20 PM, Viral Bajaria viral.baja...@gmail.comwrote: Thanks for all the help in advance! Answers inline.. Hi Viral, some questions: Are you adding new data or deleting data

Re: What is in BlockCache?

2013-05-20 Thread Anoop John
So in BlockCache, does HBase store b1 and b2 separately, or store the merged form? store b1 and b2 separately.. Stores the blocks read from HFiles. -Anoop- On Mon, May 20, 2013 at 5:37 PM, yun peng pengyunm...@gmail.com wrote: Hi, All, I am wondering what is exactly stored in BlockCache: Is

Re: Where is code in hbase that physically delete a record?

2012-10-19 Thread Anoop John
Yes the KVs coming out from your delegate Scanner will be in sorted form.. Also with all other logic applied like removing TTL expired data, handling max versions etc.. Thanks for updating.. -Anoop- On Sat, Oct 20, 2012 at 1:11 AM, PG pengyunm...@gmail.com wrote: Hi, Anoop and Ram, As I have

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
Hi Using ImportTSV tool you are trying to bulk load your data. Can you see and tell how many mappers and reducers were there. Out of total time what is the time taken by the mapper phase and by the reducer phase. Seems like MR related issue (may be some conf issue). In this bulk load case

Re: How do I get multiple rows under coprocessor

2012-10-23 Thread Anoop John
How many rows want to get within CP? What is the time taken now? Do you enable block caching? Pls observe the cache hit ratio. As you issue the Get at the server side only(CP), one by one get is the only way. Also how many CFs and how many HFiles for each of the CF? Blooms have you tried?

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
at 8:59 AM, Anoop John anoop.hb...@gmail.com wrote: Hi Using ImportTSV tool you are trying to bulk load your data. Can you see and tell how many mappers and reducers were there. Out of total time what is the time taken by the mapper phase and by the reducer phase. Seems like

Re: Hbase import Tsv performance (slow import)

2012-10-23 Thread Anoop John
RS will be used.. So there is no point in WAL there... I am making it clear for you? The data is already present in form of raw data in some txt or csv file :) -Anoop- On Wed, Oct 24, 2012 at 10:41 AM, Anoop John anoop.hb...@gmail.com wrote: Hi Anil On Wed, Oct 24, 2012 at 10:39 AM, anil

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Anoop John
: That's a very interesting fact. You made it clear but my custom Bulk Loader generates an unique ID for every row in map phase. So, all my data is not in csv or text. Is there a way that i can explicitly turn on WAL for bulk loading? On Tue, Oct 23, 2012 at 10:14 PM, Anoop John anoop.hb...@gmail.com

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Anoop John
What I still don’t understand is, since both CP and MR are both running on the region side, with is the MR better than the CP? For the case bulk delete alone CP (Endpoint) will be better than MR for sure.. Considering your over all need people were suggesting better MR.. U need a scan and move

Re: How does BlockCache find relevant Blocks?

2013-05-29 Thread Anoop John
There is an index for the blocks in a HFile. This index contains details like start row in the block, its offset and length in the HFile... So as a 1st setp to get a rowkey, we will find this rk can be present in which HFile block.. (I am assuming only one HFile as of now).. Now we will see

Re: How does BlockCache find relevant Blocks?

2013-05-29 Thread Anoop John
level of index (like meta data index) here in memory? if there is, is it a hash index or other?... Regards Yun On Wed, May 29, 2013 at 7:45 AM, Anoop John anoop.hb...@gmail.com wrote: There is an index for the blocks in a HFile. This index contains details like start row in the block, its

Re: HConnectionManager$HConnectionImplementation.locateRegionInMeta

2013-05-29 Thread Anoop John
Can you have a look at issue HBASE-8476? Seems related? A fix is available in HBASE-8346's patch.. -Anoop- On Thu, May 30, 2013 at 9:21 AM, Kireet kir...@feedly.com wrote: We are running hbase 0.94.6 in a concurrent environment and we are seeing the majority of our code stuck in this method

Re: RPC Replication Compression

2013-06-04 Thread Anoop John
0.96 will support HBase RPC compression Yes Replication between master and slave will enjoy it as well (important since bandwidth between geographically distant data centers is scarce and more expensive) But I can not see it is being utilized in replication. May be we can do improvements in

Re: Replication is on columnfamily level or table level?

2013-06-04 Thread Anoop John
Yes the replication can be specified at the CF level.. You have used HCD#setScope() right? S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE = '2'*, You set scope as 2?? You have to set one CF to be replicated to one cluster and another to to another cluster. I dont think it is

Re: Questions about HBase

2013-06-04 Thread Anoop John
4. This one is related to what I read in the HBase definitive guide bloom filter section Given a random row key you are looking for, it is very likely that this key will fall in between two block start keys. The only way for HBase to figure out if the key actually exists is by loading

Re: Scan + Gets are disk bound

2013-06-04 Thread Anoop John
When you set time range on Scan, some files can get skipped based on the max min ts values in that file. Said this, when u do major compact and do scan based on time range, dont think u will get some advantage. -Anoop- On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran rahu...@yahoo.com wrote:

Re: Questions about HBase

2013-06-05 Thread Anoop John
Why there are so many miss for the index blocks? WHat is the block cache mem you use? On Wed, Jun 5, 2013 at 12:37 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: I get your point Pankaj. Going thro the code to confirm it // Data index. We also read statistics about the

Re: Replication is on columnfamily level or table level?

2013-06-05 Thread Anoop John
, Jun 5, 2013 at 12:37 AM, Anoop John anoop.hb...@gmail.com wrote: Yes the replication can be specified at the CF level.. You have used HCD#setScope() right? S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE = '2'*, You set scope as 2?? You have to set one CF

Re: Handling regionserver crashes in production cluster

2013-06-05 Thread Anoop John
How many total RS in the cluster? You mean u can not do any operation on other regions in the live clusters? It should not happen.. Is it so happening that the client ops are targetted at the regions which were in the dead RS( and in transition now)? Can u have a closer look and see? If not

Re: Questions about HBase

2013-06-05 Thread Anoop John
feel that warming up the block and index cache could be a useful feature for many workflows. Would it be a good idea to have a JIRA for that? Thanks, Pankaj On Wed, Jun 5, 2013 at 1:24 AM, Anoop John anoop.hb...@gmail.com wrote: Why there are so many miss for the index blocks? WHat

Re: observer coprocessor question regarding puts

2013-06-08 Thread Anoop John
You want to have an index per every CF+CQ right? You want to maintain diff tables for diff columns? Put is having getFamilyMap method Map CF vs List KVs. From this List of KVs you can get all the CQ names and values etc.. -Anoop- On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso rcuso...@gmail.com

Re: how can I extend or replace the read cache

2013-06-10 Thread Anoop John
Shixiaolong, You would like to contribute your work to open src? If so mind raising a jira and attaching a solution doc for all of us? -Anoop- On Thu, Jun 6, 2013 at 10:31 AM, Ted Yu yuzhih...@gmail.com wrote: HBASE-7404 Bucket Cache has done some work in this regard. Please

Re: bulk-load bug ?

2013-06-21 Thread Anoop John
When adding data to HBase with same key, it is the timestamp (ts) which determines the version. Diff ts will make diff versions for the cell. But in case of bulk load using ImportTSV tool, the ts used by one mapper will be same. All the Puts created from it will have the same ts. The tool allows

Re: Possibility of using timestamp as row key in HBase

2013-06-21 Thread Anoop John
You can specify max size to indicate the region split (when a region should get split) But this size is the size of the HFile. To be precise it is the size of the biggest HFile under that region. If u specify this size as 10G and when the region is having a file of size bigger than 10G the region

Re: Scan performance

2013-06-22 Thread Anoop John
Have a look at FuzzyRowFilter -Anoop- On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean tony.d...@sas.com wrote: I understand more, but have additional questions about the internals... So, in this example I have 6000 rows X 40 columns in this table. In this test my startRow and stopRow do not

Re: flushing + compactions after config change

2013-06-27 Thread Anoop John
the flush size is at 128m and there is no memory pressure You mean there is enough memstore reserved heap in the RS, so that there wont be premature flushes because of global heap pressure? What is the RS max mem and how many regions and CFs in each? Can you check whether the flushes happening

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Anoop John
The config hbase.regionserver.maxlogs specifies what is the max #logs and defaults to 32. But remember if there are so many log files to replay then the MTTR will become more (RS down case ) -Anoop- On Thu, Jun 27, 2013 at 1:59 PM, Viral Bajaria viral.baja...@gmail.comwrote: Thanks Liang!

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Anoop John
Viral, Basically when you increase the memstore flush size ( well ur aim there is to reduce flushes and make data sit in memory for longer time) you need to carefully consider the 2 things 1. What is the max heap and what is the % memory you have allocated max for all the memstores in a RS.

Re: Problems while exporting from Hbase to CSV file

2013-06-28 Thread Anoop John
so i can not use default scan() constructor as it will scan whole table in one go which results in OutOfMemory error in client process Not getting what you mean by this. Client calls next() on the Scanner and gets the rows. The setCaching() and setBatch() determines how much of data (rows,

Re: Help in designing row key

2013-07-03 Thread Anoop John
When you make the RK and convert the int parts into byte[] ( Use org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *) it will give 4 bytes for every byte.. Be careful about the ordering... When u convert a +ve and -ve integer into byte[] and u do Lexiographical compare (as done in HBase) u will

Re: Deleting range of rows from Hbase.

2013-07-04 Thread Anoop John
It is not supported from shell. Not directly from delete API also.. You can have a look at BulkDeleteEndpoint which can do what you want to -Anoop- On Thu, Jul 4, 2013 at 4:09 PM, yonghu yongyong...@gmail.com wrote: I check the latest api of Delete class. I am afraid you have to do it by

Re: Deleting range of rows from Hbase.

2013-07-04 Thread Anoop John
yongyong...@gmail.com wrote: Hi Anoop one more question. Can I use BulkDeleteEndpoint at the client side or should I use it like coprocessor which deployed in the server side? Thanks! Yong On Thu, Jul 4, 2013 at 12:50 PM, Anoop John anoop.hb...@gmail.com wrote: It is not supported

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
checksumFailures will get updated when the HBase handled checksum feature is in use and checksum check done at RS side failed.. If it happens we will try to read from DN with DN checksum check enabled. Agree that right now the HBase handled checksum will work only with SCR. But it might work

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
Viral DFS client uses org.apache.hadoop.hdfs.BlockReaderLocal for SCR.. I can see some debug level logs in this * LOG*.debug(New BlockReaderLocal for file + blkfile + of size + blkfile.length() + startOffset + startOffset + length + length + short circuit checksum +

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Anoop John
for HDFS_READ and HDFS_WRITE ops. I wanted to see if it's a valid assumption that SCR is working if I don't see any clienttrace logs for the RS that is hosted on the same box as the DN. Hopefully I clarified it. On Fri, Jul 5, 2013 at 12:55 AM, Anoop John anoop.hb...@gmail.com wrote: Agree

Re: Bulk loading HFiles via LoadIncrementalHFiles fails at a region that is being compacted, a bug?

2013-07-10 Thread Anoop John
Hello Stan, Your bulk load trying to load data to multiple column families? -Anoop- On Wed, Jul 10, 2013 at 11:13 AM, Stack st...@duboce.net wrote: File a bug Stan please. Paste your log snippet and surrounding what is going on at the time. It looks broke that a bulk load

Re: can coprocessor be used to do distinct operation?

2013-07-11 Thread Anoop John
Can u be little more specific? CPs works per region basis. So it can be utalized for distinct ops for one region.. Overall at table level if u want to do, then some work at client side also will be needed.. Have a look at Phoenix. http://forcedotcom.github.io/phoenix/functions.html On Thu,

Re: Region server going down when deleting a table from HBase

2013-07-11 Thread Anoop John
What is the full trace for DroppedSnapshotException ? The caused by trace? -Anoop- On Thu, Jul 11, 2013 at 6:38 PM, Sandeep L sandeepvre...@outlook.comwrote: Hi, We are using hbase-0.94.1 with hadoop-1.0.2. Recently couple of time we faced a strange issue while deleting a table. Whenever

Re: Region server going down when deleting a table from HBase

2013-07-11 Thread Anoop John
After the flush opening the HFile reader. You are getting FileNotFoundException! * LOG*.warn(Unable to rename + path + to + dstPath); Do you see above warn in log? In code I can see even if this rename got failed we try to open the file for read. Also as part of the region close, if we do

Re: Region server going down when deleting a table from HBase

2013-07-11 Thread Anoop John
Ya as Ted said, if this warn is not there, need to see what happened to that file (just created) -Anoop- On Thu, Jul 11, 2013 at 8:07 PM, Anoop John anoop.hb...@gmail.com wrote: After the flush opening the HFile reader. You are getting FileNotFoundException! * LOG *.warn(Unable to rename

Re: the scan will be executed parallel if not use coprocessor?

2013-07-15 Thread Anoop John
Yes it may be good to visit HBASE-1935 .. Whether or not CP Observers (pre/post hooks) are used or not, the scanning is sequential from HBase client side. Phoenix having their own client side code to make mutiple parallel scan requests to servers. (splitting the scan range) We have Endpoints.

Re: [ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-13 Thread Anoop John
Good to see this Rajesh. Thanks a lot to Huawei HBase team! -Anoop- On Tue, Aug 13, 2013 at 11:49 AM, rajeshbabu chintaguntla rajeshbabu.chintagun...@huawei.com wrote: Hi, We have been working on implementing secondary index in HBase, and had shared an overview of our design in the 2012

Re: Get on a row with multiple columns

2013-08-13 Thread Anoop John
You mean BulkDeleteProtocol? Can u paste the trace of the exception that u r getting? Near by logs.. -Anoop- On Wed, Aug 14, 2013 at 9:22 AM, Mrudula Madiraju mrudulamadir...@yahoo.com wrote: Hi Varun, I tried BulkDeletePoint and it is giving me an UnsupportedProtocolException - no handler

Re: Table and Family

2013-08-13 Thread Anoop John
Hi Try using SingleColumnValueFilter # setFilterIfMissing() Default value of this is false. Set it to True to filter rows when the cf:q is missing -Anoop- On Mon, Aug 12, 2013 at 9:52 PM, Bing Li lbl...@gmail.com wrote: Hi, all, My understandings about HBase table and its family are as

Re: Get on a row with multiple columns

2013-08-14 Thread Anoop John
/coprocessor_introduction*.. It works now. Regards, Mrudula *Reduce,Recycle,Reuse!!!* -- *From:* Anoop John anoop.hb...@gmail.com *To:* user@hbase.apache.org; Mrudula Madiraju mrudulamadir...@yahoo.com *Sent:* Wednesday, 14 August 2013 10:33 AM *Subject:* Re: Get

Re: passing a parameter to an observer coprocessor

2013-08-22 Thread Anoop John
This will need you have to pass the attr with every Mutation. If this level of dynamic nature you dont want, then as Andy said can impl Observer and Endpoint and some Singleton object which both can share.. -Anoop- On Fri, Aug 23, 2013 at 12:18 AM, Anoop John anoop.hb...@gmail.com wrote: Can

Re: best approach for write and immediate read use case

2013-08-23 Thread Anoop John
What would be the behavior for inserting data using map reduce job? would the recently added records be in the memstore? or I need to load them for read queries after the insert is done? Using MR you have 2 options for insertion. One will create the HFiles directly as o/p (Using

Re: Programming practices for implementing composite row keys

2013-09-05 Thread Anoop John
Hi Have a look at Phoenix[1]. There you can define a composite RK model and it handles the -ve number ordering. Also the scan model u mentioned will be well supported with start/stop RK on entity1 and using SkipScanFilter for others. -Anoop- [1] https://github.com/forcedotcom/phoenix

Re: java.lang.NegativeArraySizeException: -1 in hbase

2013-09-09 Thread Anoop John
Thats sound correct. Can we mention it some where in our doc? Will that be good? -Anoop- On Mon, Sep 9, 2013 at 11:24 PM, lars hofhansl la...@apache.org wrote: The 0.94.5 change (presumably HBASE-3996) is only forward compatible. M/R is a bit special in the jars are shipped with the job.

Re: Please welcome our newest committer, Nick Dimiduk

2013-09-11 Thread Anoop John
Congratulations Nick... Welcome... -Anoop- On Wed, Sep 11, 2013 at 10:23 AM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Congratulations, Nick !!! Keep doing this great work 2013/9/10 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com Congratulations Nick.!!!

Re: Please welcome our newest committer, Rajeshbabu Chintaguntla

2013-09-11 Thread Anoop John
Congrats Rajesh. All the best! -Anoop- On Wed, Sep 11, 2013 at 10:38 PM, Jimmy Xiang jxi...@cloudera.com wrote: Congrats! On Wed, Sep 11, 2013 at 9:54 AM, Stack st...@duboce.net wrote: Hurray for Rajesh! On Wed, Sep 11, 2013 at 9:17 AM, ramkrishna vasudevan

Re: how to delete quickly?

2013-09-28 Thread Anoop John
So u want to do delete where col=? right? Have a look at BulkDelete Endpoint -Anoop- On Sat, Sep 28, 2013 at 8:28 AM, Azuryy Yu azury...@gmail.com wrote: Hi dear, I want to delete some rows with the specified column value, how to do it more quickly? Thanks.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

2013-11-03 Thread Anoop John
If you have used con.getTable() the close on HTable wont close the underlying connection -Anoop- On Mon, Nov 4, 2013 at 11:21 AM, michael.grund...@high5games.com wrote: Our current usage is how I would do this in a typical database app with table acting like a statement. It looks like this:

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

2013-11-03 Thread Anoop John
He uses HConnection.getTable() which in turn uses the Htable constructor HTable(*final* *byte*[] tableName, *final* HConnection connection, *final*ExecutorService pool) So no worry. on HTable#close() the connection wont get closed :) -Anoop- On Mon, Nov 4, 2013 at 11:29 AM, Sriram

Re: setCaching and setBatch

2013-11-04 Thread Anoop John
Have u tested this? AFAIK it happens the way u like. -Anoop- On Mon, Nov 4, 2013 at 10:49 PM, John johnnyenglish...@gmail.com wrote: Hi, If I have a question about the setCaching function. As I know the caching value influence how many rows will be cached in the memory of the region

Re: Region server block cache and memstore size

2013-11-28 Thread Anoop John
So you use Bulk load with HFileOpFormat for writing data? Then you can reduce the hbase.regionserver.global.memstore.upperLimit and hbase.regionserver.global.memstore.lowerLimit and give more heap % for the block cache. Not getting why u try to reduce that also. -Anoop- On Thu, Nov 28, 2013

Re: Thrift Error in HBase

2013-12-17 Thread Anoop John
As per the line no it comes as * byte*[][] famAndQf = KeyValue.*parseColumn*(*getBytes*(m.column)); column inside Mutation comes as null... Can you check client code -Anoop- On Tue, Dec 17, 2013 at 2:59 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Due to some reason the

Re: Did major compaction really happen?

2013-12-19 Thread Anoop John
The minor compaction is selecting all files. So there is nothing harm there by being promoted as major compaction (by the system) .. Are you finding some issues with that or just the logs made you worry? -Anoop- On Fri, Dec 20, 2013 at 11:55 AM, ramkrishna vasudevan

Re: secondary index feature

2013-12-22 Thread Anoop John
HIndex is local indexing mechanism. It is per region indexing. Pheonix is not yet having a local indexing mechanism.(Global indexing is in place) Phoenix team do have roadmap for that. You have done some thing like Lily Henning? -Anoop- On Sun, Dec 22, 2013 at 9:39 PM, Ted Yu

Re: Filter thread safe?

2013-12-22 Thread Anoop John
Filter methods are getting called within thread safe codes already. So no need to do some extra steps to make it so. Make sure do proper reset in reset() method -Anoop- On Mon, Dec 23, 2013 at 11:44 AM, Asaf Mesika asaf.mes...@gmail.com wrote: Hi, Are Filter implementations need to be

Re: secondary index feature

2014-01-03 Thread Anoop John
Is there any data on how RLI (or in particular Phoenix) query throughput correlates with the number of region servers assuming homogeneously distributed data? Phoenix is yet to add RLI. Now it is having global indexing only. Correct James? RLI impl from Huawei (HIndex) is having some numbers wrt

Re: secondary index feature

2014-01-03 Thread Anoop John
, Rajeshbabu From: Anoop John [anoop.hb...@gmail.com] Sent: Friday, January 03, 2014 3:22 PM To: user@hbase.apache.org Subject: Re: secondary index feature Is there any data on how RLI (or in particular Phoenix) query throughput correlates with the number

Re: A question about Scan

2014-03-21 Thread Anoop John
Scan s = new Scan(); s.addColum(cf1,cq1) This will return you rows but each row will contain only this column value (cf1:cq1) I guess you want the entire row (all columns) A query like select * from table where c1 != null Correct Vimal? You will need Filter then. -Anoop- On Thu, Mar 20,

  1   2   3   >