Re: Region balancing query

2015-02-13 Thread Shahab Yunus
Thanks, we will try that and report back. Regards, Shahab On Fri, Feb 13, 2015 at 4:56 PM, Ted Yu wrote: > You can make TableSkewCostFunction more prominent by increasing the value > for config parameter: > > hbase.master.balancer.stochastic.tableSkewCost > > Its default is 35. > > See if raisi

Re: Region balancing query

2015-02-13 Thread Ted Yu
You can make TableSkewCostFunction more prominent by increasing the value for config parameter: hbase.master.balancer.stochastic.tableSkewCost Its default is 35. See if raising to 100 or 200 helps. On Fri, Feb 13, 2015 at 1:09 PM, Shahab Yunus wrote: > Yes, this sever hosts other regions from

HMaster does not start when Upgrading Hbase 0.94 to 0.98 (needed by new version of Drill)

2015-02-13 Thread Alexander Zarei
Hi, I was wondering if you could help me solve this issue I am facing in setting up a new Hbase 0.98. So we had Hbase 0.94 running on our Drillbit machine. I stopped the .94, and installed a fresh 0.98 from Apache website. I configured hbase-site.xml based on the Apache getting started page

Re: Streaming data to htable

2015-02-13 Thread Nicolas Liochon
You should first try with the 'autoflush' boolean on the htable: set it to false. it buffers the writes for you and does the writes asynchronously. So all the multithreading / buffering work is done for you. If you need a synchronisation point (to free the resources on the sending side), you can ca

Re: Region balancing query

2015-02-13 Thread Shahab Yunus
Yes, this sever hosts other regions from other tables as well. Regards Shahab On Fri, Feb 13, 2015 at 1:45 PM, Ted Yu wrote: > Interesting, server7.ec3.internal,60020,1423845018628 was consistently > chosen as destination for the table. > Did server7.ec3.internal,60020,1423845018628 host region

Re: Streaming data to htable

2015-02-13 Thread Geovanie Marquez
We use Spark to convert large batches of data directly into HFiles. We've found it to be extremely performant, but we do not batch since our use case is not streaming. We bring it in about 50GB at a time so we would not suffer from the small files issue mentioned, but we do manually manage our regi

Re: Streaming data to htable

2015-02-13 Thread Alok Singh
Have you considered placing something like Kafka queue in between the data stream and hbase consumer/writer? I have used Kafka in the past to consume very high volume of event data and write it to hbase. Problems we ran into when writing large amounts of data continuously to hbase are stalls/timeo

Re: Region balancing query

2015-02-13 Thread Ted Yu
Interesting, server7.ec3.internal,60020,1423845018628 was consistently chosen as destination for the table. Did server7.ec3.internal,60020,1423845018628 host regions from other table ? Cheers On Fri, Feb 13, 2015 at 10:27 AM, Shahab Yunus wrote: > Table name is: > MYTABLE_RECENT_4W_V2 > > Paste

Re: Region balancing query

2015-02-13 Thread Shahab Yunus
Table name is: MYTABLE_RECENT_4W_V2 Pastebin snippet 1: http://pastebin.com/dQzMhGyP Pastebin snippet 2: http://pastebin.com/Y7ZsNAgF This is the master log after invoking balancer command from hbase shell. Regards, Shahab On Fri, Feb 13, 2015 at 12:00 PM, Ted Yu wrote: > bq. all the regions

Re: Streaming data to htable

2015-02-13 Thread Andrey Stepachev
Hi Jaime. That a bit of magic to use HFiles directly without considering keys and data layout (as mentioned by Nick you will face with a task of manually splitting keys, so effectively you will do what hbase already does effectively). Original answer was for concrete usecase: it is known where ke

Re: Streaming data to htable

2015-02-13 Thread Nick Dimiduk
Writing HFiles can become cumbersome if the data is spread evenly across regions -- you'll end up with lots of small files rather than a few big ones. You can batch writes through the client API. I would recommend you start with HTableInterface$put(List). You can tune the client-side buffer (#setW

Re: Region balancing query

2015-02-13 Thread Ted Yu
bq. all the regions of this table were back on this same RS! Interesting. Please check master log around the time this RS was brought online. You can pastebin the relevant snippet. Thanks On Fri, Feb 13, 2015 at 8:55 AM, Shahab Yunus wrote: > Hi Ted. > > Yes, the cluster itself is balanced. On

Re: Region balancing query

2015-02-13 Thread Shahab Yunus
Hi Ted. Yes, the cluster itself is balanced. On average 300 regions per node on 10 nodes. # of tables is 53 of varying sizes. Balancer was invoked and it didn't do anything (i.e. no movement of regions) but we didn't check the master's logs. We can do that. Interestingly, we restarted the RS wh

Re: Region balancing query

2015-02-13 Thread Ted Yu
How many tables are there in your cluster ? Is the cluster balanced overall (in terms of number of regions per server) but this table is not ? What happens (check master log) when you issue 'balancer' command through shell ? Cheers On Fri, Feb 13, 2015 at 8:19 AM, Shahab Yunus wrote: > CDH 5.

Re: Streaming data to htable

2015-02-13 Thread Jaime Solano
Hi Andrey, We're facing a similar situation, where we plan to load a lot of data into HBase direclty. We considered writing the Hfiles without MapReduce. Is this something you've done in the past? Are there any sample codes we could use as guide? On another side, what would you consider "big enoug

Region balancing query

2015-02-13 Thread Shahab Yunus
CDH 5.3 HBase 98.6 We are writing data to an HBase table through a M/R job. We pre split the table before each job run. The problem is that most of the regions end up on the same RS. This results in that one RS being severely overloaded and subsequent M/R jobs failing trying to write to the region

Re: Streaming data to htable

2015-02-13 Thread Andrey Stepachev
Hi hongbin, It seems that depend on how many data you ingest. In case of big enough I'd look at creating HFiles directly without mapreduce (for example using HFileOutputFormat without mapreduce or using HFileWriter directly). Created files can be imported by LoadIncrementalHFiles#doBulkLoad direct

Re: Multiple Filterlists

2015-02-13 Thread Ted Yu
Pragalbh: You can refer to testTransformMPO in TestFilterList which shows how hierarchical filter is constructed. Cheers On Fri, Feb 13, 2015 at 5:34 AM, Harsh J wrote: > You can build a single FilterList consisting of multiple FilterLists, > if that is what you're looking for. > > On Fri, Feb

hbase as logging dump => design for mapred

2015-02-13 Thread Wilm Schumacher
Hi, I have a design question and I'm kind of stuck. I do not find an easy solution, but I think there is one. The problem: consider you have an application where users can "open" an object. And then they can make an operation on that object. Or go further to another object. And now I want to make

Re: Streaming data to htable

2015-02-13 Thread Wilm Schumacher
Am 13.02.2015 um 10:39 schrieb Sleiman Jneidi: > I would go with second option, HtableInterface.put(List). The first > option sounds dodgy, where 5 minutes is a good time for things to go wrong > and you lose your data I agree with Sleiman. In my opinion the "multi put" option is the best plan. T

Re: Fwd: data base design question

2015-02-13 Thread Wilm Schumacher
Hi, Am 13.02.2015 um 04:08 schrieb Jignesh Patel: > How about Option 1: Create an embedded entity of results and store it as > list object inside order table as one of the column field. the problem is, that a hbase cell value must be a byte array. Thus you have to convert the "list object" to a by

Re: Multiple Filterlists

2015-02-13 Thread Harsh J
You can build a single FilterList consisting of multiple FilterLists, if that is what you're looking for. On Fri, Feb 13, 2015 at 5:03 PM, Pragalbh Garg wrote: > Is it possible to have multiple FilterLists while performing a scan in HBase > ? If yes, how ? > > >

Re: Re: managing HConnection

2015-02-13 Thread Serega Sheypak
What's the problem to call HConnectionManager.getConnection in Servlet.init method and pass it to your class responsible for HBase interaction? 2015-02-13 14:49 GMT+03:00 Sleiman Jneidi : > a single HConnection > > On Fri, Feb 13, 2015 at 11:12 AM, Serega Sheypak > > wrote: > > > What are you t

Re: Re: managing HConnection

2015-02-13 Thread Sleiman Jneidi
a single HConnection On Fri, Feb 13, 2015 at 11:12 AM, Serega Sheypak wrote: > What are you trying to achieve? > > 2015-02-13 12:36 GMT+03:00 Sleiman Jneidi : > > > To be honest guys I am still confused, especially that that HConnection > > implements Closeable and hence everyone has the right

Multiple Filterlists

2015-02-13 Thread Pragalbh Garg
Is it possible to have multiple FilterLists while performing a scan in HBase ? If yes, how ? NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the name

Re: Re: managing HConnection

2015-02-13 Thread Serega Sheypak
What are you trying to achieve? 2015-02-13 12:36 GMT+03:00 Sleiman Jneidi : > To be honest guys I am still confused, especially that that HConnection > implements Closeable and hence everyone has the right to close the > connection. I wrote this code to manage connections but I am not sure about

Re: Streaming data to htable

2015-02-13 Thread Sleiman Jneidi
I would go with second option, HtableInterface.put(List). The first option sounds dodgy, where 5 minutes is a good time for things to go wrong and you lose your data On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma wrote: > hi, > > I'm trying to use a htable to store data that comes in a streaming fa

Re: Re: managing HConnection

2015-02-13 Thread Sleiman Jneidi
To be honest guys I am still confused, especially that that HConnection implements Closeable and hence everyone has the right to close the connection. I wrote this code to manage connections but I am not sure about its correctness. private static class HConnectionProvider { private static HCo

Re: Re: managing HConnection

2015-02-13 Thread Serega Sheypak
Hi, really, I can share one Hconnection for the whole application. It's done by design. I have several servlets. Each servlet has 1-2 controllers working with hbase internally (put/get/e.t.c) Right now I don't see any reason to refactor code and share single HConnection for all controllers in servl