Is this map creation happening on client side ? But how does it know which RS will contain that row key in put operation until asking the .Meta. table . Does Hbase client first gets that ranges of keys of each Reagionservers and then group put objects based on Region servers ?
On Fri, Jul 17, 2015 at 7:48 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Internally AsyncProcess uses a Map which is keyed by server name: > > Map<ServerName, MultiAction<Row>> actionsByServer = > > new HashMap<ServerName, MultiAction<Row>>(); > > Here MultiAction would group Put's in your example which are destined for > the same server. > > Cheers > > On Fri, Jul 17, 2015 at 5:15 AM, Shushant Arora <shushantaror...@gmail.com > > wrote: > >> Thanks ! >> >> My key is random (hexadecimal). So hot spot should not be created. >> >> Is there any concept of bulk put. Say I want to raise a one put request >> for a 1000 size batch which will hit a region server instead of individual >> put for each key. >> >> >> Htable.put(List<Put>) Does this handles batching of put based on >> regionserver to which they will land to finally. Say in my batch there are >> 10 puts- 5 for RS1,3 for RS3 and 2 for RS3. Does this handles that? >> >> >> >> >> >> >> >> >> >> On Thu, Jul 16, 2015 at 8:31 PM, Michael Segel <michael_se...@hotmail.com >> > wrote: >> >>> You ask an interesting question… >>> >>> Lets set aside spark, and look at the overall ingestion pattern. >>> >>> Its really an ingestion pattern where your input in to the system is >>> from a queue. >>> >>> Are the events discrete or continuous? (This is kinda important.) >>> >>> If the events are continuous then more than likely you’re going to be >>> ingesting data where the key is somewhat sequential. If you use put(), you >>> end up with hot spotting. And you’ll end up with regions half full. >>> So you would be better off batching up the data and doing bulk imports. >>> >>> If the events are discrete, then you’ll want to use put() because the >>> odds are you will not be using a sequential key. (You could, but I’d >>> suggest that you rethink your primary key) >>> >>> Depending on the rate of ingestion, you may want to do a manual flush. >>> (It depends on the velocity of data to be ingested and your use case ) >>> (Remember what caching occurs and where when dealing with HBase.) >>> >>> A third option… Depending on how you use the data, you may want to avoid >>> storing the data in HBase, and only use HBase as an index to where you >>> store the data files for quick access. Again it depends on your data >>> ingestion flow and how you intend to use the data. >>> >>> So really this is less a spark issue than an HBase issue when it comes >>> to design. >>> >>> HTH >>> >>> -Mike >>> >>> > On Jul 15, 2015, at 11:46 AM, Shushant Arora < >>> shushantaror...@gmail.com> wrote: >>> > >>> > Hi >>> > >>> > I have a requirement of writing in hbase table from Spark streaming >>> app after some processing. >>> > Is Hbase put operation the only way of writing to hbase or is there >>> any specialised connector or rdd of spark for hbase write. >>> > >>> > Should Bulk load to hbase from streaming app be avoided if output of >>> each batch interval is just few mbs? >>> > >>> > Thanks >>> > >>> >>> The opinions expressed here are mine, while they may reflect a cognitive >>> thought, that is purely accidental. >>> Use at your own risk. >>> Michael Segel >>> michael_segel (AT) hotmail.com >>> >>> >>> >>> >>> >>> >> >