I try to build a MR-Job, but in my case that doesn't work. Because if I set for example the batch to 1000 and there are 5000 columns in row. Now i found to generate something for rows where are the column size is bigger than 2500. BUT since the map function is executed for every batch-row i can't say if the row has a size bigger than 2500.
any ideas? 2013/10/25 lars hofhansl <[email protected]> > We need to finish up HBASE-8369 > > > > ________________________________ > From: Dhaval Shah <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Thursday, October 24, 2013 4:38 PM > Subject: Re: RE: Add Columnsize Filter for Scan Operation > > > Well that depends on your use case ;) > > There are many nuances/code complexities to keep in mind: > - merging results of various HFiles (each region can have.more than one) > - merging results of WAL > - applying delete markers > - how about data which is only in memory of region servers and no where > else > - applying bloom filters for efficiency > - what about hbase filters? > > At some point you would basically start rewriting an hbase region server > on you map reduce job which is not ideal for maintainability. > > Do we ever read MySQL data files directly or issue a SQL query? Kind of > goes back to the same argument ;) > > Sent from Yahoo Mail on Android >
