You will need to batch your puts of use th bulkloading... http://hbase.apache.org/book.html#arch.bulk.load
JMS 2015-12-17 0:35 GMT-05:00 Rajeshkumar J <[email protected]>: > Hi, > > I have my input file sized 1gb in HDFS. I am inserting it into hbase > table and also generating column qualifiers name dynamically. And I am > doing this in java and it took about 20 hours to finish this process. Can > any one help me in optimize the below java code > > > public class HbaseinsertionScan { > > //As I am generating column names dynamically also I need column names > to be in certain order so I decidedto have column name as combination of > numerical value and below alphabets > public static String[] ColumnNames = {"a", "aa", "b", "bb", "c", "cc", > "d", "dd", "e", "ee", "f", "ff", "g", "gg", "h", "hh", "i", "j", "k", "l", > "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}; > > > public static void main(String[] args) throws IOException { > > // Initializing req classes > BufferedReader br = null; > Configuration conf = new Configuration(); > > Configuration config = HBaseConfiguration.create(); > > Scan scan = new Scan(); > > // creating object for tables > //table 1 which holds actual data > HTable table1 = new HTable(config, "table1"); > > //table 2 holds numerical value for each rowkey > HTable table2 = new HTable(config, "table2"); > > > // Reading the files from HDFS and inserting it into Hbase Tabls > > FileSystem fs = FileSystem.get(conf); > FileStatus[] status = fs.listStatus(new Path("/filename")); > for (int i = 0; i < status.length; i++) { > br = new BufferedReader(new > InputStreamReader(fs.open(status[i].getPath()))); > String line = ""; > > while ((line = br.readLine()) != null) { > > String[] colvalues = line.split(","); > > //Generating column names dynamically > String colno = "199999"; > > RowFilter filter = new > RowFilter(CompareFilter.CompareOp.EQUAL, new > BinaryComparator(Bytes.toBytes(colvalues[1]))); > scan.setFilter(filter); > ResultScanner scanner = table2.getScanner(scan); > > for (Result result : scanner) { > > for (KeyValue kv : result.raw()) { > > colno = Bytes.toString(kv.getValue()); > long newval = Long.valueOf(colno)-1; > System.out.println(newval); > colno = String.valueOf(newval); > } > } > for (int index = 0; index < ColumnNames.length; index++) { > > // Adding rowkey > Put p = new Put(Bytes.toBytes(colvalues[1])); > > // Adding column family name, qualifier name ,value > p.add(Bytes.toBytes("test_family"), > Bytes.toBytes(colno + ColumnNames[index]), > Bytes.toBytes(colvalues[index])); > > table1.put(p); > > Put col = new Put(Bytes.toBytes(colvalues[1])); > > // Adding column family name, qualifier name ,value > col.add(Bytes.toBytes("test_family"), > Bytes.toBytes("randomno"), > Bytes.toBytes(colno)); > > table2.put(col); > > //System.out.println("row inserted"); > } > > } > > } > hTable.close(); > coltable.close(); > br.close(); > > } > > > } > > On Mon, Dec 14, 2015 at 9:35 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Rajesh, > > > > For the column qualifier, there is no need to "create" them in advance. > > Just setup what ever you want when you build your Put and HBase will take > > it... > > > > JMS > > > > 2015-12-14 6:05 GMT-05:00 Rajeshkumar J <[email protected]>: > > > > > Hi > > > > > > Thanks. This is what I need and I am considering this as flat-wide > > table > > > approach. > > > > > > I have some doubts and first of them is how to create dynamic column > > > qualifiers. Do you know the command or any other sites which is useful > > for > > > this approach. > > > > > > Thanks > > > > > > On Mon, Dec 14, 2015 at 4:28 PM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > > > That is correct. As long as ths column qualifier is different. But > they > > > > will still go on the same region and after compactions will end up in > > the > > > > same file. > > > > > > > > JMS > > > > > > > > JMS > > > > Le 2015-12-14 6:55 AM, "Rajeshkumar J" <[email protected]> > a > > > > écrit : > > > > > > > > > Hi, > > > > > > > > > > So as per your reply inserting second row will not update the > > > existing > > > > > row-key and it will add as new column qualifiers to the existing > > > row-key > > > > > > > > > > Thanks > > > > > > > > > > On Mon, Dec 14, 2015 at 4:13 PM, Jean-Marc Spaggiari < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > When you will insert the 2nd row, HBase wil just add is after the > > > first > > > > > > one. On the storage side it will be another key/value entry AFTER > > the > > > > > first > > > > > > one. On the conceptual view,, it will be seen as anothe column > for > > > the > > > > > same > > > > > > row (wide approach). HBase will not update the previous existing > > > > entry. I > > > > > > will create a new one for the new key/value. The 1002-xxx | > > url.com > > > > that > > > > > > you have insterted before will not be touched. > > > > > > > > > > > > you have to see all those key/values are totally independent. If > > they > > > > > have > > > > > > a different column name, what you do with one will have not any > > > impact > > > > on > > > > > > the others. > > > > > > > > > > > > JMS > > > > > > > > > > > > 2015-12-14 5:37 GMT-05:00 Rajeshkumar J < > > [email protected] > > > >: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Thanks for your response but in your previous answer you have > > > > > mentioned > > > > > > > as follows > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > -------------------------------------------------------------- > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > "This is if 1002-xxx is your key and "url.com" is your column > > > > > qualifier" > > > > > > > > > > > > > > I have input rows as follows > > > > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > > > > when I insert first row to my hbase table 1002-xxx will be > > > inserted > > > > as > > > > > > > rowkey and url.com will be one of my column qualifier > > > > > > > > > > > > > > what happens when I try to insert next row i.e., 1002 | xxx | > > > > > urrl2.com > > > > > > | > > > > > > > zz:zz:zz for this also row-key will be 1002-xxx. As far as I > know > > > > when > > > > > we > > > > > > > try to insert same row-key the row will be updated. > > > > > > > > > > > > > > what to do for this cases? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > On Mon, Dec 14, 2015 at 3:49 PM, Jean-Marc Spaggiari < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Hi Rajesh, > > > > > > > > > > > > > > > > This is not a tall table. Tall will be something whereyou put > > > your > > > > > > domain > > > > > > > > name on the key, no on the column qualifier. Putting the > domain > > > on > > > > > the > > > > > > > > columns means you will have many many columns for the same > key. > > > At > > > > > the > > > > > > > end, > > > > > > > > HBase always stores the key for each and every column, what > > ever > > > it > > > > > is > > > > > > > tall > > > > > > > > or wide. > > > > > > > > > > > > > > > > Reading 1000 rows or reading 1000 columns for HBase is > exactly > > > the > > > > > same > > > > > > > > thing. The only difference is that between 1000 rows HBase > > might > > > > > split > > > > > > > the > > > > > > > > rows into 2 regions. If you have 1000 columns, HBase will not > > > split > > > > > > them. > > > > > > > > > > > > > > > > HBase can return a row in few milli seconds. 2 seconds for > one > > > Cell > > > > > is > > > > > > a > > > > > > > > lot... > > > > > > > > > > > > > > > > HTH > > > > > > > > > > > > > > > > JMS > > > > > > > > > > > > > > > > 2015-12-14 5:14 GMT-05:00 Rajeshkumar J < > > > > [email protected] > > > > > >: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Thanks for your response, But you are suggesting me a > tall > > > and > > > > > > > narrow > > > > > > > > > table which is not working for me right now. As my use case > > > > > involves > > > > > > > > > real-time solution I need to retrieve data from hbase table > > > > within > > > > > > one > > > > > > > or > > > > > > > > > two seconds. I have tried as you suggested which may lead > to > > > 1000 > > > > > > rows > > > > > > > > for > > > > > > > > > a given id which takes more than a minute in retrieval > > > process. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > Rajeshkumar > > > > > > > > > > > > > > > > > > On Mon, Dec 14, 2015 at 3:29 PM, Jean-Marc Spaggiari < > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > HBase is a key value sotre. So what you are pushing here > > will > > > > be > > > > > > > stored > > > > > > > > > as: > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > > > > > > > > > > HOWEVER.... HBase will never split a region withing a key > > and > > > > > keys > > > > > > > are > > > > > > > > > > always ordered. So at the end, what you will have exactly > > is: > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > The only places where HBase will splis are marked with > > > "-----" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is if 1002-xxx is your key and "url.com" is your > > column > > > > > > > > qualifier. > > > > > > > > > > > > > > > > > > > > HTH > > > > > > > > > > > > > > > > > > > > JMS > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2015-12-14 3:39 GMT-05:00 Rajeshkumar J < > > > > > > [email protected] > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I am going to use flat-wide tables in Hbase for my > > > usecase > > > > > > and I > > > > > > > > > have > > > > > > > > > > > some doubts regarding this. > > > > > > > > > > > > > > > > > > > > > > 1. As per my knowledge flat-wide stores one column > > value > > > > as > > > > > > key > > > > > > > > and > > > > > > > > > > > others as its values in a key-value pair relationship ( > > > > correct > > > > > > me > > > > > > > > if I > > > > > > > > > > am > > > > > > > > > > > wrong). > > > > > > > > > > > > > > > > > > > > > > I am having row as follows > > > > > > > > > > > > > > > > > > > > > > id | name | url | time > > > > > > > > > > > > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I need to store it in flat-wide table as follows > > > > > > > > > > > > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx | 1002 | xxx | > > > > url.com > > > > > | > > > > > > > > > > yy:yy:yy > > > > > > > > > > > | > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > > > > > > > > > > > > How to store it like this? > > > > > > > > > > > Can any on help me in this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
