Hi, Thanks. it works. After storing the data in flat-wide table I try to query it using RowFilter it takes 50 seconds to return the results. Is this behaviour of Hbase?
Thanks On Thu, Dec 17, 2015 at 6:53 PM, Jean-Marc Spaggiari < [email protected]> wrote: > You will need to batch your puts of use th bulkloading... > > http://hbase.apache.org/book.html#arch.bulk.load > > JMS > > 2015-12-17 0:35 GMT-05:00 Rajeshkumar J <[email protected]>: > > > Hi, > > > > I have my input file sized 1gb in HDFS. I am inserting it into hbase > > table and also generating column qualifiers name dynamically. And I am > > doing this in java and it took about 20 hours to finish this process. Can > > any one help me in optimize the below java code > > > > > > public class HbaseinsertionScan { > > > > //As I am generating column names dynamically also I need column > names > > to be in certain order so I decidedto have column name as combination of > > numerical value and below alphabets > > public static String[] ColumnNames = {"a", "aa", "b", "bb", "c", > "cc", > > "d", "dd", "e", "ee", "f", "ff", "g", "gg", "h", "hh", "i", "j", "k", > "l", > > "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}; > > > > > > public static void main(String[] args) throws IOException { > > > > // Initializing req classes > > BufferedReader br = null; > > Configuration conf = new Configuration(); > > > > Configuration config = HBaseConfiguration.create(); > > > > Scan scan = new Scan(); > > > > // creating object for tables > > //table 1 which holds actual data > > HTable table1 = new HTable(config, "table1"); > > > > //table 2 holds numerical value for each rowkey > > HTable table2 = new HTable(config, "table2"); > > > > > > // Reading the files from HDFS and inserting it into Hbase Tabls > > > > FileSystem fs = FileSystem.get(conf); > > FileStatus[] status = fs.listStatus(new Path("/filename")); > > for (int i = 0; i < status.length; i++) { > > br = new BufferedReader(new > > InputStreamReader(fs.open(status[i].getPath()))); > > String line = ""; > > > > while ((line = br.readLine()) != null) { > > > > String[] colvalues = line.split(","); > > > > //Generating column names dynamically > > String colno = "199999"; > > > > RowFilter filter = new > > RowFilter(CompareFilter.CompareOp.EQUAL, new > > BinaryComparator(Bytes.toBytes(colvalues[1]))); > > scan.setFilter(filter); > > ResultScanner scanner = table2.getScanner(scan); > > > > for (Result result : scanner) { > > > > for (KeyValue kv : result.raw()) { > > > > colno = Bytes.toString(kv.getValue()); > > long newval = Long.valueOf(colno)-1; > > System.out.println(newval); > > colno = String.valueOf(newval); > > } > > } > > for (int index = 0; index < ColumnNames.length; index++) > { > > > > // Adding rowkey > > Put p = new Put(Bytes.toBytes(colvalues[1])); > > > > // Adding column family name, qualifier name ,value > > p.add(Bytes.toBytes("test_family"), > > Bytes.toBytes(colno + ColumnNames[index]), > > Bytes.toBytes(colvalues[index])); > > > > table1.put(p); > > > > Put col = new Put(Bytes.toBytes(colvalues[1])); > > > > // Adding column family name, qualifier name ,value > > col.add(Bytes.toBytes("test_family"), > > Bytes.toBytes("randomno"), > > Bytes.toBytes(colno)); > > > > table2.put(col); > > > > //System.out.println("row inserted"); > > } > > > > } > > > > } > > hTable.close(); > > coltable.close(); > > br.close(); > > > > } > > > > > > } > > > > On Mon, Dec 14, 2015 at 9:35 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > Hi Rajesh, > > > > > > For the column qualifier, there is no need to "create" them in advance. > > > Just setup what ever you want when you build your Put and HBase will > take > > > it... > > > > > > JMS > > > > > > 2015-12-14 6:05 GMT-05:00 Rajeshkumar J <[email protected]>: > > > > > > > Hi > > > > > > > > Thanks. This is what I need and I am considering this as flat-wide > > > table > > > > approach. > > > > > > > > I have some doubts and first of them is how to create dynamic > column > > > > qualifiers. Do you know the command or any other sites which is > useful > > > for > > > > this approach. > > > > > > > > Thanks > > > > > > > > On Mon, Dec 14, 2015 at 4:28 PM, Jean-Marc Spaggiari < > > > > [email protected]> wrote: > > > > > > > > > That is correct. As long as ths column qualifier is different. But > > they > > > > > will still go on the same region and after compactions will end up > in > > > the > > > > > same file. > > > > > > > > > > JMS > > > > > > > > > > JMS > > > > > Le 2015-12-14 6:55 AM, "Rajeshkumar J" < > [email protected]> > > a > > > > > écrit : > > > > > > > > > > > Hi, > > > > > > > > > > > > So as per your reply inserting second row will not update the > > > > existing > > > > > > row-key and it will add as new column qualifiers to the existing > > > > row-key > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Mon, Dec 14, 2015 at 4:13 PM, Jean-Marc Spaggiari < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > When you will insert the 2nd row, HBase wil just add is after > the > > > > first > > > > > > > one. On the storage side it will be another key/value entry > AFTER > > > the > > > > > > first > > > > > > > one. On the conceptual view,, it will be seen as anothe column > > for > > > > the > > > > > > same > > > > > > > row (wide approach). HBase will not update the previous > existing > > > > > entry. I > > > > > > > will create a new one for the new key/value. The 1002-xxx | > > > url.com > > > > > that > > > > > > > you have insterted before will not be touched. > > > > > > > > > > > > > > you have to see all those key/values are totally independent. > If > > > they > > > > > > have > > > > > > > a different column name, what you do with one will have not any > > > > impact > > > > > on > > > > > > > the others. > > > > > > > > > > > > > > JMS > > > > > > > > > > > > > > 2015-12-14 5:37 GMT-05:00 Rajeshkumar J < > > > [email protected] > > > > >: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Thanks for your response but in your previous answer you > have > > > > > > mentioned > > > > > > > > as follows > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > -------------------------------------------------------------- > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > "This is if 1002-xxx is your key and "url.com" is your > column > > > > > > qualifier" > > > > > > > > > > > > > > > > I have input rows as follows > > > > > > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > > > > > > when I insert first row to my hbase table 1002-xxx will be > > > > inserted > > > > > as > > > > > > > > rowkey and url.com will be one of my column qualifier > > > > > > > > > > > > > > > > what happens when I try to insert next row i.e., 1002 | xxx | > > > > > > urrl2.com > > > > > > > | > > > > > > > > zz:zz:zz for this also row-key will be 1002-xxx. As far as I > > know > > > > > when > > > > > > we > > > > > > > > try to insert same row-key the row will be updated. > > > > > > > > > > > > > > > > what to do for this cases? > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > On Mon, Dec 14, 2015 at 3:49 PM, Jean-Marc Spaggiari < > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > Hi Rajesh, > > > > > > > > > > > > > > > > > > This is not a tall table. Tall will be something whereyou > put > > > > your > > > > > > > domain > > > > > > > > > name on the key, no on the column qualifier. Putting the > > domain > > > > on > > > > > > the > > > > > > > > > columns means you will have many many columns for the same > > key. > > > > At > > > > > > the > > > > > > > > end, > > > > > > > > > HBase always stores the key for each and every column, what > > > ever > > > > it > > > > > > is > > > > > > > > tall > > > > > > > > > or wide. > > > > > > > > > > > > > > > > > > Reading 1000 rows or reading 1000 columns for HBase is > > exactly > > > > the > > > > > > same > > > > > > > > > thing. The only difference is that between 1000 rows HBase > > > might > > > > > > split > > > > > > > > the > > > > > > > > > rows into 2 regions. If you have 1000 columns, HBase will > not > > > > split > > > > > > > them. > > > > > > > > > > > > > > > > > > HBase can return a row in few milli seconds. 2 seconds for > > one > > > > Cell > > > > > > is > > > > > > > a > > > > > > > > > lot... > > > > > > > > > > > > > > > > > > HTH > > > > > > > > > > > > > > > > > > JMS > > > > > > > > > > > > > > > > > > 2015-12-14 5:14 GMT-05:00 Rajeshkumar J < > > > > > [email protected] > > > > > > >: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Thanks for your response, But you are suggesting me a > > tall > > > > and > > > > > > > > narrow > > > > > > > > > > table which is not working for me right now. As my use > case > > > > > > involves > > > > > > > > > > real-time solution I need to retrieve data from hbase > table > > > > > within > > > > > > > one > > > > > > > > or > > > > > > > > > > two seconds. I have tried as you suggested which may lead > > to > > > > 1000 > > > > > > > rows > > > > > > > > > for > > > > > > > > > > a given id which takes more than a minute in retrieval > > > > process. > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > Rajeshkumar > > > > > > > > > > > > > > > > > > > > On Mon, Dec 14, 2015 at 3:29 PM, Jean-Marc Spaggiari < > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > HBase is a key value sotre. So what you are pushing > here > > > will > > > > > be > > > > > > > > stored > > > > > > > > > > as: > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > > > > > > > > > > > > HOWEVER.... HBase will never split a region withing a > key > > > and > > > > > > keys > > > > > > > > are > > > > > > > > > > > always ordered. So at the end, what you will have > exactly > > > is: > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > The only places where HBase will splis are marked with > > > > "-----" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is if 1002-xxx is your key and "url.com" is your > > > column > > > > > > > > > qualifier. > > > > > > > > > > > > > > > > > > > > > > HTH > > > > > > > > > > > > > > > > > > > > > > JMS > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2015-12-14 3:39 GMT-05:00 Rajeshkumar J < > > > > > > > [email protected] > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > I am going to use flat-wide tables in Hbase for my > > > > usecase > > > > > > > and I > > > > > > > > > > have > > > > > > > > > > > > some doubts regarding this. > > > > > > > > > > > > > > > > > > > > > > > > 1. As per my knowledge flat-wide stores one column > > > value > > > > > as > > > > > > > key > > > > > > > > > and > > > > > > > > > > > > others as its values in a key-value pair > relationship ( > > > > > correct > > > > > > > me > > > > > > > > > if I > > > > > > > > > > > am > > > > > > > > > > > > wrong). > > > > > > > > > > > > > > > > > > > > > > > > I am having row as follows > > > > > > > > > > > > > > > > > > > > > > > > id | name | url | time > > > > > > > > > > > > > > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx > > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > > 1002 | xxx | url.com | yy:yy:yy > > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I need to store it in flat-wide table as follows > > > > > > > > > > > > > > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx | 1002 | xxx > | > > > > > url.com > > > > > > | > > > > > > > > > > > yy:yy:yy > > > > > > > > > > > > | > > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz > > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy > > > > > > > > > > > > > > > > > > > > > > > > How to store it like this? > > > > > > > > > > > > Can any on help me in this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
