Re: Flat-wide table Hbase

Rajeshkumar J Thu, 17 Dec 2015 21:49:49 -0800

Hi,

  Thanks. it works. After storing the data in flat-wide table I try to
query it using RowFilter it takes 50 seconds to return the results. Is this
behaviour of Hbase?


Thanks

On Thu, Dec 17, 2015 at 6:53 PM, Jean-Marc Spaggiari <
[email protected]> wrote:

> You will need to batch your puts of use th bulkloading...
>
> http://hbase.apache.org/book.html#arch.bulk.load
>
> JMS
>
> 2015-12-17 0:35 GMT-05:00 Rajeshkumar J <[email protected]>:
>
> > Hi,
> >
> >    I have my input file sized 1gb in HDFS. I am inserting it into hbase
> > table and also generating column qualifiers name dynamically. And I am
> > doing this in java and it took about 20 hours to finish this process. Can
> > any one help me in optimize the below java code
> >
> >
> > public class HbaseinsertionScan {
> >
> >     //As I am generating column names dynamically also I need column
> names
> > to be in certain order so I decidedto have column name as combination of
> > numerical value  and below alphabets
> >     public static String[] ColumnNames = {"a", "aa", "b", "bb", "c",
> "cc",
> > "d", "dd", "e", "ee", "f", "ff", "g", "gg", "h", "hh", "i", "j", "k",
> "l",
> > "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"};
> >
> >
> >     public static void main(String[] args) throws IOException {
> >
> >         // Initializing req classes
> >         BufferedReader br = null;
> >         Configuration conf = new Configuration();
> >
> >         Configuration config = HBaseConfiguration.create();
> >
> >    Scan scan = new Scan();
> >
> >         // creating object for tables
> > //table 1 which holds actual data
> >         HTable table1 = new HTable(config, "table1");
> >
> > //table 2 holds numerical value for each rowkey
> >         HTable table2 = new HTable(config, "table2");
> >
> >
> >         // Reading the files from HDFS and inserting it into Hbase Tabls
> >
> >         FileSystem fs = FileSystem.get(conf);
> >         FileStatus[] status = fs.listStatus(new Path("/filename"));
> >         for (int i = 0; i < status.length; i++) {
> >             br = new BufferedReader(new
> > InputStreamReader(fs.open(status[i].getPath())));
> >             String line = "";
> >
> >             while ((line = br.readLine()) != null) {
> >
> >                 String[] colvalues = line.split(",");
> >
> >                 //Generating column names dynamically
> >                 String colno = "199999";
> >
> >                 RowFilter filter = new
> > RowFilter(CompareFilter.CompareOp.EQUAL, new
> > BinaryComparator(Bytes.toBytes(colvalues[1])));
> >                 scan.setFilter(filter);
> >                 ResultScanner scanner = table2.getScanner(scan);
> >
> >                 for (Result result : scanner) {
> >
> >                     for (KeyValue kv : result.raw()) {
> >
> >                         colno = Bytes.toString(kv.getValue());
> >                         long newval = Long.valueOf(colno)-1;
> >                         System.out.println(newval);
> >                         colno = String.valueOf(newval);
> >                     }
> >                 }
> >                 for (int index = 0; index < ColumnNames.length; index++)
> {
> >
> >                     // Adding rowkey
> >                     Put p = new Put(Bytes.toBytes(colvalues[1]));
> >
> >                     // Adding column family name, qualifier name ,value
> >                     p.add(Bytes.toBytes("test_family"),
> >                             Bytes.toBytes(colno + ColumnNames[index]),
> > Bytes.toBytes(colvalues[index]));
> >
> >                     table1.put(p);
> >
> >                     Put col = new Put(Bytes.toBytes(colvalues[1]));
> >
> >                     // Adding column family name, qualifier name ,value
> >                     col.add(Bytes.toBytes("test_family"),
> >                             Bytes.toBytes("randomno"),
> > Bytes.toBytes(colno));
> >
> >                     table2.put(col);
> >
> >                     //System.out.println("row inserted");
> >                 }
> >
> >             }
> >
> >         }
> >         hTable.close();
> >         coltable.close();
> >         br.close();
> >
> >     }
> >
> >
> > }
> >
> > On Mon, Dec 14, 2015 at 9:35 PM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Hi Rajesh,
> > >
> > > For the column qualifier, there is no need to "create" them in advance.
> > > Just setup what ever you want when you build your Put and HBase will
> take
> > > it...
> > >
> > > JMS
> > >
> > > 2015-12-14 6:05 GMT-05:00 Rajeshkumar J <[email protected]>:
> > >
> > > > Hi
> > > >
> > > >    Thanks. This is what I need and I am considering this as flat-wide
> > > table
> > > > approach.
> > > >
> > > >    I have some doubts and first of them is how to create dynamic
> column
> > > > qualifiers. Do you know the command or any other sites which is
> useful
> > > for
> > > > this approach.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Dec 14, 2015 at 4:28 PM, Jean-Marc Spaggiari <
> > > > [email protected]> wrote:
> > > >
> > > > > That is correct. As long as ths column qualifier is different. But
> > they
> > > > > will still go on the same region and after compactions will end up
> in
> > > the
> > > > > same file.
> > > > >
> > > > > JMS
> > > > >
> > > > > JMS
> > > > > Le 2015-12-14 6:55 AM, "Rajeshkumar J" <
> [email protected]>
> > a
> > > > > écrit :
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > >    So as per your reply inserting second row will not update the
> > > > existing
> > > > > > row-key and it will add as new column qualifiers to the existing
> > > > row-key
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, Dec 14, 2015 at 4:13 PM, Jean-Marc Spaggiari <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > When you will insert the 2nd row, HBase wil just add is after
> the
> > > > first
> > > > > > > one. On the storage side it will be another key/value entry
> AFTER
> > > the
> > > > > > first
> > > > > > > one. On the conceptual view,, it will be seen as anothe column
> > for
> > > > the
> > > > > > same
> > > > > > > row (wide approach). HBase will not update the previous
> existing
> > > > > entry. I
> > > > > > > will create a new one for the new key/value. The 1002-xxx |
> > > url.com
> > > > > that
> > > > > > > you have insterted before will not be touched.
> > > > > > >
> > > > > > > you have to see all those key/values are totally independent.
> If
> > > they
> > > > > > have
> > > > > > > a different column name, what you do with one will have not any
> > > > impact
> > > > > on
> > > > > > > the others.
> > > > > > >
> > > > > > > JMS
> > > > > > >
> > > > > > > 2015-12-14 5:37 GMT-05:00 Rajeshkumar J <
> > > [email protected]
> > > > >:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > >   Thanks for your response but in your previous answer you
> have
> > > > > > mentioned
> > > > > > > > as follows
> > > > > > > >
> > > > > > > >
> > > > > > > >
> --------------------------------------------------------------
> > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > >
> --------------------------------------------------------------
> > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > >
> --------------------------------------------------------------
> > > > > > > >
> > > > > > > > "This is if 1002-xxx is your key and "url.com" is your
> column
> > > > > > qualifier"
> > > > > > > >
> > > > > > > > I have input rows as follows
> > > > > > > >
> > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > >
> > > > > > > > when I insert  first row to my hbase table 1002-xxx will be
> > > > inserted
> > > > > as
> > > > > > > > rowkey and url.com will be one of my column qualifier
> > > > > > > >
> > > > > > > > what happens when I try to insert next row i.e., 1002 | xxx |
> > > > > > urrl2.com
> > > > > > > |
> > > > > > > > zz:zz:zz for this also row-key will be 1002-xxx. As far as I
> > know
> > > > > when
> > > > > > we
> > > > > > > > try to insert same row-key the row will be updated.
> > > > > > > >
> > > > > > > > what to do for this cases?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Mon, Dec 14, 2015 at 3:49 PM, Jean-Marc Spaggiari <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Hi Rajesh,
> > > > > > > > >
> > > > > > > > > This is not a tall table. Tall will be something whereyou
> put
> > > > your
> > > > > > > domain
> > > > > > > > > name on the key, no on the column qualifier. Putting the
> > domain
> > > > on
> > > > > > the
> > > > > > > > > columns means you will have many many columns for the same
> > key.
> > > > At
> > > > > > the
> > > > > > > > end,
> > > > > > > > > HBase always stores the key for each and every column, what
> > > ever
> > > > it
> > > > > > is
> > > > > > > > tall
> > > > > > > > > or wide.
> > > > > > > > >
> > > > > > > > > Reading 1000 rows or reading 1000 columns for HBase is
> > exactly
> > > > the
> > > > > > same
> > > > > > > > > thing. The only difference is that between 1000 rows HBase
> > > might
> > > > > > split
> > > > > > > > the
> > > > > > > > > rows into 2 regions. If you have 1000 columns, HBase will
> not
> > > > split
> > > > > > > them.
> > > > > > > > >
> > > > > > > > > HBase can return a row in few milli seconds. 2 seconds for
> > one
> > > > Cell
> > > > > > is
> > > > > > > a
> > > > > > > > > lot...
> > > > > > > > >
> > > > > > > > > HTH
> > > > > > > > >
> > > > > > > > > JMS
> > > > > > > > >
> > > > > > > > > 2015-12-14 5:14 GMT-05:00 Rajeshkumar J <
> > > > > [email protected]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > >    Thanks for your response, But you are suggesting me a
> > tall
> > > > and
> > > > > > > > narrow
> > > > > > > > > > table which is not working for me right now. As my use
> case
> > > > > > involves
> > > > > > > > > > real-time solution I need to retrieve data from hbase
> table
> > > > > within
> > > > > > > one
> > > > > > > > or
> > > > > > > > > > two seconds. I have tried as you suggested which may lead
> > to
> > > > 1000
> > > > > > > rows
> > > > > > > > > for
> > > > > > > > > > a given id which takes more than  a minute in retrieval
> > > > process.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Rajeshkumar
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 14, 2015 at 3:29 PM, Jean-Marc Spaggiari <
> > > > > > > > > > [email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > HBase is a key value sotre. So what you are pushing
> here
> > > will
> > > > > be
> > > > > > > > stored
> > > > > > > > > > as:
> > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > >
> > > > > > > > > > > HOWEVER.... HBase will never split a region withing a
> key
> > > and
> > > > > > keys
> > > > > > > > are
> > > > > > > > > > > always ordered. So at the end, what you will have
> exactly
> > > is:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > --------------------------------------------------------------
> > > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > >
> > > > --------------------------------------------------------------
> > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > >
> > > > --------------------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > The only places where HBase will splis are marked with
> > > > "-----"
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This is if 1002-xxx is your key and "url.com" is your
> > > column
> > > > > > > > > qualifier.
> > > > > > > > > > >
> > > > > > > > > > > HTH
> > > > > > > > > > >
> > > > > > > > > > > JMS
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2015-12-14 3:39 GMT-05:00 Rajeshkumar J <
> > > > > > > [email protected]
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > >    I am going to use flat-wide tables in Hbase for my
> > > > usecase
> > > > > > > and I
> > > > > > > > > > have
> > > > > > > > > > > > some doubts regarding this.
> > > > > > > > > > > >
> > > > > > > > > > > >    1. As per my knowledge flat-wide stores one column
> > > value
> > > > > as
> > > > > > > key
> > > > > > > > > and
> > > > > > > > > > > > others as its values in a key-value pair
> relationship (
> > > > > correct
> > > > > > > me
> > > > > > > > > if I
> > > > > > > > > > > am
> > > > > > > > > > > > wrong).
> > > > > > > > > > > >
> > > > > > > > > > > > I am having row  as follows
> > > > > > > > > > > >
> > > > > > > > > > > > id  | name | url | time
> > > > > > > > > > > >
> > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx
> > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > > > 1002 | xxx | url.com  | yy:yy:yy
> > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I need to store it in flat-wide table as follows
> > > > > > > > > > > >
> > > > > > > > > > > > 1002 | xxx | www.sample.com | xx:xx:xx | 1002 | xxx
> |
> > > > > url.com
> > > > > > |
> > > > > > > > > > > yy:yy:yy
> > > > > > > > > > > > |
> > > > > > > > > > > > 1002 | xxx | urrl2.com | zz:zz:zz
> > > > > > > > > > > > 1003 | yyy | www.url,com | xx:xx:yy
> > > > > > > > > > > >
> > > > > > > > > > > > How to store it like this?
> > > > > > > > > > > > Can any on help me in this?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Flat-wide table Hbase

Reply via email to