How is DataXceiver been used?

2013-01-16 Thread Liu, Raymond
Hi

I have a table with about 24 region on the one regionserver, and each 
region have about 20 block files on hdfs.
The xceiverCount is set to 1024, I have thought that this is quite 
enough since at most 480 blocks will be opened.
While when I do a MR job to scan the table, with 24 map task each open 
and scan a different region at the same time, it turn out that the DataXceiver 
is run out...

I am a little bit puzzled, those blocks will only be read by one task, 
then shouldn't region server scan blocks one by one? And since there are 480 
blocks at most, how can it use up dataXceiver?

Best Regards,
Raymond Liu



Re: H-Rider / HTable UI

2013-01-16 Thread Kyle Lin
Hi there

Another similar tool on sourceforge.

http://sourceforge.net/projects/haredbhbaseclie/


2012/12/11 Jean-Marc Spaggiari 

> For those who are not following the HBase group on LinkedIn, Roi Amir
> just posted on a tool they build to look/update HTable in HBase.
>
> You can take a look here: https://github.com/NiceSystems/hrider/wiki
>
> I'm not part of this project, so don't ask me anything about it ;) I'm
> just relaying the information here.
>


RE: Hbase as mongodb

2013-01-16 Thread Anoop Sam John
Yes Mohammad. Smarter way like this is needed..  I was telling that even if the 
full JSON is stored as a column value it will be possible to achive what 
Panshul needs. :) But a full table scan will not be acceptable I guess.

As Ted suggested pls check Panthera also. Panthera seems to use Hive HBase 
integration in a smart way. 

-Anoop-
__
From: Mohammad Tariq [donta...@gmail.com]
Sent: Wednesday, January 16, 2013 7:08 PM
To: user@hbase.apache.org
Subject: Re: Hbase as mongodb

@Anoop sir : Does it make sense to extract the timestamp of JSON
object beforehand and use it as the rowkey? After that serialize the
JSON object and store it in the Hbase cell. Gets would a lot faster
then???

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Jan 16, 2013 at 7:02 PM, Imran M Yousuf  wrote:

> We have used Jackson library for converting Java Object to JSON String
> and eventually to byte[] and vice-versa; but that is not scan/query
> friendly, so we integrated Apache Solr to the stack to get that done.
> http://smart-cms.org
>
> Thank you,
>
> Imran
>
> On Wed, Jan 16, 2013 at 7:27 PM, Anoop Sam John 
> wrote:
> >>Such as I can directly say Mongodb to get me
> > all the objects having timestamp value of xxx date where timestamp is a
> > field in Json objects stored in Mongodb
> >
> > It is possible to store any data in HBase which can be converted into
> byte[].  Yes using filters one can perform above kind of query. There is no
> built in filter for above kind of need but custom one can be created.  But
> remember that there is no built in secondary indexing capability in HBase.
>  Here by I can see you have a need for indexing a part of column value.
> [timestamp is a field in Json objects ]
> >
> > -Anoop-
> > 
> > From: Panshul Whisper [ouchwhis...@gmail.com]
> > Sent: Wednesday, January 16, 2013 6:36 PM
> > To: user@hbase.apache.org
> > Subject: Re: Hbase as mongodb
> >
> > Hello Tariq,
> >
> > Thank you for the reply.
> >
> > My concern is that I have been working with MongoDB, but now I am
> switching
> > over to Hadoop and I want to use HBase for certain reasons. I was
> wondering
> > if I can store Json files in Hbase in a way that I can query the Json
> files
> > in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get
> me
> > all the objects having timestamp value of xxx date where timestamp is a
> > field in Json objects stored in Mongodb. Can I perform similar operations
> > on Hbase or does it have another approach for doing similar operations.
> > I do not have much knowledge on Hbase yet. I am beginning to learn it,
> but
> > I just want to be sure i am investing my time in the right direction.
> >
> > Thank you so much for the help,
> >
> > Regards,
> > Panshul.
> >
> >
> > On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq 
> wrote:
> >
> >> Hello Panshul,
> >>
> >> Hbase and MongoDB are built to serve different purposes. You
> can't
> >> replace one with the other. They have different strengths and
> weaknesses.
> >> So, if you are using Hbase for something, think well before switching to
> >> MongoDB or vice verca.
> >>
> >> Coming back to the actual question, you can store anything which can be
> >> converted into a sequence of bytes into Hbase and query it. Could you
> >> please elaborate your problem a bit?It will help us to answer your
> question
> >> in a better manner.
> >>
> >> Warm Regards,
> >> Tariq
> >> https://mtariq.jux.com/
> >> cloudfront.blogspot.com
> >>
> >>
> >> On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  >> >wrote:
> >>
> >> > Hello,
> >> >
> >> > Is it possible to use hbase to query json documents in a same way as
> we
> >> can
> >> > do with Mongodb
> >> >
> >> > Suggestions please.
> >> > If we can then a small example as how.. not the query but the process
> >> > flow..
> >> > Thanku so much
> >> > Regards,
> >> > Panshul.
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>+880-1746119494
>

Trouble shooting process for a random lag region issue.

2013-01-16 Thread Liu, Raymond
Hi

I have record my trouble shooting process for my random lag region in MR scan 
issue. share it here, In case you meet similar problem need to diagnose.

Full text with image here: http://blog.csdn.net/colorant/article/details/8510254

Only text, as below:

--

=== Problem observation ===

When scanning some specific table, there are always some lag behind slow map 
tasks, (usually cost 150%~200% of the average task run time) And the top 10 
slowest tasks usually locate on the same Region Server, And if run the same 
scan job multiple times, the slowest tasks and their location do not change.

If only judge by the above behavior, you can suspect that the lag behind Region 
Server  must have some problem which slow down the whole system. But the truth 
is : If you run scan job on different table, the lag behind Region Server is 
not the same one, say, e.g. with table 1, region server A have a lot of lags 
behind tasks, while for table 2, it might be region server B which lags behind. 

Last but not least, All these tables works fine a few days ago, Seems the 
problem occurs (or is observed) after a few times of cluster restart.

=== Environment ===

1 master node + 4 data/region node, each with 4 disk. 48G RAM, 16 CPU core
Hadoop 1.1.1, HBase 0.94.1 24/20 Map/Reduce slots on each node.

Each table is around 50GB, 64~96 Regions distributed evenly across 4 Region 
Servers. The data is generated, and each region have exactly the same number of 
keyvalues and almost exactly the same size. All table have Major Compact done.

Using Map Reduce job to do whole table scan. Each Region is assigned to a Local 
Map Task, the map task just scan the local region, and count rows. Since map 
slot number is equal or large than the region number, the tasks can be assigned 
within one batch.

=== Trouble shooting ===

My trouble shooting procedure is recorded as below ( with some path finding and 
misleading works, but also a few knowledge gained as byproduct, well for me a 
newbie)

== Any bottleneck? ==

First of all, supervise the lag Region Server to check out is there any 
bottleneck when performing the scan job. It appears to be nothing abnormal. The 
CPU/DISK IO is ok, not reached peak, except that the overall disk IO throughput 
is a little bit lower than the other Region Servers.

== Data locality? == 

If the region's data is actually not reside on the local data node, then it 
will also lead to hot spot region, since it will need to read data from other 
nodes.

To make sure that all data is actually read from local data node, I do a second 
Major compact on the table to eliminate the possibility that the region get 
relocated and balanced since last major compact. Then inspect on the network IO 
when doing MapReduce scan jobs.

Knowledge : A simple fast way to inspect network IO together with other system 
resource is using "dstat" e.g. dstat -cdnm can supervise CPU / Disk IO / 
network IO / Memory, Cache, Buffer all together.

The observation show to me that there are no data locality issue, all data is 
read from local data node, and no notable network IO. The lag behind issue 
still exist after another Major Compact. But there are some changes been 
observed. After each Major Compact, the top 10 slow region seems to change 
randomly with weak relationship (say probably still on the same region server 
before/after major compact)

Thus, this issue is not related to data locality.

== Cluster configuration == 

Since this problem is random across tables. So I also wondering that is there 
any configuration I have made for the past days which impact the cluster's 
stability? e.g. All memory related setting? Some parameters fine tune on map 
reduce framework?

? First of all I look into the GC behavior, since GC do bring a lot of 
randomness. And a lot of settings might influence GC behavior. Say Hadoop/Hbase 
HeapSize, GC strategy, Eden area size, HBase block Cache Enable/Disable etc.

After tuning and comparing different settings on these parameters ( including 
restore them to the setting that I know is working before this problem occurs), 
the lag behind issue still exist. Though some settings do behavior better in 
the sense of GC time, but don't solve the lag region issue.

Knowledge:Disable Hbase block cache will reduce GC time a lot for whole table 
scan like jobs , for my 50G data, it saves about 10s GC time - observed by 
jvisualvm GC plugin. And by default, TableInputFormat do disable block cache 
(obviously, since all the data is accessed only once, they don't need to be 
cached) , while if you are writing custom InputFormat, you need to disable it 
by yourself.

? Then I try to tune some parameters which related HDFS/MapReduce/Hbase's 
concurrent capability, e.g. Data Node Xceiver/Handler number, RegionServer 
Handler number, map slot number, client scan cache size etc. Though these 
settings are sync across each node, so it should not bring ra

Re: Constructing rowkeys and HBASE-7221

2013-01-16 Thread Aaron Kimball
Hi Doug,

This HBase feature is really interesting. It is quite related to some work
we're doing on Kiji, our schema management project. In particular, we've
also been focusing on building composite row keys correctly. One thing that
jumped out at me in that ticket is that with a composition of md5hash and
other (string, int, etc) components, you probably don't want the whole
hash. If you're using that to shard your rows more efficiently across
regions, you might want to just use a subset of the md5 bytes as a prefix.
It might be a good idea to offer users control of this.

Our own thoughts on this on the Kiji side are being tracked at
https://jira.kiji.org/browse/schema-3 where we have a design doc that goes
into a bit more detail.

Cheers,
- Aaron


On Tue, Jan 15, 2013 at 2:01 PM, Doug Meil wrote:

>
> Hi there, well, this request for input fell like a thud.  :-)
>
> But I think perhaps it has to do with the fact that I sent it to the
> dev-list instead of the user-list, as people that are actively writing
> HBase itself (devs) need less help with such keybuilding utilities.
>
> So one last request for feedback, but this time aimed at users of HBase:
> how has your key-building experience been?
>
> Thanks!
>
>
>
> On 1/7/13 11:04 AM, "Doug Meil"  wrote:
>
> >
> >Greetings folks-
> >
> >I would like to restart the conversation on
> >https://issues.apache.org/jira/browse/HBASE-7221 because there continue
> >to be conversations on the dist-list about creating composite rowkeys,
> >and while HBase makes just about anything possible, it doesn¹t make much
> >easy in this respect.
> >
> >What I¹m lobbying for is a utility class (see the v3 patch in HBASE-7221)
> >that can both create and read rowkeys (so this isn¹t just a one-way
> >builder pattern).
> >
> >This is currently stuck because it was noted that Bytes has an issue with
> >sort-order of numbers specifically if you have both negative and positive
> >values, which is really a different issue, but because this patch uses
> >Bytes it¹s related.
> >
> >What are people¹s thoughts on this topic in general, and the v3 version
> >of the patch specifically?  (and the last set of comments).  Thanks!
> >
> >One of the unit tests shows the example of usage.  The last set of
> >comments suggested that RowKey be renamed FixedLengthRowKey, which I
> >think is a good idea.  A follow-on patch could include
> >VariableLengthRowKey for folks that use strings in the rowkeys.
> >
> >
> >  public void testCreate() throws Exception {
> >
> >int elements[] = {RowKeySchema.SIZEOF_MD5_HASH,
> >RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG};
> >RowKeySchema schema = new RowKeySchema(elements);
> >
> >RowKey rowkey = schema.createRowKey();
> >rowkey.setHash(0, hashVal);
> >rowkey.setInt(1, intVal);
> >rowkey.setLong(2, longVal);
> >
> >byte bytes[] = rowkey.getBytes();
> >Assert.assertEquals("key length", schema.getRowKeyLength(),
> >bytes.length);
> >
> >Assert.assertEquals("e1", rowkey.getInt(1), intVal);
> >Assert.assertEquals("e2", rowkey.getLong(2), longVal);
> >  }
> >
> >Doug Meil
> >Chief Software Architect, Explorys
> >doug.m...@explorys.com
> >
>
>
>


Re: Hbase as mongodb

2013-01-16 Thread Ted
Project Panthera seems to serve your use case well. 

You can refer to 
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201209.mbox/%3c521452fcf7acde4795c7e150d240afca0fde1...@shsmsx102.ccr.corp.intel.com%3E

On Jan 16, 2013, at 5:38 AM, Mohammad Tariq  wrote:

> @Anoop sir : Does it make sense to extract the timestamp of JSON
> object beforehand and use it as the rowkey? After that serialize the
> JSON object and store it in the Hbase cell. Gets would a lot faster
> then???
> 
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> 
> 
> On Wed, Jan 16, 2013 at 7:02 PM, Imran M Yousuf  wrote:
> 
>> We have used Jackson library for converting Java Object to JSON String
>> and eventually to byte[] and vice-versa; but that is not scan/query
>> friendly, so we integrated Apache Solr to the stack to get that done.
>> http://smart-cms.org
>> 
>> Thank you,
>> 
>> Imran
>> 
>> On Wed, Jan 16, 2013 at 7:27 PM, Anoop Sam John 
>> wrote:
 Such as I can directly say Mongodb to get me
>>> all the objects having timestamp value of xxx date where timestamp is a
>>> field in Json objects stored in Mongodb
>>> 
>>> It is possible to store any data in HBase which can be converted into
>> byte[].  Yes using filters one can perform above kind of query. There is no
>> built in filter for above kind of need but custom one can be created.  But
>> remember that there is no built in secondary indexing capability in HBase.
>> Here by I can see you have a need for indexing a part of column value.
>> [timestamp is a field in Json objects ]
>>> 
>>> -Anoop-
>>> 
>>> From: Panshul Whisper [ouchwhis...@gmail.com]
>>> Sent: Wednesday, January 16, 2013 6:36 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Hbase as mongodb
>>> 
>>> Hello Tariq,
>>> 
>>> Thank you for the reply.
>>> 
>>> My concern is that I have been working with MongoDB, but now I am
>> switching
>>> over to Hadoop and I want to use HBase for certain reasons. I was
>> wondering
>>> if I can store Json files in Hbase in a way that I can query the Json
>> files
>>> in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get
>> me
>>> all the objects having timestamp value of xxx date where timestamp is a
>>> field in Json objects stored in Mongodb. Can I perform similar operations
>>> on Hbase or does it have another approach for doing similar operations.
>>> I do not have much knowledge on Hbase yet. I am beginning to learn it,
>> but
>>> I just want to be sure i am investing my time in the right direction.
>>> 
>>> Thank you so much for the help,
>>> 
>>> Regards,
>>> Panshul.
>>> 
>>> 
>>> On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq 
>> wrote:
>>> 
 Hello Panshul,
 
Hbase and MongoDB are built to serve different purposes. You
>> can't
 replace one with the other. They have different strengths and
>> weaknesses.
 So, if you are using Hbase for something, think well before switching to
 MongoDB or vice verca.
 
 Coming back to the actual question, you can store anything which can be
 converted into a sequence of bytes into Hbase and query it. Could you
 please elaborate your problem a bit?It will help us to answer your
>> question
 in a better manner.
 
 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com
 
 
 On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  wrote:
 
> Hello,
> 
> Is it possible to use hbase to query json documents in a same way as
>> we
 can
> do with Mongodb
> 
> Suggestions please.
> If we can then a small example as how.. not the query but the process
> flow..
> Thanku so much
> Regards,
> Panshul.
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>> 
>> 
>> 
>> --
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>   +880-1746119494
>> 


Re: Hbase as mongodb

2013-01-16 Thread Mohammad Tariq
@Anoop sir : Does it make sense to extract the timestamp of JSON
object beforehand and use it as the rowkey? After that serialize the
JSON object and store it in the Hbase cell. Gets would a lot faster
then???

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Jan 16, 2013 at 7:02 PM, Imran M Yousuf  wrote:

> We have used Jackson library for converting Java Object to JSON String
> and eventually to byte[] and vice-versa; but that is not scan/query
> friendly, so we integrated Apache Solr to the stack to get that done.
> http://smart-cms.org
>
> Thank you,
>
> Imran
>
> On Wed, Jan 16, 2013 at 7:27 PM, Anoop Sam John 
> wrote:
> >>Such as I can directly say Mongodb to get me
> > all the objects having timestamp value of xxx date where timestamp is a
> > field in Json objects stored in Mongodb
> >
> > It is possible to store any data in HBase which can be converted into
> byte[].  Yes using filters one can perform above kind of query. There is no
> built in filter for above kind of need but custom one can be created.  But
> remember that there is no built in secondary indexing capability in HBase.
>  Here by I can see you have a need for indexing a part of column value.
> [timestamp is a field in Json objects ]
> >
> > -Anoop-
> > 
> > From: Panshul Whisper [ouchwhis...@gmail.com]
> > Sent: Wednesday, January 16, 2013 6:36 PM
> > To: user@hbase.apache.org
> > Subject: Re: Hbase as mongodb
> >
> > Hello Tariq,
> >
> > Thank you for the reply.
> >
> > My concern is that I have been working with MongoDB, but now I am
> switching
> > over to Hadoop and I want to use HBase for certain reasons. I was
> wondering
> > if I can store Json files in Hbase in a way that I can query the Json
> files
> > in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get
> me
> > all the objects having timestamp value of xxx date where timestamp is a
> > field in Json objects stored in Mongodb. Can I perform similar operations
> > on Hbase or does it have another approach for doing similar operations.
> > I do not have much knowledge on Hbase yet. I am beginning to learn it,
> but
> > I just want to be sure i am investing my time in the right direction.
> >
> > Thank you so much for the help,
> >
> > Regards,
> > Panshul.
> >
> >
> > On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq 
> wrote:
> >
> >> Hello Panshul,
> >>
> >> Hbase and MongoDB are built to serve different purposes. You
> can't
> >> replace one with the other. They have different strengths and
> weaknesses.
> >> So, if you are using Hbase for something, think well before switching to
> >> MongoDB or vice verca.
> >>
> >> Coming back to the actual question, you can store anything which can be
> >> converted into a sequence of bytes into Hbase and query it. Could you
> >> please elaborate your problem a bit?It will help us to answer your
> question
> >> in a better manner.
> >>
> >> Warm Regards,
> >> Tariq
> >> https://mtariq.jux.com/
> >> cloudfront.blogspot.com
> >>
> >>
> >> On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  >> >wrote:
> >>
> >> > Hello,
> >> >
> >> > Is it possible to use hbase to query json documents in a same way as
> we
> >> can
> >> > do with Mongodb
> >> >
> >> > Suggestions please.
> >> > If we can then a small example as how.. not the query but the process
> >> > flow..
> >> > Thanku so much
> >> > Regards,
> >> > Panshul.
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>+880-1746119494
>


Re: Hbase as mongodb

2013-01-16 Thread Imran M Yousuf
We have used Jackson library for converting Java Object to JSON String
and eventually to byte[] and vice-versa; but that is not scan/query
friendly, so we integrated Apache Solr to the stack to get that done.
http://smart-cms.org

Thank you,

Imran

On Wed, Jan 16, 2013 at 7:27 PM, Anoop Sam John  wrote:
>>Such as I can directly say Mongodb to get me
> all the objects having timestamp value of xxx date where timestamp is a
> field in Json objects stored in Mongodb
>
> It is possible to store any data in HBase which can be converted into byte[]. 
>  Yes using filters one can perform above kind of query. There is no built in 
> filter for above kind of need but custom one can be created.  But remember 
> that there is no built in secondary indexing capability in HBase.  Here by I 
> can see you have a need for indexing a part of column value. [timestamp is a 
> field in Json objects ]
>
> -Anoop-
> 
> From: Panshul Whisper [ouchwhis...@gmail.com]
> Sent: Wednesday, January 16, 2013 6:36 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase as mongodb
>
> Hello Tariq,
>
> Thank you for the reply.
>
> My concern is that I have been working with MongoDB, but now I am switching
> over to Hadoop and I want to use HBase for certain reasons. I was wondering
> if I can store Json files in Hbase in a way that I can query the Json files
> in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get me
> all the objects having timestamp value of xxx date where timestamp is a
> field in Json objects stored in Mongodb. Can I perform similar operations
> on Hbase or does it have another approach for doing similar operations.
> I do not have much knowledge on Hbase yet. I am beginning to learn it, but
> I just want to be sure i am investing my time in the right direction.
>
> Thank you so much for the help,
>
> Regards,
> Panshul.
>
>
> On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq  wrote:
>
>> Hello Panshul,
>>
>> Hbase and MongoDB are built to serve different purposes. You can't
>> replace one with the other. They have different strengths and weaknesses.
>> So, if you are using Hbase for something, think well before switching to
>> MongoDB or vice verca.
>>
>> Coming back to the actual question, you can store anything which can be
>> converted into a sequence of bytes into Hbase and query it. Could you
>> please elaborate your problem a bit?It will help us to answer your question
>> in a better manner.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper > >wrote:
>>
>> > Hello,
>> >
>> > Is it possible to use hbase to query json documents in a same way as we
>> can
>> > do with Mongodb
>> >
>> > Suggestions please.
>> > If we can then a small example as how.. not the query but the process
>> > flow..
>> > Thanku so much
>> > Regards,
>> > Panshul.
>> >
>>
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101



-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557
   +880-1746119494


RE: Hbase as mongodb

2013-01-16 Thread Anoop Sam John
>Such as I can directly say Mongodb to get me
all the objects having timestamp value of xxx date where timestamp is a
field in Json objects stored in Mongodb

It is possible to store any data in HBase which can be converted into byte[].  
Yes using filters one can perform above kind of query. There is no built in 
filter for above kind of need but custom one can be created.  But remember that 
there is no built in secondary indexing capability in HBase.  Here by I can see 
you have a need for indexing a part of column value. [timestamp is a field in 
Json objects ]

-Anoop-

From: Panshul Whisper [ouchwhis...@gmail.com]
Sent: Wednesday, January 16, 2013 6:36 PM
To: user@hbase.apache.org
Subject: Re: Hbase as mongodb

Hello Tariq,

Thank you for the reply.

My concern is that I have been working with MongoDB, but now I am switching
over to Hadoop and I want to use HBase for certain reasons. I was wondering
if I can store Json files in Hbase in a way that I can query the Json files
in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get me
all the objects having timestamp value of xxx date where timestamp is a
field in Json objects stored in Mongodb. Can I perform similar operations
on Hbase or does it have another approach for doing similar operations.
I do not have much knowledge on Hbase yet. I am beginning to learn it, but
I just want to be sure i am investing my time in the right direction.

Thank you so much for the help,

Regards,
Panshul.


On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq  wrote:

> Hello Panshul,
>
> Hbase and MongoDB are built to serve different purposes. You can't
> replace one with the other. They have different strengths and weaknesses.
> So, if you are using Hbase for something, think well before switching to
> MongoDB or vice verca.
>
> Coming back to the actual question, you can store anything which can be
> converted into a sequence of bytes into Hbase and query it. Could you
> please elaborate your problem a bit?It will help us to answer your question
> in a better manner.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  >wrote:
>
> > Hello,
> >
> > Is it possible to use hbase to query json documents in a same way as we
> can
> > do with Mongodb
> >
> > Suggestions please.
> > If we can then a small example as how.. not the query but the process
> > flow..
> > Thanku so much
> > Regards,
> > Panshul.
> >
>



--
Regards,
Ouch Whisper
010101010101

Re: Hbase as mongodb

2013-01-16 Thread Mohammad Tariq
You can do that, approach might vary though, depending upon the scenario.
You just have to think well about your schema in order to make sure that it
fits into your put and get requirements.

I have not worked personally on JSON+Hbase combo, so could not give any
direct suggestion at the moment. But we have a lot of folks here on the list
who have experience on such things. Let us hope they come across with this
thread.

I have a few things in mind which I would like to share with you though. Try
not to store too large JSON files. You might need to consider making the
column family compressed. I would first serialize the JSON into Bytes and
then write to Hbase cells and while pulling the data, deserialize it back
in the
same layer.

You may find this
thread
 useful.

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Jan 16, 2013 at 6:36 PM, Panshul Whisper wrote:

> Hello Tariq,
>
> Thank you for the reply.
>
> My concern is that I have been working with MongoDB, but now I am switching
> over to Hadoop and I want to use HBase for certain reasons. I was wondering
> if I can store Json files in Hbase in a way that I can query the Json files
> in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get me
> all the objects having timestamp value of xxx date where timestamp is a
> field in Json objects stored in Mongodb. Can I perform similar operations
> on Hbase or does it have another approach for doing similar operations.
> I do not have much knowledge on Hbase yet. I am beginning to learn it, but
> I just want to be sure i am investing my time in the right direction.
>
> Thank you so much for the help,
>
> Regards,
> Panshul.
>
>
> On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq 
> wrote:
>
> > Hello Panshul,
> >
> > Hbase and MongoDB are built to serve different purposes. You
> can't
> > replace one with the other. They have different strengths and weaknesses.
> > So, if you are using Hbase for something, think well before switching to
> > MongoDB or vice verca.
> >
> > Coming back to the actual question, you can store anything which can be
> > converted into a sequence of bytes into Hbase and query it. Could you
> > please elaborate your problem a bit?It will help us to answer your
> question
> > in a better manner.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  > >wrote:
> >
> > > Hello,
> > >
> > > Is it possible to use hbase to query json documents in a same way as we
> > can
> > > do with Mongodb
> > >
> > > Suggestions please.
> > > If we can then a small example as how.. not the query but the process
> > > flow..
> > > Thanku so much
> > > Regards,
> > > Panshul.
> > >
> >
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>


Re: Hbase as mongodb

2013-01-16 Thread Panshul Whisper
Hello Tariq,

Thank you for the reply.

My concern is that I have been working with MongoDB, but now I am switching
over to Hadoop and I want to use HBase for certain reasons. I was wondering
if I can store Json files in Hbase in a way that I can query the Json files
in Hbase as I can in Mongodb. Such as I can directly say Mongodb to get me
all the objects having timestamp value of xxx date where timestamp is a
field in Json objects stored in Mongodb. Can I perform similar operations
on Hbase or does it have another approach for doing similar operations.
I do not have much knowledge on Hbase yet. I am beginning to learn it, but
I just want to be sure i am investing my time in the right direction.

Thank you so much for the help,

Regards,
Panshul.


On Wed, Jan 16, 2013 at 11:45 AM, Mohammad Tariq  wrote:

> Hello Panshul,
>
> Hbase and MongoDB are built to serve different purposes. You can't
> replace one with the other. They have different strengths and weaknesses.
> So, if you are using Hbase for something, think well before switching to
> MongoDB or vice verca.
>
> Coming back to the actual question, you can store anything which can be
> converted into a sequence of bytes into Hbase and query it. Could you
> please elaborate your problem a bit?It will help us to answer your question
> in a better manner.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper  >wrote:
>
> > Hello,
> >
> > Is it possible to use hbase to query json documents in a same way as we
> can
> > do with Mongodb
> >
> > Suggestions please.
> > If we can then a small example as how.. not the query but the process
> > flow..
> > Thanku so much
> > Regards,
> > Panshul.
> >
>



-- 
Regards,
Ouch Whisper
010101010101


Re: hbase olap cube

2013-01-16 Thread Otis Gospodnetic
Hi,

There is one getting started on Github.  Also google for HBase lattice.

We've also built something like that at Sematext, with real-time in memory
agg and HBase persistence (we use it for the thing in my sig and more), but
have not OSSed it.

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html





On Wed, Jan 16, 2013 at 5:23 AM, Oleg Ruchovets wrote:

> Hi ,
>
> I have a timeseries data  and I am looking for capabilities to store
> and aggregates the events on such level granularity:   YEAR | MONTH  | WEEK
> | DAY  | HOUR.
> I need functionality like sum() , average().
> For example to calculate average for event X which was at 14:00 PM every
> sunday during JANUARY.
>
> Questions:
> 1) I see it like Olap Cube solution , Am I right?
> 2) Can someone point me on  Olap Cube open source project over hbase?
>
> Thanks
> Oleg.
>


Re: Loading Endpoint coprocessor from shell

2013-01-16 Thread Amit Sela
Forget about it, my bad :)

On Wed, Jan 16, 2013 at 2:48 PM, Amit Sela  wrote:

> Hi all,
>
> It seems like I can't load Endpoint coprocessor from shell but I have no
> problem loading RegionObserver from shell.
> In both cases I pack a jar file, copy it to HDFS and load from shell using
> table_att but only the RegionObserver is loaded (I can see it in the
> webapp).
>
> Is it supposed to be like that ? bug maybe ?
>
> I'm using HBase 0.94.2
>
> Thanks,
>
> Amit.
>


Loading Endpoint coprocessor from shell

2013-01-16 Thread Amit Sela
Hi all,

It seems like I can't load Endpoint coprocessor from shell but I have no
problem loading RegionObserver from shell.
In both cases I pack a jar file, copy it to HDFS and load from shell using
table_att but only the RegionObserver is loaded (I can see it in the
webapp).

Is it supposed to be like that ? bug maybe ?

I'm using HBase 0.94.2

Thanks,

Amit.


Re: hbase olap cube

2013-01-16 Thread Amit Sela
Hi Oleg,

Try this  http://github.com/urbanairship/datacube
Andrew Purtell recommended it here when I had a similar need. I didn't need
things like sum and average so I ended up using a RegionObserver
coprocessor but I did take a look and looks like it might help you.

Good Luck!

On Wed, Jan 16, 2013 at 12:23 PM, Oleg Ruchovets wrote:

> Hi ,
>
> I have a timeseries data  and I am looking for capabilities to store
> and aggregates the events on such level granularity:   YEAR | MONTH  | WEEK
> | DAY  | HOUR.
> I need functionality like sum() , average().
> For example to calculate average for event X which was at 14:00 PM every
> sunday during JANUARY.
>
> Questions:
> 1) I see it like Olap Cube solution , Am I right?
> 2) Can someone point me on  Olap Cube open source project over hbase?
>
> Thanks
> Oleg.
>


Re: Tune MapReduce over HBase to insert data

2013-01-16 Thread Farrokh Shahriari
I've noticed that if I comment the write command in Map function (
Context.write(row,put)),it will just take 40 sec. The differences is about
30 seconds,that's weird for me,what do you think ?

the parameters that are useful up to now:
hbase.hstore.blockingStoreFiles => 20
hbase.hregion.memstore.block.multiplier => 4
hbase.hregion.memstore.flush.size => 1073741824
speculative.execution => false
wal => false

should I change these two parameter : io.sort.mb & io.sort.factor ?

Mohandes

On Tue, Jan 15, 2013 at 5:03 AM, Bing Jiang wrote:

> Hi, mohandes.zebeleh
> you can adjust parameter as below( Major Compaction, Minor Compaction,
> Split):
> if you do not set, it will retain default value(1).
>
> 
>   hbase.regionserver.thread.compaction.large
>   5
> 
> 
>   hbase.regionserver.thread.compaction.small
>   10
> 
> 
>   hbase.regionserver.thread.split
>   5
> 
>
> Regards!
>
> Bing
>
> 2013/1/14 Farrokh Shahriari 
>
>> Bing Jiang, What do you mean by add compaction thread number ? Because, in
>> Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
>> the parameter that you have said.
>>
>> Thanks you if you guide me.
>>
>> On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu  wrote:
>>
>> > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
>> package.
>> >
>> > Cheers
>> >
>> > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang > > >wrote:
>> >
>> > > hi,anoop.
>> > > Why not hbase mapreduce package contains the tools like this?
>> > >
>> > > Anoop John 编写:
>> > >
>> > > >Hi
>> > > > Can you think of using HFileOutputFormat ?  Here you use
>> > > >TableOutputFormat now. There will be put calls to HTable. Instead in
>> > > >HFileOutput format the MR will write the HFiles directly.[No flushes
>> ,
>> > > >compactions] Later using LoadIncrementalHFiles need to load the
>> HFiles
>> > to
>> > > >the regions.  May help you..
>> > > >
>> > > >-Anoop-
>> > > >
>> > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
>> > > >mohandes.zebeleh...@gmail.com> wrote:
>> > > >
>> > > >> Thank you guys,let me change these configuration & test mapreduce
>> > again.
>> > > >>
>> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <
>> asaf.mes...@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
>> > using
>> > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
>> > pathTo8GBFile
>> > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
>> > > computers,
>> > > >> > you should get around 75 MB/sec.
>> > > >> >
>> > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
>> > > >> >
>> > > >> > > In our experience, it can enhance mapreduce insert by
>> > > >> > > 1.add regionserver flush thread number
>> > > >> > > 2.add memstore/jvm_heap
>> > > >> > > 3.pre split table region before mapreduce
>> > > >> > > 4.add large and small compaction thread number.
>> > > >> > >
>> > > >> > > please correct me if wrong, or any other better ideas.
>> > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" > > > >> >
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > What type of disks and how many?
>> > > >> > > > With the default replication factor your 2 (or 6) GB are
>> > actually
>> > > >> > > > replicated 3 times.
>> > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
>> > which
>> > > a
>> > > >> > > > reasonable machine should be able to absorb.
>> > > >> > > > The fact that deferred log flush does not help you seems to
>> > > indicate
>> > > >> > that
>> > > >> > > > you're over IO bound.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > What's your memstore flush size? Potentially the data is
>> written
>> > > many
>> > > >> > > > times during compactions.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > In your case you dial down the HDFS replication, since you
>> only
>> > > have
>> > > >> > two
>> > > >> > > > physical machines anyway.
>> > > >> > > > (Set it to 2. If you do not specify any failure zones, you
>> might
>> > > as
>> > > >> > well
>> > > >> > > > set it to 1... You will lose data if one of your server
>> machines
>> > > dies
>> > > >> > > > anyway).
>> > > >> > > >
>> > > >> > > > It does not really make that much sense to deploy HBase and
>> HDFS
>> > > on
>> > > >> > > > virtual nodes like this.
>> > > >> > > > -- Lars
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > 
>> > > >> > > >  From: Farrokh Shahriari > > > >> >
>> > > >> > > > To: user@hbase.apache.org 
>> > > >> > > > Sent: Monday, January 7, 2013 9:38 PM
>> > > >> > > > Subject: Re: Tune MapReduce over HBase to insert data
>> > > >> > > >
>> > > >> > > > Hi again,
>> > > >> > > > I'm using HBase 0.92.1-cdh4.0.0.
>> > > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24
>> > > logical
>> > > >> > > core
>> > > >> > > > that contain 12 nodes(6 nodes on each server). Each node has
>> 8Gb
>> > > RAM
>> > > >> &
>> > > >> > 2
>> > > >> > > > VCPU.
>> >

Re: Hbase as mongodb

2013-01-16 Thread Mohammad Tariq
Hello Panshul,

Hbase and MongoDB are built to serve different purposes. You can't
replace one with the other. They have different strengths and weaknesses.
So, if you are using Hbase for something, think well before switching to
MongoDB or vice verca.

Coming back to the actual question, you can store anything which can be
converted into a sequence of bytes into Hbase and query it. Could you
please elaborate your problem a bit?It will help us to answer your question
in a better manner.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper wrote:

> Hello,
>
> Is it possible to use hbase to query json documents in a same way as we can
> do with Mongodb
>
> Suggestions please.
> If we can then a small example as how.. not the query but the process
> flow..
> Thanku so much
> Regards,
> Panshul.
>


Re: Hbase as mongodb

2013-01-16 Thread Nitin Pawar
may be this will help
http://sites.ieee.org/scv-cs/files/2011/03/IBM-Jaql-by-Kevin-Beyer.pdf


On Wed, Jan 16, 2013 at 4:03 PM, Panshul Whisper wrote:

> Hello,
>
> Is it possible to use hbase to query json documents in a same way as we can
> do with Mongodb
>
> Suggestions please.
> If we can then a small example as how.. not the query but the process
> flow..
> Thanku so much
> Regards,
> Panshul.
>



-- 
Nitin Pawar


Hbase as mongodb

2013-01-16 Thread Panshul Whisper
Hello,

Is it possible to use hbase to query json documents in a same way as we can
do with Mongodb

Suggestions please.
If we can then a small example as how.. not the query but the process flow..
Thanku so much
Regards,
Panshul.


hbase olap cube

2013-01-16 Thread Oleg Ruchovets
Hi ,

I have a timeseries data  and I am looking for capabilities to store
and aggregates the events on such level granularity:   YEAR | MONTH  | WEEK
| DAY  | HOUR.
I need functionality like sum() , average().
For example to calculate average for event X which was at 14:00 PM every
sunday during JANUARY.

Questions:
1) I see it like Olap Cube solution , Am I right?
2) Can someone point me on  Olap Cube open source project over hbase?

Thanks
Oleg.