about schema design problem

2013-06-28 Thread ch huang
hi,all: i have two search pattern , search for row with specific id ,and search for row with specific token,how can i design schema fit two search pattern effectly?

RE: about schema design problem

2013-06-28 Thread Jyothi Mandava
HI, do you mean search wuth id is Search with row key and search with token is search with column value? Jyothi From: ch huang [justlo...@gmail.com] Sent: Friday, June 28, 2013 12:11 PM To: user@hbase.apache.org Subject: about schema design problem

Re: about schema design problem

2013-06-28 Thread ch huang
no , i have a old RDMS(mysql) table ,have two search pattern , i want to change to HBase schema,i do not know how to design to get best performance (can i put id and token together into rowkey?) On Fri, Jun 28, 2013 at 2:55 PM, Jyothi Mandava jyothi.mand...@huawei.comwrote: HI, do you mean

RE: about schema design problem

2013-06-28 Thread Jyothi Mandava
If you add both(id+token) to the row key and if you want to search with any one of these (id , token), giving only id in row key can use prefix filter to get the rows. giving only token can use FuzzyRowFilter ( Token) but here id length should be fixed length to use FuzzyRowFilter If your

Re: about schema design problem

2013-06-28 Thread ch huang
As i know ,hbase only can index key ( rowkey+column qualifier + ts) ,how can i set secondary key on HBase Table? On Fri, Jun 28, 2013 at 4:28 PM, Jyothi Mandava jyothi.mand...@huawei.comwrote: If you add both(id+token) to the row key and if you want to search with any one of these (id ,

Re: Schema design for filters

2013-06-28 Thread Kristoffer Sjögren
Interesting. Im actually building something similar. A fullblown SQL implementation is bit overkill for my particular usecase and the query API is the final piece to the puzzle. But ill definitely have a look for some inspiration. Thanks! On Fri, Jun 28, 2013 at 3:55 AM, James Taylor

RE: about schema design problem

2013-06-28 Thread Jyothi Mandava
There is no built-in support but you can check these http://jyates.github.io/2013/06/11/hbase-consistent-secondary-indexing.html https://github.com/jyates/phoenix/tree/hbase-index/contrib/hbase-index Recently there was a discussion in Hbase users group on indexing columns. Please find it

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Anoop John
Viral, Basically when you increase the memstore flush size ( well ur aim there is to reduce flushes and make data sit in memory for longer time) you need to carefully consider the 2 things 1. What is the max heap and what is the % memory you have allocated max for all the memstores in a RS.

Re: Problems while exporting from Hbase to CSV file

2013-06-28 Thread Anoop John
so i can not use default scan() constructor as it will scan whole table in one go which results in OutOfMemory error in client process Not getting what you mean by this. Client calls next() on the Scanner and gets the rows. The setCaching() and setBatch() determines how much of data (rows,

Re: How many column families in one table ?

2013-06-28 Thread Ted Yu
Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.comwrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a

Re: How many column families in one table ?

2013-06-28 Thread Vimal Jain
Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On

Re: How many column families in one table ?

2013-06-28 Thread Michael Segel
Beyond the physical limitations (cost constraints) there's a logical one in terms of design. I just did a talk at the CHUG on schema design and the key was to understand how and why one should use column families. From a logical design perspective you would want to limit data within a CF to

Re: Problems while exporting from Hbase to CSV file

2013-06-28 Thread Michael Segel
Yeah, that's the point. You fetch, you iterate through the returned set, you get the next batch. The only way he could get OOM is in his code. On Jun 28, 2013, at 7:23 AM, Anoop John anoop.hb...@gmail.com wrote: so i can not use default scan() constructor as it will scan whole table in one

Re: Schema design for filters

2013-06-28 Thread Michael Segel
Why is it that if all you have is a hammer, everything looks like a nail? ;-) On Jun 27, 2013, at 8:55 PM, James Taylor jtay...@salesforce.com wrote: Hi Kristoffer, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Jean-Daniel Cryans
On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: Hey JD, Thanks for the clarification. I also came across a previous thread which sort of talks about a similar problem.

Re: Poor HBase map-reduce scan performance

2013-06-28 Thread lars hofhansl
If we can make a clean patch with minimal impact to existing code I would be supportive of a backport to 0.94. -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject:

Re: Schema design for filters

2013-06-28 Thread Otis Gospodnetic
Kristoffer, You could also consider using something other than HBase, something that supports secondary indices, like anything that is Lucene based - Solr and ElasticSearch for example. We recently compared how we aggregate data in HBase (see my signature) and how we would do it if we were to

Re: How many column families in one table ?

2013-06-28 Thread Otis Gospodnetic
Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On

Re: is hbase cluster support multi-instance?

2013-06-28 Thread Otis Gospodnetic
You can have multiple and completely separate tables inside the same HBase cluster... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 12:35 AM, ch huang justlo...@gmail.com wrote: hi all: can hbase start

Re: Schema design for filters

2013-06-28 Thread Otis Gospodnetic
Hi, I see. Btw. isn't HBase for 1M rows an overkill? Note that Lucene is schemaless and both Solr and Elasticsearch can detect field types, so in a way they are schemaless, too. Otis -- Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren

RE: is hbase cluster support multi-instance?

2013-06-28 Thread rajeshbabu chintaguntla
you can start multiple instances of hbase by managing unique zookeeper quorum and hdfs for each hbase instance. Thanks and Regards, Rajeshbabu From: ch huang [justlo...@gmail.com] Sent: Friday, June 28, 2013 10:05 AM To: user@hbase.apache.org Subject:

Re: Schema design for filters

2013-06-28 Thread Asaf Mesika
Yep. Other DBs like Mongo may have the stuff you need out of the box. Another option is to encode the whole class using Avro, and writing a filter on top of that. You basically use one column and store it there. Yes, you pay the penalty of loading your entire class and extract the fields you need

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Viral Bajaria
On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data in its memstores. That's weird, because I see some of my regions which receive

RE: How to specify the hbase.zookeeper.quorum on command line invoking hbase shell

2013-06-28 Thread rob mancuso
If you have the separate conf files in their own directories , You can run a script to set HBASE_CONF_DIR and HADOOP_CONF_DIR. Then just run hbase shell On Jun 24, 2013 10:21 AM, rajeshbabu chintaguntla rajeshbabu.chintagun...@huawei.com wrote: --config conf dir arguments of a command will be

Re: How to specify the hbase.zookeeper.quorum on command line invoking hbase shell

2013-06-28 Thread Stack
On Sun, Jun 23, 2013 at 10:33 PM, Stephen Boesch java...@gmail.com wrote: We want to connect to a non-default / remote hbase server by setting hbase.zookeeper.quorum=our.remote.hbase.server on the command line invocation of hbase shell (and not disturbing the existing hbase-env.sh or

Re: Schema design for filters

2013-06-28 Thread Michel Segel
This doesn't make sense in that the OP wants schema less structure, yet wants filtering on columns. The issue is that you do have a limited Schema, so Schema less is a misnomer. In order to do filtering, you need to enforce object type within a column which requires a Schema to be enforced.

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Jean-Daniel Cryans
On Fri, Jun 28, 2013 at 2:39 PM, Viral Bajaria viral.baja...@gmail.com wrote: On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data