for your deletes?
On Wed, Nov 21, 2012 at 10:17 AM, Bing Jiang jiangbinglo...@gmail.com
wrote:
yes,hbase has made a compaction between batch-put and deletes. any ideas?
On Nov 21, 2012 11:10 PM, Michael Segel michael_se...@hotmail.com
wrote:
Some time later?
Time of course is relative, so
Salting is not a good idea and I don't know why people suggest it.
Case in point you want to fetch a single row/record back. Because the salt
is arbitrary, you will need to send N number of get()s one for each salt value.
Doing a simple one way hash of the data, even appending the data,
there is not a
more comprehensive composite atomic operation available. If there is a
good reason for the API to include appends, then that reason applies here.
If there is no such reason, then you may ignore the appends in my
question.
Thanks,
Mike
From: Michael Segel michael_se
/HTableInterface.html#append%28org.apache.hadoop.hbase.client.Append%29
However, the point of my question is not specific to appends. I am asking
why HBase does not have checkMany-and-mutateMany.
Thanks,
Mike
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org
Date
Ok, maybe this is a silly question on my part
Could you define what you mean by check and append?
With respect to HBase, how would that be different from check and put?
On Nov 18, 2012, at 12:28 AM, Mike Spreitzer mspre...@us.ibm.com wrote:
I am not looking at the trunk. I am just a
Just a guess... have you done any compactions on the table post bulk load?
On Nov 12, 2012, at 8:44 AM, Marcos Ortiz mlor...@uci.cu wrote:
Regards, Amit.
Did you tuned the RegionServer where you has that data range hosted?
Why do you say that scans are slower after a bulk load?
Did you test
Ok...
First, if you're estimating that the raw data would be 10TB, you will find out
that you will need a bit more to handle the data in terms of indexing and
denormalized structures.
The short answer to your question is yes, you can do it.
Longer answer...
You can bake a solution in
There's an HDFS bandwidth setting which is set to 10MB/s.
Way too low for even 1GBe.
Have you modified this setting yet?
-Mike
On Nov 3, 2012, at 2:50 PM, David Koch ogd...@googlemail.com wrote:
Hello Ted,
We never initiate major compaction manually. I have not looked at I/O
balance
2012, at 15:05, Michael Segel michael_se...@hotmail.com wrote:
There's an HDFS bandwidth setting which is set to 10MB/s.
Way too low for even 1GBe.
Have you modified this setting yet?
-Mike
On Nov 3, 2012, at 2:50 PM, David Koch ogd...@googlemail.com wrote:
Hello Ted,
We never
Just out of curiosity...
What's the impact on having regions of 10GB or larger?
What does that do to your footprint in memory and the time it takes to split or
compact a region?
-Mike
On Nov 1, 2012, at 8:35 AM, Kevin O'dell kevin.od...@cloudera.com wrote:
Couple thoughts(it is still
No, sorry, you have to disable the table in order to modify the table.
On Oct 30, 2012, at 9:33 AM, Mike mike20...@gmail.com wrote:
Hi All,
I use hbase 0.92 and I am trying to add a column family to hbase table
and I get the below error.
ERROR:
When I hear experimental and production in the same conversation, I get shivers
up my spine.
Which release(s) contain this flag?
On Oct 30, 2012, at 9:35 AM, Kevin O'dell kevin.od...@cloudera.com wrote:
Mike,
I have not messed around with the online schema changes too much. It is
still
.
-- Lars
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Sent: Monday, October 22, 2012 9:18 PM
Subject: Re: How to config hbase0.94.2 to retain deleted data
Curious, why do you think this is better than using the keep-deleted-cells
nobody should use TTL/VERSIONS, which is nonsense.
From: Michael Segel michael_se...@hotmail.com
To: lars hofhansl lhofha...@yahoo.com
Cc: user@hbase.apache.org user@hbase.apache.org
Sent: Tuesday, October 23, 2012 4:41 AM
Subject: Re: How to config
,
Delete, Increment, Append, RowMutations, etc)
Curious, why do you think this is better than using the keep-deleted-cells
feature?
(It might well be, just curious)
-- Lars
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org
I would suggest that you use your coprocessor to copy the data to a 'backup'
table when you mark them for delete.
Then as major compaction hits, the rows are deleted from the main table, but
still reside undeleted in your delete table.
Call it a history table.
On Oct 21, 2012, at 3:53 PM,
the keep-deleted-cells
feature?
(It might well be, just curious)
-- Lars
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org
Cc:
Sent: Sunday, October 21, 2012 4:34 PM
Subject: Re: How to config hbase0.94.2 to retain deleted data
I
Outch...
That could get very nasty. You may end up with a lot of uneven splits.
Suppose your 'metric1' spans 3 regions, 'metric2' 1 but its still in the same
split as 'metric1' and then 'metric3' is in two regions, 'metric4' is in two
regions where its split between the end of 'metric3' and
someone
could describe the performance trade-off between Scan vs. Get.
Thanks again for anyone who read this far.
Neil Yalowitz
neilyalow...@gmail.com
On Wed, Oct 17, 2012 at 10:45 AM, Michael Segel
michael_se...@hotmail.comwrote:
Neil,
Since you asked
Actually your
Lars,
I think we need to clarify what we think of as a SAN.
Its possible to have a SAN where the disks appear as attached storage, while
the traditional view is that the disks are detached.
There are some design considerations like cluster density where one would want
to use a SAN like
Neil,
Since you asked
Actually your question is kind of a boring question. ;-) [Note I will probably
get flamed for saying it, even if it is the truth!]
Having said that...
Boring as it is, its an important topic that many still seem to trivialize in
terms of its impact on performance.
Hi,
I'm a firm believer in KISS (Keep It Simple, Stupid)
The Map/Reduce (map job only) is the simplest and least prone to failure.
Not sure why you would want to do this using coprocessors.
How often are you running this job? It sounds like its going to be sporadic.
-Mike
On Oct 17,
?
Or should I go with the initial idea of doing the Put with the M/R job
and the delete with HBASE-6942?
Thanks,
JM
2012/10/17, Michael Segel michael_se...@hotmail.com:
Hi,
I'm a firm believer in KISS (Keep It Simple, Stupid)
The Map/Reduce (map job only) is the simplest
Not really a good idea.
JDC hit the nail on the head.
You want to handle the setting on the HTable instance and not on the pool.
Just saying...
On Oct 10, 2012, at 3:09 AM, Jeroen Hoek jer...@lable.org wrote:
If you want to disable auto-flush in 0.92.1, one approach is to
override the
Silly question(s).
1) What sort of indexes do you want to build?
2) Why would you want to store your indexes outside of HBase?
(Ok they are not so silly. But I don't want people to think that I'm against
the idea, just that its more of an issue of design.)
-Mike
On Oct 12, 2012, at 7:03
Well you don't want to do joins in HBase.
There are a couple of ways to do this, however, I think based on what you have
said... the larger issue for either solution (HBase or MySQL would be your
schema design.)
Basically you said you have Table A w 50 Million rows and Table B of 7 Million
:06, Michael Segel michael_se...@hotmail.com 작성:
I took it that the OP wants to store the rows A1-A3 in the order in which
they came in. So It could be A3,A1,A2 as an example.
So to do this you end up prefixing the rowkey with a timestamp or something.
This is not a good idea, and I
Actually I think you'd want to do the reverse.
Store your Lucene index in HBase. Which is what we did a while back.
This could be extended to SOLR, but we never had time to do it.
On Oct 5, 2012, at 4:11 AM, Lars George lars.geo...@gmail.com wrote:
Hi Otis,
My initial reaction was,
Depends.
What sort of system are you tuning?
Sorry, but we have to start somewhere and if we don't know what you have in
terms of hardware, we don't have a good starting point.
On Oct 5, 2012, at 7:47 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
Do most people start out with default
Silly question. Why do you care how your data is being stored?
Does it matter if the data is stored in rows where A1,A2, A3 are the order of
the keys, or
if its A3,A1,A2 ?
If you say that you want to store the rows in order based on entry time, you're
going to also have to deal with a little
, 10 is before 9. So if your row key includes 1 ... 10,
it is neccessory to format the single letter by adding 0.
Best Wishes
Dan Han
On Thu, Oct 4, 2012 at 9:19 AM, Michael Segel
michael_se...@hotmail.comwrote:
Silly question. Why do you care how your data is being stored?
Does
You really don't want to go to 20GB.
Without knowing the number of regions... going beyond 1-2 GB may cause more
headaches than its worth.
Sorry, but I tend to be very cautious when it comes to tuning.
-Mike
On Oct 2, 2012, at 9:20 AM, Damien Hardy dha...@viadeoteam.com wrote:
Hello
Interesting.
So how do you manage the transaction? On the client or on the cluster?
On Oct 1, 2012, at 6:12 PM, de Souza Medeiros Andre andre.medei...@aalto.fi
wrote:
Hello all at this mailing list,
I'm glad to finally announce an HBase addon that I have been working on.
HAcid is a
I wouldn't 'prefix' the hash to the key, but actually replace the key with a
hash and store the unhashed key in a column.
But that's a different discussion.
In a nutshell, the problem is that there are a lot of potential use cases where
you want to store data in a sequence dependent fashion.
How much memory do you have?
What's the size of the underlying row?
What does your network look like? 1GBe or 10GBe?
There's more to it, and I think that you'll find that YMMV on what is an
optimum scan size...
HTH
-Mike
On Sep 12, 2012, at 7:57 AM, Amit Sela am...@infolinks.com wrote:
Hi
can use a fast and simple hashing algorithm, because
you do not need the hash to be unique.
Depends again on various aspects.
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Cc:
Sent: Wednesday
On Sep 10, 2012, at 12:32 PM, Tom Brown tombrow...@gmail.com wrote:
We have our system setup such that all interaction is done through
co-processors. We update the database via a co-processor (it has the
appropriate logic for dealing with concurrent access to rows), and we
also
Well,
Lets actually skip a few rounds of questions... and start from the beginning.
What does your physical cluster look like?
On Sep 10, 2012, at 12:40 PM, Ramasubramanian
ramasubramanian.naraya...@gmail.com wrote:
Hi,
Will be helpful if u say specific things to look into. Pls help
I think the issue is that you are misinterpreting what you are seeing and what
Doug was trying to tell you...
The short simple answer is that you're getting one split per region. Each split
is assigned to a specific mapper task and that task will sequentially walk
through the table finding the
I think you have to understand what happens as a table splits.
If you have a composite key where the first field has the value between 0-9 and
you pre-split your table, you will have all of your 1's going to the single
region until it splits. But both splits will start on the same node until
a little slow until the regions split and distribute
effectively.
That make sense?
On Tue, Sep 4, 2012 at 1:34 PM, Michael Segel
michael_se...@hotmail.comwrote:
I think you have to understand what happens as a table splits.
If you have a composite key where the first field has
What row keys are you skipping?
Using your example...
You have a start row of 200, and an end key of xFFxFFxFFxFFxFFxFF00350.
Note that you could also write that end key as xFF(1..6) 01 since it looks like
you're trying to match the 00 in positons 7 and 8 of your numeric string.
Can you disable the table?
How much free disk space do you have?
Is this a production cluster?
Can you upgrade to CDH3u5?
Are you running a capacity scheduler or fair scheduler?
Just out of curiosity, what would happen if you could disable the table, alter
the table's max file size and then
Ah... schema design...
Yes you have both options identified... but just to add a twist... in the
column name, prepend the (epoch - timestamp) to the message id. This will put
the messages in reverse order.
The only drawback to this is that its theoretically possible to create a row
which
I think you need to think outside of the box...
I've thought about it a little more and while there's validity to indexing at
the RS, there's a bit more of a headache.
But I think you've been too dismissive of looking at the index at the table
level and not at the region level.
On Aug 14,
wrote:
On Tue, Aug 14, 2012 at 7:38 PM, Michael Segel
michael_se...@hotmail.com wrote:
I think you need to think outside of the box...
But I think you've been too dismissive of looking at the index at the table
level and not at the region level.
I'd be interested if you can point out exactly
:58 AM, Michael Segel
michael_se...@hotmail.comwrote:
Yes, it can.
You can see RS failure causing a cascading RS failure. Of course YMMV and
it depends on which version you are running.
OP is on CHD3u2 which still had some issues. CDH3u4 is the latest and he
should upgrade.
(Or go
will be relying
heavily on Fault Tolerance.
If HBase Bulk Loader is fault tolerant to failure of RS in a viable
environment then I dont have any issue. I hope this clears up my purpose
of posting on this topic.
Thanks,
Anil
On Mon, Aug 13, 2012 at 12:39 PM, Michael Segel michael_se
Not really a good idea or anything new.
Essentially a full table scan where you're doing a closer inspection on the key
to see if it matches your search regex, before actually fetching the entire row
and returning it.
Secondary indexes are pretty straight forward.
You have your primary key
, Michael Segel michael_se...@hotmail.com wrote:
Anil,
I don't know if you can call it a bug if you don't have enough memory
available.
I mean if you don't use HBase, then you may have more leeway in terms of
swap.
You can also do more tuning of HBase to handle the additional latency
You don't want to do that.
I mean you really don't want to do that. ;-)
You would be better off doing a strong encryption at the cell level. You can
use co-processors to do that if you'd like.
YMMV
On Aug 8, 2012, at 9:17 AM, Mohammad Tariq donta...@gmail.com wrote:
Hello Stack,
Would
,
Mohammad Tariq
On Wed, Aug 8, 2012 at 8:01 PM, Michael Segel michael_se...@hotmail.com
wrote:
You don't want to do that.
I mean you really don't want to do that. ;-)
You would be better off doing a strong encryption at the cell level. You can
use co-processors to do that if you'd
While this may be a trivial fix, have you considered possible down sides to the
implementation?
I'm not sure its a bad idea, but one that could have some potential issues when
put into practice.
-Mike
On Aug 7, 2012, at 7:30 PM, lars hofhansl lhofha...@yahoo.com wrote:
I filed HBASE-6522.
.
(the canonical example is that nothing stops a RegionObserver implementation
from calling System.exit(), taking the RegionServer with it).
-- Lars
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
or substring (using special
delimiters for date)
Second, the extracted value must be parsed to Long and set to a RowFilter
Comparator like this:
- Ursprüngliche Message -
Von: Michael Segel michael_se...@hotmail.com
An: user@hbase.apache.org
CC:
Gesendet: 13:52 Mittwoch, 1
Actually w coprocessors you can create a secondary index in short order.
Then your cost is going to be 2 fetches. Trying to do a partial table scan will
be more expensive.
On Jul 31, 2012, at 12:41 PM, Matt Corgan mcor...@hotpads.com wrote:
When deciding between a table scan vs secondary
Which release?
On Jul 31, 2012, at 5:13 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
I am seeing null row key and I am wondering how I got the nulls in there.
Is it possible when using HBaseClient that a null row might have got
inserted?
Amian,
Like always the answer to your question is... it depends.
First, how much data are we talking about?
What's the value of the underlying data?
One possible scenario...
You run a M/R job to copy data from the table to an HDFS file, that is then
copied to attached storage on an edge
Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Monday, July 23, 2012 8:19 PM
To: user@hbase.apache.org
Subject: Re: Hbase bkup options
Amian,
Like always the answer to your question is... it depends.
First, how much data are we talking about?
What's
Ok, I'll take a stab at the shorter one. :-)
You can create a base data table which contains your raw data.
Depending on your index... like an inverted table, you can run a map/reduce job
that builds up a second table. And a third, a fourth... depending on how many
inverted indexes you want.
Find a different row key?
The problem with merging regions is that once you merge the regions, any net
new regions will still have the same problem. So you'll have to merge again,
and again and again.
You're always filling to the left of the last key.
In order to merge, you have to take the
First,
A caveat... Schema design in HBase is one of the hardest things to teach/learn
because its so open. There is more than one correct answer when it comes to
creating a good design...
Ian's presentation kind of tries to relate HBase schema design to relational
modeling.
From past
Currently there is a hardcoded limit on the number of regions that a region
server can manage.
Its 1500.
Note that if the number of regions gets to around 1000 regions per region
server, you end up with a performance hit. (YMMV)
So if you have 1 region per table, there's a real limit of 1500
I'm going from memory. There was a hardcoded number. I'd have to go back and
try to find it.
From a practical standpoint, going over 1000 regions per RS will put you on
thin ice.
Too many regions can kill your system.
On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote:
Mike,
I just saw
Uhm... I'd take a step back...
Thanks for the reply. I didn't realized that all the non-MR tasks were this
CPU bound; plus my naive assumption was that four spindles will have a hard
time supplying data to MR fast enough for it to become bogged down.
Your gut feel is correct.
If you go w
Regardless,
Its still a bad design.
On Jul 9, 2012, at 10:02 PM, Jonathan Hsieh wrote:
Keith,
The HBASE-3584 feature is a 0.94 and we are strongly considering an 0.94
version for for a future CDH4 update. There is very little chance this
will get into a CDH3 release.
Jon.
On Thu,
help a lot the defect tracking when
someone will face the issue and see the stack trace.
JM
2012/7/9, Michael Segel michael_se...@hotmail.com:
Jean-Marc,
I think you mis understood.
At run time, you can query HBase to find out the table schema and its column
families.
While I agree
This may beg the question ...
Why do you not know the CF?
Your table schemas only consist of tables and CFs. So you should know them at
the start of your job or m/r Mapper.setup();
On Jul 9, 2012, at 7:25 AM, Jean-Marc Spaggiari wrote:
Hi,
When we try to add a value to a CF which does
really.
2012/7/9, Michael Segel michael_se...@hotmail.com:
This may beg the question ...
Why do you not know the CF?
Your table schemas only consist of tables and CFs. So you should know them
at the start of your job or m/r Mapper.setup();
On Jul 9, 2012, at 7:25 AM, Jean-Marc Spaggiari
I was going to post this yesterday, but real work got in the way...
I have to ask... why are you deleting anything from your columns?
The reason I ask is that you're sync'ing an object from an RDBMS to HBase.
While HBase allows fields that contain NULL not to exist, your RDBMS doesn't.
Your
, Michael Segel michael_se...@hotmail.com wrote:
I was going to post this yesterday, but real work got in the way...
I have to ask... why are you deleting anything from your columns?
The reason I ask is that you're sync'ing an object from an RDBMS to
HBase. While HBase allows fields that contain
No, you need to know your key ranges for each split. If you don't and you guess
wrong, you may end up not seeing any benefits because your data may still end
up going to a single region...
(Its data dependent.)
I am personally not a fan of pre-splitting a table.
The way I look at it, you
Timestamps on the cells themselves?
# Versions?
On Jul 3, 2012, at 4:54 AM, Sever Fundatureanu wrote:
Hello,
I have a simpel table with 1.5 billion rows and one column familiy 'F'.
Each row key is 33 bytes and the cell values are void. By doing the math I
would expect this table to take
Hi,
You're over thinking this.
Take a step back and remember that you can store anything you want as a byte
stream in a column.
Literally.
So you have a record that could be a text blob. Store it in one column. Use
JSON to define its structure and fields.
The only thing that makes it
What's the status of Hadoop and IPV6 vs IPV4?
On Jul 3, 2012, at 7:07 AM, AnandaVelMurugan Chandra Mohan wrote:
Hi,
These are text from the files
/etc/hosts
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost
!
JM
2012/7/3, Michael Segel michael_se...@hotmail.com:
Hi,
You're over thinking this.
Take a step back and remember that you can store anything you want as a byte
stream in a column.
Literally.
So you have a record that could be a text blob. Store it in one column. Use
JSON to define
Lloyd Bridges talks about how today was a bad day
for giving up insert your favorite drug ...
Sorry to side track but I thought I'd give a more detailed explanation ...
On Jul 2, 2012, at 2:51 AM, Stack wrote:
On Mon, Jul 2, 2012 at 7:11 AM, Michael Segel michael_se...@hotmail.com
wrote
of explanation is this???
Regards,
Mohammad Tariq
On Mon, Jul 2, 2012 at 5:10 PM, Michael Segel michael_se...@hotmail.com
wrote:
Sorry St. Ack,
Which is why I said that I was losing it...
The entire quote was...
On Sun, Jul 1, 2012 at 2:05 PM, Jay Wilson
registrat
I'm sorry I'm losing it.
Running RS on a machine where DN isn't running?
So then the RS can't store its regions locally. Not sure if that would ever be
a good idea or recommended.
Thought the initial question is running ZK on the same node as a RS which isn't
a good idea and a recipe for
Sounds like your .Meta. table is corrupted.
Thought that was fixed in 90.4...
On Jun 28, 2012, at 1:26 PM, Kasturi wrote:
Hi,
I have HBase master running on 3 nodes and region server on 4 other nodes on a
Mapr hadoop cluster. We have been using it for a while, and it was working
fine.
.
Schema design is a bit tricky to master because its going to be data dependent
along with your use case.
On Jun 25, 2012, at 2:32 AM, Marcin Cylke wrote:
On 21/06/12 14:33, Michael Segel wrote:
I think the version issue is the killer factor here.
Usually performing a simple get() where you
One way..,
Create an NFS mountable directory for your cluster and mount on all of the DNs.
You can either place a symbolic link in /usr/lib/hadoop/lib or add the jar to
the classpath in /etc/hadoop/conf/hadoop-env.sh
(Assuming Cloudera)
On Jun 27, 2012, at 12:47 PM, Evan Pollan wrote:
Network is always good to check, it's all fun and games until an
interface negotiates 100Mb.
50ms per get sounds a bit extreme.
mini-rant
Funny you should mention hardware.
I did submit a talk on cluster design to Strata (NY and London) Seems it didn't
make the cut on NY, but who knows
There are a couple of issues and I'm sure others will point them out.
If you turn off speculative execution on the job, you don't get duplicate tasks
running in parallel.
You could create a table to store your aggregations on a per job basis where
your row-id could incorporate your job-id.
I think the version issue is the killer factor here.
Usually performing a simple get() where you are getting the latest version of
the data on the row/cell occurs in some constant time k. This is constant
regardless of the size of the cluster and should scale in a near linear curve.
As JD C
While data locality is nice, you may see it becoming less of a bonus or issue.
With Co-processors available, indexing becomes viable. So you may see things
where within the M/R you process a row from table A, maybe hit an index to find
a value in table B and then do some processing.
16, 2012 at 6:33 PM, Michael Segel
michael_se...@hotmail.comwrote:
Jean-Marc,
You indicated that you didn't want to do full table scans when you want
to
find out which files hadn't been touched since X time has past.
(X could be months, weeks, days, hours, etc ...)
So here's the thing
Assuming that you have an Apache release (Apache, HW, Cloudera) ...
(If MapR, replace the drive and you should be able to repair the cluster from
the console. Node doesn't go down. )
Node goes down.
10 min later, cluster sees node down. Should then be able to replicate the
missing blocks.
Hi,
The simple way to do this as a map/reduce is the following
Use the HTable Input and scan the records you want to delete.
In side Mapper.Setup() create a connection to the HTable where you want to
delete the records.
In side Mapper.Map() for each iteration you will get a row which
with job!);
}
} catch (Exception e) {
LOG.error(e.getMessage(), e);
}
}
}
On Wed, Jun 20, 2012 at 7:41 AM, Michael Segel
michael_se...@hotmail.comwrote:
Hi,
The simple way to do this as a map/reduce is the following
Use the HTable
Sure, why not?
You can always open a connection to the counter table in your Mapper.setup()
method and then increment the counters within the Mapper.map() method.
Your update of the counter is an artifact and not the output of the
Mapper.map() method.
On Jun 18, 2012, at 7:49 PM, Sid Kumar
Since you don't have OLTP, the terms need to be better defined.
What is meant by an uncommitted_ write in HBase?
RLL in RDBMS is different than RLL in HBase. You don't have the concept of a
transaction in HBase.
-Mike
On Jun 17, 2012, at 10:32 PM, Anoop Sam John wrote:
Hi
You
if
more efficient?
JM
2012/6/15, Michael Segel michael_se...@hotmail.com:
Thought about this a little bit more...
You will want two tables for a solution.
1 Table is Key: Unique ID
Column: FilePathValue: Full Path to file
Column: Last
to achieve this
goal (using co-processors). I don't know ye thow this part is working,
so I will dig the documentation for it.
Thanks,
JM
2012/6/14, Michael Segel michael_se...@hotmail.com:
Jean-Marc,
You do realize that this really isn't a good use case for HBase, assuming
that what
Actually I think you should revisit your key design
Look at your access path to the data for each of the types of queries you are
going to run.
From your post:
I have a table with a uniq key, a file path and a last update field.
I can easily find back the file with the ID and find when it
will be, the more up to date I will be able to keep it.
JM
2012/6/14, Michael Segel michael_se...@hotmail.com:
Actually I think you should revisit your key design
Look at your access path to the data for each of the types of queries you
are going to run.
From your post:
I have a table with a uniq
the documentation for it.
Thanks,
JM
2012/6/14, Michael Segel michael_se...@hotmail.com:
Jean-Marc,
You do realize that this really isn't a good use case for HBase, assuming
that what you are describing is a stand alone system.
It would be easier and better if you just used a simple
UUIDs are unique but not necessarily random and even in random samplings, you
may not see an even distribution except over time.
Sent from my iPhone
On Jun 12, 2012, at 3:18 AM, Simon Kelly simongdke...@gmail.com wrote:
Hi
I'm getting some unexpected results with a pre-split table where
been able to find any
docs on what format the splits keys should be in so I've used what's
produced by Bytes.toStringBinary. Is that correct?
Simon
On 12 June 2012 10:23, Michael Segel michael_se...@hotmail.com wrote:
UUIDs are unique but not necessarily random and even in random
servers so I need to try and get as much as I can from the get go.
Simon
On 12 June 2012 13:37, Michael Segel michael_se...@hotmail.com wrote:
Ok,
Now that I'm awake, and am drinking my first cup of joe...
If you just generate UUIDs you are not going to have an even distribution.
Nor
301 - 400 of 583 matches
Mail list logo