You could start with ycsb, http://hbase.apache.org/book.html#d470e4911, Jason?
St.Ack
On Fri, Jun 3, 2011 at 5:31 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I'm looking for a sample data set to benchmark the Lucene FST,
specifically the keys. I'm guessing a common key type for
I varied the ms increment randomly between 1-20, then created 10 mil
dates. The FST was then 58,481,582 bytes, eg, 57 MB. Guess it's not
perfect! 19,739,994 bytes, eg, 18.8 MB for random 1-5 increments. I
think that's still pretty good. I need to try varying the long value
stored alongside to
From: Todd Lipcon t...@cloudera.com
Not to be too mean and discouraging to everyone passing around patches
against CDH3 and/or 0.20-append, but just an FYI: there is no chance
that these things will get committed to an 0.20 branch without first
going through trunk. Sharing patches and testing
Here's some more data for the 10 mil dates:
68.1 MB random increment up to 1000
87.1 MB random increment up to 10,000
162.1 MB total not using the FST
On Fri, Jun 3, 2011 at 10:57 PM, Stack st...@duboce.net wrote:
That can't be true? (smile) How would you search a 'key' in the FST?
St.Ack
On Fri, Jun 3, 2011 at 7:03 PM, Matt Corgan mcor...@hotpads.com wrote:
Pluggable formats would help here so you could tune for mem vs cpu.
More history. At the time of KV and hfile incubation, we thought
about making these building blocks pluggable but it was thought that
there would be a
I want to take a wh/hack at creating a pluggable block index, is there
an open issue for this? I looked and couldn't find one.
You'd have to change how the Scanner code works, etc. You'll find out.
Nice! Sounds fun.
On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson ryano...@gmail.com wrote:
What are the specs/goals of a pluggable block index? Right now the
block index is fairly tied deep in how HFile works. You'd have
Also, dont break it :-)
Part of the goal of HFile was to build something quick and reliable.
It can be hard to know you have all the corner cases down and you
won't find out in 6 months that every single piece of data you have
put in HBase is corrupt. Keeping it simple is one strategy.
I have
It can be hard to know you have all the corner cases down and you
won't find out in 6 months that every single piece of data you have
put in HBase is corrupt. Keeping it simple is one strategy.
Isn't the block index separate from the actual data? So corruption in
that case is unlikely.
I
Oh BTW, you can't mmap anything in HBase unless you copy it to local
disk first. HDFS = no mmap.
just thought you'd like to know.
On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
It can be hard to know you have all the corner cases down and you
won't find out
Oh BTW, you can't mmap anything in HBase unless you copy it to local
disk first. HDFS = no mmap.
Right. I know that! Once the block index is pluggable, the FST would
be an in heap byte[].
On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson ryano...@gmail.com wrote:
Oh BTW, you can't mmap anything
I mentioned a bunch of stuff in that prefix compression email about cache
lines, prefetching, trie node sizes, etc... The gist of it all is that
memory has become relatively slow to the point where you need to start
thinking of it in similar ways as we think of disk/network.
I dug up and cleaned
12 matches
Mail list logo