Jason Rutherglen:
Hello,
I'm curious as to what a 'good' approach would be for implementing
search in HBase (using Lucene) with the end goal being the integration
of realtime search into HBase. I think the use case makes sense as
HBase is realtime and has a write-ahead log, performs
On Fri, Feb 11, 2011 at 4:13 PM, Ted Dunning tdunn...@maprtech.com wrote:
On Fri, Feb 11, 2011 at 3:50 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I can't imagine that the speed achieved by using Hbase would be even
within
orders of magnitude of what you can do in Lucene 4
Hello,
Can you please tell me if this is the proper way of designing a table that's
got an auto increment key? If there's a better way please let me know that
as well.
After reading the mail archives, I learned that the best way is to use the
'incrementColumnValue' method of HTable.
So
On Sun, Feb 13, 2011 at 8:29 AM, Mike Spreitzer mspre...@us.ibm.com wrote:
Yes, I simply took the Hadoop 0.20.2 release, deleted its hadoop-core.jar,
and replaced it with the contents of
lib/hadoop-core-0.20-append-r1056497.jar from hbase.
I'm not sure what to do with this approach might
Transactional consistency isn't going to happen if you even involve more
than one hbase row.
What does this mean? Or rather, can you elaborate?
What they need is that documents can be found very shortly
after they are inserted and that crashes won't compromise that.
Right. I think HBase
Google's percolator paper.
Can you post a link?
Another issue is that maybe the scalability needs for search might be
different. An HBase region is always only active in one region server, there
are no active replica's, while often for search you need replicas to scale,
since a search will
HBase bulk load (using configureIncrementalLoad helper method) configures the
job to create as many reducer task as the regions in the hbase table. So if
there are few hundred regions then the job would spawn few hundred reducer
tasks. This could get very slow on a small cluster..
Is there any
Do you want to do Term- or Document partitioning?
It sounds like no one uses term partitioning, doc-partitioning seems
to be the most logical default?
serve the index shards from memory
In Lucene-land this's a function of allocating enough RAM for the
system IO cache.
On Sun, Feb 13, 2011 at
I think there's another way to look at this, and that is what types of
queries do HBase users perform that search can enhance? Eg, given we
can index extremely quickly with Lucene and with RT we can search with
near-zero latency, perhaps there are new queries that would be of
interest/useful to
you can also stripe, eg:
c_1 starts at 1, skip=100
c_2 starts at 2, skip=100
c_$i starts at $i, skip=100 for 3..99
now you have 100x speed/parallelism. If single regionserver
assignment becomes a problem, use multiple tables.
On Sun, Feb 13, 2011 at 10:12 PM, Lars George lars.geo...@gmail.com
Doc-partitioning has much better failure modes and is universal in my
experience for serious applications.
On Sun, Feb 13, 2011 at 6:01 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Do you want to do Term- or Document partitioning?
It sounds like no one uses term partitioning,
I would avoid this, personally.
Serious transactions and complex queries are pretty much incompatible with
simple implementation and large scale.
Flow based updates and write-behind are more the norm.
On Sun, Feb 13, 2011 at 6:09 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I
The DFS errors are after the server aborts. What is in the log before the
server abort? Doesn't seem to show any reason here which is unusual.
Anything in the master? Did it time out this RS? You're running with
replication = 1?
-Original Message-
From: Bradford Stephens
We've got dfs.replication = 3 in hdfs-site.xml
doing a grep for FATAL and the surrounding 50 lines yields this:
Regionserver log: http://pastebin.com/3cYYNhct
HMaster and DataNode logs seem pretty boring, no errors. Some sections
of lots of scheduling/deleting blocks...
Restarted the HBase
14 matches
Mail list logo