Sent to you by Sean McBride via Google Reader: Anticipating the Next
Generation of Search via Alt Search Engines by Guest Author on 9/30/08

Hank Williams

Most of the world’s most important information has structure. To be
clear, by structure I mean the data has separate fields for its
component parts, like for example, a contact has separate fields for
first name, last name, address city, country, etc.

This structured information is where most of the value from the
Internet resides. For example, all e-commerce is centered around
structured data like product information records which have fields like
part numbers, prices, descriptions, etc. I would suggest that
structured data, on the whole, has one or more orders of magnitude
greater economic value than unstructured data.

And yet we currently have no centralized way to find structured
information. Today the process is very ad-hoc. We must know where to
look in order to find what we are looking for. When we want to buy a
new TV perhaps we check Amazon, Best Buy, and Circuit City. If we want
to find people to potentially hire, perhaps we go to LinkedIn. If we
want to buy Beanie Babies we go to eBay.

But shouldn’t it be possible to find structured information from a
centralized source like Google? Yes, we have vertical search engines,
but that really is the entirely wrong concept. There is nothing at all
vertical (i.e. narrow) about structured search, and creating separate
types of engines for each type of data is really the wrong thing to do.
Perhaps we think of this problem as “vertical” just because it happens
to be hard. But this is indeed one of the broadest remaining problems
on the Internet.

While the technical challenges are significant, the opportunity here is
huge. This is because the direct economic value of structured data is,
as I suggested above, *much* greater that that of raw text. And so the
company that brings us structured search might have the potential to
be *at least* as valuable as Google.

Understanding the Problem

The current form of the search engine makes lots of sense when the data
you are searching is just a river of text, and all you are looking for
is whether a set of words is present, or even with semantic search,
whether a concept is present. But as the Internet becomes a web of
structured data and you want to find records of a particular type with,
for example, fields within a particular range, how will that work?

The first thing to consider is that the Web, despite what people would
like to think, is not really a collection of millions of independent
servers operated by different people and companies. The web is a
collection of servers that are all hooked into the major search engines
— forming a kind of singular hive. These search engines operate as the
brain of the Internet. They mirror a copy of most every bit of data on
the Internet and index it inside their own servers.

This mirroring is doable because the task is, at the most basic level,
relatively simple: store text and build an inverted index of it. I
don’t mean to minimize the implementation complexities of modern
search, but the basic concept is very simple. Doing a web scale
structured search engine is not nearly so conceptually simple.

We have many years of experience storing structured information in SQL
databases. But there are no web scale SQL databases comparable to the
hundreds of thousands of servers Google has under the hood for storing
and indexing text. And even if such a beast did exist, the whole
concept of the SQL/relational database doesn’t work at web scale
because you have to know what types of records you are going to store
up front. You cannot just have every new user adding new record types.

And yet this ability to search through and understand any kind of
structure is *exactly* what you want a structured search engine to do.
In the next generation of search, new structures must be as easy to add
to the index as new web pages are to add to Google. Just as today’s
search engines store any kind of text, tomorrows search engines must be
able to grab structured information, understand it, and understand the
relationships between structures. For example, you need to be able to
ask your search engine, who are John Doe’s friends. You need to be able
to tell it that John Doe is a person and to find all the people that
are connected to John as friends. This will require a fundamental
rethinking of what a search engine does.

The solution to this is really a database problem. You need an
infinitely horizontally database that understands structure but is not
limited by it. And then you need some new kind of crawler to extract
data. Ideally you also have some sort of notification system that
allows this new search engine to be notified when individual records
change.

Getting There

A broad-based solution to this problem is what I would call Web 3.0
search, and it will be necessary for Web 3.0 to reach its full
potential in the same way search engines are critical to the current
Web universe. But despite its importance and obviousness, I think this
problem will not be solved by one of the major players, but by a
startup. The major’s have too much work on their plates already, and it
just makes more sense to let a focused startup figure all of this stuff
out and to acquire it later.

But once this nut is cracked, it will be possible to explore the world
of information in a way that makes the current incarnation of Google
seem almost silly. And I believe that creating such a search engine
would provide the motivation for almost every holder of actionable,
relevant data to make that data available in a form that is searchable
by such an engine. I do believe this is an, “if you build it they will
come” situation, because of the scale of problems that such a search
engine solves, and because way before something like this got to be
Google scale it would be invaluable.

As an example, imagine being able to search for all of the flights
between New York and anywhere, available on American Airlines that are
below $200. From each of the flights you could click to see the cities
associated with each flight. From each city you could see the top rated
restaurants with prices under $20 a piece. From there you could explore
their neighborhoods, etc.

In many respects an interesting presage of this is Metaweb’s Freebase.
Freebase allows you to explore data in much the way that I describe,
but it is a database, and not a search engine. They present themselves
as the structured version of Wikipedia and not the structured version
of Google. In its present form, I think Freebase really needs to either
become the next generation “structured data” search engine, or they
need to hope that someone else invents it. Because without a good
central search system users will just never think of Freebase.

There is no doubt that such a structured search engine will come to
pass. The need is too obvious and important. The most interesting
question is what is the smallest possible implementation of this that
actually does something useful will look like. Because while the need
is clear, the most capital efficient way to get there never is.



Things you can do from here:
- Subscribe to Alt Search Engines using Google Reader
- Get started using Google Reader to easily keep up with all your
favorite sites

Reply via email to