Tommy Pham wrote:
> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham <> wrote:
>> nope never been able to find any significant advantage; and thus ended
>> up using http uri's in my own domain space(s) which are always
>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>> dereferenced and server as both a universal (resource) identifier and a
>> locater.
>> ps: resource = anything that can be named.
> Hi Nathan,
> I'm interested in hearing your technique of generating your own uuid
> if you don't mind sharing :).  I'm building a project to test my idea
> about search engine and testing of different RDBMSes.  So naturally,
> it (the app) would crawl the net and I'd have over a 1 billion rows.
> Thanks,
> Tommy

Hi Tommy,

Always good to see somebody experimenting and questioning things :)

With regards generating UUID's which are http schema uri's; this is
something I got hooked up about early on, but then with practise
realised much of the worlds data already has globally known and used
http scheme identifiers; for instance if I'm talking about a web page
then it's the URL for it; a user may as well be
a country could be in the
rare occurrence where i actually need to create an identifier then
anything from a freebase style GUID (
through to a generated meaningful URI
/project/project-name or even just strapped to a class + microtime:
$uri = '' . microtime(true);

There are milions of approaches; but it's worth noting that with each
you can have extra functionality due to the identifier and locator
duality of http scheme uri's (thanks to the domain name system).

With regards what you are doing, if I may suggest a few things that you
could try:

You can create Identifiers that are spatial POINT()s and store them in
mysql/postgres using either the MySQL spatial extension or PostGIS
respectively. You can create identifiers using something like POINT(
timestamp, float-id ) which again serves a duality of timestamping each
record and identifying it. Moreover you'll be shocked at the speed gains
from spatial indexing (seriously amazing), and further it allows you to
do some pretty cool functionality with amazing speed.

The spatial indexing lets you leverage your information in some pretty
cool ways, at phenomenal speed. Because your data is essentially now
points in a virtual world where X is time and Y is identity, you can
pull information out by drawing MBRs around the data and thus selecting
say all records between timestampA and timestampB with identities in the
range 0-1832.1234 (we use floats rather than ints, far more scope and
lends to great spatial optimisation / boxing). Further you aren't
limited to basic geometries; you can create chains of data using
linestring, test intersections on time, disjunctions and much more;
again all with shocking speed over even the biggest of data sets (many
billions in under 0.001s).

You may also want to test out some non relational databases; as
typically with large datasets you have to remove all the relational
parts of the database (foreign keys etc) just to be able to run the
thing efficiently. There are many kv db's; nosql solutions and my
personal favourites which are quad/triple stores for EAV modeled data.

Taking an datachanging approach and working with + storing all data as
EAV triples is by far the fastest and most efficient way to make both
small and large sites; everything is stored in a single flat "table" and
you can query across all your data with great speed and chain queries
together linking up id's to access your data in ways you can't even
imagine ;) personally I'm running triple stores with 3-4 billion rows on
many machines, even on my desktop!

I'll leave it there, but something to get you started..


PHP General Mailing List (
To unsubscribe, visit:

Reply via email to