Re: [PHP] Re: using UID in DB

2010-03-31 Thread Nathan Rixham
Tommy Pham wrote:
> On Wed, Mar 31, 2010 at 11:01 AM, Nathan Rixham  wrote:
>> Tommy Pham wrote:
>>> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham  wrote:
 Tommy Pham wrote:
> As for spatial data types, I've never find much use for non scientific
> related.  (example) If using point as a PK, if MySQL stores it the
> same way as PostgreSQL which is 16 bytes, how is that any different -
> performance wise - than using UUID and storing it as binary(16) for
> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>
 it's all about the indexing (R-Tree)

 http://en.wikipedia.org/wiki/R-tree

>>> I can see where the performance would be between B-Tree vs R-Tree for
>>> the same field size but I've yet to see real life application of it.
>>> Case in point, if using point for GPS data coordinates, then wouldn't
>>> it still be a lot better to use Lon (float), Lat (float) which is 2
>>> fields of 4 bytes each vs 1 field of 16 bytes?  The index would still
>>> be faster (8 bytes less) in B-Tree right?  Not to mention smaller row
>>> & DB size.
>> wish I still had an application to point you at; and I can't assert in
>> any other way just how much faster it is; especially over large datasets
>> - regardless of the amount of rows the style of index doesn't slow
>> (spatial) queries down.
> 
> If I see the PoC w/o having to see the actual setup, then I would
> probably understand your point better.
> 
>> if index byte size is of importance then negate spatial indexes (which
>> are rather a lot bigger).
>>
> 
> Since the spatial indexes are bigger, wouldn't that work against it
> due disk access having to read more?

depends on how seriously you take a solution tbh; if you just wanna run
a huge dataset on a sata drive w/ 2gb ram then your screwed whatever
approach (unless you turn the dataset in to many fine grained resources,
heavily cached and only accessed on a need to have basis) but even then
there are many limitations.

one approach is to throw more ram at a situation; where you are
definitely better loading a huge dataset in to one box with massive
amounts of ram than splitting it over multiple boxes. [1]

http://rickonrails.wordpress.com/2009/03/30/big-ole-mysql-spatial-table-optimization-tricks/

if you're really serious then you want to be on a box with a tonne of
ram and pci express solid state storage devices which give you
throughput of 700MB/s and higher; prices aren't that bad comparatively
and one machine w/ a $800 card added works much better than 2-3 servers
(although you still need to cover replication and high availability).

http://www.fusionio.com/ioxtreme/

wandering off course a bit maybe; but these are situations we need to
consider.

I don't know how big you think big is; but realistically aws ec2
instances coupled with various "big data" / nosql - non rdbms approaches
are the way to go (see many many many recent discussions on the subject
around the net)

>> regardless though, it was just an idea I was throwing at you, not
>> suggesting it's the best approach, but worth consideration depending on
>> your priorities.
>>
> 
> Thanks for your idea.  I'd appreciate it very much.  I'm told that I
> over analyze things at times ... lol ... so please don't take it the
> wrong way :)

ty for clarifying, and deal - I'm the same :)

>>  As for calculating the distance between 2 'points', it
>>> would be simpler for querying purposes to use 'point'.  ATM, I don't
>>> have need to do any such calculations.  If I do use 'point' for PK,
>>> I'd run into problems with scaling and migration later.
>> the points don't matter tbh; or should i say specifying an x and a y
>> doesn't matter, because you can simply get by a single id, or get a
>> range of ids very quickly; the other value comes in to play when you
>> want temporal queries.
>>
>> again though, i reiterate only useful if you want speed and don't care
>> about index bytesize - and certainly not forcing it down your neck.
>>
>> re scaling.. unsure theoretically you have 15 digit precision which is a
>> mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
>> incomprehensible number of rows are possible before a collision.
>>
>> all in all: there are many good reasons why I've only used this on
>> occassion (specifically when dealing with 1 million+ rows in an RDBMS
>> table); and many more good reasons why i stick to non-rdbms databases
>> and use http uri's + eav model data only nowadays.
>>
>> The latter i would advocate, the former not so unless you are stuck w/
>> mysql postgres - in which case you can't get faster. [1]
>>
>> [1] get some numbers to prove when i have time.
>>
>> Nathan
>>
> 
> Using HTTP URI's + EAV model is something I'll definitely look into.
> 

good good; i can assure you it's the future; all the major corps and
governments are moving to this (age-old) model under the banner of
linked data, odata, and partially gdata - and meanwhile the development
community is slowly 

Re: [PHP] Re: using UID in DB

2010-03-31 Thread Tommy Pham
On Wed, Mar 31, 2010 at 11:16 AM, tedd  wrote:
>
> I didn't say otherwise.
>
> At some point, you asked how others solve the unique problem and I provided
> my solution.
>
> Cheers,
>
> tedd
>

Sorry, I misread that.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Tommy Pham
On Wed, Mar 31, 2010 at 11:01 AM, Nathan Rixham  wrote:
> Tommy Pham wrote:
>> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham  wrote:
>>> Tommy Pham wrote:
 As for spatial data types, I've never find much use for non scientific
 related.  (example) If using point as a PK, if MySQL stores it the
 same way as PostgreSQL which is 16 bytes, how is that any different -
 performance wise - than using UUID and storing it as binary(16) for
 MySQL or uniqueidentifier (16 bytes) for PostgreSQL?

>>> it's all about the indexing (R-Tree)
>>>
>>> http://en.wikipedia.org/wiki/R-tree
>>>
>>
>> I can see where the performance would be between B-Tree vs R-Tree for
>> the same field size but I've yet to see real life application of it.
>> Case in point, if using point for GPS data coordinates, then wouldn't
>> it still be a lot better to use Lon (float), Lat (float) which is 2
>> fields of 4 bytes each vs 1 field of 16 bytes?  The index would still
>> be faster (8 bytes less) in B-Tree right?  Not to mention smaller row
>> & DB size.
>
> wish I still had an application to point you at; and I can't assert in
> any other way just how much faster it is; especially over large datasets
> - regardless of the amount of rows the style of index doesn't slow
> (spatial) queries down.

If I see the PoC w/o having to see the actual setup, then I would
probably understand your point better.

>
> if index byte size is of importance then negate spatial indexes (which
> are rather a lot bigger).
>

Since the spatial indexes are bigger, wouldn't that work against it
due disk access having to read more?

> regardless though, it was just an idea I was throwing at you, not
> suggesting it's the best approach, but worth consideration depending on
> your priorities.
>

Thanks for your idea.  I'd appreciate it very much.  I'm told that I
over analyze things at times ... lol ... so please don't take it the
wrong way :)

>  As for calculating the distance between 2 'points', it
>> would be simpler for querying purposes to use 'point'.  ATM, I don't
>> have need to do any such calculations.  If I do use 'point' for PK,
>> I'd run into problems with scaling and migration later.
>
> the points don't matter tbh; or should i say specifying an x and a y
> doesn't matter, because you can simply get by a single id, or get a
> range of ids very quickly; the other value comes in to play when you
> want temporal queries.
>
> again though, i reiterate only useful if you want speed and don't care
> about index bytesize - and certainly not forcing it down your neck.
>
> re scaling.. unsure theoretically you have 15 digit precision which is a
> mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
> incomprehensible number of rows are possible before a collision.
>
> all in all: there are many good reasons why I've only used this on
> occassion (specifically when dealing with 1 million+ rows in an RDBMS
> table); and many more good reasons why i stick to non-rdbms databases
> and use http uri's + eav model data only nowadays.
>
> The latter i would advocate, the former not so unless you are stuck w/
> mysql postgres - in which case you can't get faster. [1]
>
> [1] get some numbers to prove when i have time.
>
> Nathan
>

Using HTTP URI's + EAV model is something I'll definitely look into.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread tedd

At 11:08 AM -0700 3/31/10, Tommy Pham wrote:

On Wed, Mar 31, 2010 at 10:23 AM, tedd  wrote:

 At 5:56 PM -0700 3/30/10, Tommy Pham wrote:


 On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:


  nope never been able to find any significant advantage; and thus ended
  up using http uri's in my own domain space(s) which are always
  guaranteed to be unique as I'm issuing them. bonus is that they can be
  dereferenced and server as both a universal (resource) identifier and a
  locater.

  ps: resource = anything that can be named.



 Hi Nathan,

 I'm interested in hearing your technique of generating your own uuid
 if you don't mind sharing :).  I'm building a project to test my idea
 about search engine and testing of different RDBMSes.  So naturally,
 it (the app) would crawl the net and I'd have over a 1 billion rows.

 Thanks,
 Tommy

 PS: Here are some info for those who haven't heard of UUID/GUID:

 http://affy.blogspot.com/2003/03/why-use-uuid-values.html
 http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/



 I've read your links and see the problem presented.

 The solution is to create an absolutely unique user ID and therein lies the
 problem. How do you create something that is absolutely unique?

 Clearly, if you have one database creating records, an auto_increment will
 work for creating an unique number for that's what increment does.  However,
 if you have more than one database creating records at the same time, then
 conflicts would occur.

 I had a similar problem recently where I used two fields to identify a
 record as being "unique". One field used the exact time the record was added
 to the database and the other field was a random number. Combining the two
 numbers I believed I had a unique number for the taks I was doing. Of
 course, if activity was spread across hundreds of servers with millions of
 entries per second, then my method would not be a solution. As such, the
 solution should be tailored to the problem.

 Cheers,

 tedd



The original implementation of UUID/GUID includes MAC of the NIC which
will guarantee that uniqueness.  But because of certain privacy issues
brought up, they have to come up with other methods, though slightly
less unique. I've not heard of someone running into collision using
UUID/GUID.


I didn't say otherwise.

At some point, you asked how others solve the unique problem and I 
provided my solution.


Cheers,

tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Tommy Pham
On Wed, Mar 31, 2010 at 10:23 AM, tedd  wrote:
> At 5:56 PM -0700 3/30/10, Tommy Pham wrote:
>>
>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:
>>
>>>  nope never been able to find any significant advantage; and thus ended
>>>  up using http uri's in my own domain space(s) which are always
>>>  guaranteed to be unique as I'm issuing them. bonus is that they can be
>>>  dereferenced and server as both a universal (resource) identifier and a
>>>  locater.
>>>
>>>  ps: resource = anything that can be named.
>>>
>>
>> Hi Nathan,
>>
>> I'm interested in hearing your technique of generating your own uuid
>> if you don't mind sharing :).  I'm building a project to test my idea
>> about search engine and testing of different RDBMSes.  So naturally,
>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>
>> Thanks,
>> Tommy
>>
>> PS: Here are some info for those who haven't heard of UUID/GUID:
>>
>> http://affy.blogspot.com/2003/03/why-use-uuid-values.html
>> http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
>>
>
> I've read your links and see the problem presented.
>
> The solution is to create an absolutely unique user ID and therein lies the
> problem. How do you create something that is absolutely unique?
>
> Clearly, if you have one database creating records, an auto_increment will
> work for creating an unique number for that's what increment does.  However,
> if you have more than one database creating records at the same time, then
> conflicts would occur.
>
> I had a similar problem recently where I used two fields to identify a
> record as being "unique". One field used the exact time the record was added
> to the database and the other field was a random number. Combining the two
> numbers I believed I had a unique number for the taks I was doing. Of
> course, if activity was spread across hundreds of servers with millions of
> entries per second, then my method would not be a solution. As such, the
> solution should be tailored to the problem.
>
> Cheers,
>
> tedd
>

The original implementation of UUID/GUID includes MAC of the NIC which
will guarantee that uniqueness.  But because of certain privacy issues
brought up, they have to come up with other methods, though slightly
less unique. I've not heard of someone running into collision using
UUID/GUID.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Nathan Rixham
Tommy Pham wrote:
> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham  wrote:
>> Tommy Pham wrote:
>>> As for spatial data types, I've never find much use for non scientific
>>> related.  (example) If using point as a PK, if MySQL stores it the
>>> same way as PostgreSQL which is 16 bytes, how is that any different -
>>> performance wise - than using UUID and storing it as binary(16) for
>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>>
>> it's all about the indexing (R-Tree)
>>
>> http://en.wikipedia.org/wiki/R-tree
>>
> 
> I can see where the performance would be between B-Tree vs R-Tree for
> the same field size but I've yet to see real life application of it.
> Case in point, if using point for GPS data coordinates, then wouldn't
> it still be a lot better to use Lon (float), Lat (float) which is 2
> fields of 4 bytes each vs 1 field of 16 bytes?  The index would still
> be faster (8 bytes less) in B-Tree right?  Not to mention smaller row
> & DB size.

wish I still had an application to point you at; and I can't assert in
any other way just how much faster it is; especially over large datasets
- regardless of the amount of rows the style of index doesn't slow
(spatial) queries down.

if index byte size is of importance then negate spatial indexes (which
are rather a lot bigger).

regardless though, it was just an idea I was throwing at you, not
suggesting it's the best approach, but worth consideration depending on
your priorities.

  As for calculating the distance between 2 'points', it
> would be simpler for querying purposes to use 'point'.  ATM, I don't
> have need to do any such calculations.  If I do use 'point' for PK,
> I'd run into problems with scaling and migration later.

the points don't matter tbh; or should i say specifying an x and a y
doesn't matter, because you can simply get by a single id, or get a
range of ids very quickly; the other value comes in to play when you
want temporal queries.

again though, i reiterate only useful if you want speed and don't care
about index bytesize - and certainly not forcing it down your neck.

re scaling.. unsure theoretically you have 15 digit precision which is a
mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
incomprehensible number of rows are possible before a collision.

all in all: there are many good reasons why I've only used this on
occassion (specifically when dealing with 1 million+ rows in an RDBMS
table); and many more good reasons why i stick to non-rdbms databases
and use http uri's + eav model data only nowadays.

The latter i would advocate, the former not so unless you are stuck w/
mysql postgres - in which case you can't get faster. [1]

[1] get some numbers to prove when i have time.

Nathan

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Tommy Pham
On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham  wrote:
> Tommy Pham wrote:
>>
>> As for spatial data types, I've never find much use for non scientific
>> related.  (example) If using point as a PK, if MySQL stores it the
>> same way as PostgreSQL which is 16 bytes, how is that any different -
>> performance wise - than using UUID and storing it as binary(16) for
>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>
>
> it's all about the indexing (R-Tree)
>
> http://en.wikipedia.org/wiki/R-tree
>

I can see where the performance would be between B-Tree vs R-Tree for
the same field size but I've yet to see real life application of it.
Case in point, if using point for GPS data coordinates, then wouldn't
it still be a lot better to use Lon (float), Lat (float) which is 2
fields of 4 bytes each vs 1 field of 16 bytes?  The index would still
be faster (8 bytes less) in B-Tree right?  Not to mention smaller row
& DB size.  As for calculating the distance between 2 'points', it
would be simpler for querying purposes to use 'point'.  ATM, I don't
have need to do any such calculations.  If I do use 'point' for PK,
I'd run into problems with scaling and migration later.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread tedd

At 5:56 PM -0700 3/30/10, Tommy Pham wrote:

On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:


 nope never been able to find any significant advantage; and thus ended
 up using http uri's in my own domain space(s) which are always
 guaranteed to be unique as I'm issuing them. bonus is that they can be
 dereferenced and server as both a universal (resource) identifier and a
 locater.

 ps: resource = anything that can be named.



Hi Nathan,

I'm interested in hearing your technique of generating your own uuid
if you don't mind sharing :).  I'm building a project to test my idea
about search engine and testing of different RDBMSes.  So naturally,
it (the app) would crawl the net and I'd have over a 1 billion rows.

Thanks,
Tommy

PS: Here are some info for those who haven't heard of UUID/GUID:

http://affy.blogspot.com/2003/03/why-use-uuid-values.html
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/



I've read your links and see the problem presented.

The solution is to create an absolutely unique user ID and therein 
lies the problem. How do you create something that is absolutely 
unique?


Clearly, if you have one database creating records, an auto_increment 
will work for creating an unique number for that's what increment 
does.  However, if you have more than one database creating records 
at the same time, then conflicts would occur.


I had a similar problem recently where I used two fields to identify 
a record as being "unique". One field used the exact time the record 
was added to the database and the other field was a random number. 
Combining the two numbers I believed I had a unique number for the 
taks I was doing. Of course, if activity was spread across hundreds 
of servers with millions of entries per second, then my method would 
not be a solution. As such, the solution should be tailored to the 
problem.


Cheers,

tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Nathan Rixham
Tommy Pham wrote:
> 
> As for spatial data types, I've never find much use for non scientific
> related.  (example) If using point as a PK, if MySQL stores it the
> same way as PostgreSQL which is 16 bytes, how is that any different -
> performance wise - than using UUID and storing it as binary(16) for
> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
> 

it's all about the indexing (R-Tree)

http://en.wikipedia.org/wiki/R-tree




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Tommy Pham
On Wed, Mar 31, 2010 at 2:57 AM, Nathan Rixham  wrote:
> Tommy Pham wrote:
>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:
>>
>>> nope never been able to find any significant advantage; and thus ended
>>> up using http uri's in my own domain space(s) which are always
>>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>>> dereferenced and server as both a universal (resource) identifier and a
>>> locater.
>>>
>>> ps: resource = anything that can be named.
>>>
>>
>> Hi Nathan,
>>
>> I'm interested in hearing your technique of generating your own uuid
>> if you don't mind sharing :).  I'm building a project to test my idea
>> about search engine and testing of different RDBMSes.  So naturally,
>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>
>> Thanks,
>> Tommy
>
> Hi Tommy,
>
> Always good to see somebody experimenting and questioning things :)
>
> With regards generating UUID's which are http schema uri's; this is
> something I got hooked up about early on, but then with practise
> realised much of the worlds data already has globally known and used
> http scheme identifiers; for instance if I'm talking about a web page
> then it's the URL for it; a user may as well be http://twitter.com/webr3
> a country could be http://dbpedia.org/resource/United_Kingdom in the
> rare occurrence where i actually need to create an identifier then
> anything from a freebase style GUID (http://example.org/GUID-HERE)
> through to a generated meaningful URI http://mydomain.com/user/username
> /project/project-name or even just strapped to a class + microtime:
> $uri = 'http://mydomain.com/__CLASS__/' . microtime(true);
>
> There are milions of approaches; but it's worth noting that with each
> you can have extra functionality due to the identifier and locator
> duality of http scheme uri's (thanks to the domain name system).
>

Very interesting approach.  I'll have to think and research more into it.

> With regards what you are doing, if I may suggest a few things that you
> could try:
>
> You can create Identifiers that are spatial POINT()s and store them in
> mysql/postgres using either the MySQL spatial extension or PostGIS
> respectively. You can create identifiers using something like POINT(
> timestamp, float-id ) which again serves a duality of timestamping each
> record and identifying it. Moreover you'll be shocked at the speed gains
> from spatial indexing (seriously amazing), and further it allows you to
> do some pretty cool functionality with amazing speed.
>
> The spatial indexing lets you leverage your information in some pretty
> cool ways, at phenomenal speed. Because your data is essentially now
> points in a virtual world where X is time and Y is identity, you can
> pull information out by drawing MBRs around the data and thus selecting
> say all records between timestampA and timestampB with identities in the
> range 0-1832.1234 (we use floats rather than ints, far more scope and
> lends to great spatial optimisation / boxing). Further you aren't
> limited to basic geometries; you can create chains of data using
> linestring, test intersections on time, disjunctions and much more;
> again all with shocking speed over even the biggest of data sets (many
> billions in under 0.001s).
>
> You may also want to test out some non relational databases; as
> typically with large datasets you have to remove all the relational
> parts of the database (foreign keys etc) just to be able to run the
> thing efficiently. There are many kv db's; nosql solutions and my
> personal favourites which are quad/triple stores for EAV modeled data.
>
> Taking an datachanging approach and working with + storing all data as
> EAV triples is by far the fastest and most efficient way to make both
> small and large sites; everything is stored in a single flat "table" and
> you can query across all your data with great speed and chain queries
> together linking up id's to access your data in ways you can't even
> imagine ;) personally I'm running triple stores with 3-4 billion rows on
> many machines, even on my desktop!
>
> I'll leave it there, but something to get you started..
>
> Regards!
>

As for spatial data types, I've never find much use for non scientific
related.  (example) If using point as a PK, if MySQL stores it the
same way as PostgreSQL which is 16 bytes, how is that any different -
performance wise - than using UUID and storing it as binary(16) for
MySQL or uniqueidentifier (16 bytes) for PostgreSQL?

Thanks,
Tommy

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-31 Thread Nathan Rixham
Tommy Pham wrote:
> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:
> 
>> nope never been able to find any significant advantage; and thus ended
>> up using http uri's in my own domain space(s) which are always
>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>> dereferenced and server as both a universal (resource) identifier and a
>> locater.
>>
>> ps: resource = anything that can be named.
>>
> 
> Hi Nathan,
> 
> I'm interested in hearing your technique of generating your own uuid
> if you don't mind sharing :).  I'm building a project to test my idea
> about search engine and testing of different RDBMSes.  So naturally,
> it (the app) would crawl the net and I'd have over a 1 billion rows.
> 
> Thanks,
> Tommy

Hi Tommy,

Always good to see somebody experimenting and questioning things :)

With regards generating UUID's which are http schema uri's; this is
something I got hooked up about early on, but then with practise
realised much of the worlds data already has globally known and used
http scheme identifiers; for instance if I'm talking about a web page
then it's the URL for it; a user may as well be http://twitter.com/webr3
a country could be http://dbpedia.org/resource/United_Kingdom in the
rare occurrence where i actually need to create an identifier then
anything from a freebase style GUID (http://example.org/GUID-HERE)
through to a generated meaningful URI http://mydomain.com/user/username
/project/project-name or even just strapped to a class + microtime:
$uri = 'http://mydomain.com/__CLASS__/' . microtime(true);

There are milions of approaches; but it's worth noting that with each
you can have extra functionality due to the identifier and locator
duality of http scheme uri's (thanks to the domain name system).

With regards what you are doing, if I may suggest a few things that you
could try:

You can create Identifiers that are spatial POINT()s and store them in
mysql/postgres using either the MySQL spatial extension or PostGIS
respectively. You can create identifiers using something like POINT(
timestamp, float-id ) which again serves a duality of timestamping each
record and identifying it. Moreover you'll be shocked at the speed gains
from spatial indexing (seriously amazing), and further it allows you to
do some pretty cool functionality with amazing speed.

The spatial indexing lets you leverage your information in some pretty
cool ways, at phenomenal speed. Because your data is essentially now
points in a virtual world where X is time and Y is identity, you can
pull information out by drawing MBRs around the data and thus selecting
say all records between timestampA and timestampB with identities in the
range 0-1832.1234 (we use floats rather than ints, far more scope and
lends to great spatial optimisation / boxing). Further you aren't
limited to basic geometries; you can create chains of data using
linestring, test intersections on time, disjunctions and much more;
again all with shocking speed over even the biggest of data sets (many
billions in under 0.001s).

You may also want to test out some non relational databases; as
typically with large datasets you have to remove all the relational
parts of the database (foreign keys etc) just to be able to run the
thing efficiently. There are many kv db's; nosql solutions and my
personal favourites which are quad/triple stores for EAV modeled data.

Taking an datachanging approach and working with + storing all data as
EAV triples is by far the fastest and most efficient way to make both
small and large sites; everything is stored in a single flat "table" and
you can query across all your data with great speed and chain queries
together linking up id's to access your data in ways you can't even
imagine ;) personally I'm running triple stores with 3-4 billion rows on
many machines, even on my desktop!

I'll leave it there, but something to get you started..

Regards!

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: using UID in DB

2010-03-30 Thread Tommy Pham
On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham  wrote:

> nope never been able to find any significant advantage; and thus ended
> up using http uri's in my own domain space(s) which are always
> guaranteed to be unique as I'm issuing them. bonus is that they can be
> dereferenced and server as both a universal (resource) identifier and a
> locater.
>
> ps: resource = anything that can be named.
>

Hi Nathan,

I'm interested in hearing your technique of generating your own uuid
if you don't mind sharing :).  I'm building a project to test my idea
about search engine and testing of different RDBMSes.  So naturally,
it (the app) would crawl the net and I'd have over a 1 billion rows.

Thanks,
Tommy

PS: Here are some info for those who haven't heard of UUID/GUID:

http://affy.blogspot.com/2003/03/why-use-uuid-values.html
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: using UID in DB

2010-03-30 Thread Nathan Rixham
Tommy Pham wrote:
> Hi,
> 
> I'm just wondering if anyone on this list using some type of
> UID/UUID/GUID in any of the DB?  If so, what DBMS/RDBMS are you using
> and how many rows do you have for the table(s) using it?  What data
> type are you using for that column?

nope never been able to find any significant advantage; and thus ended
up using http uri's in my own domain space(s) which are always
guaranteed to be unique as I'm issuing them. bonus is that they can be
dereferenced and server as both a universal (resource) identifier and a
locater.

ps: resource = anything that can be named.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php