Since the list will be closing soon, I thought I'd put up for discussion an idea a friend and I had back in 2000. I put it here partly as a thought provoker, and partly so it doesn't get lost. I had it on Microsoft's community groups, but it never got any traffic so died a death. (By the way, my Outlook spelling check wanted to change "Microsoft's" to "Microfossil's")
I'm adding my original stuff to the bottom of this message, and I extended the idea in some thoughts I put up at http://emeraldglenlodge.co.nz/superpick.html. Part of the extension was to point out that if the marks (like attribute marks) were changed to be 9C-9F then there wouldn't be a clash with the characters used in the internet. Even using 0C-0F would work better. Interestingly, having followed the recent threads Dawn has engaged in at comp.databases.theory, I have retreated a bit from my original position. When you think about it, the fact we have (in general) only one level of data 'nesting' means that we don't get a heirachical structure that is difficult to understand as a single conceptual "thang". Codd's original paper drew back from having relations within relations, maybe because he didn't consider the Pick idea of limiting the depth of this structure? Anyway, following is the original idea (although maybe calling it "SuperPick was a conceit - I could be modest and follow established precedent and call it "Johnson") Regards, Keith. SuperPick Copyright Keith Johnson 2000 Background My experience has been as an application programmer using Pick-type databases. Within these all data is represented as an ASCII string using delimiters to separate fields. Pick allows three levels of fields called attributes, values, and sub-values using characters 254, 253 and 252 as the respective delimiters. I was seeking a method of storing data which would be similar, but which could cope with a theoretically unlimited nested structure. This structure would work well for the sort of data I see in my work - names, dates, addresses, money, product codes, etc. It would also convert easily to an XML form, which I see as the coming data interchange format. My colleague Ron Knox, one day came up with the idea that 'brackets' would allow for nesting of any depth. We refined this idea over time into a data structure one that Ron has called "Noble" (as in, it's not base!). "SuperPick" is Noble with a data map concept added which allows the data to be easily converted to XML. Considering XML XML itself is interesting and I could see in it Pick-like things, such as repeating fields, but it annoys me to see the verbosity associated with the tag mechanism. In Pick, one describes fields by their position within the record that is, by counting delimiters. From my experience with Pick, parsing out fields for manipulation does not have an adverse effect on performance as long as you try to avoid extremely long strings, and actively code to avoid re-parsing long strings as much as possible. XML, being verbose, would be more vulnerable to that sort of performance problem. The mechanism required to pull a field out of XML is more complex than delimiter counting, as it has to match the tag strings surrounding the field. This is more difficult than it sounds, because tags do not have to be unique. Format of example Described below is a mechanism for storing data - "SuperPick . Under this mechanism there is there is the data itself, and a map. Unlike Pick, the map is required. Both map and data are stored as text strings with four special characters file start, file end, record separator, and field separator. In this example I have used the left square bracket as the file start, the right square bracket as the file end, the pipe as the record separator, and the backslash as the field separator that is, []|\ respectively. This is not to say these are the characters that would in fact be used, just that they are clear. The example The map is [Customer\0\file|FirstName\1|LastName\2|CreditLimit\3|OrderEntry\4\file|Orde rID\4,1|OrderDetail\4,2\file|Title\4,2,1| Author\4,2,2|Price\4,2,3] While the data is [Amy\Higginbottom\5000\[16273\[Number, the Language of Science\Danzig\5.95|Tales of Grandpa Cat\Wardlaw, Lee\6.58]]] The map Taking the map first, it consists of a file with ten records. If we put each record on a separate line, we can see that each one is made up of two or three fields. The fields are a name, a position, and a type. The type defaults to a standard one, which may be called generic field Customer\0\file FirstName\1 LastName\2 CreditLimit\3 OrderEntry\4\file OrderID\4,1 OrderDetail\4,2\file Title\4,2,1 Author\4,2,2 Price\4,2,3 The map means that data is held in a file called "Customer in four fields. The first three fields are standard ones called FirstName, LastName, and CreditLimit. The fourth is a sub-file called OrderEntry. The OrderEntry sub-file has two fields, the first is a standard one called OrderId and the second is a further sub-file called OrderDetail. OrderDetail contains three fields, which are Title, Author, and Price. Positional referencing The positional numbers are the key concept in the mechanism. The field separators define what field a datum actually is, and a missing one will totally destroy meaning. They do however, give a way to refer to data within the structure in an unambiguous way using a notation whereby Customer{1} is the first name, Customer{4,2,2} is the set of authors, and Customer{4,2.2,3} is the price "6.58" in the example. That is, while Customer is the entire file, Customer{} is a record from that file. A perspicacious comment I recently read said that the dynamic reference (like VAR<3>) in Pick was not a variable, but a process. Here I am claiming for SuperPick the curly brackets in a line like READ CUSTOMER.REC{} FROM CUSTOMERFILE,ID ELSE STOP. Then I could have a line like AUTHORS = CUSTOMER.REC{4,2,2} which in Pick terms this would be "Danzig :CHAR(254): Wardlaw, Lee . This would be different from AUTHORS{} = CUSTOMER.REC{4,2,2} which would be "Danzig|Wardlaw, Lee the pipe representing whatever was used as a record delimiter. The data If we now look at the data [Amy\Higginbottom\5000\[16273,[Number, the Language of Science\Danzig\5.95|Tales of Grandpa Cat\Wardlaw, Lee\6.58]]] We can see that it is a file of one record. Laying this out with an indent for each sub-file shows the record structure as below Amy\Higginbottom\5000\ 16273\ Number, the Language of Science\Danzig\5.95 Tales of Grandpa Cat\Wardlaw, Lee\6.58 Expanding this and labeling each field gives Firstname Amy LastName Higginbottom CreditLimit 5000 OrderEntry OrderID 16273 OrderDetail Title Number, the Language of Science Author Danzig Price 5.95 OrderDetail Title Tales of Grandpa Cat Author Wardlaw, Lee Price 6.58 In XML form And from there, I can put the data into the XML form <Customer> <FirstName>Amy</FirstName> <LastName>Higginbottom</LastName> <CreditLimit>500</CreditLimit> <OrderEntry> <OrderID>16273</OrderID> <OrderDetail> <Title>Number, the Language of Science</Title> <Author>Danzig</Author> <Price>5.95</Price> </OrderDetail> <OrderDetail> <Title>Tales of Grandpa Cat</Title> <Author>Wardlaw, Lee</Author> <Price>6.58</Price> </OrderDetail> </OrderEntry> </Customer> Comparison The previous XML was the original example I picked up from the Internet when I was first looking at XML and thought "this is SO verbose . While it s not entirely a fair comparison, the XML version is 402 characters while the map and data added together are 263 characters. I have done tests that indicate one could reduce the typical XML by about 50-60% and that a zipped file would be about 20% shorter than zipping the original XML. Some extra thoughts The mechanism as described does not cover using XML attributes interchangeably with tags for storing data. This is not outside the XML specification, but attributes do seem to belong in the DTD in my opinion. However, an extension to the types in the map could easily cover this. The map does not intrude into areas covered by the XML DTD, but perhaps it should for SuperPick itself. I see defining here whether a field is a date or a number, and whether it is mandatory. A useful extension would be to have a limit on the number of records allowed in a file. The practical circumstance covered here is something like an address (a multi-value in Pick, a sub-file with one field in SuperPick) where you want to limit the number of lines so it fits on a label. Within a SuperPick dictionary, you could refer to the map field names directly or to something like @SRECORD{4,2.2,2} perhaps. The map is an intrinsic part of a file, but one could have a seperate dictionary item (something like Pick's file translations) that links this file to another. A query would then return its results in the form of another map (which covers only the data set requested) and the requested data. The query results could include stuff from files at one, two, or more removes from the one initially interrogated. Logically, any 'jump' to another file results in a new sub-file in the query results. One way to implement this would be to use the Berkeley DB http://www.sleepycat.com/products.html and add keys to the map. In this case one could add keys to the map like keypartA/1/key|keypartB/2/key... The key would be a string, and a record would be another string - Berkeley DB will support this. Weaknesses The obvious one is that the data is not as easily read as XML. However, it is not impossible to read the data if one has the map. It would be easy to write software that presents the data in a readable form (something like the intermediate forms I used above) with perhaps a zooming facility for sub-files. A map that put a lot of usually empty fields first would make records with lots of leading delimiters. The data would take more space than required, possibly more than an XML version. It is not possible to build the map without knowing all the structure required, because there is a difference between fields and sub-files. My feeling is that this would provide a gentle push to make the structure an efficient representation of the data, in these terms. Also, I can see that it would be relatively easy to go through a file, counting the fields, and then to re-structure the file to be more 'efficient' with an automatic process. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
