Re: Object field storage: Questions and alternatives
> I think it was unfortunate that "some thing like a hash table" was mentioned, > in passing, when the object feature was explained in detail. Thanks *very* much for the clarification. Of course now, we don't even know that about object field indexes ;-) > it was simply a demo to show how using an object property name, which is case sensitive, > is much faster than "Find in array". that was all. Presumably, the comparison was with an unsorted array. A binary search is competitive with a hash table, depending a lot on the hash. And binary searches are great for range searches whereas hash tables are *not* optimized for ranges. So even here "faster" depends on what you're searching for and why. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
Keisuke, You are *the* man on character encoding stuff in 4D, so thanks for adding your observations. I'm likely to be working with nothing but low ASCII, Latin1 data, for what it's worth. Just simple names for keys and numbers for most values. As far as the planning quesiton goes, I'm just trying to get some rules of thumb, not totally perfect numbers. And I think that I have that now: * 4D stores objects/text in UTF-16. * The data is stored in a binary format. * The binary format isn't really meaningfully more compact than the original text. (In fact, it may be larger for some reason.) That's enough for planning, if correct. And, for the application part: * An object field can be searched in various ways, so long as you use 4D's supported styles. * These styles of JSON are not (meant to be) space efficient. * If you want to do space-efficient, valid JSON, I don't think 4D offers any options. Nor does its parser. You need NTK to do very compact, valid JSON. I haven't used 4D's object fields much (obviously), but I've done a ton with JSON with various formats at scale. I've had to rework formats repeatedly in the past to address space and performance constraints. This is one reason I'd like a more complete JSON parser and code set natively in 4D. NTK is great, but not everyone has it and, in my experience, it's significantly slower than 4D's native JSON tools. (They're remarkably fast.) ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
one more thing, it is not incorrect to do something like { "\u0066\u006F\u006F":"\u0062\u0063\u0072" } instead of {"foo":"bar"} it's an extreme example, but it happens all the time in non-ASCII JSON. I may prefer one over the other, but really I don't care, and nor should I care, because it would be wrong to assume a particular implementation. in hindsight, I think it was unfortunate that "some thing like a hash table" was mentioned, in passing, when the object feature was explained in detail. it gave the wrong signal that there was some kind of clever optimisation going on, but 4D will not talk about it in detail. that was not the context. it was simply a demo to show how using an object property name, which is case sensitive, is much faster than "Find in array". that was all. it had nothing to do with QUERY BY ATTRIBUTE, it had nothing to do with automatic indexes, it definitely had nothing to do with optimising storage. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
Hello, I think it would not be telling the full story if you just focus on how UTF-16 takes double the size for standard ASCII, and therefore uneconomic. and I am not speaking because UTF-16 actually takes less space than UTF-8 for Japanese text. one must take into account that standard Windows API (wide characters) and macOS API (NSString) both use UTF-16, as does the file system (paths). so its make sense that 4D uses UTF-16 for internal string representation, since it can be used directly with native APIs. if UTF-8 was used for storage, then the data file may be smaller, but 4D would have to do the conversion back to UTF-16 before loading it to memory. > 2017/07/12 9:03、David Adams via 4D_Tech <4d_tech@lists.4d.com> のメール: > In 4D, I think you're always > using UTF-16, is that right? ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
On Wed, Jul 12, 2017 at 6:41 AM, Lee Hinde via 4D_Tech <4d_tech@lists.4d.com > wrote: > On Tue, Jul 11, 2017 at 3:10 AM, David Adams via 4D_Tech < > 4d_tech@lists.4d.com> wrote: > > > * Use a header object that describes the 'columns' and then use compact > > JSON arrays for the data. Rob Laveaux reminded me about this option some > > months back and it's a really decent compromise. > > > > That seems like a good option - it's the same, mostly, as a TSV/CSV. But it > seems like JSON isn't appropriate for your storage needs (compactness > trumps all), irrespective of how 4D stores it internally. > > Hey Lee! It depends. In the system I just described with 4D+MySQL+Mustache+PHP+DataTables+D3, some of the pre-rendered data was stored as JSON and some was stored as TSV and converted to JSON on the fly. The example I brought up here wasn't about my big flabby arrays - there are lots of ways to rework those. But bytes are bytes. Imagine that you had 100KB of sensible JSON, not silly JSON. It's the behavior of the storage engine that I'm interested in. As to storing JSON in a more compact format, well, I think you then lose all of the nifty search, etc. features that 4D offers. At that point, there's no strong argument for using JSON at all. I guess it's a bit easier to parse than plain text, so there's that. Actually, TSV is super easy to parse. As far as I can tell, price of admission for the enticing object field features is that you lay your JSON out as pure name-value pairs (at whatever level of nesting.) If that's how it works, fine. I just want to make sure so that I can make the right design choices. I tend to iterate through implementations from easier to more optimized. This whole question was just an iteration in that sort of process. Day 1: Stuff everything into an object field. OMG, it's huge! That's the point where the thread came in...but now I think I have a grip on the storage costs of JSON (however formatted) in 4D's engine. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
Okay, I'm back and think that I get it now. And, yeah, storage space with 4D object fields is definitely a planning issue. Headline result: Copy and paste a 1MB JSON from a text editor and save it in 4D and it's likely to take about 2MB to store. More below. Corrections welcomed. I've also posted my 'findings' (current notions) on the forums on France where, hopefully, an errors I've made will be addressed. (I've experienced some vigorous correction already on that thread, so clearly folks aren't being shy about telling me when I'm wrong.) On this thread, I'm also adding a story and some comments at the bottom for background and thoguht. The take-away from that is if you've got a small system, you'll probably never care about this and that if you have a big system you very likely might care a lot. Oh, and I posted a link to a feature request I made but Cannon Smith made a better one already, please vote for it: Option To Compress Object (and Text) Fields http://forums.4d.fr/Post/FR/17748608/1/17748609#17748609 Okay, as far as I can tell, the answer I was after is really simple: Question: How much room do objects require for storage in an object field? Answer: Very nearly the amount the object-as-JSON-without-whitespace would take stored as UTF-16. Put another way, if you take a 1MB JSON in a text editor, paste it into an object field and save it, it's likely to take roughly 2MB to store. Below are several points that I think are right (or close to it), but improvements and corrections are welcome: * Object fields don't store straight JSON, they use some kind of binary format. * The binary format seems to take up very nearly as much space as the raw JSON would, stripped of whitespace. (Padding between elements.) * It is likely that the storage space is double what you would expect because 4D tables always use UTF-16. If you need to store large character data, then UTF-16 is what you need. If you don't need to store large character data, then UTF-16 doubles the storage requirements for all your alpha/text/object data. If you're using something like SQLLite, MySQL/MariaDB, or PostgreSQL, you can specify the character set on a per-table basis. (All support UTF8, all but SQLite support even smaller character encodings.) So, if you're dealing with pure ASCII-like data, you could potentially save it without compression in another character set that takes 1/4 or 1/2 as much space as UTF-16. In 4D, I think you're always using UTF-16, is that right? * I don't care what the binary format is. No one else cares either ;-) * The binary format may change in future versions. Again, no one minds or cares about the details. * There are a lot of ways to format and store JSON. Some are inefficient (like the example I was using), some are more compact. This is a pretty well-travelled subject in the world as JSON is so common, regardless of language. I'll skip discussing how to pack and organize JSON as it's a different subject entirely from how the JSON is stored and handled by the DB engine. * If you want to use 4D's object field magic features, your data needs to be laid out in name-value pairs. And the name-value pairs are *stored in the data without compression.* You can see them right in the hex of the 4DD. * If you don't need to use 4D's object field magic features, you can rewrite your JSON to be more compact. Or, for that matter, you can convert it to something even more compact than that and store it as text. It would be great to have the option to compress JSON stored by 4D, so long as it allows indexed searching to work properly. The idea is that if you're using 4D to as a repository for flabby data (logs, instrumentation reports, etc.) or pre-rendered JSON for export/serving, you don't typically need to search into it, indexes are just fine. There's a cost to the compression/decompression, but that could pay for itself quickly in some situations. I put in a request like this and then Cannon Smith posted a link to a better-written existing request: Option To Compress Object (and Text) Fields http://forums.4d.fr/Post/FR/17748608/1/17748609#17748609 Please vote for Cannon's request. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
On Tue, Jul 11, 2017 at 3:10 AM, David Adams via 4D_Tech < 4d_tech@lists.4d.com> wrote: > * Use a header object that describes the 'columns' and then use compact > JSON arrays for the data. Rob Laveaux reminded me about this option some > months back and it's a really decent compromise. > That seems like a good option - it's the same, mostly, as a TSV/CSV. But it seems like JSON isn't appropriate for your storage needs (compactness trumps all), irrespective of how 4D stores it internally. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: Object field storage: Questions and alternatives
I was making some related comments on the forums in France today and Rob Laveaux, a hero of the 4D world, made a great technical post: Hi David, Please elaborate what makes you conclude this? You seem to imply that objects fields are stored in JSON format, but I'm sorry ... that is not true. Take a hex-editor to examine your datafile and you will see they are stored in a binary format. I just want to set this straight, before it is taken as a fact by others. Of course, if you store large objects, they occupy disk space. Just like large text, picture or blob fields. Compression of data is possible, but it comes at a cost: the cost of decompression and recompression. This takes CPU time, which is more costly than the cost of disk space. But I agree, it would be nice to have data compression as an option. Greetings, Rob I wanted to repost it here for the sake of the archives because, according to Rob, I'm starting an erroneous superstition. I'm still in the dark about the real-world story and trade-offs in 4D'S object fields (I tired them for the first time yesterday), and hopefully we'll get some more details out of the folks in France. For now, it looks like, based on what Rob says, thing aren't as dire as I feared - but I don't have an easy way to quantify that yet. Not keen to get out a hex editor and start experimenting/counting when France could just tell us. ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **