Re: Object field storage: Questions and alternatives

2017-07-11 Thread David Adams via 4D_Tech
> I think it was unfortunate that  "some thing like a hash table" was
mentioned,
> in passing, when the object feature was explained in detail.

Thanks *very* much for the clarification.

Of course now, we don't even know that about object field indexes ;-)

> it was simply a demo to show how using an object property name, which is
case sensitive,
> is much faster than "Find in array". that was all.

Presumably, the comparison was with an unsorted array. A binary search is
competitive with a hash table, depending a lot on the hash. And binary
searches are great for range searches whereas hash tables are *not*
optimized for ranges. So even here "faster" depends on what you're
searching for and why.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread David Adams via 4D_Tech
Keisuke,

You are *the* man on character encoding stuff in 4D, so thanks for adding
your observations.

I'm likely to be working with nothing but low ASCII, Latin1 data, for what
it's worth. Just simple names for keys and numbers for most values.

As far as the planning quesiton goes, I'm just trying to get some rules of
thumb, not totally perfect numbers. And I think that I have that now:

* 4D stores objects/text in UTF-16.

* The data is stored in a binary format.

* The binary format isn't really meaningfully more compact than the
original text. (In fact, it may be larger for some reason.)

That's enough for planning, if correct.

And, for the application part:

* An object field can be searched in various ways, so long as you use 4D's
supported styles.

* These styles of JSON are not (meant to be) space efficient.

* If you want to do space-efficient, valid JSON, I don't think 4D offers
any options. Nor does its parser. You need NTK to do very compact, valid
JSON.

I haven't used 4D's object fields much (obviously), but I've done a ton
with JSON with various formats at scale. I've had to rework formats
repeatedly in the past to address space and performance constraints. This
is one reason I'd like a more complete JSON parser and code set natively in
4D. NTK is great, but not everyone has it and, in my experience, it's
significantly slower than 4D's native JSON tools. (They're remarkably fast.)
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread Keisuke Miyako via 4D_Tech
one more thing,

it is not incorrect to do something like

{
"\u0066\u006F\u006F":"\u0062\u0063\u0072"
}

instead of {"foo":"bar"}

it's an extreme example,
but it happens all the time in non-ASCII JSON.

I may prefer one over the other,
but really I don't care, and nor should I care,
because it would be wrong to assume a particular implementation.

in hindsight,

I think it was unfortunate that  "some thing like a hash table" was mentioned,
in passing, when the object feature was explained in detail.

it gave the wrong signal that there was some kind of clever optimisation going 
on,
but 4D will not talk about it in detail.

that was not the context.

it was simply a demo to show how using an object property name, which is case 
sensitive,
is much faster than "Find in array". that was all.

it had nothing to do with QUERY BY ATTRIBUTE,
it had nothing to do with automatic indexes,
it definitely had nothing to do with optimising storage.







**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread Keisuke Miyako via 4D_Tech
Hello,

I think it would not be telling the full story if you just focus on how UTF-16 
takes double the size for standard ASCII, and therefore uneconomic.
and I am not speaking because UTF-16 actually takes less space than UTF-8 for 
Japanese text.

one must take into account that standard Windows API (wide characters) and 
macOS API (NSString) both use UTF-16, as does the file system (paths).
so its make sense that 4D uses UTF-16 for internal string representation, since 
it can be used directly with native APIs.

if UTF-8 was used for storage, then the data file may be smaller,
but 4D would have to do the conversion back to UTF-16 before loading it to 
memory.

> 2017/07/12 9:03、David Adams via 4D_Tech <4d_tech@lists.4d.com> のメール:
> In 4D, I think you're always
> using UTF-16, is that right?




**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread David Adams via 4D_Tech
On Wed, Jul 12, 2017 at 6:41 AM, Lee Hinde via 4D_Tech <4d_tech@lists.4d.com
> wrote:

> On Tue, Jul 11, 2017 at 3:10 AM, David Adams via 4D_Tech <
> 4d_tech@lists.4d.com> wrote:
>
> > * Use a header object that describes the 'columns' and then use compact
> > JSON arrays for the data. Rob Laveaux reminded me about this option some
> > months back and it's a really decent compromise.
> >
>
> That seems like a good option - it's the same, mostly, as a TSV/CSV. But it
> seems like JSON isn't appropriate for your storage needs (compactness
> trumps all), irrespective of how 4D stores it internally.
>
>
Hey Lee!

It depends. In the system I just described with
4D+MySQL+Mustache+PHP+DataTables+D3, some of the pre-rendered data was
stored as JSON and some was stored as TSV and converted to JSON on the fly.
The example I brought up here wasn't about my big flabby arrays - there are
lots of ways to rework those. But bytes are bytes. Imagine that you had
100KB of sensible JSON, not silly JSON. It's the behavior of the storage
engine that I'm interested in.

As to storing JSON in a more compact format, well, I think you then lose
all of the nifty search, etc. features that 4D offers. At that point,
there's no strong argument for using JSON at all. I guess it's a bit easier
to parse than plain text, so there's that. Actually, TSV is super easy to
parse.

As far as I can tell, price of admission for the enticing object field
features is that you lay your JSON out as pure name-value pairs (at
whatever level of nesting.) If that's how it works, fine. I just want to
make sure so that I can make the right design choices. I tend to iterate
through implementations from easier to more optimized. This whole question
was just an iteration in that sort of process. Day 1: Stuff everything into
an object field. OMG, it's huge! That's the point where the thread came
in...but now I think I have a grip on the storage costs of JSON (however
formatted) in 4D's engine.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread David Adams via 4D_Tech
Okay, I'm back and think that I get it now. And, yeah, storage space with
4D object fields is definitely a planning issue. Headline result: Copy and
paste a 1MB JSON from a text editor and save it in 4D and it's likely to
take about 2MB to store. More below. Corrections welcomed. I've also posted
my 'findings' (current notions) on the forums on France where, hopefully,
an errors I've made will be addressed. (I've experienced some vigorous
correction already on that thread, so clearly folks aren't being shy about
telling me when I'm wrong.)

On this thread, I'm also adding a story and some comments at the bottom for
background and thoguht. The take-away from that is if you've got a small
system, you'll probably never care about this and that if you have a big
system you very likely might care a lot.

Oh, and I posted a link to a feature request I made but Cannon Smith made a
better one already, please vote for it:

Option To Compress Object (and Text) Fields
http://forums.4d.fr/Post/FR/17748608/1/17748609#17748609

Okay, as far as I can tell, the answer I was after is really simple:

Question: How much room do objects require for storage in an object field?

Answer: Very nearly the amount the object-as-JSON-without-whitespace would
take stored as UTF-16.

Put another way, if you take a 1MB JSON in a text editor, paste it into an
object field and save it, it's likely to take roughly 2MB to store.

Below are several points that I think are right (or close to it), but
improvements and corrections are welcome:

* Object fields don't store straight JSON, they use some kind of binary
format.

* The binary format seems to take up very nearly as much space as the raw
JSON would, stripped of whitespace. (Padding between elements.)

* It is likely that the storage space is double what you would expect
because 4D tables always use UTF-16. If you need to store large character
data, then UTF-16 is what you need. If you don't need to store large
character data, then UTF-16 doubles the storage requirements for all your
alpha/text/object data. If you're using something like SQLLite,
MySQL/MariaDB, or PostgreSQL, you can specify the character set on a
per-table basis. (All support UTF8, all but SQLite support even smaller
character encodings.) So, if you're dealing with pure ASCII-like data, you
could potentially save it without compression in another character set that
takes 1/4 or 1/2 as much space as UTF-16. In 4D, I think you're always
using UTF-16, is that right?

* I don't care what the binary format is. No one else cares either ;-)

* The binary format may change in future versions. Again, no one minds or
cares about the details.

* There are a lot of ways to format and store JSON. Some are inefficient
(like the example I was using), some are more compact. This is a pretty
well-travelled subject in the world as JSON is so common, regardless of
language. I'll skip discussing how to pack and organize JSON as it's a
different subject entirely from how the JSON is stored and handled by the
DB engine.

* If you want to use 4D's object field magic features, your data needs to
be laid out in name-value pairs. And the name-value pairs are *stored in
the data without compression.* You can see them right in the hex of the 4DD.

* If you don't need to use 4D's object field magic features, you can
rewrite your JSON to be more compact. Or, for that matter, you can convert
it to something even more compact than that and store it as text.

It would be great to have the option to compress JSON stored by 4D, so long
as it allows indexed searching to work properly. The idea is that if you're
using 4D to as a repository for flabby data (logs, instrumentation reports,
etc.) or pre-rendered JSON for export/serving, you don't typically need to
search into it, indexes are just fine. There's a cost to the
compression/decompression, but that could pay for itself quickly in some
situations. I put in a request like this and then Cannon Smith posted a
link to a better-written existing request:

Option To Compress Object (and Text) Fields
http://forums.4d.fr/Post/FR/17748608/1/17748609#17748609

Please vote for Cannon's request.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread Lee Hinde via 4D_Tech
On Tue, Jul 11, 2017 at 3:10 AM, David Adams via 4D_Tech <
4d_tech@lists.4d.com> wrote:

> * Use a header object that describes the 'columns' and then use compact
> JSON arrays for the data. Rob Laveaux reminded me about this option some
> months back and it's a really decent compromise.
>

That seems like a good option - it's the same, mostly, as a TSV/CSV. But it
seems like JSON isn't appropriate for your storage needs (compactness
trumps all), irrespective of how 4D stores it internally.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Object field storage: Questions and alternatives

2017-07-11 Thread David Adams via 4D_Tech
I was making some related comments on the forums in France today and
Rob Laveaux, a hero of the 4D world, made a great technical post:

Hi David,

Please elaborate what makes you conclude this?

You seem to imply that objects fields are stored in JSON format, but I'm
sorry ... that is not true. Take a hex-editor to examine your datafile and
you will see they are stored in a binary format. I just want to set this
straight, before it is taken as a fact by others.

Of course, if you store large objects, they occupy disk space. Just like
large text, picture or blob fields.

Compression of data is possible, but it comes at a cost: the cost of
decompression and recompression. This takes CPU time, which is more costly
than the cost of disk space. But I agree, it would be nice to have data
compression as an option.

Greetings,

Rob


I wanted to repost it here for the sake of the archives because, according
to Rob, I'm starting an erroneous superstition. I'm still in the dark about
the real-world story and trade-offs in 4D'S object fields (I tired them for
the first time yesterday), and hopefully we'll get some more details out of
the folks in France.

For now, it looks like, based on what Rob says, thing aren't as dire as I
feared - but I don't have an easy way to quantify that yet. Not keen to get
out a hex editor and start experimenting/counting when France could just
tell us.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**