Ok, so what you are asking for is a JSON minify option to the property replacer.
Is there an option in ES to have it not store teh _source field? that would save
FAR more space.
although, when you are putting things in ES, you are not going for the most
space efficient storage in the first place. ES deliberatly uses a LOT of space
to optimize it's searches.
with 1 character values and fieldnames the extra whitespace could amount to a
significant percentage of the raw data, but once you add the indexing data that
ES also stores, the overall percentage gets knocked down drastically
'{ "a" : "b" }, ' is about the worst possible case. total length 15 characters,
5 of them 'wasted', so it seems like 30%, but then when you realize that
everything gets stored at least twice, that cuts it down to 15%, then you add in
the index data, and the fact that the field names and data values are going to
be longer....
I have trouble believing that this would make even a 1% difference on anything
resembling real-world data. And if it did, the best thing would be for ES to
store the data in a compressed form that would make the whitespace effectively
free. (assuming it doesn't already)
could you run a test, do two ES instances, one that you populate with 'normal'
data and one that you populate with the identical data 'minified' and see if you
can measure the difference?
David Lang
On Tue, 26 May 2015, chenlin rao wrote:
No, ES store the raw JSON in _source field. We can set `"_size": {
"enabled": true }` to check the record size.
$ curl
10.19.0.97:9200/testindex/testtype/AU2OSfj0ZRvQT5qcC_l3?fields=_size,_source
{"_index":"testindex","_type":"testtype","_id":"AU2OSfj0ZRvQT5qcC_l3","_version":1,"found":true,"_source":{"@timestamp":"2015-05-25T07:29:35+08:00","host":"
web032.mweibo.yf.sinanode.com", "content": "Traceroute Result of
api.weibo.cn:\nDNS is 202.103.224.68\nIP is
180.149.153.216\n1|192:168:1:1|34.482ms\n2|219:159:136:1|27.005ms\n3|218:65:201:21|206.549ms\n4|202:103:236:53|116.733ms\n5|*\n6|*\n7|*\n8|180:149:128:54|339.948ms\n9|*\n10|180:149:129:178|302.180ms\n11|180:149:153:216|1.840s\n\n",
"__date": 1432477827.545300 },"fields":{"_size":420}}
$ curl
10.19.0.97:9200/testindex/testtype/AU2OSflQZRvQT5qcC_l4?fields=_size,_source
{"_index":"testindex","_type":"testtype","_id":"AU2OSflQZRvQT5qcC_l4","_version":1,"found":true,"_source":{"@timestamp":"2015-05-25T07:29:35+08:00","host":"
web032.mweibo.yf.sinanode.com", "content":"Traceroute Result of
api.weibo.cn:\nDNS
is 202.103.224.68\nIP is
180.149.153.216\n1|192:168:1:1|34.482ms\n2|219:159:136:1|27.005ms\n3|218:65:201:21|206.549ms\n4|202:103:236:53|116.733ms\n5|*\n6|*\n7|*\n8|180:149:128:54|339.948ms\n9|*\n10|180:149:129:178|302.180ms\n11|180:149:153:216|1.840s\n\n","__date":1432477827.545300},"fields":{"_size":416}}
Well, I know this is not a good example. but I have some other loglines
that has hundred of fields, hundreds of blank space...
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.