Ok, so what you are asking for is a JSON minify option to the property replacer.

Is there an option in ES to have it not store teh _source field? that would save FAR more space.

although, when you are putting things in ES, you are not going for the most space efficient storage in the first place. ES deliberatly uses a LOT of space to optimize it's searches.

with 1 character values and fieldnames the extra whitespace could amount to a significant percentage of the raw data, but once you add the indexing data that ES also stores, the overall percentage gets knocked down drastically

'{ "a" : "b" }, ' is about the worst possible case. total length 15 characters, 5 of them 'wasted', so it seems like 30%, but then when you realize that everything gets stored at least twice, that cuts it down to 15%, then you add in the index data, and the fact that the field names and data values are going to be longer....

I have trouble believing that this would make even a 1% difference on anything resembling real-world data. And if it did, the best thing would be for ES to store the data in a compressed form that would make the whitespace effectively free. (assuming it doesn't already)

could you run a test, do two ES instances, one that you populate with 'normal' data and one that you populate with the identical data 'minified' and see if you can measure the difference?

David Lang

On Tue, 26 May 2015, chenlin rao wrote:

No, ES store the raw JSON in _source field. We can set `"_size": {
"enabled": true }` to check the record size.

$ curl
10.19.0.97:9200/testindex/testtype/AU2OSfj0ZRvQT5qcC_l3?fields=_size,_source
{"_index":"testindex","_type":"testtype","_id":"AU2OSfj0ZRvQT5qcC_l3","_version":1,"found":true,"_source":{"@timestamp":"2015-05-25T07:29:35+08:00","host":"
web032.mweibo.yf.sinanode.com", "content": "Traceroute Result of
api.weibo.cn:\nDNS is 202.103.224.68\nIP is
180.149.153.216\n1|192:168:1:1|34.482ms\n2|219:159:136:1|27.005ms\n3|218:65:201:21|206.549ms\n4|202:103:236:53|116.733ms\n5|*\n6|*\n7|*\n8|180:149:128:54|339.948ms\n9|*\n10|180:149:129:178|302.180ms\n11|180:149:153:216|1.840s\n\n",
"__date": 1432477827.545300 },"fields":{"_size":420}}

$ curl
10.19.0.97:9200/testindex/testtype/AU2OSflQZRvQT5qcC_l4?fields=_size,_source
{"_index":"testindex","_type":"testtype","_id":"AU2OSflQZRvQT5qcC_l4","_version":1,"found":true,"_source":{"@timestamp":"2015-05-25T07:29:35+08:00","host":"
web032.mweibo.yf.sinanode.com", "content":"Traceroute Result of
api.weibo.cn:\nDNS
is 202.103.224.68\nIP is
180.149.153.216\n1|192:168:1:1|34.482ms\n2|219:159:136:1|27.005ms\n3|218:65:201:21|206.549ms\n4|202:103:236:53|116.733ms\n5|*\n6|*\n7|*\n8|180:149:128:54|339.948ms\n9|*\n10|180:149:129:178|302.180ms\n11|180:149:153:216|1.840s\n\n","__date":1432477827.545300},"fields":{"_size":416}}

Well, I know this is not a good example. but I have some other loglines
that has hundred of fields, hundreds of blank space...
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to