[ 
https://issues.apache.org/jira/browse/OAK-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344143#comment-15344143
 ] 

Ian Boston commented on OAK-4471:
---------------------------------

bq.  However we would still benefit with size of stuff that goes over the wire.

Wire protocol compression may be available for free in the drivers currently 
used by Oak.

https://jira.mongodb.org/browse/SERVER-3018?focusedCommentId=897375&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-897375
  compression can be enabled if OpenSSL is compiled with compression enabled, 
in which case zlib cann be used by the driver (not checked if the Java Driver 
supports this, but the server does and mongos does).
The ticket is open as some dont want to use SSL based compression even with a 
null cypher.
Most common JDBC drivers already support wire protocol compression.



bq. One of the driver for me was Dynamo limit of 4k as 1 read unit

Does Oak deploy on Dynamo ? AFAIK it has RDBMK and MongoMK support, but no 
Dynamo or Cassandra support.



bq. OAK-1312 - Bundle multiple nodes in single Document

agreed.

bq. Hybrid lucene index etc

Implementing a real time index in a cluster using a shadow local index has been 
tried many times by others and abandoned due to production experience with 
reliability and stability. I guess Oak might succeed where many others have 
failed. Most abandoned the model based on shipping segments in favor of sharded 
indexes, with replication at the index update level coupled with a write ahead 
log to cover resilience and real time high volume throughput.

Eliminating the mix of semi transactional in database property indexes, non 
real time lucene indexes and offloaded Solr indexes with a performant, 
resilient, real time index capability for all properties would fix some aspects 
of the data record explosion that exists in the DocumentMK variants.  OAK-1312 
should address most of the other aspects.

> More compact storage format for Documents
> -----------------------------------------
>
>                 Key: OAK-4471
>                 URL: https://issues.apache.org/jira/browse/OAK-4471
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>              Labels: performance
>             Fix For: 1.6
>
>         Attachments: node-doc-size2.png
>
>
> Aim of this task is to evaluate storage cost of current approach for various 
> Documents in DocumentNodeStore. And then evaluate possible alternative to see 
> if we can get a significant reduction in storage size.
> Possible areas of improvement
> # NodeDocument
> ## Use binary encoding for property values - Currently property values are 
> stored in JSON encoding i.e. arrays and single values are encoded in json 
> along with there type
> ## Use binary encoding for Revision values - In a given document Revision 
> instances are a major part of storage size. A binary encoding might provide 
> more compact storage
> # Journal - The journal entries can be stored in compressed form
> Any new approach should support working with existing setups i.e. provide 
> gradual change in storage format. 
> *Possible Benefits*
> More compact storage would help in following ways
> # Low memory footprint of Document in Mongo and RDB
> # Low memory footprint for in memory NodeDocument instances - For e.g. 
> property values when stored in binary format would consume less memory
> # Reduction in IO over wire - That should reduce the latency in say 
> distributed deployments where Oak has to talk to remote primary
> Note that before doing any such change we must analyze the gains. Any change 
> in encoding would make interpreting stored data harder and also represents 
> significant change in stored data where we need to be careful to not 
> introduce any bug!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to