Apache Avro is a JSON-equivalent binary format. That would be smaller. Looking around the web, it might be 2X to 4X smaller.
Gzipping the JSON can be a big win, especially if there are lots of repeated keys, like in state.json. Gzip has the advantage that some editors can natively unpack it. This page has some size comparisons for one data set. https://www.adaltas.com/en/2021/03/22/performance-comparison-of-file-formats/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 17, 2023, at 11:09 AM, Christine Poerschke (BLOOMBERG/ LONDON/ V) > <cpoersc...@bloomberg.net> wrote: > > Hi Florin and Matthias, > > Thanks for sharing about this! > > Looking into where the JSON indentation in storage comes from -- from code > reading only -- I think this is the code trail: > * > https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/modules/ltr/src/java/org/apache/solr/ltr/store/rest/ManagedModelStore.java#L46 > * > https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/core/src/java/org/apache/solr/rest/ManagedResource.java#L348 > * > https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/core/src/java/org/apache/solr/rest/ManagedResource.java#L238 > * > https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/core/src/java/org/apache/solr/rest/ManagedResourceStorage.java#L443 > * > https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/solrj/src/java/org/apache/solr/common/util/Utils.java#L218-L227 > > Thinking out aloud ... > > ... how might storing the model in non-JSON format work? Haven't looked into. > > ... what are the concerns about DefaultWrapperModel usage, I'm guessing it's > the split nature (i.e. the wrapper part of the model in ZK but the resource > part on disk) -- is that so? > > Looking at the > https://solr.apache.org/docs/9_4_0/modules/ltr/org/apache/solr/ltr/model/DefaultWrapperModel.html > javadocs for an example configuration and there noticing the > > "resource": "models/myModel.json" > > element made me wonder ... > > ... what if the resource being wrapped was not external (giving rise to the > split nature scenario) but internal inlined in the model? > > ... Yes, something like > > "content": "{ \"class\": \"org.apache.solr.ltr.model.LinearModel\", > \"name\": \"myModelName\", \"params\": { ... } }" > > would be a bit human-unreadable but would it save enough space? > > And/Or could representation in non-JSON format be of interest in some use > cases? > > "format": "foobar", > "content": " ... model representation in foobar format goes here ... " > > https://github.com/apache/solr/pull/2018 explores the > AlternativeFormatWrapperModel idea. > > Hope that helps. > > Best wishes, > Christine > > From: users@solr.apache.org At: 10/16/23 15:02:23 UTC+1:00To: > users@solr.apache.org > Subject: Re: Zk big files issues and model store > > Hi Florin, > > What has worked for me was making model deployment a software deployment > task and bundling the model > in the JAR I deployed with Solr as the DefaultWrapperModel also loads > resources from the classpath. > > Cheers > Matthias > > > On Sun, Oct 15, 2023 at 8:54 AM Florin Babes <babesflo...@gmail.com> wrote: > >> Hello, >> We reached the limit of zk for storing LTR models. I want to avoid the >> usage of DefaultWrapperModel for as long as possible because we have >> deployed in an container orchestrator and this implementation can be >> really risky if you can not guarantee the presence of the model on >> disk all the time. >> So I want to use the managed model feature to upload some bigger >> models but zk is dying with OOM. What we noticed is that solr stores >> the models in a json indented file. By saving the models in compacted >> json our models will be 60% smaller. >> Do you think that we should try to implement this? Could this work and >> allow us to postpone the moment of using DefaultWrapperModel? >> Thanks, >> Florin Babes >> > >