Hey! First off, Mahout is pretty much the bee's knees.

Anyhoo, I'm deploying my mahout classifier using zookeeper, following the
technique from Mahout in Action, but my models often exceed the 1M limit
zookeeper wants you to stick to. I'm using the AdaptiveLogisticRegression
algorithm, but I think I'm doing all the things I'm supposed to (only
serializing the best model etc)

Here is my code:

ModelSerializer.writeBinary("/var/www/shared/model/products.model",
learningAlgorithm.getBest().getPayload().getLearner().getModels().get(0));

I feel like i'm missing something, most of my models are clocking in at
something like 1.8m. The complete model is of course somewhere around 200m.

Do most people boost the znode size? Am I simply being too ambitious with
the number of features I'm using?

ZooKeeper lists boosting znode size under "unsafe operations," but I don't
know how big a deal this is.

(From http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html)

*(Java system property: jute.maxbuffer)

This option can only be set as a Java system property. There is no
zookeeper prefix on it. It specifies the maximum size of the data that can
be stored in a znode. The default is 0xfffff, or just under 1M. If this
option is changed, the system property must be set on all servers and
clients otherwise problems will arise. This is really a sanity check.
ZooKeeper is designed to store data on the order of kilobytes in size.*

Any help would be much appreciated, thanks!

Brandon Root

Reply via email to