I think that may have to do with the very large update that has been sitting on my machine because I forgot to commit the change.
On Wed, Dec 29, 2010 at 5:05 PM, Chris Schilling <[email protected]> wrote: > Hey Ted, > > Sorry for the noise. I am looking around in the > o.a.m.classifier.sgd.ModelSerializer and I only see methods for writeJson... > > > On Dec 29, 2010, at 4:01 PM, Ted Dunning wrote: > > > Yes. > > > > That is evil. The problem is that GSON recurses on lists and that makes > > memory use crazy bad. > > > > Try serializing as binary. I committed a change to allow that a few > weeks > > ago that added a method to ModelSerializer. The SGD models are also all > > Writable's now which should make rolling your own serialization very > easy.. > > > > > > On Wed, Dec 29, 2010 at 3:59 PM, Chris Schilling > > <[email protected]>wrote: > > > >> Hi again, > >> > >> I notice that if I try to write the model for the 20 NG example, I am > >> running out of memory. I am running on a small ec2 instance, so I run > with > >> the JVM with -Xmx1400m. > >> > >> So, I can train and dissect the model just fine. However, when I try to > >> write the weights: > >> ModelSerializer.writeJson("/tmp/sgd_adaptive.model", learningAlgorithm); > >> > >> My feature vector size is 10000. > >> > >> I get an OOM exception: > >> > >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:221) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:210) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148) > >> at > >> > com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141) > >> at > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122) > >> at > >> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47) > >> at > >> > com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:445) > >> at > >> > com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:431) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148) > >> at > >> > com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141) > >> at > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122) > >> at > >> > com.google.gson.JsonSerializationVisitor.getJsonElementForChild(JsonSerializationVisitor.java:117) > >> at > >> > com.google.gson.JsonSerializationVisitor.addAsChildOfObject(JsonSerializationVisitor.java:95) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitObjectField(JsonSerializationVisitor.java:90) > >> at > >> > com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147) > >> at > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122) > >> at > >> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47) > >> at > >> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:40) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:333) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:287) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128) > >> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96) > >> at > >> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:375) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:339) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128) > >> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96) > >> at > >> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:189) > >> at > >> > org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:153) > >> at > >> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128) > >> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96) > >> > >> Does this make sense? seems like too much memory to serialize. > >> > >> Thanks > >> Chris > >
