Hi, i have some problem when implementing this solr classification, this is my schema :
<field name="pagetext_mlt" type="text_mlt" indexed="true" stored="true" required="false" multiValued="false" termVectors="true"/> <field name="knn_tags" type="string" indexed="true" stored="true" required="false" multiValued="true"/> <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" useDocValuesAsStored="true"/> <fieldType name="text_mlt" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> and this is my solrconfig : <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">classi</str> </lst> </requestHandler> <updateRequestProcessorChain name="classi"> <processor class="solr.ClassificationUpdateProcessorFactory"> <str name="inputFields">pagetext_mlt</str> <str name="classField">knn_tags</str> <str name="predictedClassField">prebayes_tags</str> <field name="prebayes_tags" type="string" indexed="true" stored="true" required="false" multiValued="true"/> <str name="algorithm">bayes</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> but this is not working, step : 1. insert document A with pagetext_mlt="something A" and knn_tags="aaa" 2. insert document B with pagetext_mlt="something B" and knn_tags="bbb" 3. insert document C with pagetext_mlt="something B" and knn_tags=null but field prebayes_tags always empty(i cant see it even when i stored the field). is it something i miss? Thanks, Alessandro Benedetti wrote > But how big it is your index ? Are you expecting Solr to automatically > classify your documents without any knowledge groundbase ? > Please attach an example of schema. > There was a reason if I asked you :) > Seems related the fact we get no token from the text analysis. > > Cheers > > On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas < > Tomas.Ramanauskas@ >> wrote: > >> Hi, Allesandro, >> >> sorry for the delay. What do you mean? >> >> >> As I mentioned earlier, I followed a super simply set of steps. >> >> 1. Download Solr >> 2. Configure classification >> 3. Create some documents using curl over HTTP. >> >> >> Is it difficult to reproduce the steps / problem? >> >> >> Tomas >> >> >> >> > On 23 Jun 2016, at 16:42, Alessandro Benedetti < >> > benedetti.alex85@ >> wrote: >> > >> > Can you give an example of your schema, and can you run a simple query >> for >> > you index, curious to see how the input fields are analyzed. >> > >> > Cheers >> > >> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti < >> > > benedetti.alex85@ >> wrote: >> > >> >> This is better! At list the classifier is invoked! >> >> How many docs in the index have the class assigned? >> >> Take a look to the stacktrace and you should find the cause! >> >> I am now on mobile, I will check the code tomorrow! >> >> Cheers >> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" < >> >> > Tomas.Ramanauskas@ >> wrote: >> >> >> >>> >> >>> I also tried with this config (adding **): >> >>> >> >>> >> >>> > <initParams path="/update/**"> >> >>> > <lst name="defaults"> >> >>> > <str name="update.chain"> > classification > </str> >> >>> > </lst> >> >>> > </initParams> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> And I get the error: >> >>> >> >>> >> >>> >> >>> $ curl http://localhost:8983/solr/demo/update -d ' >> >>> [ >> >>> {"id" : "book15", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s": null, >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5" >> >>> } >> >>> ]' >> >>> >> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat >> >>> >> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat >> >>> >> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat >> >>> >> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat >> >>> >> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat >> >>> >> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat >> >>> >> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat >> >>> >> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat >> >>> >> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat >> >>> >> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat >> >>> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat >> >>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat >> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat >> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat >> >>> >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat >> >>> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat >> >>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat >> >>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat >> >>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >> >>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat >> >>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat >> >>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat >> >>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat >> >>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat >> >>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat >> >>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat >> >>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat >> >>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat >> >>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat >> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat >> >>> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat >> >>> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat >> >>> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat >> >>> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat >> >>> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat >> >>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat >> >>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat >> >>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat >> >>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat >> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}} >> >>> >> >>> >> >>> Tomas >> >>> >> >>> >> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas < >> >>> > Tomas.Ramanauskas@ > <mailto: > Tomas.Ramanauskas@ > >> >> >>> wrote: >> >>> >> >>> Thanks for the response, Alessandro. >> >>> >> >>> I tried this and it didn’t work either: >> >>> >> >>> >> >>> >> >>> $ curl http://localhost:8983/solr/demo/update -d ' >> >>> [ >> >>> {"id" : "book14", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s": null, >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5" >> >>> } >> >>> ]’ >> >>> >> >>> {"responseHeader":{"status":0,"QTime":2}} >> >>> >> >>> $ curl http://localhost:8983/solr/demo/get?id=book14 >> >>> { >> >>> "doc": >> >>> { >> >>> "id":"book14", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5", >> >>> "_version_":1537854598189940736}} >> >>> >> >>> >> >>> I don’t see “cat_s” field in the results at all. >> >>> >> >>> >> >>> Tomas >> >>> >> >>> >> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti < > abenedetti@ > > >>> <mailto: > abenedetti@ > >> wrote: >> >>> >> >>> Hi Tomas, >> >>> first consideration : >> >>> an empty string is different from a NULL string. >> >>> This is controversial, I would suggest you to never use the empty >> String >> >>> as >> >>> this can cause some others side effect. >> >>> Apart from that, the plugin will add the class only if the class >> field >> is >> >>> without any value >> >>> >> >>> Object documentClass = doc.getFieldValue(classFieldName); >> >>> if (documentClass == null) { >> >>> >> >>> Saying that, I would suggest you to build a sample index with some >> >>> document and then try to classify. >> >>> If this doesn't solve your issue, I can help you further. >> >>> >> >>> Cheers >> >>> >> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas < >> >>> > Tomas.Ramanauskas@ > <mailto: > Tomas.Ramanauskas@ > >> >> >>> wrote: >> >>> >> >>> I also tried this configuration, but could get the feature to work: >> >>> >> >>> >> >>> >> >>> > <initParams path="/update/"> >> >>> > <lst name="defaults"> >> >>> > <str name="update.chain"> > classification > </str> >> >>> > </lst> >> >>> > </initParams> >> >>> >> >>> >> >>> > <updateRequestProcessorChain name="classification"> >> >>> > <processor class="solr.ClassificationUpdateProcessorFactory"> >> >>> > <str name="inputFields"> > title_t,author_s > </str> >> >>> > <str name="classField"> > cat_s > </str> >> >>> > <str name="algorithm"> > bayes > </str> >> >>> > </processor> >> >>> > </updateRequestProcessorChain> >> >>> >> >>> >> >>> Tomas >> >>> >> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas < >> >>> > Tomas.Ramanauskas@ > <mailto: > Tomas.Ramanauskas@ > > >>>> <mailto: > Tomas.Ramanauskas@ > >> >> >>> wrote: >> >>> >> >>> P.S. The version I use: >> >>> >> >>> 6.1.0-68 >> >>> >> >>> Also, earlier I said “If I modify an existing record, I think the >> >>> functionality works:”, but I think it doesn’t work for me at all. >> >>> >> >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >> >>> { >> >>> "doc": >> >>> { >> >>> "id":"book1", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"fantasy", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5", >> >>> "_version_":1535488016326328320}} >> >>> >> >>> $ curl http://localhost:8983/solr/demo/update -d ' >> >>> [ >> >>> {"id" : "book1", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"aaa", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5" >> >>> } >> >>> ]' >> >>> {"responseHeader":{"status":0,"QTime":0}} >> >>> >> >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >> >>> { >> >>> "doc": >> >>> { >> >>> "id":"book1", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"fantasy", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5", >> >>> "_version_":1535488016326328320}} >> >>> >> >>> >> >>> Tomas >> >>> >> >>> >> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas < >> >>> > Tomas.Ramanauskas@ > <mailto: > Tomas.Ramanauskas@ > > >>>> <mailto: > Tomas.Ramanauskas@ > >> >> >>> wrote: >> >>> >> >>> Hi, everyone, >> >>> >> >>> >> >>> would someone be able to share a working example (step by step) that >> >>> demonstrates the use of Naive Bayes classifier in Solr? >> >>> >> >>> >> >>> I followed this Blog post: >> >>> >> >>> >> >>> >> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947 >> >>> >> >>> And this tutorial: >> >>> http://yonik.com/solr-tutorial/ >> >>> >> >>> And this JIRA ticket: >> >>> https://issues.apache.org/jira/browse/SOLR-7739 >> >>> >> >>> >> >>> >> >>> So this is my configuration file (only what I added or modified): >> >>> >> >>> > <initParams path="/update/**"> >> >>> > <lst name="defaults"> >> >>> > <str name="update.chain"> > classification > </str> >> >>> > </lst> >> >>> > </initParams> >> >>> >> >>> >> >>> > <updateRequestProcessorChain name="classification"> >> >>> > <processor class="solr.ClassificationUpdateProcessorFactory"> >> >>> > <str name="inputFields"> > title_t,author_s > </str> >> >>> > <str name="classField"> > cat_s > </str> >> >>> > <str name="algorithm"> > bayes > </str> >> >>> > </processor> >> >>> > </updateRequestProcessorChain> >> >>> >> >>> >> >>> >> >>> If I modify an existing record, I think the functionality works: >> >>> >> >>> >> >>> $ curl http://localhost:8983/solr/demo/update -d ' >> >>> [ >> >>> {"id" : "book1", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5" >> >>> } >> >>> ]' >> >>> {"responseHeader":{"status":0,"QTime":8}} >> >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >> >>> { >> >>> "doc": >> >>> { >> >>> "id":"book1", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"fantasy", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5", >> >>> "_version_":1535488016326328320}} >> >>> >> >>> >> >>> >> >>> >> >>> If I add a new document, something isn’t quite working: >> >>> >> >>> $ curl http://localhost:8983/solr/demo/update -d ' >> >>> [ >> >>> {"id" : "book7", >> >>> "title_t":["The Way of Kings"], >> >>> "author_s":"Brandon Sanderson", >> >>> "cat_s":"", >> >>> "pubyear_i":2010, >> >>> "ISBN_s":"978-0-7653-2635-5" >> >>> } >> >>> ]' >> >>> {"responseHeader":{"status":0,"QTime":0}} >> >>> $ curl http://localhost:8983/solr/demo/get?id=book7 >> >>> { >> >>> "doc":null} >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> -------------------------- >> >>> >> >>> Benedetti Alessandro >> >>> Visiting card : http://about.me/alessandro_benedetti >> >>> >> >>> "Tyger, tyger burning bright >> >>> In the forests of the night, >> >>> What immortal hand or eye >> >>> Could frame thy fearful symmetry?" >> >>> >> >>> William Blake - Songs of Experience -1794 England >> >>> >> >>> >> >>> >> > >> > >> > -- >> > -------------------------- >> > >> > Benedetti Alessandro >> > Visiting card - http://about.me/alessandro_benedetti >> > Blog - http://alexbenedetti.blogspot.co.uk >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> >> > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html