If you create Lucene indexes in Solr which match the Lucene index formats used in the Mahout code, that is easiest.
Otherwise, I would make a file input reader for Hadoop based on the SolrJ library. This would include a way to configure the actual query and response fields and how they map to the mapper inputs. If it reads too slowly, you can change the SolrJ library to the Embedded Solr app (which reads directly from the indexes instead of using a servlet container). On Sun, Aug 28, 2011 at 11:28 AM, Ramo Karahasan < [email protected]> wrote: > Thank you Ted, > > I'll have a look these days on the example. > > I guess, I'll take a copy of the ebook, the shipping costs of the printed > version are very high... > > Thanks, > RK > > -----Ursprüngliche Nachricht----- > Von: Ted Dunning [mailto:[email protected]] > Gesendet: Sonntag, 28. August 2011 19:57 > An: [email protected] > Betreff: Re: Workflow for categorization/classifiying > > See https://github.com/tdunning/Chapter-16 for example code. > > The book has a lot of background material on why things are as they appear > in the example but you should be able to get some benefit from the example > any way. > > > On Sunday, August 28, 2011, Ramo Karahasan <[email protected]> > wrote: > > Hi, > > > > i'm primarily not looking fort he right algorithms, more for a way to > > implement this in web application that process the workflow "on-the-fly". > > > > Thanks, > > Ramo > > > > -----Ursprüngliche Nachricht----- > > Von: myn [mailto:[email protected]] > > Gesendet: Sonntag, 28. August 2011 18:33 > > An: [email protected] > > Betreff: Re:AW: Workflow for categorization/classifiying > > > > I think the Bayes orDecisionForest classfy method will bi Suitable > > look > at > > the follow link; > > > > https://cwiki.apache.org/confluence/display/MAHOUT/Bayesian > > https://cwiki.apache.org/confluence/display/MAHOUT/Random+Forests > > https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression > > > > > > > > > > At 2011-08-28 23:59:03,"Ramo Karahasan" > > <[email protected]> > > wrote: > >>Hi Ted, > >> > >>no i was not looking at the book. I'd have to buy me one copy, but the > > money problem ;). > >>What I want to do is quite simple... I have some manual chosen > > categories(topics), and a lot of documents on a webpage. This > > documents aren't good categorized/classified to the right topics. The > > documents all resists in a solr-index. The aim was to try a > > auto-categorization with Mahout, so to train the system incrementally > > every time new documents arrives, and to update the categories on the > website. > >> > >>Thanks, > >>RK > >> > >>-----Ursprüngliche Nachricht----- > >>Von: Ted Dunning [mailto:[email protected]] > >>Gesendet: Samstag, 27. August 2011 17:13 > >>An: [email protected] > >>Betreff: Re: Workflow for categorization/classifiying > >> > >>Yes. That is a reasonable work-flow. Have you looked at the book > >>Mahout > > in Action (conflict alert, I am an author). We provide extensive > > details > on > > how you can use categorization and clustering on real problems in the > > last two sections of the book. > >> > >>Also, if you say just a bit more about what you want to do, it would > >>be > > easier to help you. > >> > >>On Sat, Aug 27, 2011 at 6:01 AM, Ramo Karahasan < > > [email protected]> wrote: > >> > >>> Hello, > >>> > >>> i wanted to ask, if there is a common workflow when trying to > >>> categorize/classify documents with mahout. For me one possible > >>> workflow with solr could be: > >>> > >>> index documents into solr -> fetch data from solr -> prepare data > >>> for training -> operate training -> get data model -> operate with > >>> algorithms on data model -> get a result list -> ? > >>> > >>> Is that a possible workflow with Mahout and what to do after getting > >>> the processed categorizations? How would I make use of this result? > >>> > >>> Thanks, > >>> RK > >>> > >>> -----Ursprüngliche Nachricht----- > >>> Von: Sean Owen [mailto:[email protected]] > >>> Gesendet: Samstag, 27. August 2011 09:40 > >>> An: [email protected] > >>> Betreff: Re: How to get recommendation demo example working > >>> > >>> No there is not. > >>> > >>> On Sat, Aug 27, 2011 at 8:33 AM, Ramo Karahasan < > >>> [email protected]> wrote: > >>> > >>> > Thank you Sean, > >>> > > >>> > i'll try that today. > >>> > > >>> > Is there an similar example for classification/classify with an > >>> > web application? > >>> > > >>> > > >>> > >>> > >> > > > > > > -- Lance Norskog [email protected]
