To kick this off, I have created a design document that is open for
comments.  Much detail is needed here.  I will create a JIRA as well, but
the google doc is much easier for collating lots of input into a coherent
document.

The directory that the document is stored in is accessible at

http://      bit.ly/18vbbaT <http://bit.ly/18vbbaT>

Once we get going, we can talk about how to coordinate tasks between
hangouts.  One option is a public Trello project: https://trello.com/ or we
can use JIRA sub-tasks.


On Sat, Jul 20, 2013 at 11:25 AM, Andrew Psaltis <
[email protected]> wrote:

> I am very interested in collaborating on the off-line to Solr part. Just
> let me know how we want to get going.
>
> Thanks,
> Andrew
>
>
>
>
>
> On 7/19/13 4:45 PM, "Ted Dunning" <[email protected]> wrote:
>
> >OK.  I think the crux here is the off-line to Solr part so let's see who
> >else pops up.
> >
> >Having a solr maven could be very helpful.
> >
> >
> >On Fri, Jul 19, 2013 at 3:39 PM, Luis Carlos Guerrero Covo <
> >[email protected]> wrote:
> >
> >> I'm currently working for a portal that has a similar use case and I was
> >> thinking of implementing this in a similar way. I'm generating
> >> recommendations using python scripts based on similarity measures
> >>(content
> >> based recommendation) only using euclidean distance and some weights for
> >> each attribute. I want to use mahout's GenericItemBasedRecommender to
> >> generate these same recommendations without user data (no tracking right
> >> now of user to item relationship). I was thinking of pushing the
> >>generated
> >> recommendations to solr using atomic updates since my fields are all
> >>stored
> >> right now. Since this is very similar to what I'm trying to accomplish,
> >>I
> >> would sign up to collaborate in any way I can since I'm fairly familiar
> >> with solr and I'm starting to learn my way around mahout.
> >>
> >>
> >> On Fri, Jul 19, 2013 at 5:12 PM, Sebastian Schelter <[email protected]>
> >> wrote:
> >>
> >> > I would also be willing to provide guidance and advice for anyone
> >>taking
> >> > this on, I can especially help with the offline analysis part.
> >> >
> >> > --sebastian
> >> >
> >> >
> >> > 2013/7/19 Ted Dunning <[email protected]>
> >> >
> >> > > I would be happy to supervise a project to implement a demo of this
> >>if
> >> > > anybody is willing to do the grunt work of gluing things together.
> >> > >
> >> > > Sooo, if you would like to work on this, here is a suggested
> >>project.
> >> > >
> >> > > This project would entail:
> >> > >
> >> > > a) build a synthetic data source
> >> > >
> >> > > b) write scripts to do the off-line analysis
> >> > >
> >> > > c) write scripts to export to Solr
> >> > >
> >> > > d) write a very quick web facade over Solr to make it look like a
> >> > > recommendation engine.  This would include
> >> > >
> >> > >   d.1) a "most popular page" that does combined popularity rise and
> >> > > recommendation
> >> > >
> >> > >   d.2) a "personal recommendation page" that does just
> >>recommendation
> >> > with
> >> > > dithering
> >> > >
> >> > >   d.3) item pages with "related items" at the bottom
> >> > >
> >> > > e) work with others to provide high quality system walk-through and
> >> > install
> >> > > directions
> >> > >
> >> > > If you want to bite on this, we should arrange a weekly video
> >>hangout.
> >>  I
> >> > > am willing to commit to guiding and providing detailed technical
> >> > > approaches.  You should be willing to commit to actually doing
> >>stuff.
> >> > >
> >> > > The goal would be to provide a fully worked out scaffolding of a
> >> > practical
> >> > > recommendation system that presumably would become an example
> >>module in
> >> > > Mahout.
> >> > >
> >> > >
> >> > > On Fri, Jul 19, 2013 at 1:08 PM, B Lyon <[email protected]>
> wrote:
> >> > >
> >> > > > +1 as well.  Sounds fun.
> >> > > >
> >> > > > On Fri, Jul 19, 2013 at 4:06 PM, Dominik Hübner <
> >> [email protected]
> >> > > > >wrote:
> >> > > >
> >> > > > > +1 for getting something like that in a future release of Mahout
> >> > > > >
> >> > > > > On Jul 19, 2013, at 10:02 PM, Sebastian Schelter
> >><[email protected]>
> >> > > wrote:
> >> > > > >
> >> > > > > > It would be awesome if we could get a nice, easily deployable
> >> > > > > > implementation of that approach into Mahout before 1.0
> >> > > > > >
> >> > > > > >
> >> > > > > > 2013/7/19 Ted Dunning <[email protected]>
> >> > > > > >
> >> > > > > >> My current advice is to use Hadoop (if necessary) to build a
> >> > sparse
> >> > > > > >> item-item matrix based on each kind of behavior you have and
> >> then
> >> > > drop
> >> > > > > >> those similarities into a search engine to deliver the actual
> >> > > > > >> recommendations.  This allows lots of flexibility in terms of
> >> > which
> >> > > > > kinds
> >> > > > > >> of inputs you use for the recommendation and lets you blend
> >> > > > > recommendations
> >> > > > > >> with search and geo-location.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Fri, Jul 19, 2013 at 12:33 PM, Helder Martins <
> >> > > > > >> [email protected]> wrote:
> >> > > > > >>
> >> > > > > >>> Hi,
> >> > > > > >>> I'm a dev working for a web portal in Brazil and I'm
> >> particularly
> >> > > > > >>> interested in building a item-based collaborative filtering
> >> > > > recommender
> >> > > > > >>> for our database of news articles.
> >> > > > > >>> After some coding, I was able to get some recommendations
> >> using a
> >> > > > > >>> GenericItemBasedRecommender, a CassandraDataModel and some
> >> custom
> >> > > > > >>> classes that store item similarities and migrated item IDs
> >>into
> >> > > > > >>> Cassandra. But know I'm in doubt of what is normally done
> >>with
> >> > this
> >> > > > > >>> recommender: Should I run this as a daemon, cache the
> >> > > recommendations
> >> > > > > >>> into memory and set up a web service to consult it online?
> >> > Should I
> >> > > > pre
> >> > > > > >>> process these recommendations for each recent user and
> >>store it
> >> > > > > >>> somewhere? My first idea was storing all these recs back
> >>into
> >> > > > > Cassandra,
> >> > > > > >>> but looking into some classes it seems to me that the norm
> >>is
> >> to
> >> > > read
> >> > > > > >>> the input data and store the output always using files. Is
> >> this a
> >> > > > > common
> >> > > > > >>> practice that benefits from HDFS?
> >> > > > > >>> My use case here is something around 70k recommendations
> >> requests
> >> > > per
> >> > > > > >>> second.
> >> > > > > >>>
> >> > > > > >>> Thanks in advance,
> >> > > > > >>>
> >> > > > > >>> --
> >> > > > > >>>
> >> > > > > >>> Atenciosamente
> >> > > > > >>> Helder Martins
> >> > > > > >>> Arquitetura do Portal e Sistemas de Backend
> >> > > > > >>> +55 (51) 3284-4475
> >> > > > > >>> Terra
> >> > > > > >>>
> >> > > > > >>>
> >> > > > > >>> Esta mensagem e seus anexos se dirigem exclusivamente ao seu
> >> > > > > >> destinatário,
> >> > > > > >>> podem conter informação privilegiada ou confidencial e são
> >>de
> >> uso
> >> > > > > >> exclusivo
> >> > > > > >>> da pessoa ou entidade de destino. Se não for destinatário
> >>desta
> >> > > > > mensagem,
> >> > > > > >>> fica notificado de que a leitura, utilização, divulgação
> >>e/ou
> >> > cópia
> >> > > > sem
> >> > > > > >>> autorização pode estar proibida em virtude da legislação
> >> vigente.
> >> > > Se
> >> > > > > >>> recebeu esta mensagem por engano, pedimos que nos o
> >>comunique
> >> > > > > >> imediatamente
> >> > > > > >>> por esta mesma via e, em seguida, apague-a.
> >> > > > > >>>
> >> > > > > >>> Este mensaje y sus adjuntos se dirigen exclusivamente a su
> >> > > > > destinatario,
> >> > > > > >>> puede contener información privilegiada o confidencial y es
> >> para
> >> > > uso
> >> > > > > >>> exclusivo de la persona o entidad de destino. Si no es
> >>usted él
> >> > > > > >>> destinatario indicado, queda notificado de que la lectura,
> >> > > > utilización,
> >> > > > > >>> divulgación y/o copia sin autorización puede estar
> >>prohibida en
> >> > > > virtud
> >> > > > > de
> >> > > > > >>> la legislación vigente. Si ha recibido este mensaje por
> >>error,
> >> le
> >> > > > > pedimos
> >> > > > > >>> que nos lo comunique inmediatamente por esta misma vía y
> >> proceda
> >> > a
> >> > > su
> >> > > > > >>> exclusión.
> >> > > > > >>>
> >> > > > > >>> The information contained in this transmissión is privileged
> >> and
> >> > > > > >>> confidential information intended only for the use of the
> >> > > individual
> >> > > > or
> >> > > > > >>> entity named above. If the reader of this message is not the
> >> > > intended
> >> > > > > >>> recipient, you are hereby notified that any dissemination,
> >> > > > distribution
> >> > > > > >> or
> >> > > > > >>> copying of this communication is strictly prohibited. If you
> >> have
> >> > > > > >> received
> >> > > > > >>> this transmission in error, do not read it. Please
> >>immediately
> >> > > reply
> >> > > > to
> >> > > > > >> the
> >> > > > > >>> sender that you have received this communication in error
> >>and
> >> > then
> >> > > > > delete
> >> > > > > >>> it.
> >> > > > > >>>
> >> > > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > BF Lyon
> >> > > > http://www.nowherenearithaca.com
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Luis Carlos Guerrero Covo
> >> M.S. Computer Engineering
> >> (57) 3183542047
> >>
>
>

Reply via email to