So we talked about this and want to get it through within a few weeks, so stay tuned. I will add a jira for that soon.
2012/10/11 Thomas Jungblut <[email protected]> > Yes that is great, I will help you with that. > > > 2012/10/11 Panos Mandros <[email protected]> > >> Hey Thomas, >> implementing PLANET was part of my bachelor thesis. It works not only >> for single label learning but for multi-label learning also, as this is >> one >> of the areas my professor is interested. It works fine but still has >> things >> that need to be done. One of these is to transfer it to Hama. Another >> thing >> is to find a more efficient way to transfer data from mappers to the >> reducer because right now the output is really big. If you want we can >> cooperate on this. >> >> 2012/10/10 Thomas Jungblut <[email protected]> >> >> > Hey Panos, >> > >> > thanks for transferring this. >> > >> > Here is the paper for the others: >> > >> > >> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//pubs/archive/36296.pdf >> > >> > I wanted to do this, not enough time :/ >> > As I said on stackoverflow, I think the graph package is the wrong >> approach >> > here, you can clearly translate the mapreduce algorithm to BSP >> > and make use of the faster iterations. >> > >> > Do you already have the code in MapReduce? I can simply turn this into >> BSP. >> > I would like to support the creation of random forests as well, by >> training >> > a decision tree in every task and combining them later. >> > >> > >> > 2012/10/10 Panos Mandros <[email protected]> >> > >> > > I currently have implemented in Hadoop, Google's framework for >> building >> > > decision trees (also known as PLANET). It is supposed to scale well in >> > > very large datasets. But it has many problems. It scales only well if >> > > the dataset has a few attributes. If a dataset has a lot of >> attributes, >> > > that means it will have a lot of map/reduce jobs which means a big >> > > start-up cost for all of these jobs. Google however uses it with a lot >> > > of modifications on its Hadoop like platform and not on the algorithm >> > > itself. PLANET starts with a single vertex and with map reduce jobs >> you >> > > add more and more until the tree is fully build. >> > > >> > > I have seen many times that Apache Hama is suitable for iterative >> > > algorithms like graphs. Can someone build a new graph with Hama or you >> > > just have as input a graph and make some computations on it? Will it >> be >> > > easy to transfer my project to Hama?? Thanks >> > > >> > >> > >
