Yes that is great, I will help you with that. 2012/10/11 Panos Mandros <[email protected]>
> Hey Thomas, > implementing PLANET was part of my bachelor thesis. It works not only > for single label learning but for multi-label learning also, as this is one > of the areas my professor is interested. It works fine but still has things > that need to be done. One of these is to transfer it to Hama. Another thing > is to find a more efficient way to transfer data from mappers to the > reducer because right now the output is really big. If you want we can > cooperate on this. > > 2012/10/10 Thomas Jungblut <[email protected]> > > > Hey Panos, > > > > thanks for transferring this. > > > > Here is the paper for the others: > > > > > http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//pubs/archive/36296.pdf > > > > I wanted to do this, not enough time :/ > > As I said on stackoverflow, I think the graph package is the wrong > approach > > here, you can clearly translate the mapreduce algorithm to BSP > > and make use of the faster iterations. > > > > Do you already have the code in MapReduce? I can simply turn this into > BSP. > > I would like to support the creation of random forests as well, by > training > > a decision tree in every task and combining them later. > > > > > > 2012/10/10 Panos Mandros <[email protected]> > > > > > I currently have implemented in Hadoop, Google's framework for building > > > decision trees (also known as PLANET). It is supposed to scale well in > > > very large datasets. But it has many problems. It scales only well if > > > the dataset has a few attributes. If a dataset has a lot of attributes, > > > that means it will have a lot of map/reduce jobs which means a big > > > start-up cost for all of these jobs. Google however uses it with a lot > > > of modifications on its Hadoop like platform and not on the algorithm > > > itself. PLANET starts with a single vertex and with map reduce jobs you > > > add more and more until the tree is fully build. > > > > > > I have seen many times that Apache Hama is suitable for iterative > > > algorithms like graphs. Can someone build a new graph with Hama or you > > > just have as input a graph and make some computations on it? Will it be > > > easy to transfer my project to Hama?? Thanks > > > > > >
