Although I think this is a great project, I think that you will not meet the requirements. You need a community and a charter to get it into the incubation.
What about hosting it on Github? 2012/9/7 Leonidas Fegaras <[email protected]> > Yes, this is a great idea. I have used GIT on my own server but I don't > know how to do this for ASF. Could you please send me a link for setting up > an open-source Apache project? > > > On 09/05/2012 10:51 AM, Edward J. Yoon wrote: > >> If you can open source this then I'm sure the ASF community can help >> you and make this software better. >> >> Pls feel free to ask us if you need any assistance donating source >> code to the ASF or contributing to the Hama project in the future. >> >> On Thu, Aug 30, 2012 at 11:40 PM, Leonidas Fegaras<[email protected]> >> wrote: >> >>> Yes sure. I have fixed the bug with the repeat stopping condition but I >>> have >>> only tested pagerank on my small cluster. I still need to fix the k-means >>> clustering (it's a special case because you improve a fixed number of >>> points). >>> Leonidas >>> >>> >>> On Aug 30, 2012, at 9:02 AM, Edward J. Yoon wrote: >>> >>> Shall we work together? >>>> >>>> On Fri, Aug 24, 2012 at 9:01 PM, Leonidas Fegaras<[email protected]> >>>> wrote: >>>> >>>>> Thank you very much for your interest and for testing my system. >>>>> It seems that my release was premature: It worked for some random data >>>>> but >>>>> didn't for some others. It's a minor logical error that I will try to >>>>> fix >>>>> in >>>>> the next few days. The problem is with the stopping condition of the >>>>> repeat >>>>> expression that calculates the new pagerank from the old. It must stop >>>>> if >>>>> ALL peers reach the specified precision. This is done by having those >>>>> peers >>>>> that need to continue send a message to others to continue. It seems >>>>> that >>>>> now when all peers agree at the same time, the program works fine. But >>>>> if >>>>> one finishes sooner, instead of continuing the repeat loop, it runs >>>>> away >>>>> to >>>>> the next BSP step that follows the repeat, then exits prematurely and >>>>> the >>>>> system hangs. The casting errors are due to the run-away peers >>>>> executing >>>>> the >>>>> wrong BSP steps reading wrong messages. Queries without repeat though >>>>> are >>>>> OK. >>>>> By the way, I had a problem exchanging large amount of data during sync >>>>> (I >>>>> discussed this with Thomas). My solution was to to break a BSP >>>>> superstep >>>>> into multiple substeps so that each substep can handle a max number of >>>>> messages. Of course my program has to collect all messages in a vector >>>>> in >>>>> memory. When the vector is too big, it is spilled in a local file. This >>>>> moved the problem from the Hama side to my side and allowed me to >>>>> handle >>>>> larger data, especially in joins. I think this problem of exchanging >>>>> large >>>>> amount of data during a superstep is currently a weakness of Hama. >>>>> Leonidas >>>>> >>>>> >>>>> >>>>> On 08/24/2012 04:15 AM, Thomas Jungblut wrote: >>>>> >>>>>> >>>>>> BTW, should we feature this on our website? >>>>>> >>>>>> 2012/8/24 Thomas >>>>>> Jungblut<thomas.jungblut@**gmail.com<[email protected]> >>>>>> > >>>>>> >>>>>> Hi Leonidas! >>>>>>> >>>>>>> I have to admit that I have known what is going on (and had to keep >>>>>>> silent), but I have to say: Thank you very much! >>>>>>> This will help many people writing BSPs in a more easier way. >>>>>>> >>>>>>> Of course this is not as fast as the native BSP code, Hive and Pig >>>>>>> suffer >>>>>>> from the same problems in MR. >>>>>>> But it gives people the opportunity to develop faster and get their >>>>>>> code >>>>>>> in production with just a minor time expense. >>>>>>> >>>>>>> And I think, that we will help you gladly on improving the BSP part >>>>>>> of >>>>>>> your framework. At least I would do ;) >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> 2012/8/24 Edward J. Yoon<[email protected]> >>>>>>> >>>>>>> Here's my few test results on Oracle BDA (40G/s infiniband network). >>>>>>> >>>>>>>> >>>>>>>> It seems slow than our PageRank example. >>>>>>>> >>>>>>>> P.S., There are some errors so I couldn't test large-scale. >>>>>>>> (java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast to >>>>>>>> hadoop.mrql.Inv and java.lang.Error: Cannot clear a non-materialized >>>>>>>> sequence ..., etc.) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> == 100K nodes and 1M edges == >>>>>>>> >>>>>>>> *** Using 10 BSP tasks (out of a max 10). Each task will handle >>>>>>>> about >>>>>>>> 2383611 bytes of input data. >>>>>>>> >>>>>>>> Run time: 30.384 secs >>>>>>>> >>>>>>>> *** Using 20 BSP tasks (out of a max 20). Each task will handle >>>>>>>> about >>>>>>>> 1191805 bytes of input data. >>>>>>>> >>>>>>>> Run time: 24.412 secs >>>>>>>> >>>>>>>> On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon >>>>>>>> <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Wow, very interesting. I'm going to install and test on my large >>>>>>>>> >>>>>>>> >>>>>>>> cluster. >>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras >>>>>>>>> <[email protected]> >>>>>>>>> >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>>> Dear Hama users, >>>>>>>>>> I am pleased to announce that the MRQL query processing system can >>>>>>>>>> now >>>>>>>>>> evaluate SQL-like queries on a Hama cluster. MRQL is available at: >>>>>>>>>> >>>>>>>>>> http://lambda.uta.edu/mrql/ >>>>>>>>>> >>>>>>>>>> MRQL (the Map-Reduce Query Language) is an SQL-like query language >>>>>>>>>> for >>>>>>>>>> large-scale, distributed data analysis. MRQL is powerful enough to >>>>>>>>>> express most common data analysis tasks over many different kinds >>>>>>>>>> of >>>>>>>>>> raw data, including hierarchical data and nested collections, such >>>>>>>>>> as >>>>>>>>>> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using >>>>>>>>>> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using >>>>>>>>>> Apache >>>>>>>>>> Hama. Both modes use Apache's HDFS to read and write their data. >>>>>>>>>> >>>>>>>>>> Note that, the BSP mode is currently experimental (not fine-tuned >>>>>>>>>> yet) >>>>>>>>>> and lacks any fault-tolerance (if an error occurs, the entire job >>>>>>>>>> must >>>>>>>>>> be restarted). Due to our limited resources, MRQL has only been >>>>>>>>>> tested >>>>>>>>>> on a small cluster (7-nodes/28-cores). We compared the BSP mode >>>>>>>>>> with >>>>>>>>>> the MR mode by evaluating a pagerank query over a small graph >>>>>>>>>> (100K >>>>>>>>>> nodes, 1M edges) and found that BSP mode is about 4.5 times faster >>>>>>>>>> than the MR mode. Please let me know if you'd like to contribute >>>>>>>>>> to >>>>>>>>>> this project by testing MRQL on a larger cluster. >>>>>>>>>> Best regards, >>>>>>>>>> Leonidas Fegaras >>>>>>>>>> University of Texas at Arlington >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, Edward J. Yoon >>>>>>>>> @eddieyoon >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, Edward J. Yoon >>>>>>>> @eddieyoon >>>>>>>> >>>>>>>> . >>>>>> >>>>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>>> >>> >>> >> >> >
