It seems that the file is lost in communication. Here is a copy: https://dl.dropboxusercontent.com/u/42489708/MasterSlaveBSP.py
Roman On Tue, Oct 1, 2013 at 6:13 PM, Roman Shapovalov <[email protected]> wrote: > Hi Martin, > >> it seems you have forgotten the attachment. > > I can see one in the message I sent. Attaching again, try this. > > >> But currently the Hama Streaming API [2] does not support partitioning. > > So, the text protocol does not support it, or does it lack only in the > Python wrapper? > > So, the default partitioning is arbitrary, regardless of who is > reading and who is not? Then it seems the easiest way to work it > around is to have the master thread resend those records to slaves... > if they are not very big. > > Thanks, > Roman > > On Tue, Oct 1, 2013 at 9:30 AM, Martin Illecker <[email protected]> wrote: >> Hi Roman, >> >> it seems you have forgotten the attachment. (your code) >> >> ad 1) >> I would solve this by using a custom partitioner. >> A custom partitioner defines which records are distributed to which tasks. >> >> Here is some C++ partitioner example [1]. >> e.g., key 3,6,9 partitioner should return 1 >> and key 2,5,8 should return 2 >> >> But currently the Hama Streaming API [2] does not support partitioning. >> Only Hama Pipes C++ supports it. >> >> ad 2) >> Please submit your code, I will have a look at this exception. >> Or please submit the tasklog. >> >> Martin >> >> [1] >> https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/impl/matrixmultiplication.cc#L131-138 >> [2] >> https://github.com/millecker/HamaStreaming/blob/1009bb1a6472d11f5dd3af9dc07fe64547dd0290/BinaryProtocol.py#L37-38 >> >> 2013/9/30 Roman Shapovalov <[email protected]> >> >>> Hello all, >>> >>> I am developing a toy master-slave application for the Python >>> streaming interface. There are two issues. >>> >>> 1. What is the semantics of the readNext command? >>> >>> If I run 3 tasks -- one of them is master who does not read input, -- >>> slaves take turn to read records, but each of them reads only each >>> third example, e.g. slave#1 reads records 3,6,9, while slave#2 reads >>> 2,5,8. So 1/3 of records are skipped, as if the master task would read >>> them. >>> >>> So, what is the exact semantics? Is there any best practice to make >>> each example read by some task (but not the master). >>> >>> >>> 2. After the code is executed (and the output is written), the job >>> fails. All the task logs contain the following text: >>> >>> 13/09/30 16:32:09 ERROR protocol.UplinkReader: >>> java.lang.NullPointerException >>> at >>> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:127) >>> >>> The exception is raised even if I don't use pipes at all. Since it >>> shows up after cleanup, it is not critical for the program, but it may >>> indicate some misuse by me or bugs in the Hama code. >>> >>> Please look at that issue. My code is attached. >>> >>> Roman >>>
