Hi Roman, sorry for the delay.
So, the text protocol does not support it, or does it lack only in the > Python wrapper? Yes only the Python wrapper does not support it. The Hama Pipes protocol (C++) does support custom partitioning. I could reproduce the exception, which is occurring during the protocol shutdown only in Streaming API. 13/09/30 16:32:09 ERROR protocol.UplinkReader: > java.lang.NullPointerException > at org.apache.hama.pipes.protocol.UplinkReader.run( > UplinkReader.java:127) It was a problem in the Python wrapper and I fixed it [1] in my github repository [2]. If I send any message of the length L, I receive the message with > additional (L-1)/2 '^@' symbols after it. > So, I use the following workaround. In the BSPPeer.getCurrentMessage(): > return line[:len(line)-len(line)//3] > instead of > return line I could not locate the cause [3] for your currentMessage length problem, so I committed your workaround [4]. Then it seems the easiest way to work it > around is to have the master thread resend those records to slaves... > if they are not very big. > Yes this would be an option but partitioning should be preferred. Any chance to switch to Hama Pipes C++? Martin [1] https://github.com/millecker/HamaStreaming/commit/95466287e883a892f85427303ff255bc52b00b9a [2] https://github.com/millecker/HamaStreaming [3] https://github.com/apache/hama/blob/trunk/core/src/main/java/org/apache/hama/pipes/protocol/StreamingProtocol.java#L94 [4] https://github.com/millecker/HamaStreaming/commit/12ceb0d2db5d088004566592e42431ee40558fa7 2013/10/1 Roman Shapovalov <[email protected]> > It seems that the file is lost in communication. Here is a copy: > > https://dl.dropboxusercontent.com/u/42489708/MasterSlaveBSP.py > > Roman > > > On Tue, Oct 1, 2013 at 6:13 PM, Roman Shapovalov > <[email protected]> wrote: > > Hi Martin, > > > >> it seems you have forgotten the attachment. > > > > I can see one in the message I sent. Attaching again, try this. > > > > > >> But currently the Hama Streaming API [2] does not support partitioning. > > > > So, the text protocol does not support it, or does it lack only in the > > Python wrapper? > > > > So, the default partitioning is arbitrary, regardless of who is > > reading and who is not? Then it seems the easiest way to work it > > around is to have the master thread resend those records to slaves... > > if they are not very big. > > > > Thanks, > > Roman > > > > On Tue, Oct 1, 2013 at 9:30 AM, Martin Illecker <[email protected]> > wrote: > >> Hi Roman, > >> > >> it seems you have forgotten the attachment. (your code) > >> > >> ad 1) > >> I would solve this by using a custom partitioner. > >> A custom partitioner defines which records are distributed to which > tasks. > >> > >> Here is some C++ partitioner example [1]. > >> e.g., key 3,6,9 partitioner should return 1 > >> and key 2,5,8 should return 2 > >> > >> But currently the Hama Streaming API [2] does not support partitioning. > >> Only Hama Pipes C++ supports it. > >> > >> ad 2) > >> Please submit your code, I will have a look at this exception. > >> Or please submit the tasklog. > >> > >> Martin > >> > >> [1] > >> > https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/impl/matrixmultiplication.cc#L131-138 > >> [2] > >> > https://github.com/millecker/HamaStreaming/blob/1009bb1a6472d11f5dd3af9dc07fe64547dd0290/BinaryProtocol.py#L37-38 > >> > >> 2013/9/30 Roman Shapovalov <[email protected]> > >> > >>> Hello all, > >>> > >>> I am developing a toy master-slave application for the Python > >>> streaming interface. There are two issues. > >>> > >>> 1. What is the semantics of the readNext command? > >>> > >>> If I run 3 tasks -- one of them is master who does not read input, -- > >>> slaves take turn to read records, but each of them reads only each > >>> third example, e.g. slave#1 reads records 3,6,9, while slave#2 reads > >>> 2,5,8. So 1/3 of records are skipped, as if the master task would read > >>> them. > >>> > >>> So, what is the exact semantics? Is there any best practice to make > >>> each example read by some task (but not the master). > >>> > >>> > >>> 2. After the code is executed (and the output is written), the job > >>> fails. All the task logs contain the following text: > >>> > >>> 13/09/30 16:32:09 ERROR protocol.UplinkReader: > >>> java.lang.NullPointerException > >>> at > >>> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:127) > >>> > >>> The exception is raised even if I don't use pipes at all. Since it > >>> shows up after cleanup, it is not critical for the program, but it may > >>> indicate some misuse by me or bugs in the Hama code. > >>> > >>> Please look at that issue. My code is attached. > >>> > >>> Roman > >>> >
