Hi it works as you expected? I thought bsp.input.runtime.partitioning should be true. :0
-- Best Regards, Edward J. Yoon Chief Executive Officer DataSayer Co., Ltd. > 2014. 10. 21., 오전 6:31, Leonidas Fegaras <[email protected]> 작성: > > Hi Edward, > OK. It works now. I used the following in hama-site.xml: > > <property> > <name>bsp.input.runtime.partitioning</name> > <value>false</value> > </property> > > and re-started bspd. The correct code for the Job is: > > job.setNumBspTask(10); > job.setPartitioner(org.apache.hama.bsp.HashPartitioner.class); > > Maybe you should explain this in the Hama Wiki. > Thanks. > Leonidas > > On 10/20/2014 02:19 PM, Leonidas Fegaras wrote: >> Hi Edward, >> Thank you for the reply. >> But I want the opposite: I want to create more tasks than blocks, not >> fewer tasks than blocks. >> That is, I want to be able to send less than one block to each task (for >> example, only 10000 bytes). Sending less data to a task will speed-up >> execution and will require less memory at each node. Hadoop map-reduce, >> Spark, and Flink allow you to use a split size smaller than a block. >> Also, I used to be able to do this with Hama 0.5.0 but not with Hama >> 0.6.4. Did you remove this capability because it is a bad idea or >> because it is very hard to implement? >> >> Based on your instructions, I tried the following: >> >> job.setNumBspTask(10); >> job.setBoolean("bsp.input.runtime.partitioning",false); >> job.setPartitioner(org.apache.hama.bsp.HashPartitioner.class); >> >> I get the following error: >> >> java.lang.ArrayIndexOutOfBoundsException: 1 >> at org.apache.hama.bsp.BSPJobClient.writeSplits(BSPJobClient.java:556) >> at >> org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:354) >> at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:296) >> at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:219) >> at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:226) >> >> Thanks. >> Leonidas >> >> >> On 10/20/2014 10:06 AM, Edward J. Yoon wrote: >>> Hi Leonidas, >>> >>> The bsp.min.split.size property is used to prevent to create too many >>> tasks, like Hadoop MR (NOTE: if bsp.min.split.size is less than block >>> size then 1 block is sent to each task). >>> >>> I guess this will work fine. BTW, if you set the input partitioner >>> then input partitioner creates the new partitions as you specified in >>> the setNumBspTask() method (graph job pre-processes the (hash) input >>> partition by default). >>> >>> Thanks. >>> >>> -- >>> Best Regards, Edward J. Yoon >>> Chief Executive Officer >>> DataSayer Co., Ltd. >>> >>>> 2014. 10. 20., 오후 10:51, Leonidas Fegaras <[email protected] >>>> <mailto:[email protected]>> 작성: >>>> >>>> Dear Hama developers, >>>> I still have a problem setting the split size of an HDFS input file >>>> using Hama 0.6.4. For example, when I use: >>>> >>>> BSPJob job = new BSPJob(conf,BSPop.class); >>>> job.setNumBspTask(10); >>>> job.setLong("bsp.min.split.size",10000L); // 10000 bytes >>>> >>>> For a small file with 2 blocks, this will use only 2 BSP tasks (one >>>> for each block), instead of 10. >>>> This used to work in Hama 0.5.0. >>>> Any suggestions? >>>> Thanks. >>>> Leonidas Fegaras >>>> >>>> On 01/04/2013 05:45 PM, Edward J. Yoon wrote: >>>>> Hello, >>>>> >>>>>> than a block. But if you have more nodes in your cluster than data >>>>>> blocks, >>>>>> you may get faster execution if you allow splits smaller than a >>>>>> block. Is >>>>> You're right. So, we're working on partitioning issues now. >>>>> >>>>>> you may get faster execution if you allow splits smaller than a >>>>>> block. Is >>>>>> there any way to use splits smaller than a block in Hama 0.6.0? >>>>> Yes. But, Hama 0.6.1 version will support it. >>>>> >>>>> On Sat, Jan 5, 2013 at 4:59 AM, Leonidas Fegaras >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>> Dear Hama developers, >>>>>> It seems that the splits generated by the FileInputFormat in Hama 0.6.0 >>>>>> cannot be smaller than a block. In Hama 0.5.0, I could set any >>>>>> split size >>>>>> using job.set("bsp.min.split.size",...) and set the task numbers using >>>>>> job.setNumBspTask(...). This is ignored by Hama 0.6.0 for a split >>>>>> smaller >>>>>> than a block. But if you have more nodes in your cluster than data >>>>>> blocks, >>>>>> you may get faster execution if you allow splits smaller than a >>>>>> block. Is >>>>>> there any way to use splits smaller than a block in Hama 0.6.0? >>>>>> Thanks for your help, >>>>>> Leonidas >>>>>> >>>>> >
