Can you point me to the example , I am not able to understand what you mean by partition the data accordingly.
If i have 3 node storm cluster and i want to go a global count how it will work , please explain if possible. Regards On Fri, May 29, 2015 at 12:04 PM, Andrew Xor <[email protected]> wrote: > I am guessing that as you currently do it you spawn the different tasks > counting the same thing, hence you are basically reading the data twice. I > suspect if you set a parallelism hint of 3,4 and so on you would get 21 x > that number. To do a global count you need to partition the data > accordingly, please do check the documentation as it has some useful > examples as to how to accomplish that. > > Regards. > > On Fri, May 29, 2015 at 6:56 PM P. Taylor Goetz <[email protected]> wrote: > >> Try moving “.parallelismHint(2)” to after the groupBy. >> >> With the current placement (before the groupBy) Storm is creating two >> instances of your spout, each outputting the same data set. >> >> -Taylor >> >> >> On May 29, 2015, at 11:09 AM, Ashish Soni <[email protected]> wrote: >> >> > HI All , >> > >> > I am trying to run a global count using Trident and when i use Parallel >> hint of 2 it is getting double counted , Please tell me what i am doing >> wrong , below is the code and sample data set. >> > >> > I am just trying to count the no of calls made by a particular phone no >> and when i do not specify the parallel hint i get the count as 21 but when >> i specify parallel of 2 it get count as 42 >> > >> > >> > public static void main(String[] args) { >> > TridentTopology topology = new TridentTopology(); >> > topology.newStream("cdrevent", new >> CSVSpout("testdata.csv", ',', false)) >> > .parallelismHint(2). >> > groupBy(new Fields("field_1")).aggregate(new >> Fields("field_1"), new Count(),new Fields("count")). >> > each(new Fields("field_1","count"), new >> Utils.PrintFilter()); >> > >> > Config config = new Config(); >> > config.put(RichSpoutBatchExecutor.MAX_BATCH_SIZE_CONF, >> 100); >> > >> > >> > LocalCluster cluster = new LocalCluster(); >> > >> > cluster.submitTopology("cdreventTopology", config, >> topology.build()); >> > >> > backtype.storm.utils.Utils.sleep(10000); >> > cluster.killTopology("cdreventTopology"); >> > >> > } >> > >> > >> > >> 0,05051111111,0,05055555555,22/01/2014,22/01/2014,21:15:00,21:20:00,Local,20.0 >> > >> 1,05051111111,0,05055555555,13/01/2014,13/01/2014,18:00:00,18:05:00,Local,5.0 >> > >> 2,05051111111,0,05055555555,21/01/2014,21/01/2014,20:35:00,20:35:00,Local,0.0 >> > >> 3,05051111111,0,05055555555,11/02/2014,11/02/2014,00:00:00,00:00:00,Local,0.0 >> > >> 4,05051111111,0,05055555555,22/02/2014,22/02/2014,20:25:00,20:25:00,Local,0.0 >> > >> 5,05051111111,1,38722222222,20/01/2014,20/01/2014,18:20:00,18:22:00,Intl,50.0 >> > >> 6,05051111111,0,05055555555,15/01/2014,15/01/2014,01:25:00,01:25:00,Local,0.0 >> > >> 7,05051111111,0,08453350000,31/12/2013,31/12/2013,12:40:00,12:44:00,National,32.0 >> > >> 8,05051111111,1,05055555555,30/12/2013,30/12/2013,18:40:00,18:40:00,Local,0.0 >> > >> 9,05051111111,0,08453350000,10/01/2014,10/01/2014,13:10:00,13:14:00,National,32.0 >> > >> 10,05051111111,0,05055555555,17/02/2014,17/02/2014,17:50:00,17:50:00,Local,0.0 >> > >> 11,05051111111,0,08453350000,09/02/2014,09/02/2014,18:15:00,18:17:00,National,8.0 >> > >> 12,05051111111,1,05055555555,09/02/2014,09/02/2014,17:05:00,17:05:00,Local,0.0 >> > >> 13,05051111111,1,05055555555,15/02/2014,15/02/2014,18:45:00,18:45:00,Local,0.0 >> > >> 14,05051111111,0,08453350000,20/02/2014,20/02/2014,19:20:00,19:21:00,National,8.0 >> > >> 15,05051111111,1,08453350000,05/01/2014,05/01/2014,13:50:00,13:53:00,National,12.0 >> > >> 16,05051111111,1,07776122222,26/01/2014,26/01/2014,14:00:00,14:00:00,Mobile,0.0 >> > >> 17,05051111111,0,05055555555,04/02/2014,04/02/2014,12:30:00,12:35:00,Local,20.0 >> > >> 18,05051111111,1,05055555555,16/01/2014,16/01/2014,20:20:00,20:25:00,Local,20.0 >> > >> 19,05051111111,0,05055555555,23/02/2014,23/02/2014,16:55:00,17:07:00,Local,12.0 >> > >> 20,05051111111,0,05055555555,07/01/2014,07/01/2014,16:20:00,16:23:00,Local,12.0 >> > >> > Ashish >> > >> > >> >>
