Thanks Matthias , Thanks alot! please help me , with other questions , if u not understand other question please tel me ! Once Again! thanks alot! 'Matthias'***********************
> Date: Thu, 23 Apr 2015 13:47:44 +0200 > From: [email protected] > To: [email protected] > Subject: Re: Your Help Is Required ! > > Hi, > > a short explanation about the different grouping pattern you can use. > Let's say, you have 10 tuples each having two attributes: For example > (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4) > > And let's assume you have two bolt tasks (ie, a single bolt with dop=2), > b1 and b2. > > * If you use shuffle-grouping, each tuple is "randomly" sent to a single > bolt talks; tuples are NOT replicated. For example: > > b1: (a,1),(c,1),(b,2),(a,3),(c,3) > b2: (b,1),(a,2),(c,2),(b,3),(a,4) > > * If you use field-grouping data is partitioned according to the > specified key. Let's assume, we use the first attribute as key. > Thus, we have three partitions: > (a,1),(a,2),(a,3),(a,4) > (b,1),(b,2),(b,3) > (c,1),(c,2),(c,3) > > Storm ensures, that all tuples of a single partition are processed by a > single task. Because we have only two bolt tasks in out example, one > bolt task will process two partitions and the other one partition. > Tuples are not replicates in this case either: > > b1: (a,1),(a,2),(a,3),(a,4) > b2: (b,1),(c,1),(b,2),(c,2)(b,3),(c,3) > > Pay attention, that b2 received the tuple of both partitions interleaved. > > * If you use all-grouping, each tuple is replicated to each bolt task: > > b1: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4) > b2: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4) > > * If you use global-grouping all tuples are sent to a single bolt tasks. > > b1: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4) > b2: > > For this case, a dop greater then one is not useful. > > > About question 4: > If you read from multiples files, and use multiple spout tasks, you need > to make sure, that each task is reading from a different file. If I > understood it correctly, each task is reading each file in your case, > and thus the get each tuple multiple times. > -> To ensure, that each spout task read a different file, you can > exploit the task-id of the spout tasks, to assign different files to > different tasks. > > > Unfortunately, I cannot answer your other questions. > > > -Matthias > > > On 04/23/2015 11:20 AM, prasad ch wrote: > > Hi, > > > > > > i read complete storm document ! but am not understanding the > > following things ! > > > > please help me !. > > > > > > 1) we have concept of stream grouping , am not getting any difference > > practically with one to one. > > in document , shuffle grouping mens tuples are equally distributed to > > all tasks i.e if spout emits 10 tuples only i have bolt 5 tasks then > > finally bolt receives 50 tuples . > > > > field grouping means all same tuples will go to same task ,am unable to > > prove practically with all groupings. > > > > please help me in stream grouping. > > > > 2)for processing tuples in bolt fast i used 10 executors with field > > grouping , i had a problem here if spout emits tuples 10 then am not > > receive same tuples in bolt duplicates are coming but when i use global > > grouping it is fine but slow in cluster mode. > > > > 3)what are the minimum basic properties for topology run fast ? > > > > 4) i performed reading list of files using spout using java ,when i > > use multiple executors for reading fast already processed records are > > coming at that scenario i handled with java is it possible in storm? > > am unable to perform (reading from files and processing ) 10000 records > > (including 10 files) aggregations successfully ? > > > > 5)in trident topology i did aggregations with 10 files(each file 1000 > > records) fine ,but when i use 10 files with 100000 records (each file > > 10000 records) my application is keep processing nothing is done i mean > > control is not coming to corresponding filter or function , it will > > take lot of time to emit never comes to filter or function .(here i used > > irichspout) > > > > example code;- > > main app code > > > > Config con = new Config(); > > con.setDebug(true); > > con.put("fileLocation",args[0]); > > con.put("ext",args[1]); > > con.setNumAckers(10); > > file=args[2]; > > //con.setNumWorkers(Integer.parseInt(args[3])); > > System.out.println("application start time :"+new Date()); > > TridentTopology topology = new TridentTopology(); > > Stream s=topology.newStream("spout1", new > > ReadingSpout1(9080000)).parallelismHint(10); > > s.groupBy(new Fields("m")). > > aggregate(new Fields("v"), new Sum(), new > > Fields("r")).each(new Fields("m", "r"), new MyFun1(file), new > > Fields("o")).parallelismHint(40); > > > > > > LocalCluster cluster=new LocalCluster(); > > > > cluster.submitTopology("TD", con,topology.build()); > > > > > > > > 6) with out trident state tuples which are failed are never replay? > > > > > > 7)when i run trident with specified number of workers also it is not > > running ,please help me any configuration i missed? > > > > > > 8)i have a requirement to perform streaming by reading list of files in > > a specified locations(large number of files) i need to aggregate based > > on considering sliding window operations and write result into some > > files or any destination ! > > which way is better either normal topology or trident ? please help me! >
