RE: Your Help Is Required !

prasad ch Thu, 23 Apr 2015 22:20:07 -0700

Thanks   Matthias , Thanks alot!
please help me , with other questions , if u not understand other question 
please  tel me !
Once Again! thanks alot!  'Matthias'***********************


> Date: Thu, 23 Apr 2015 13:47:44 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: Your Help Is Required !
> 
> Hi,
> 
> a short explanation about the different grouping pattern you can use.
> Let's say, you have 10 tuples each having two attributes: For example
> (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4)
> 
> And let's assume you have two bolt tasks (ie, a single bolt with dop=2),
> b1 and b2.
> 
> * If you use shuffle-grouping, each tuple is "randomly" sent to a single
> bolt talks; tuples are NOT replicated. For example:
> 
> b1: (a,1),(c,1),(b,2),(a,3),(c,3)
> b2: (b,1),(a,2),(c,2),(b,3),(a,4)
> 
> * If you use field-grouping data is partitioned according to the
> specified key. Let's assume, we use the first attribute as key.
> Thus, we have three partitions:
> (a,1),(a,2),(a,3),(a,4)
> (b,1),(b,2),(b,3)
> (c,1),(c,2),(c,3)
> 
> Storm ensures, that all tuples of a single partition are processed by a
> single task. Because we have only two bolt tasks in out example, one
> bolt task will process two partitions and the other one partition.
> Tuples are not replicates in this case either:
> 
> b1: (a,1),(a,2),(a,3),(a,4)
> b2: (b,1),(c,1),(b,2),(c,2)(b,3),(c,3)
> 
> Pay attention, that b2 received the tuple of both partitions interleaved.
> 
> * If you use all-grouping, each tuple is replicated to each bolt task:
> 
> b1: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4)
> b2: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4)
> 
> * If you use global-grouping all tuples are sent to a single bolt tasks.
> 
> b1: (a,1),(b,1),(c,1),(a,2),(b,2),(c,2),(a,3),(b,3),(c,3),(a,4)
> b2:
> 
> For this case, a dop greater then one is not useful.
> 
> 
> About question 4:
> If you read from multiples files, and use multiple spout tasks, you need
> to make sure, that each task is reading from a different file. If I
> understood it correctly, each task is reading each file in your case,
> and thus the get each tuple multiple times.
> -> To ensure, that each spout task read a different file, you can
> exploit the task-id of the spout tasks, to assign different files to
> different tasks.
> 
> 
> Unfortunately, I cannot answer your other questions.
> 
> 
> -Matthias
> 
> 
> On 04/23/2015 11:20 AM, prasad ch wrote:
> > Hi,
> >  
> > 
> > i read  complete storm document !  but am  not understanding the
> > following things !
> > 
> > please help me !.
> > 
> > 
> > 1) we have concept of stream grouping , am not getting any difference
> > practically  with one to one.
> >  in document , shuffle grouping mens  tuples are equally distributed to
> > all  tasks i.e if spout emits 10 tuples only i have bolt 5 tasks then
> > finally bolt receives 50 tuples .
> > 
> > field grouping  means all same tuples will go to same task ,am unable to
> > prove practically with all groupings.
> > 
> > please help me in stream grouping.
> > 
> > 2)for processing   tuples in bolt fast i used 10 executors  with field
> > grouping , i had a problem here if spout emits tuples 10 then am not
> > receive same tuples in bolt duplicates are coming but when i use global
> > grouping it is fine but slow in cluster mode.
> > 
> > 3)what are  the minimum basic properties for topology run fast  ?
> > 
> > 4)   i performed  reading list of files using spout using java ,when i
> > use multiple executors for reading fast already processed records are
> > coming at that scenario i  handled with java is it possible in storm? 
> > am unable to perform (reading from files and processing ) 10000 records
> > (including 10 files) aggregations  successfully ?
> > 
> > 5)in trident topology i did  aggregations with 10 files(each file 1000
> > records)  fine ,but when i use 10 files with 100000 records (each file
> > 10000 records) my application is keep processing nothing is done  i mean
> > control is not coming to corresponding filter  or function , it will
> > take lot of time to emit never comes to filter or function .(here i used
> > irichspout)
> > 
> > example code;-
> > main app code 
> > 
> >         Config con = new Config();
> >         con.setDebug(true);
> >         con.put("fileLocation",args[0]);
> >         con.put("ext",args[1]);
> >         con.setNumAckers(10);
> >                file=args[2];
> >         //con.setNumWorkers(Integer.parseInt(args[3]));
> >         System.out.println("application start time :"+new Date());
> >         TridentTopology topology = new TridentTopology();
> >         Stream s=topology.newStream("spout1", new
> > ReadingSpout1(9080000)).parallelismHint(10);
> >                 s.groupBy(new Fields("m")).
> >                 aggregate(new Fields("v"), new Sum(), new
> > Fields("r")).each(new Fields("m", "r"), new MyFun1(file), new
> > Fields("o")).parallelismHint(40);
> >         
> >         
> >         LocalCluster cluster=new LocalCluster();
> >         
> >         cluster.submitTopology("TD", con,topology.build()); 
> > 
> > 
> > 
> > 6) with out trident state tuples which are failed are never replay?
> > 
> > 
> > 7)when i run trident with specified  number of workers also it is not
> > running ,please help me any configuration i missed? 
> > 
> > 
> > 8)i have a requirement to perform streaming by reading list of files in
> > a specified locations(large number of files) i need to aggregate based
> > on considering sliding window operations and write result into some
> > files or any destination !
> > which way is better either normal topology or trident ?  please help me!
>

RE: Your Help Is Required !

Reply via email to