FW: Unable to work with storm

prasad ch Thu, 23 Apr 2015 02:19:40 -0700


From: [email protected]
To: [email protected]
Subject: Unable to work with storm
Date: Thu, 23 Apr 2015 10:24:12 +0530





Hi,
i read  complete storm document !  but am  not understanding the following 
things !
please help me !.

1) we have concept of stream grouping , am not getting any difference 
practically  with one to one. in document , shuffle grouping mens  tuples are 
equally distributed to all  tasks i.e if spout emits 10 tuples only i have bolt 
5 tasks then finally bolt receives 50 tuples .
field grouping  means all same tuples will go to same task ,am unable to prove 
practically with all groupings.
please help me in stream grouping.
2)for processing   tuples in bolt fast i used 10 executors  with field grouping 
, i had a problem here if spout emits tuples 10 then am not receive same tuples 
in bolt duplicates are coming but when i use global grouping it is fine but 
slow in cluster mode.
3)what are  the minimum basic properties for topology run fast  ?
4)   i performed  reading list of files using spout using java ,when i use 
multiple executors for reading fast already processed records are coming at 
that scenario i  handled with java is it possible in storm? am unable to 
perform (reading from files and processing ) 10000 records (including 10 files) 
aggregations  successfully ?
5)in trident topology i did  aggregations with 10 files(each file 1000 records) 
 fine ,but when i use 10 files with 100000 records (each file 10000 records) my 
application is keep processing nothing is done  i mean control is not coming to 
corresponding filter  or function , it will take lot of time to emit never 
comes to filter or function .(here i used irichspout)
example code;-main app code 
        Config con = new Config();        con.setDebug(true);        
con.put("fileLocation",args[0]);        con.put("ext",args[1]);        
con.setNumAckers(10);               file=args[2];        
//con.setNumWorkers(Integer.parseInt(args[3]));        
System.out.println("application start time :"+new Date());        
TridentTopology topology = new TridentTopology();        Stream 
s=topology.newStream("spout1", new ReadingSpout1(9080000)).parallelismHint(10); 
               s.groupBy(new Fields("m")).                aggregate(new 
Fields("v"), new Sum(), new Fields("r")).each(new Fields("m", "r"), new 
MyFun1(file), new Fields("o")).parallelismHint(40);                        
LocalCluster cluster=new LocalCluster();                
cluster.submitTopology("TD", con,topology.build()); 


6) with out trident state tuples which are failed are never replay?

7)when i run trident with specified  number of workers also it is not running 
,please help me any configuration i missed? 

8)i have a requirement to perform streaming by reading list of files in a 
specified locations(large number of files) i need to aggregate based on 
considering sliding window operations and write result into some files or any 
destination !which way is better either normal topology or trident ?  please 
help me!

FW: Unable to work with storm

Reply via email to