From: [email protected]
To: [email protected]
Subject: Unable to work with storm
Date: Thu, 23 Apr 2015 10:24:12 +0530
Hi,
i read complete storm document ! but am not understanding the following
things !
please help me !.
1) we have concept of stream grouping , am not getting any difference
practically with one to one. in document , shuffle grouping mens tuples are
equally distributed to all tasks i.e if spout emits 10 tuples only i have bolt
5 tasks then finally bolt receives 50 tuples .
field grouping means all same tuples will go to same task ,am unable to prove
practically with all groupings.
please help me in stream grouping.
2)for processing tuples in bolt fast i used 10 executors with field grouping
, i had a problem here if spout emits tuples 10 then am not receive same tuples
in bolt duplicates are coming but when i use global grouping it is fine but
slow in cluster mode.
3)what are the minimum basic properties for topology run fast ?
4) i performed reading list of files using spout using java ,when i use
multiple executors for reading fast already processed records are coming at
that scenario i handled with java is it possible in storm? am unable to
perform (reading from files and processing ) 10000 records (including 10 files)
aggregations successfully ?
5)in trident topology i did aggregations with 10 files(each file 1000 records)
fine ,but when i use 10 files with 100000 records (each file 10000 records) my
application is keep processing nothing is done i mean control is not coming to
corresponding filter or function , it will take lot of time to emit never
comes to filter or function .(here i used irichspout)
example code;-main app code
Config con = new Config(); con.setDebug(true);
con.put("fileLocation",args[0]); con.put("ext",args[1]);
con.setNumAckers(10); file=args[2];
//con.setNumWorkers(Integer.parseInt(args[3]));
System.out.println("application start time :"+new Date());
TridentTopology topology = new TridentTopology(); Stream
s=topology.newStream("spout1", new ReadingSpout1(9080000)).parallelismHint(10);
s.groupBy(new Fields("m")). aggregate(new
Fields("v"), new Sum(), new Fields("r")).each(new Fields("m", "r"), new
MyFun1(file), new Fields("o")).parallelismHint(40);
LocalCluster cluster=new LocalCluster();
cluster.submitTopology("TD", con,topology.build());
6) with out trident state tuples which are failed are never replay?
7)when i run trident with specified number of workers also it is not running
,please help me any configuration i missed?
8)i have a requirement to perform streaming by reading list of files in a
specified locations(large number of files) i need to aggregate based on
considering sliding window operations and write result into some files or any
destination !which way is better either normal topology or trident ? please
help me!