Dongchao, the problem is that i would not want to write each entry (very small) to hdfs, this will make hive loading very inefficient.(though i can do file merging in separate job). So ideally, i would like to write all entries within the same 6 min to the same file. right now i am actually thinking about adding a timer(say 6min) in my bolt, collect all input to memory, and write to a single file on time out... Chen
On Tue, Jan 7, 2014 at 5:00 PM, Ding,Dongchao <[email protected]>wrote: > Hi ,some suggestions > > You didn’t need to “instruct data within the same hourly tenth to the > same bolt” , just write the entries within the same hourly tenth(6 > min) to the same hdfs directory . > > Because hive partition locates to one hdfs directory ,not one hdfs file > . > > thks > > ding > > *发件人:* Chen Wang [mailto:[email protected]] > *发送时间:* 2014年1月8日 7:47 > *收件人:* [email protected] > *主题:* write to the same file in bolt? > > > > Hey Guys, > > I am using storm to read data from our socket server, entry by entry. Each > entry has a time stamp. In my bolt, i will need to write the entries within > the same hourly tenth(6 min) to the same hdfs file, so that later i can > load them to hive. (with hourly tenth 6min as the partition). > > > > In order to achieve that, i will either need > > 1 instruct data within the same hourly tenth to the same bolt > > or 2. share the same file writer for all bolts that deal with data within > the same hourly tenth. > > > > How can I achieve this? or if there is some other approach for this > problem? > > Thank you very much! > > Chen > > > > >
