Dongchao,
the problem is that i would not want to write each entry (very small) to
hdfs, this will make hive loading very inefficient.(though i can do file
merging in separate job). So ideally, i would like to write all entries
within the same 6 min to the same file.
right now i am actually thinking about adding a timer(say 6min) in my bolt,
collect all input to memory,  and write to a single file on time out...
Chen


On Tue, Jan 7, 2014 at 5:00 PM, Ding,Dongchao <[email protected]>wrote:

>   Hi   ,some suggestions
>
> You  didn’t need  to “instruct data within the same hourly tenth to the
> same bolt”   , just write   the entries within the same hourly tenth(6
> min) to the same hdfs  directory .
>
> Because hive partition locates to one hdfs  directory ,not one hdfs file
> .
>
> thks
>
> ding
>
> *发件人:* Chen Wang [mailto:[email protected]]
> *发送时间:* 2014年1月8日 7:47
> *收件人:* [email protected]
> *主题:* write to the same file in bolt?
>
>
>
> Hey Guys,
>
> I am using storm to read data from our socket server, entry by entry. Each
> entry has a time stamp. In my bolt, i will need to write the entries within
> the same hourly tenth(6 min) to the same hdfs file, so that later i can
> load them to hive. (with hourly tenth 6min as the partition).
>
>
>
> In order to achieve that, i will either need
>
>     1 instruct data within the same hourly tenth to the same bolt
>
> or  2. share the same file writer for all bolts that deal with data within
> the same hourly tenth.
>
>
>
> How can I achieve this? or  if there is some other approach for this
> problem?
>
> Thank you very much!
>
> Chen
>
>
>
>
>

Reply via email to