Ok but so some questions :
- Sometimes I have to remove some messages from HDFS (cancel/replace cases) , 
is it possible ?
- In the case of a big zip file, is it possible to easily process Pig on it 
directly ?

Tks
Nicolas

----- Mail original -----
De: "Tao Lu" <[email protected]>
À: [email protected]
Cc: "Ted Yu" <[email protected]>, "user" <[email protected]>
Envoyé: Mercredi 2 Septembre 2015 19:09:23
Objet: Re: Small File to HDFS


You may consider storing it in one big HDFS file, and to keep appending new 
messages to it. 


For instance, 
one message -> zip it -> append it to the HDFS as one line 


On Wed, Sep 2, 2015 at 12:43 PM, < [email protected] > wrote: 


Hi, 
I already store them in MongoDB in parralel for operational access and don't 
want to add an other database in the loop 
Is it the only solution ? 

Tks 
Nicolas 

----- Mail original ----- 
De: "Ted Yu" < [email protected] > 
À: [email protected] 
Cc: "user" < [email protected] > 
Envoyé: Mercredi 2 Septembre 2015 18:34:17 
Objet: Re: Small File to HDFS 




Instead of storing those messages in HDFS, have you considered storing them in 
key-value store (e.g. hbase) ? 


Cheers 


On Wed, Sep 2, 2015 at 9:07 AM, < [email protected] > wrote: 


Hello, 
I'am currently using Spark Streaming to collect small messages (events) , size 
being <50 KB , volume is high (several millions per day) and I have to store 
those messages in HDFS. 
I understood that storing small files can be problematic in HDFS , how can I 
manage it ? 

Tks 
Nicolas 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: [email protected] 
For additional commands, e-mail: [email protected] 



--------------------------------------------------------------------- 
To unsubscribe, e-mail: [email protected] 
For additional commands, e-mail: [email protected] 





-- 


------------------------------------------------ Thanks! 
Tao

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to