What about concurrent access (read / update) to the small file with same key ?
That can get a bit tricky. On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Well it is the same as in normal hdfs, delete file and put a new one with > the same name works. > > Le jeu. 3 sept. 2015 à 21:18, <nib...@free.fr> a écrit : > >> HAR archive seems a good idea , but just a last question to be sure to do >> the best choice : >> - Is it possible to override (remove/replace) a file inside the HAR ? >> Basically the name of my small files will be the keys of my records , and >> sometimes I will need to replace the content of a file by a new content >> (remove/replace) >> >> >> Tks a lot >> Nicolas >> >> ----- Mail original ----- >> De: "Jörn Franke" <jornfra...@gmail.com> >> À: nib...@free.fr >> Cc: user@spark.apache.org >> Envoyé: Jeudi 3 Septembre 2015 19:29:42 >> Objet: Re: Small File to HDFS >> >> >> >> Har is transparent and hardly any performance overhead. You may decide >> not to compress or use a fast compression algorithm, such as snappy >> (recommended) >> >> >> >> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit : >> >> >> My main question in case of HAR usage is , is it possible to use Pig on >> it and what about performances ? >> >> ----- Mail original ----- >> De: "Jörn Franke" < jornfra...@gmail.com > >> À: nib...@free.fr , user@spark.apache.org >> Envoyé: Jeudi 3 Septembre 2015 15:54:42 >> Objet: Re: Small File to HDFS >> >> >> >> >> Store them as hadoop archive (har) >> >> >> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : >> >> >> Hello, >> I'am currently using Spark Streaming to collect small messages (events) , >> size being <50 KB , volume is high (several millions per day) and I have to >> store those messages in HDFS. >> I understood that storing small files can be problematic in HDFS , how >> can I manage it ? >> >> Tks >> Nicolas >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>