I would recommend an object store such as openstack swift as another option.
On Mon, Sep 4, 2017 at 1:09 PM Uwe Geercken <uwe.geerc...@web.de> wrote: > just my two cents: > > Maybe you can use hadoop for storing and to pack multiple files to use > hdfs in a smarter way and at the same time store a limited amount of > data/photos - based on time - in parallel in a different solution. I assume > you won't need high performant access to the whole time span. > > Yes it would be a duplication, but maybe - without knowing all the details > - that would be acceptable and and easy way to go for. > > Cheers, > > Uwe > > *Gesendet:* Montag, 04. September 2017 um 21:32 Uhr > *Von:* "Alexey Eremihin" <a.eremi...@corp.badoo.com.INVALID> > *An:* "Ralph Soika" <ralph.so...@imixs.com> > *Cc:* "user@hadoop.apache.org" <user@hadoop.apache.org> > *Betreff:* Re: Is Hadoop basically not suitable for a photo archive? > Hi Ralph, > In general Hadoop is able to store such data. And even Har archives can be > used with conjunction with WebHDFS (by passing offset and limit > attributes). What are your reading requirements? FS meta data are not > distributed and reading the data is limited by the HDFS NameNode server > performance. So if you would like to download files with high RPS that > would not work well. > > On Monday, September 4, 2017, Ralph Soika <ralph.so...@imixs.com> wrote: >> >> Hi, >> >> I know that the issue around the small-file problem was asked frequently, >> not only in this mailing list. >> I also have read already some books about Haddoop and I also started to >> work with Hadoop. But still I did not really understand if Hadoop is the >> right choice for my goals. >> >> To simplify my problem domain I would like to use the use case of a photo >> archive: >> >> - An external application produces about 10 million photos in one year. >> The files contain important business critical data. >> - A single photo file has a size between 1 and 10 MB. >> - The photos need to be stored over several years (10-30 years). >> - The data store should support replication over several servers. >> - A checksum-concept is needed to guarantee the data integrity of all >> files over a long period of time. >> - To write and read the files a Rest API is preferred. >> >> So far Hadoop seems to be absolutely the perfect solution. But my last >> requirement seems to throw Hadoop out of the race. >> >> - The photos need to be readable with very short latency from an external >> enterprise application >> >> With Hadoop HDFS and the Web Proxy everything seems perfect. But it seems >> that most of the Hadoop experts advise against this usage if the size of my >> data files (1-10 MB) are well below the Hadoop block size of 64 or 128 MB. >> >> I think I understood the concepts of HAR or sequential files. >> But if I pack, for example, my files together in a large file of many >> Gigabytes it is impossible to access one single photo from the Hadoop >> repository in a reasonable time. It makes no sense in my eyes to pack >> thousands of files into a large file just so that Hadoop jobs can handle it >> better. To simply access a single file from a web interface - as in my case >> - it seems to be all counterproductive. >> >> So my question is: Is Hadoop only feasible to archive large Web-server >> log files and not designed to handle big archives of small files with also >> business critical data? >> >> >> Thanks for your advice in advance. >> >> Ralph >> -- >> >> > --------------------------------------------------------------------- To > unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional > commands, e-mail: user-h...@hadoop.apache.org -- Hayati Gonultas