Re: Re: Is Hadoop basically not suitable for a photo archive?

Hayati Gonultas Mon, 04 Sep 2017 15:06:08 -0700

I would recommend an object store such as openstack swift as another option.


On Mon, Sep 4, 2017 at 1:09 PM Uwe Geercken <uwe.geerc...@web.de> wrote:

> just my two cents:
>
> Maybe you can use hadoop for storing and to pack multiple files to use
> hdfs in a smarter way and at the same time store a limited amount of
> data/photos - based on time - in parallel in a different solution. I assume
> you won't need high performant access to the whole time span.
>
> Yes it would be a duplication, but maybe - without knowing all the details
> - that would be acceptable and and easy way to go for.
>
> Cheers,
>
> Uwe
>
> *Gesendet:* Montag, 04. September 2017 um 21:32 Uhr
> *Von:* "Alexey Eremihin" <a.eremi...@corp.badoo.com.INVALID>
> *An:* "Ralph Soika" <ralph.so...@imixs.com>
> *Cc:* "user@hadoop.apache.org" <user@hadoop.apache.org>
> *Betreff:* Re: Is Hadoop basically not suitable for a photo archive?
> Hi Ralph,
> In general Hadoop is able to store such data. And even Har archives can be
> used with conjunction with WebHDFS (by passing offset and limit
> attributes). What are your reading requirements? FS meta data are not
> distributed and reading the data is limited by the HDFS NameNode server
> performance. So if you would like to download files with high RPS that
> would not work well.
>
> On Monday, September 4, 2017, Ralph Soika <ralph.so...@imixs.com> wrote:
>>
>> Hi,
>>
>> I know that the issue around the small-file problem was asked frequently,
>> not only in this mailing list.
>> I also have read already some books about Haddoop and I also started to
>> work with Hadoop. But still I did not really understand if Hadoop is the
>> right choice for my goals.
>>
>> To simplify my problem domain I would like to use the use case of a photo
>> archive:
>>
>> - An external application produces about 10 million photos in one year.
>> The files contain important business critical data.
>> - A single photo file has a size between 1 and 10 MB.
>> - The photos need to be stored over several years (10-30 years).
>> - The data store should support replication over several servers.
>> - A checksum-concept is needed to guarantee the data integrity of all
>> files over a long period of time.
>> - To write and read the files a Rest API is preferred.
>>
>> So far Hadoop seems to be absolutely the perfect solution. But my last
>> requirement seems to throw Hadoop out of the race.
>>
>> - The photos need to be readable with very short latency from an external
>> enterprise application
>>
>> With Hadoop HDFS and the Web Proxy everything seems perfect. But it seems
>> that most of the Hadoop experts advise against this usage if the size of my
>> data files (1-10 MB) are well below the Hadoop block size of 64 or 128 MB.
>>
>> I think I understood the concepts of HAR or sequential files.
>> But if I pack, for example, my files together in a large file of many
>> Gigabytes it is impossible to access one single photo from the Hadoop
>> repository in a reasonable time. It makes no sense in my eyes to pack
>> thousands of files into a large file just so that Hadoop jobs can handle it
>> better. To simply access a single file from a web interface - as in my case
>> - it seems to be all counterproductive.
>>
>> So my question is: Is Hadoop only feasible to archive large Web-server
>> log files and not designed to handle big archives of small files with also
>> business critical data?
>>
>>
>> Thanks for your advice in advance.
>>
>> Ralph
>> --
>>
>>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional
> commands, e-mail: user-h...@hadoop.apache.org

-- 
Hayati Gonultas

Re: Re: Is Hadoop basically not suitable for a photo archive?

Reply via email to