I would recommend using Hadoop only if you are ingesting a lot of data and you need reasonable performance at scale. I would recommend starting with using <insert language/tool of choice> to ingest and transform data until that process starts taking too long.
For example, one of our researchers at the University of Michigan had to process ~150GB of data. Using python, processing that data took about 45 minutes - it was not worth it to spend extra development time to run it on Hadoop. This time will change depending on what you need to do and the hardware available, naturally. So until you need to frequently process large amounts of data, I'd stick with something you're already familiar with. Alec Ten Harmsel On 09/04/2014 03:30 AM, Henrik Aagaard Jørgensen wrote: > > Dear all, > > > > I’m very new to Hadoop as I’m still trying to grasp its value and > purpose. I do hope my question on this mailing list is OK. > > > > I manage our open data platform at our municipality, using CKAN.org. > It works very well for its purpose of showing data and adding API’s to > data. > > > > However, I’m very interested in knowing more about Hadoop and if it > would fit into a (open) data platform, as we are getting more and more > data to show and to work with internally at our municipality. > > > > However, I cannot figure out if it’s the right purpose to use Hadoop > for, if it is “overkill” or… > > > > Could someone elaborate on such topic? > > > > I’ve Googled around a lot and looked at various videos online and > Hadoop seems to have it place, also in an open data platform environment. > > > > Best regards, > > Henrik >
