Hi,
you can use National Climatic Data Center (NCDC) data which is good
candidate for Hadoop
Below are steps to download Data.
1. Create one Folder in your Local drive
i created as "*/home/sujit/Desktop/Data/*"
2. Create below script and run
for i in {1901..2012}
do
cd */home/sujit/Desktop/Data/*
wget -r --no-parent --reject "index.html*" http://ftp3.ncdc
.noaa.gov/pub/data/noaa/$i/
done
Kind Regards
Sujit Dhamale
(+91 9970086652)
On Sat, Dec 8, 2012 at 4:05 AM, Mohammad Tariq <[email protected]> wrote:
> Hello Yin,
>
> You may find this interesting :
> https://github.com/unitedstates
>
> Regards,
> Mohammad Tariq
>
>
>
> On Sat, Dec 8, 2012 at 3:25 AM, Chris Nauroth <[email protected]>wrote:
>
>> Another suggestion is Google Books Ngrams:
>>
>> http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
>>
>>
>> On Fri, Dec 7, 2012 at 7:57 AM, Phillip Rhodes <[email protected]
>> > wrote:
>>
>>> On Fri, Dec 7, 2012 at 10:48 AM, Harsh J <[email protected]> wrote:
>>> >
>>> > On Fri, Dec 7, 2012 at 8:31 PM, Yin Steve <[email protected]>
>>> wrote:
>>> >> Hello, I'm Steve who need some raw big data for studying mapreduce
>>> >> programming. Where can i find them? especially those about weblog,
>>> traffic
>>> >> info etc. My English is not so well, if you can give me a URL which
>>> directly
>>> >> help me download the big file, That'll be great.
>>> >> Waiting for your reply......
>>>
>>> Try some of the links off of this Quora thread:
>>>
>>>
>>> http://www.quora.com/Data/Where-can-I-find-large-datasets-for-modeling-confidence-during-the-financial-crisis-which-is-open-to-the-public
>>>
>>> You might also try googling "Enron corpus". Or check out
>>> CommonCrawl.org.
>>>
>>>
>>> Phil
>>>
>>
>>
>