A fun one I did to learn some pieces was: -Python and beautiful soup to crawl whatever you want (I did stock forums) - write it to Kafka - then flume to s3 then spark to read in the data and make pretty graphs using seaborn in jupyter/zeppelin
I was trying to score sentiment and graph it vs price You could also just pull in the twitter sample stream and do wordcounts, etc. It really depends what you want to learn ofcourse...just pick something you find interesting and work through the steps. It would be best to pick a goal, then research which tech is relevant for which piece of your pipeline RE: interviews, you probably don't want to be interviewing for big data jobs with no experience with it? If it's a junior role and you're already a solid dev it makes sense. But interviewing or worse yet landing a job and being expected to manage any of these pieces having only played with them in tutorials would be bad for both parties Sent from my iPhone > On Jan 12, 2017, at 7:22 AM, Jeya Vimalan <[email protected]> wrote: > > Dear All, > > Apologies for naive questions. > > I am learning hadoop myself and have sql background 3+ Years. > > 1) I need your suggestions getting focused on the topics > and tutorials to become an expert. > > 2) Any suggestions for working on real time project > > 3) Also, in this regard, would you mind sharing me some important questions > that have been asked on your interviews, that help me to > get prepared. > > I am going through all the weblinks and tutorials, > but your answers may trim me the best. > > Look forward hearing from you. > Thanks in advance. > > Thanks and best regards, > Vimal --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
