RE: Real time data processing on Hadoop WITHOUT Java

Huang, Roger Thu, 17 Jul 2014 15:08:57 -0700

Hi Saurabh,
In general Hadoop is not a good fit for real-time processing; it's better for 
batch processing.  
If you don't want to use Java, Hadoop Streaming allows you to write M/R jobs in 
any language that can read from standard input and write to standard output.
http://wiki.apache.org/hadoop/HadoopStreaming


For real-time analytics use Storm.  You can implement Storm bolts and spouts in 
a non-JVM language as per the multilang protocol.   
http://storm.incubator.apache.org/documentation/Using-non-JVM-languages-with-Storm.html

For low latency "SQL" / Hive Query Language, look at Impala or SparkSQL.

-R
-----Original Message-----
From: Db-Blog [mailto:[email protected]] 
Sent: Thursday, July 17, 2014 4:03 PM
To: [email protected]
Subject: Real time data processing on Hadoop WITHOUT Java

Hello Experts, 

I am new to real time processing of hadoop and storm too. I checked the 
implementation details provided over storm documentation however it seems to be 
all java coding. I'm a database guy and didn't work on java stuffs earlier. 

I have worked on Hive/Impala/Pig related things for batch processing and oozie 
for orchestration. I have a shell scripting exp for automation related stuffs 
and some beginner level tasks on NoSql (Hbase). 

It will be really helpful if you can guide me regd
- Any alternative tool which can be used for real time data processing on 
hadoop based on my technical exp
- where to start learning Java essentials for hadoop
- is there any SQL dialect available for real time data processing on Big data 

Thanks in advance for your time on this. 

Regards,
Saurabh

Sent from my iPhone, please avoid typos.

RE: Real time data processing on Hadoop WITHOUT Java

Reply via email to