Hi,
I try to use Spring Data Hadoop with CDH4 to write a Map Reduce Job.
On startup, I get the following exception:
Exception in thread "SimpleAsyncTaskExecutor-1"
java.lang.ExceptionInInitializerError
at
org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at
org.springframework.util.ReflectionUtils.makeAccessible(ReflectionUtils.java:405)
at
org.springframework.data.hadoop.mapreduce.JobUtils.<clinit>(JobUtils.java:123)
... 2 more
I guess there is a problem with my Hadoop related dependencies. I couldn't find
any reference showing how to configure Spring Data together with CDH4. But
Costin showed, he is able to configure it:
https://build.springsource.org/browse/SPRINGDATAHADOOP-CDH4-JOB1
**Maven Setup**
<properties>
<spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version>
<hadoop.version>2.0.0-cdh4.1.3</hadoop.version>
</properties>
<dependencies>
...
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>${spring.hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-streaming</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-test</artifactId>
<version>2.0.0-mr1-cdh4.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-tools</artifactId>
<version>2.0.0-mr1-cdh4.1.3</version>
</dependency>
...
</dependencies>
...
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>spring-snapshot</id>
<name>Spring Maven SNAPSHOT Repository</name>
<url>http://repo.springframework.org/snapshot</url>
</repository>
</repositories>
**Application Context**
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hadoop="http://www.springframework.org/schema/hadoop"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop
http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-3.1.xsd">
<context:property-placeholder location="classpath:hadoop.properties" />
<hdp:configuration id="hadoopConfiguration">
fs.defaultFS=${hd.fs}
</hdp:configuration>
<hdp:job id="wordCountJob" input-path="${input.path}"
output-path="${output.path}" mapper="com.example.WordMapper"
reducer="com.example.WordReducer" />
<hdp:job-runner job-ref="wordCountJob" run-at-startup="true"
wait-for-completion="true"/>
</beans>
**Cluster version**
Hadoop 2.0.0-cdh4.1.3
**Note:**
This small Unittest is running fine with the current configuration:
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = { "classpath:/applicationContext.xml" })
public class Starter {
@Autowired
private Configuration configuration;
@Test
public void shellOps() {
Assert.assertNotNull(this.configuration);
FsShell fsShell = new FsShell(this.configuration);
final Collection<FileStatus> coll = fsShell.ls("/user");
System.out.println(coll);
}
}
It would be nice if someone can give me an example configuration.
Best Regards,
Christian.