I am running Hadoop 2.9.1, and I am doing a reduce side join, where I want
to use reduce function that does the local join using SQL, but I am getting
this error (for MySQL).

java.sql.SQLException: No suitable driver found for
jdbc:mysql://localhost:3306/acm_ex

>From line code- Connection connection =
DriverManager.getConnection("jdbc:mysql://localhost:3306/acm_ex", "root",
"root");

*Each computer on the cluster has MySQL installed with the database acm_ex.


I have a Maven project with the SQL dependencies as follows:

   <dependency>

       <groupId>mysql</groupId>

       <artifactId>mysql-connector-java</artifactId>

       <version>5.1.39</version>

       </dependency>

   <dependency>

        <groupId>com.microsoft.sqlserver</groupId>

        <artifactId>mssql-jdbc</artifactId>

        <version>7.0.0.jre8</version>

     </dependency>


I compile and make a jar from the project and try to run it with the
following reduce function:

public void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException

{

  try { Class.forName("com.mysql.jdbc.Driver").newInstance();  }

  catch (Exception e){ System.out.println(e.toString()); }

 try {

Connection connection =
DriverManager.getConnection("jdbc:mysql://localhost:3306/acm_ex", "root",
"root");

Statement statement =
connection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_UPDATABLE);

LOG.info("SQL-  connection: " + connection + " statement: " + statement);

//create 3 tables names

 .

.

.

} //try

 }//reduce


The code for the reduce function works perfectly when I run it locally
(user and password are "root") with Eclipse, but somehow there is a problem
when I run the same code with Hadoop's reduce function.

I have tried to add the jar to the classpath (mysql-connector-java),
although Maven has done it already, and it didn't help.

I am not sure if it is something with permissions to 3306 port for the
reduce container? Or Maven problem? Or even a hostname problem?

Therefore, does anyone know how to solve this particular issue or knows
another way to do a reduce side join with SQL (I am familiar with MySQL,
but I can change if you believe there is a difference)?

*Using Hive or map side join are not an option and doing a naive for loops
works but of course not as fast as SQL.

Reply via email to