Spark Streaming + Kafka + Hive: delayed

2017-09-20 Thread toletum
Hello.

I have a process (python) that reads a kafka queue, for each record it checks 
in a table.

# Load table in memory
table=sqlContext.sql("select id from table")
table.cache()

kafkaTopic.foreachRDD(processForeach)

def processForeach (time, rdd):
 print(time)
 for k in rdd.collect ():
 if (table.filter("id =' %s'" % k["id"]).count()>0):
 print (k)

The problem is that little by little spark time is lagging behind, I can see it 
in the "print(time)" output. the kafka topic with a maximum of 3 messages per 
second.


Re[4]: Trying to connect Spark 1.6 to Hive

2017-08-09 Thread toletum
Yes... I know... but
The cluster is not administered by me
On Mié., Ago. 9, 2017 at 13:46, Gourav Sengupta  wrote: 
Hi,

Just out of sheer curiosity - why are you using SPARK 1.6? Since then SPARK has 
made significant advancement and improvement, why not take advantage of  that?
Regards,
Gourav
On Wed, Aug 9, 2017 at 10:41 AM, toletum  wrote:

Thanks Matteo
I fixed it
Regards,
JCS
On Mié., Ago. 9, 2017 at 11:22, Matteo Cossu  wrote:

 Hello,
try to use these options when starting Spark:
--conf "spark.driver.userClassPathFirst=true" --conf 
"spark.executor.userClassPathFirst=true"  

In this way you will be sure that the executor and the driver of Spark will use 
the classpath you define.

Best Regards,
Matteo Cossu
On 5 August 2017 at 23:04, toletum  wrote:
Hi everybody
I'm trying to connect Spark to Hive. 
Hive uses Derby Server for metastore_db. 
$SPARK_HOME/conf/hive-site.xml
  javax.jdo.option.ConnectionURL
  jdbc:derby://derby:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore
  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore
I have copied to $SPARK_HOME/lib derby.jar, derbyclient.jar, derbytools.jar
Added to CLASSPATH the 3 jars too
$SPARK_HOMElib/derby.jar:$SPARK_HOME/lib/derbytools.jar:$SPARK_HOME/lib/derbyclient.jar
But spark-sql saids:
org.datanucleus.store.rdbms.co 
(http://org.datanucleus.store.rdbms.co)nnectionpool.DatastoreDriverNotFoundException:
 The specified datastore driver ("org.apache.derby.jdbc.ClientDriver") was not 
found in the CLASSPATH. Please check your CLASSPATH specification, and the name 
of the driver.
java finds the class
java org.apache.derby.jdbc.ClientDriver
Error: Main method not found in class org.apache.derby.jdbc.ClientDriver, 
please define the main method as:
   public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
It seems Spark can't find the driver


Re[2]: Trying to connect Spark 1.6 to Hive

2017-08-09 Thread toletum
Thanks Matteo
I fixed it
Regards,
JCS
On Mié., Ago. 9, 2017 at 11:22, Matteo Cossu  wrote: Hello,
try to use these options when starting Spark:
--conf "spark.driver.userClassPathFirst=true" --conf 
"spark.executor.userClassPathFirst=true"  

In this way you will be sure that the executor and the driver of Spark will use 
the classpath you define.

Best Regards,
Matteo Cossu
On 5 August 2017 at 23:04, toletum  wrote:
Hi everybody
I'm trying to connect Spark to Hive. 
Hive uses Derby Server for metastore_db. 
$SPARK_HOME/conf/hive-site.xml
  javax.jdo.option.ConnectionURL
  jdbc:derby://derby:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore
  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore
I have copied to $SPARK_HOME/lib derby.jar, derbyclient.jar, derbytools.jar
Added to CLASSPATH the 3 jars too
$SPARK_HOMElib/derby.jar:$SPARK_HOME/lib/derbytools.jar:$SPARK_HOME/lib/derbyclient.jar
But spark-sql saids:
org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: 
The specified datastore driver ("org.apache.derby.jdbc.ClientDriver") was not 
found in the CLASSPATH. Please check your CLASSPATH specification, and the name 
of the driver.
java finds the class
java org.apache.derby.jdbc.ClientDriver
Error: Main method not found in class org.apache.derby.jdbc.ClientDriver, 
please define the main method as:
   public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
It seems Spark can't find the driver


Trying to connect Spark 1.6 to Hive

2017-08-05 Thread toletum
Hi everybody
I'm trying to connect Spark to Hive. 
Hive uses Derby Server for metastore_db. 
$SPARK_HOME/conf/hive-site.xml
  javax.jdo.option.ConnectionURL
  jdbc:derby://derby:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore
  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore
I have copied to $SPARK_HOME/lib derby.jar, derbyclient.jar, derbytools.jar
Added to CLASSPATH the 3 jars too
$SPARK_HOMElib/derby.jar:$SPARK_HOME/lib/derbytools.jar:$SPARK_HOME/lib/derbyclient.jar
But spark-sql saids:
org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: 
The specified datastore driver ("org.apache.derby.jdbc.ClientDriver") was not 
found in the CLASSPATH. Please check your CLASSPATH specification, and the name 
of the driver.
java finds the class
java org.apache.derby.jdbc.ClientDriver
Error: Main method not found in class org.apache.derby.jdbc.ClientDriver, 
please define the main method as:
   public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
It seems Spark can't find the driver