Spark SQL -JDBC connectivity

2016-08-09 Thread Soni spark
Hi,

I would to know the steps to connect SPARK SQL from spring framework
(Web-UI).
also how to run and deploy the web application?


Unable to Run Spark Streaming Job in Hadoop YARN mode

2016-03-30 Thread Soni spark
Hi All,

I am unable to run Spark Streaming job in my Hadoop Cluster, its behaving
unexpectedly. When i submit a job, it fails by throwing some socket
exception in HDFS, if i run the same job second or third time, it runs for
sometime and stops.

I am confused. Is there any configuration in YARN-Site.xml file specific to
spark ???

Please suggest me.


Issues facing while Running Spark Streaming Job in YARN cluster mode

2016-03-22 Thread Soni spark
Hi ,

I am able to run spark streaming job in local mode, when i try to run the
same job in my YARN cluster, its throwing errors.

Any help is appreciated in this regard

Here are my Exception logs:

Exception 1:

java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/172.16.28.192:50010
remote=/172.16.28.193:46147]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:728)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:496)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:745)


Exception 2:


2016-03-22 12:17:47,838 WARN org.apache.hadoop.hdfs.BlockReaderFactory: I/O
error constructing remote block reader.
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
at
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at
org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1460)
at
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:773)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-03-22 12:17:47,838 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1458629096860_0001_01_01 transitioned from KILLING
to DONE
2016-03-22 12:17:47,841 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1458629096860_0001_01_01 from application
application_1458629096860_0001
2016-03-22 12:17:47,842 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event CONTAINER_STOP for appId application_1458629096860_0001
2016-03-22 12:17:47,842 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /node1:50010 for block, add to deadNodes and continue.
java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException


How to Catch Spark Streaming Twitter Exception ( Written Java)

2016-03-14 Thread Soni spark
Dear All,

I am facing problem with Spark Twitter Streaming code, When ever twitter4j
throws exception, i am unable to catch that exception. Could anyone help me
catching that exception.

Here is Pseudo Code:
SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setAppName("Test");
//SparkConf sparkConf = new
SparkConf().setMaster("yarn-client").setAppName("Test");

// SparkTwitterStreaming sss = new SparkTwitterStreaming();

final int batchIntervalInSec = 60; // choose long interval to
reduce
// num
// files created

Duration batchInterval = Durations.seconds(batchIntervalInSec);
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
batchInterval);

try{
JavaReceiverInputDStream receiverStream = null;
 receiverStream = TwitterUtils.createStream(jssc, a2, params);
JavaDStream tweets = receiverStream.map(new
Function() {


public String call(Status tweet) throws Exception {

String json = "";

Gson gson = new Gson();
 jssc.start();
jssc.awaitTermination();

}



I have no issue with streaming the twitter data, When ever twitter account
had expired, i have to catch this exception and do work around for this
exception.


Here is the exception trace:
INFO  spark.streaming.receiver.BlockGenerator - Started BlockGenerator
INFO  spark.streaming.scheduler.ReceiverTracker - Registered receiver for
stream 0 from 172.16.28.183:34829
INFO  spark.streaming.receiver.ReceiverSupervisorImpl - Starting receiver
INFO  spark.streaming.twitter.TwitterReceiver - Twitter receiver started
INFO  spark.streaming.receiver.ReceiverSupervisorImpl - Called receiver
onStart
INFO  spark.streaming.receiver.ReceiverSupervisorImpl - Waiting for
receiver to be stopped
INFO  twitter4j.TwitterStreamImpl - Establishing connection.
INFO  twitter4j.TwitterStreamImpl - 401:Authentication credentials (
https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that
you have set valid consumer key/secret, access token/secret, and the system
clock is in sync.
\n\n\nError 401 Unauthorized


HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:
Unauthorized



INFO  twitter4j.TwitterStreamImpl - Waiting for 1 milliseconds
WARN  spark.streaming.receiver.ReceiverSupervisorImpl - Restarting receiver
with delay 2000 ms: Error receiving tweets
401:Authentication credentials (https://dev.twitter.com/pages/auth) were
missing or incorrect. Ensure that you have set valid consumer key/secret,
access token/secret, and the system clock is in sync.
\n\n\nError 401 Unauthorized


HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:
Unauthorized



Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=944a924a or
http://www.google.co.jp/search?q=24fd66dc
TwitterException{exceptionCode=[944a924a-24fd66dc], statusCode=401,
message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
at
twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:177)
at
twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61)
at
twitter4j.internal.http.HttpClientWrapper.post(HttpClientWrapper.java:98)
at
twitter4j.TwitterStreamImpl.getFilterStream(TwitterStreamImpl.java:304)
at twitter4j.TwitterStreamImpl$7.getStream(TwitterStreamImpl.java:292)
at
twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:462)


Please let me know if you need some more clarity my question.



Thanks,
Sony.


Terminate Spark job in eclipse

2016-03-14 Thread Soni spark
Hi Friends,

Anyone can help me about how to terminate the Spark job in eclipse using
java code?


Thanks
Soniya


Spark Twitter streaming

2016-03-07 Thread Soni spark
Hallo friends,

I need a urgent help.

I am using spark streaming to get the tweets from twitter and loading the
data into HDFS. I want to find out the tweet source whether it is from web
or mobile web or facebook ..etc.  could you please help me logic.

Thanks
Soniya


Re: spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark
Hi,

I am facing below error msg now. please help me.

2016-01-21 16:06:14,123 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /xxx.xx.xx.xx:50010 for block, add to deadNodes and continue.
java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
at
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at
org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1460)
at
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:773)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Thanks
Soniya

On Thu, Jan 21, 2016 at 5:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please also check AppMaster log.
>
> Thanks
>
> On Jan 21, 2016, at 3:51 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
>
> Can you look in the executor logs and see why the sparkcontext is being
> shutdown? Similar discussion happened here previously.
> http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-td23668.html
>
> Thanks
> Best Regards
>
> On Thu, Jan 21, 2016 at 5:11 PM, Soni spark <soni2015.sp...@gmail.com>
> wrote:
>
>> Hi Friends,
>>
>> I spark job is successfully running on local mode but failing on cluster 
>> mode. Below is the error message i am getting. anyone can help me.
>>
>>
>>
>> 16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection.
>> 16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver started
>> 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Called receiver 
>> onStart
>> 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver 
>> to be stopped*16/01/21 16:38:10 ERROR yarn.ApplicationMaster: RECEIVED 
>> SIGNAL 15: SIGTERM*
>> 16/01/21 16:38:10 INFO streaming.StreamingContext: Invoking 
>> stop(stopGracefully=false) from shutdown hook
>> 16/01/21 16:38:10 INFO scheduler.ReceiverTracker: Sent stop signal to all 1 
>> receivers
>> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Received stop signal
>> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopping receiver 
>> with message: Stopped by driver:
>> 16/01/21 16:38:10 INFO twitter.TwitterReceiver: Twitter receiver stopped
>> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Called receiver 
>> onStop
>> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Deregistering 
>> receiver 0*16/01/21 16:38:10 ERROR scheduler.ReceiverTracker: Deregistered 
>> receiver for stream 0: Stopped by driver*
>> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0
>> 16/01/21 16:38:10 INFO receiver.BlockGenerator: Stopping BlockGenerator
>> 16/01/21 16:38:10 INFO yarn.ApplicationMaster: Waiting for spark context 
>> initialization ...
>>
>> Thanks
>>
>> Soniya
>>
>>
>


spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark
Hi Friends,

I spark job is successfully running on local mode but failing on
cluster mode. Below is the error message i am getting. anyone can help
me.



16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection.
16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver started
16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Called receiver onStart
16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Waiting for
receiver to be stopped*16/01/21 16:38:10 ERROR yarn.ApplicationMaster:
RECEIVED SIGNAL 15: SIGTERM*
16/01/21 16:38:10 INFO streaming.StreamingContext: Invoking
stop(stopGracefully=false) from shutdown hook
16/01/21 16:38:10 INFO scheduler.ReceiverTracker: Sent stop signal to
all 1 receivers
16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Received stop signal
16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopping
receiver with message: Stopped by driver:
16/01/21 16:38:10 INFO twitter.TwitterReceiver: Twitter receiver stopped
16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Called receiver onStop
16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Deregistering
receiver 0*16/01/21 16:38:10 ERROR scheduler.ReceiverTracker:
Deregistered receiver for stream 0: Stopped by driver*
16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0
16/01/21 16:38:10 INFO receiver.BlockGenerator: Stopping BlockGenerator
16/01/21 16:38:10 INFO yarn.ApplicationMaster: Waiting for spark
context initialization ...

Thanks

Soniya


Unable to create hive table using HiveContext

2015-12-23 Thread Soni spark
Hi friends,

I am trying to create hive table through spark with Java code in Eclipse
using below code.

HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());
   sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");


but i am getting error

RROR XBM0J: Directory /home/workspace4/Test/metastore_db already exists.

I am not sure why metastore creating in workspace. Please help me.

Thanks
Soniya


create hive table in Spark with Java code

2015-12-20 Thread Soni spark
Hi Friends,

I have created a hive external table with partition. I want to alter the
hive table  partition through spark with java code.

 alter table table1
 add if not exists
 partition(datetime='2015-12-01')
 location 'hdfs://localhost:54310/spark/twitter/datetime=2015-12-01/'

The above query i am executing it manually. i want to execute it through
Spark with Java code. Please help me


Thanks
Soniya


epoch date time problem to load data into in spark

2015-12-08 Thread Soni spark
Hi Friends,

I am written a spark streaming program in Java to access twitter tweets and
it is working fine. I can able to copy the twitter feeds to HDFS location
by batch wise.For  each batch, it is creating a folder with epoch time
stamp. for example,

 If i give HDFS location as *hdfs://localhost:54310/twitter/*, the files
are creating like below


*/spark/twitter/-144958080//spark/twitter/-144957984/*

I want to create a folder name like -MM-dd-HH format instead of by
default epoch format.

I want it like below so that i can do hive partitions easily to access the
data.

*/spark/twitter/2015-12-08-01/*


Any one can help me. Thank you so much in advance.


Thanks
Soniya


Spark twitter streaming in Java

2015-11-18 Thread Soni spark
Dear Friends,

I am struggling with spark twitter streaming. I am not getting any data.
Please correct below code if you found any mistakes.

import org.apache.spark.*;
import org.apache.spark.api.java.
function.*;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.twitter.*;
import twitter4j.GeoLocation;
import twitter4j.Status;
import java.util.Arrays;
import scala.Tuple2;

public class SparkTwitterStreaming {

public static void main(String[] args) {

final String consumerKey = "XXX";
final String consumerSecret = "XX";
final String accessToken = "XX";
final String accessTokenSecret = "XXX";
SparkConf conf = new
SparkConf().setMaster("local[2]").setAppName("SparkTwitterStreaming");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new
Duration(6));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret",
accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream twitterStream =
TwitterUtils.createStream(jssc,filters);

// Without filter: Output text of all tweets
JavaDStream statuses = twitterStream.map(
new Function() {
public String call(Status status) { return
status.getText(); }
}
);
statuses.print();
statuses.dstream().saveAsTextFiles("/home/apache/tweets", "txt");

  }

}