Spark SQL -JDBC connectivity
Hi, I would to know the steps to connect SPARK SQL from spring framework (Web-UI). also how to run and deploy the web application?
Unable to Run Spark Streaming Job in Hadoop YARN mode
Hi All, I am unable to run Spark Streaming job in my Hadoop Cluster, its behaving unexpectedly. When i submit a job, it fails by throwing some socket exception in HDFS, if i run the same job second or third time, it runs for sometime and stops. I am confused. Is there any configuration in YARN-Site.xml file specific to spark ??? Please suggest me.
Issues facing while Running Spark Streaming Job in YARN cluster mode
Hi , I am able to run spark streaming job in local mode, when i try to run the same job in my YARN cluster, its throwing errors. Any help is appreciated in this regard Here are my Exception logs: Exception 1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.16.28.192:50010 remote=/172.16.28.193:46147] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:728) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:496) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) Exception 2: 2016-03-22 12:17:47,838 WARN org.apache.hadoop.hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1460) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:773) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-03-22 12:17:47,838 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1458629096860_0001_01_01 transitioned from KILLING to DONE 2016-03-22 12:17:47,841 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1458629096860_0001_01_01 from application application_1458629096860_0001 2016-03-22 12:17:47,842 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1458629096860_0001 2016-03-22 12:17:47,842 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /node1:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException java.nio.channels.ClosedByInterruptException
How to Catch Spark Streaming Twitter Exception ( Written Java)
Dear All, I am facing problem with Spark Twitter Streaming code, When ever twitter4j throws exception, i am unable to catch that exception. Could anyone help me catching that exception. Here is Pseudo Code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("Test"); //SparkConf sparkConf = new SparkConf().setMaster("yarn-client").setAppName("Test"); // SparkTwitterStreaming sss = new SparkTwitterStreaming(); final int batchIntervalInSec = 60; // choose long interval to reduce // num // files created Duration batchInterval = Durations.seconds(batchIntervalInSec); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, batchInterval); try{ JavaReceiverInputDStream receiverStream = null; receiverStream = TwitterUtils.createStream(jssc, a2, params); JavaDStream tweets = receiverStream.map(new Function() { public String call(Status tweet) throws Exception { String json = ""; Gson gson = new Gson(); jssc.start(); jssc.awaitTermination(); } I have no issue with streaming the twitter data, When ever twitter account had expired, i have to catch this exception and do work around for this exception. Here is the exception trace: INFO spark.streaming.receiver.BlockGenerator - Started BlockGenerator INFO spark.streaming.scheduler.ReceiverTracker - Registered receiver for stream 0 from 172.16.28.183:34829 INFO spark.streaming.receiver.ReceiverSupervisorImpl - Starting receiver INFO spark.streaming.twitter.TwitterReceiver - Twitter receiver started INFO spark.streaming.receiver.ReceiverSupervisorImpl - Called receiver onStart INFO spark.streaming.receiver.ReceiverSupervisorImpl - Waiting for receiver to be stopped INFO twitter4j.TwitterStreamImpl - Establishing connection. INFO twitter4j.TwitterStreamImpl - 401:Authentication credentials ( https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401 Problem accessing '/1.1/statuses/filter.json'. Reason: Unauthorized INFO twitter4j.TwitterStreamImpl - Waiting for 1 milliseconds WARN spark.streaming.receiver.ReceiverSupervisorImpl - Restarting receiver with delay 2000 ms: Error receiving tweets 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401 Problem accessing '/1.1/statuses/filter.json'. Reason: Unauthorized Relevant discussions can be found on the Internet at: http://www.google.co.jp/search?q=944a924a or http://www.google.co.jp/search?q=24fd66dc TwitterException{exceptionCode=[944a924a-24fd66dc], statusCode=401, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3} at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:177) at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61) at twitter4j.internal.http.HttpClientWrapper.post(HttpClientWrapper.java:98) at twitter4j.TwitterStreamImpl.getFilterStream(TwitterStreamImpl.java:304) at twitter4j.TwitterStreamImpl$7.getStream(TwitterStreamImpl.java:292) at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:462) Please let me know if you need some more clarity my question. Thanks, Sony.
Terminate Spark job in eclipse
Hi Friends, Anyone can help me about how to terminate the Spark job in eclipse using java code? Thanks Soniya
Spark Twitter streaming
Hallo friends, I need a urgent help. I am using spark streaming to get the tweets from twitter and loading the data into HDFS. I want to find out the tweet source whether it is from web or mobile web or facebook ..etc. could you please help me logic. Thanks Soniya
Re: spark job submisson on yarn-cluster mode failing
Hi, I am facing below error msg now. please help me. 2016-01-21 16:06:14,123 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xxx.xx.xx.xx:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1460) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:773) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Thanks Soniya On Thu, Jan 21, 2016 at 5:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Please also check AppMaster log. > > Thanks > > On Jan 21, 2016, at 3:51 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > > Can you look in the executor logs and see why the sparkcontext is being > shutdown? Similar discussion happened here previously. > http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-td23668.html > > Thanks > Best Regards > > On Thu, Jan 21, 2016 at 5:11 PM, Soni spark <soni2015.sp...@gmail.com> > wrote: > >> Hi Friends, >> >> I spark job is successfully running on local mode but failing on cluster >> mode. Below is the error message i am getting. anyone can help me. >> >> >> >> 16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection. >> 16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver started >> 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Called receiver >> onStart >> 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver >> to be stopped*16/01/21 16:38:10 ERROR yarn.ApplicationMaster: RECEIVED >> SIGNAL 15: SIGTERM* >> 16/01/21 16:38:10 INFO streaming.StreamingContext: Invoking >> stop(stopGracefully=false) from shutdown hook >> 16/01/21 16:38:10 INFO scheduler.ReceiverTracker: Sent stop signal to all 1 >> receivers >> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Received stop signal >> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopping receiver >> with message: Stopped by driver: >> 16/01/21 16:38:10 INFO twitter.TwitterReceiver: Twitter receiver stopped >> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Called receiver >> onStop >> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Deregistering >> receiver 0*16/01/21 16:38:10 ERROR scheduler.ReceiverTracker: Deregistered >> receiver for stream 0: Stopped by driver* >> 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0 >> 16/01/21 16:38:10 INFO receiver.BlockGenerator: Stopping BlockGenerator >> 16/01/21 16:38:10 INFO yarn.ApplicationMaster: Waiting for spark context >> initialization ... >> >> Thanks >> >> Soniya >> >> >
spark job submisson on yarn-cluster mode failing
Hi Friends, I spark job is successfully running on local mode but failing on cluster mode. Below is the error message i am getting. anyone can help me. 16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection. 16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver started 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Called receiver onStart 16/01/21 16:38:07 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver to be stopped*16/01/21 16:38:10 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM* 16/01/21 16:38:10 INFO streaming.StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 16/01/21 16:38:10 INFO scheduler.ReceiverTracker: Sent stop signal to all 1 receivers 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Received stop signal 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopping receiver with message: Stopped by driver: 16/01/21 16:38:10 INFO twitter.TwitterReceiver: Twitter receiver stopped 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Called receiver onStop 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Deregistering receiver 0*16/01/21 16:38:10 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver* 16/01/21 16:38:10 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0 16/01/21 16:38:10 INFO receiver.BlockGenerator: Stopping BlockGenerator 16/01/21 16:38:10 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... Thanks Soniya
Unable to create hive table using HiveContext
Hi friends, I am trying to create hive table through spark with Java code in Eclipse using below code. HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); but i am getting error RROR XBM0J: Directory /home/workspace4/Test/metastore_db already exists. I am not sure why metastore creating in workspace. Please help me. Thanks Soniya
create hive table in Spark with Java code
Hi Friends, I have created a hive external table with partition. I want to alter the hive table partition through spark with java code. alter table table1 add if not exists partition(datetime='2015-12-01') location 'hdfs://localhost:54310/spark/twitter/datetime=2015-12-01/' The above query i am executing it manually. i want to execute it through Spark with Java code. Please help me Thanks Soniya
epoch date time problem to load data into in spark
Hi Friends, I am written a spark streaming program in Java to access twitter tweets and it is working fine. I can able to copy the twitter feeds to HDFS location by batch wise.For each batch, it is creating a folder with epoch time stamp. for example, If i give HDFS location as *hdfs://localhost:54310/twitter/*, the files are creating like below */spark/twitter/-144958080//spark/twitter/-144957984/* I want to create a folder name like -MM-dd-HH format instead of by default epoch format. I want it like below so that i can do hive partitions easily to access the data. */spark/twitter/2015-12-08-01/* Any one can help me. Thank you so much in advance. Thanks Soniya
Spark twitter streaming in Java
Dear Friends, I am struggling with spark twitter streaming. I am not getting any data. Please correct below code if you found any mistakes. import org.apache.spark.*; import org.apache.spark.api.java. function.*; import org.apache.spark.streaming.*; import org.apache.spark.streaming.api.java.*; import org.apache.spark.streaming.twitter.*; import twitter4j.GeoLocation; import twitter4j.Status; import java.util.Arrays; import scala.Tuple2; public class SparkTwitterStreaming { public static void main(String[] args) { final String consumerKey = "XXX"; final String consumerSecret = "XX"; final String accessToken = "XX"; final String accessTokenSecret = "XXX"; SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterStreaming"); JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(6)); System.setProperty("twitter4j.oauth.consumerKey", consumerKey); System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret); System.setProperty("twitter4j.oauth.accessToken", accessToken); System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret); String[] filters = new String[] {"Narendra Modi"}; JavaReceiverInputDStream twitterStream = TwitterUtils.createStream(jssc,filters); // Without filter: Output text of all tweets JavaDStream statuses = twitterStream.map( new Function() { public String call(Status status) { return status.getText(); } } ); statuses.print(); statuses.dstream().saveAsTextFiles("/home/apache/tweets", "txt"); } }