Re: Issue with Spark Twitter Streaming
hey, did you manage to solve the problem? I have exactly the same problem and I am not able to solve it. Thank you! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark-twitter Streaming
Hello everyone, This is in reference to spark-twitter streaming. val stream = TwitterUtils.createStream(ssc, None) can anybody tell me why this dstream is no a proper JSON object as I am not able to parse it further. spark-version = 2.0.1 spark-api =scala streaming jar = org.apache.bahir Regards, Deependra
Re: Spark Twitter streaming
Do you mean listening to the twitter stream data? Maybe you can use the Twitter Stream API or Twitter Search API for this purpose. Imre On Tue, Mar 8, 2016 at 2:54 PM, Soni sparkwrote: > Hallo friends, > > I need a urgent help. > > I am using spark streaming to get the tweets from twitter and loading the > data into HDFS. I want to find out the tweet source whether it is from web > or mobile web or facebook ..etc. could you please help me logic. > > Thanks > Soniya >
Spark Twitter streaming
Hallo friends, I need a urgent help. I am using spark streaming to get the tweets from twitter and loading the data into HDFS. I want to find out the tweet source whether it is from web or mobile web or facebook ..etc. could you please help me logic. Thanks Soniya
Re: Spark twitter streaming in Java
Hi Soni, I think you need to start the JavaStreamingContext. Add something like this at the end of your program : jssc.start(); jssc.awaitTermination(6); jssc.stop(); - Yogesh On Thu, Nov 19, 2015 at 12:34 PM, Soni spark <soni2015.sp...@gmail.com> wrote: > Dear Friends, > > I am struggling with spark twitter streaming. I am not getting any data. > Please correct below code if you found any mistakes. > > import org.apache.spark.*; > import org.apache.spark.api.java. > function.*; > import org.apache.spark.streaming.*; > import org.apache.spark.streaming.api.java.*; > import org.apache.spark.streaming.twitter.*; > import twitter4j.GeoLocation; > import twitter4j.Status; > import java.util.Arrays; > import scala.Tuple2; > > public class SparkTwitterStreaming { > > public static void main(String[] args) { > > final String consumerKey = "XXX"; > final String consumerSecret = "XX"; > final String accessToken = "XX"; > final String accessTokenSecret = "XXX"; > SparkConf conf = new > SparkConf().setMaster("local[2]").setAppName("SparkTwitterStreaming"); > JavaStreamingContext jssc = new JavaStreamingContext(conf, new > Duration(6)); > System.setProperty("twitter4j.oauth.consumerKey", consumerKey); > System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret); > System.setProperty("twitter4j.oauth.accessToken", accessToken); > System.setProperty("twitter4j.oauth.accessTokenSecret", > accessTokenSecret); > String[] filters = new String[] {"Narendra Modi"}; > JavaReceiverInputDStream twitterStream = > TwitterUtils.createStream(jssc,filters); > > // Without filter: Output text of all tweets > JavaDStream statuses = twitterStream.map( > new Function<Status, String>() { > public String call(Status status) { return > status.getText(); } > } > ); > statuses.print(); > statuses.dstream().saveAsTextFiles("/home/apache/tweets", "txt"); > > } > > } > >
Spark twitter streaming in Java
Dear Friends, I am struggling with spark twitter streaming. I am not getting any data. Please correct below code if you found any mistakes. import org.apache.spark.*; import org.apache.spark.api.java. function.*; import org.apache.spark.streaming.*; import org.apache.spark.streaming.api.java.*; import org.apache.spark.streaming.twitter.*; import twitter4j.GeoLocation; import twitter4j.Status; import java.util.Arrays; import scala.Tuple2; public class SparkTwitterStreaming { public static void main(String[] args) { final String consumerKey = "XXX"; final String consumerSecret = "XX"; final String accessToken = "XX"; final String accessTokenSecret = "XXX"; SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterStreaming"); JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(6)); System.setProperty("twitter4j.oauth.consumerKey", consumerKey); System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret); System.setProperty("twitter4j.oauth.accessToken", accessToken); System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret); String[] filters = new String[] {"Narendra Modi"}; JavaReceiverInputDStream twitterStream = TwitterUtils.createStream(jssc,filters); // Without filter: Output text of all tweets JavaDStream statuses = twitterStream.map( new Function<Status, String>() { public String call(Status status) { return status.getText(); } } ); statuses.print(); statuses.dstream().saveAsTextFiles("/home/apache/tweets", "txt"); } }
Issue with Spark Twitter Streaming
All, We are using Spark Streaming to receive data from twitter stream. This is running behind proxy. We have done the following configurations inside spark steaming for twitter4j to work behind proxy. def main(args: Array[String]) { val filters = Array(Modi) System.setProperty(twitter4j.oauth.consumerKey, *) System.setProperty(twitter4j.oauth.consumerSecret, *) System.setProperty(twitter4j.oauth.accessToken, *) System.setProperty(twitter4j.oauth.accessTokenSecret, *) System.setProperty(twitter4j.http.proxyHost, X.X.X.X); System.setProperty(twitter4j.http.proxyPort, ); System.setProperty(twitter4j.http.useSSL, true); val conf = new SparkConf().setAppName(TwitterPopularTags) val ssc = new StreamingContext(conf, Seconds(60)) val stream = TwitterUtils.createStream(ssc, None, filters) stream.print() ssc.start() ssc.awaitTermination() } spark-streaming-twitter_2.10-1.1.0 twitter4j-core-3.0.3.jar twitter4j-stream-3.0.3.jar When the spark job is run with local[2], running on a single node and not on cluster, with the same settings above it is able to pull the data and it works like charm behind proxy. The same code when run on a cluster (below) on the same network with the above settings it is throwing the below error. Not sure what is going wrong. Any help is appreciated. We checked that environment variables of executors, all the above system properties are set. bin/spark-submit --class SparkTwitter2Kafka --master spark://IPADDRESS:7077 spark-twitter.jar 14/10/13 14:00:10 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error receiving tweets - connect timed out Relevant discussions can be found on the Internet at: http://www.google.co.jp/search?q=944a924a or http://www.google.co.jp/search?q=24fd66dc TwitterException{exceptionCode=[944a924a-24fd66dc 944a924a-24fd66b2], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.5} at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:177) at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61) at twitter4j.internal.http.HttpClientWrapper.post(HttpClientWrapper.java:98) at twitter4j.TwitterStreamImpl.getFilterStream(TwitterStreamImpl.java:304) at twitter4j.TwitterStreamImpl$7.getStream(TwitterStreamImpl.java:292) at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:462) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.protocol.https.HttpsClient.init(HttpsClient.java:275) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250) at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:135) ... 5 more The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.