[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/3246 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73262238 Good for me. Let's close this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73252849 Since the JIRA is a duplicate of 2 others, can I suggest we close this and try to converge on SPARK-2788? not suggesting its PR is the one to merge, but at least we can centralize the discussion in one place. See also https://github.com/apache/spark/pull/1717 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73161950 @tdas Ah, that's great. Thanks for let me know that! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73101267 Apologies, I have been really backlogged with review because of the upcoming Spark 1.3 release. The feature merge window for that has closed, so we can consider this for Spark 1.4. Also, there has been multiple attempts to add geolocation filtering to Spark Streaming. In order decide between the different approaches, we will have to do a proper design discussion with a design doc. TD On Wed, Feb 4, 2015 at 10:21 PM, Liang-Chi Hsieh notificati...@github.com wrote: @tdas https://github.com/tdas Can you have a quick look of this pr? Thanks! â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3246#issuecomment-73000140. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73000140 @tdas Can you have a quick look of this pr? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-72351744 @srowen Does @tdas have time to look at this? If no, maybe others can? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71946175 @tdas Do you have time to look at this to see if it is ready to merge? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71433530 @viirya OK by me but @tdas will need to weigh in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71419585 @srowen Is it ok going to merge this feature? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r23388400 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -85,9 +93,14 @@ class TwitterReceiver( } }) - val query = new FilterQuery - if (filters.size 0) { -query.track(filters.toArray) + if (filters.size 0 || locations.size 0) { --- End diff -- Trivial but can this be `.nonEmpty` instead of `.size 0` around here? slightly more readable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71042483 Hi @srowen, I refactor the codes for the comments. If you have time to review it, that would be great. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71052031 Fixed in new commit. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-71050223 I think this is a lot better without all the overloads, yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r23388449 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala --- @@ -53,7 +67,7 @@ object TwitterUtils { * @param jssc JavaStreamingContext object */ def createStream(jssc: JavaStreamingContext): JavaReceiverInputDStream[Status] = { -createStream(jssc.ssc, None) +createStream(jssc.ssc, None) --- End diff -- There's a couple spurious changes adding whitespace at the ends of lines. Not really worth fixing for its own sake but check the diff to see if you can fix it if you make another commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-68342204 Anyone would like to review this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343345 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala --- @@ -25,6 +25,91 @@ import org.apache.spark.streaming.api.java.{JavaReceiverInputDStream, JavaDStrea import org.apache.spark.streaming.dstream.{ReceiverInputDStream, DStream} object TwitterUtils { + + // For implicit parameter used to avoid to have same type after erasure + case class Ignore(value: String ) { --- End diff -- This looks like a big hack. Just use different method names if you are trying to avoid type conflict after erasure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343369 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], +locations: Seq[Seq[Double]], --- End diff -- These are supposed to be a bunch of (lat,lon) pairs, right? why not `Seq[(Double,Double)]`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343378 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -88,6 +90,11 @@ class TwitterReceiver( val query = new FilterQuery if (filters.size 0) { query.track(filters.toArray) + } + if (locations.size 0) { +query.locations(locations.map(_.toArray).toArray) + } + if (filters.size 0 || locations.size 0) { --- End diff -- It seems like this could be rewritten to avoid the redundant checks and create of `FilterQuery` when there is no filtering. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343430 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala --- @@ -112,20 +269,91 @@ object TwitterUtils { ): JavaReceiverInputDStream[Status] = { createStream(jssc.ssc, Some(twitterAuth), filters) } - + + /** + * Create a input stream that returns tweets received from Twitter. + * Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2. + * @param jsscJavaStreamingContext object + * @param twitterAuth Twitter4J Authorization + * @param filters Set of filter strings to get only those tweets that match them + * @param locations Set of longitude, latitude pairs to get only those tweets + *that falling within the requested bounding boxes + */ + def createStream( + jssc: JavaStreamingContext, + twitterAuth: Authorization, + filters: Array[String], + locations: Array[Array[Double]] +): JavaReceiverInputDStream[Status] = { +createStream(jssc.ssc, Some(twitterAuth), filters, locations.map(_.toSeq).toSeq) + } + + /** + * Create a input stream that returns tweets received from Twitter. + * Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2. + * @param jsscJavaStreamingContext object + * @param twitterAuth Twitter4J Authorization + * @param locations Set of longitude, latitude pairs to get only those tweets + *that falling within the requested bounding boxes + */ + def createStream( + jssc: JavaStreamingContext, + twitterAuth: Authorization, + locations: Array[Array[Double]] +): JavaReceiverInputDStream[Status] = { +createStream(jssc.ssc, Some(twitterAuth), Nil, locations.map(_.toSeq).toSeq) + } + + /** + * Create a input stream that returns tweets received from Twitter. + * @param jssc JavaStreamingContext object + * @param twitterAuth Twitter4J Authorization object + * @param filters Set of filter strings to get only those tweets that match them + * @param storageLevel Storage level to use for storing the received objects + */ + def createStream( + jssc: JavaStreamingContext, + twitterAuth: Authorization, + filters: Array[String], + storageLevel: StorageLevel +): JavaReceiverInputDStream[Status] = { +createStream(jssc.ssc, Some(twitterAuth), filters, Nil, storageLevel) + } + /** * Create a input stream that returns tweets received from Twitter. * @param jssc JavaStreamingContext object * @param twitterAuth Twitter4J Authorization object * @param filters Set of filter strings to get only those tweets that match them + * @param locations Set of longitude, latitude pairs to get only those tweets + *that falling within the requested bounding boxes * @param storageLevel Storage level to use for storing the received objects */ def createStream( jssc: JavaStreamingContext, twitterAuth: Authorization, filters: Array[String], + locations: Array[Array[Double]], + storageLevel: StorageLevel +): JavaReceiverInputDStream[Status] = { +createStream(jssc.ssc, Some(twitterAuth), filters, locations.map(_.toSeq).toSeq, storageLevel) + } + + /** + * Create a input stream that returns tweets received from Twitter. + * @param jssc JavaStreamingContext object + * @param twitterAuth Twitter4J Authorization object + * @param locations Set of longitude, latitude pairs to get only those tweets + *that falling within the requested bounding boxes + * @param storageLevel Storage level to use for storing the received objects + */ + def createStream( --- End diff -- There are *12* new overloads of `createStream` on top of the existing 4. This seems like big overkill. There should be one version in Java/Scala that takes all arguments, one each that takes minimal arguments, and any others needed to retain binary compatibility. The rest seem superfluous. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343534 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], +locations: Seq[Seq[Double]], --- End diff -- I remember that seems to be for using under both scala and java. So we can simply give a lat, lon pair in array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22344218 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], +locations: Seq[Seq[Double]], --- End diff -- There is already a separate set of methods for Java, no? that use `Array` and not things like `Seq`. This is private to the Scala-based Spark code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22344303 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], +locations: Seq[Seq[Double]], --- End diff -- There is. I meant that the main problem is the inconsistency between Scala and Java apis. One use `Seq[(Double,Double)]` and other uses `Array[Array[Double]]`. If it is ok, I would revise it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-64071312 Any idea about this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org