[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-06 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/3246


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-06 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-73262238
  
Good for me. Let's close this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-73252849
  
Since the JIRA is a duplicate of 2 others, can I suggest we close this and 
try to converge on SPARK-2788? not suggesting its PR is the one to merge, but 
at least we can centralize the discussion in one place. See also 
https://github.com/apache/spark/pull/1717


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-05 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-73161950
  
@tdas Ah, that's great. Thanks for let me know that! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-05 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-73101267
  
Apologies, I have been really backlogged with review because of the
upcoming Spark 1.3 release. The feature merge window for that has closed,
so we can consider this for Spark 1.4.

Also, there has been multiple attempts to add geolocation filtering to
Spark Streaming. In order decide between the different approaches, we will
have to do a proper design discussion with a design doc.

TD

On Wed, Feb 4, 2015 at 10:21 PM, Liang-Chi Hsieh notificati...@github.com
wrote:

 @tdas https://github.com/tdas Can you have a quick look of this pr?
 Thanks!

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3246#issuecomment-73000140.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-04 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-73000140
  
@tdas Can you have a quick look of this pr? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-31 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-72351744
  
@srowen Does @tdas have time to look at this? If no, maybe others can?






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71946175
  
@tdas Do you have time to look at this to see if it is ready to merge? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-26 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71433530
  
@viirya OK by me but @tdas will need to weigh in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71419585
  
@srowen Is it ok going to merge this feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r23388400
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -85,9 +93,14 @@ class TwitterReceiver(
 }
   })
 
-  val query = new FilterQuery
-  if (filters.size  0) {
-query.track(filters.toArray)
+  if (filters.size  0 || locations.size  0) {
--- End diff --

Trivial but can this be `.nonEmpty` instead of `.size  0` around here? 
slightly more readable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-22 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71042483
  
Hi @srowen, I refactor the codes for the comments. If you have time to 
review it, that would be great. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-22 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71052031
  
Fixed in new commit. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-71050223
  
I think this is a lot better without all the overloads, yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-01-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r23388449
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala
 ---
@@ -53,7 +67,7 @@ object TwitterUtils {
* @param jssc   JavaStreamingContext object
*/
   def createStream(jssc: JavaStreamingContext): 
JavaReceiverInputDStream[Status] = {
-createStream(jssc.ssc, None)
+createStream(jssc.ssc, None) 
--- End diff --

There's a couple spurious changes adding whitespace at the ends of lines. 
Not really worth fixing for its own sake but check the diff to see if you can 
fix it if you make another commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-68342204
  
Anyone would like to review this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22343345
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala
 ---
@@ -25,6 +25,91 @@ import 
org.apache.spark.streaming.api.java.{JavaReceiverInputDStream, JavaDStrea
 import org.apache.spark.streaming.dstream.{ReceiverInputDStream, DStream}
 
 object TwitterUtils {
+
+  // For implicit parameter used to avoid to have same type after erasure
+  case class Ignore(value: String ) {
--- End diff --

This looks like a big hack. Just use different method names if you are 
trying to avoid type conflict after erasure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22343369
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -60,6 +61,7 @@ private[streaming]
 class TwitterReceiver(
 twitterAuth: Authorization,
 filters: Seq[String],
+locations: Seq[Seq[Double]],
--- End diff --

These are supposed to be a bunch of (lat,lon) pairs, right? why not 
`Seq[(Double,Double)]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22343378
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -88,6 +90,11 @@ class TwitterReceiver(
   val query = new FilterQuery
   if (filters.size  0) {
 query.track(filters.toArray)
+  }
+  if (locations.size  0) {
+query.locations(locations.map(_.toArray).toArray)
+  }
+  if (filters.size  0 || locations.size  0) {
--- End diff --

It seems like this could be rewritten to avoid the redundant checks and 
create of `FilterQuery` when there is no filtering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22343430
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala
 ---
@@ -112,20 +269,91 @@ object TwitterUtils {
 ): JavaReceiverInputDStream[Status] = {
 createStream(jssc.ssc, Some(twitterAuth), filters)
   }
-
+ 
+  /**
+   * Create a input stream that returns tweets received from Twitter.
+   * Storage level of the data will be the default 
StorageLevel.MEMORY_AND_DISK_SER_2.
+   * @param jsscJavaStreamingContext object
+   * @param twitterAuth Twitter4J Authorization
+   * @param filters Set of filter strings to get only those tweets 
that match them
+   * @param locations Set of longitude, latitude pairs to get only those 
tweets
+   *that falling within the requested bounding boxes
+   */
+  def createStream(
+  jssc: JavaStreamingContext,
+  twitterAuth: Authorization,
+  filters: Array[String],
+  locations: Array[Array[Double]]
+): JavaReceiverInputDStream[Status] = {
+createStream(jssc.ssc, Some(twitterAuth), filters, 
locations.map(_.toSeq).toSeq)
+  }
+ 
+  /**
+   * Create a input stream that returns tweets received from Twitter.
+   * Storage level of the data will be the default 
StorageLevel.MEMORY_AND_DISK_SER_2.
+   * @param jsscJavaStreamingContext object
+   * @param twitterAuth Twitter4J Authorization
+   * @param locations Set of longitude, latitude pairs to get only those 
tweets
+   *that falling within the requested bounding boxes
+   */
+  def createStream(
+  jssc: JavaStreamingContext,
+  twitterAuth: Authorization,
+  locations: Array[Array[Double]]
+): JavaReceiverInputDStream[Status] = {
+createStream(jssc.ssc, Some(twitterAuth), Nil, 
locations.map(_.toSeq).toSeq)
+  }
+ 
+  /**
+   * Create a input stream that returns tweets received from Twitter.
+   * @param jssc JavaStreamingContext object
+   * @param twitterAuth  Twitter4J Authorization object
+   * @param filters  Set of filter strings to get only those tweets 
that match them
+   * @param storageLevel Storage level to use for storing the received 
objects
+   */
+  def createStream(
+  jssc: JavaStreamingContext,
+  twitterAuth: Authorization,
+  filters: Array[String],
+  storageLevel: StorageLevel
+): JavaReceiverInputDStream[Status] = {
+createStream(jssc.ssc, Some(twitterAuth), filters, Nil, storageLevel)
+  }
+ 
   /**
* Create a input stream that returns tweets received from Twitter.
* @param jssc JavaStreamingContext object
* @param twitterAuth  Twitter4J Authorization object
* @param filters  Set of filter strings to get only those tweets 
that match them
+   * @param locations Set of longitude, latitude pairs to get only those 
tweets
+   *that falling within the requested bounding boxes
* @param storageLevel Storage level to use for storing the received 
objects
*/
   def createStream(
   jssc: JavaStreamingContext,
   twitterAuth: Authorization,
   filters: Array[String],
+  locations: Array[Array[Double]],
+  storageLevel: StorageLevel
+): JavaReceiverInputDStream[Status] = {
+createStream(jssc.ssc, Some(twitterAuth), filters, 
locations.map(_.toSeq).toSeq, storageLevel)
+  }
+ 
+  /**
+   * Create a input stream that returns tweets received from Twitter.
+   * @param jssc JavaStreamingContext object
+   * @param twitterAuth  Twitter4J Authorization object
+   * @param locations Set of longitude, latitude pairs to get only those 
tweets
+   *that falling within the requested bounding boxes
+   * @param storageLevel Storage level to use for storing the received 
objects
+   */
+  def createStream(
--- End diff --

There are *12* new overloads of `createStream` on top of the existing 4. 
This seems like big overkill. There should be one version in Java/Scala that 
takes all arguments, one each that takes minimal arguments, and any others 
needed to retain binary compatibility. The rest seem superfluous.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22343534
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -60,6 +61,7 @@ private[streaming]
 class TwitterReceiver(
 twitterAuth: Authorization,
 filters: Seq[String],
+locations: Seq[Seq[Double]],
--- End diff --

I remember that seems to be for using under both scala and java. So we can 
simply give a lat, lon pair in array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22344218
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -60,6 +61,7 @@ private[streaming]
 class TwitterReceiver(
 twitterAuth: Authorization,
 filters: Seq[String],
+locations: Seq[Seq[Double]],
--- End diff --

There is already a separate set of methods for Java, no? that use `Array` 
and not things like `Seq`. This is private to the Scala-based Spark code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/3246#discussion_r22344303
  
--- Diff: 
external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
@@ -60,6 +61,7 @@ private[streaming]
 class TwitterReceiver(
 twitterAuth: Authorization,
 filters: Seq[String],
+locations: Seq[Seq[Double]],
--- End diff --

There is. I meant that the main problem is the inconsistency between Scala 
and Java apis. One use `Seq[(Double,Double)]` and other uses 
`Array[Array[Double]]`. If it is ok, I would revise it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-11-21 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3246#issuecomment-64071312
  
Any idea about this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org