[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams
[ https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058121#comment-15058121 ] Rodrigo Boavida commented on SPARK-12178: - I plan onto to make my akka direct stream implementation open sourced - but this would be absolutely necessary to have it complete. I heard there is someone working on a flume based implementation of direct stream and I'm sure other streaming engines will follow soon. Is there something I could do to push this forward? I don't mind being the one doing the change. Tnks, Rod > Expose reporting of StreamInputInfo for custom made streams > --- > > Key: SPARK-12178 > URL: https://issues.apache.org/jira/browse/SPARK-12178 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Rodrigo Boavida >Priority: Minor > > For custom made direct streams, the Spark Streaming context needs to be > informed of the RDD count per batch execution. This is not exposed by the > InputDStream abstract class. > The suggestion is to create a method in the InputDStream class that reports > to the streaming context and make that available to child classes of > InputDStream. > Signature example: > def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : > org.apache.spark.streaming.scheduler.StreamInputInfo) > I have already done this on my own private branch. I can merge that change in > if approval is given. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams
[ https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046602#comment-15046602 ] Saisai Shao commented on SPARK-12178: - This is a good idea to make it generic if there's more direct stream other than Kafka. I thought about this when implementing this InputInfoTracker, but at that time there's only one special case (Kafka direct stream). > Expose reporting of StreamInputInfo for custom made streams > --- > > Key: SPARK-12178 > URL: https://issues.apache.org/jira/browse/SPARK-12178 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Rodrigo Boavida >Priority: Minor > > For custom made direct streams, the Spark Streaming context needs to be > informed of the RDD count per batch execution. This is not exposed by the > InputDStream abstract class. > The suggestion is to create a method in the InputDStream class that reports > to the streaming context and make that available to child classes of > InputDStream. > Signature example: > def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : > org.apache.spark.streaming.scheduler.StreamInputInfo) > I have already done this on my own private branch. I can merge that change in > if approval is given. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams
[ https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044846#comment-15044846 ] Rodrigo Boavida commented on SPARK-12178: - For any new implementation of a custom stream. For example, the KafkaDirectInputDStream is a custom stream which has its own compute method with its way of calculating the StreamInputInfo that feeds into the Spark Streaming context the ingestion rate and information differently than the ReceiverInputDStream. I'm currently implementing a similar DStream to the KafkaDirectStream which feeds on Akka to retrieve data from each worker thus the ingestion report needs to be custom made as well. If we don't have this reporting function exposed, the spark streaming page will not be able to show us the events/sec rate. I hope this helps understand the requirement. tnks, Rod > Expose reporting of StreamInputInfo for custom made streams > --- > > Key: SPARK-12178 > URL: https://issues.apache.org/jira/browse/SPARK-12178 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Rodrigo Boavida >Priority: Minor > > For custom made direct streams, the Spark Streaming context needs to be > informed of the RDD count per batch execution. This is not exposed by the > InputDStream abstract class. > The suggestion is to create a method in the InputDStream class that reports > to the streaming context and make that available to child classes of > InputDStream. > Signature example: > def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : > org.apache.spark.streaming.scheduler.StreamInputInfo) > I have already done this on my own private branch. I can merge that change in > if approval is given. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams
[ https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044829#comment-15044829 ] Sean Owen commented on SPARK-12178: --- What is the use case for this? > Expose reporting of StreamInputInfo for custom made streams > --- > > Key: SPARK-12178 > URL: https://issues.apache.org/jira/browse/SPARK-12178 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Rodrigo Boavida >Priority: Minor > > For custom made direct streams, the Spark Streaming context needs to be > informed of the RDD count per batch execution. This is not exposed by the > InputDStream abstract class. > The suggestion is to create a method in the InputDStream class that reports > to the streaming context and make that available to child classes of > InputDStream. > Signature example: > def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : > org.apache.spark.streaming.scheduler.StreamInputInfo) > I have already done this on my own private branch. I can merge that change in > if approval is given. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams
[ https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044850#comment-15044850 ] Rodrigo Boavida commented on SPARK-12178: - This is probably the first step onto the effort of making the concept of Direct Streams generic and reusable for different technologies (not just Kafka). Reactive streams concept is an example of a further step. I'd like to tag here Prakash Chockalingam from Databricks with who I had this conversation about on the latest Spark Summit, but can't find his user name. CCing Iulian as well [~dragos] Tnks, Rod > Expose reporting of StreamInputInfo for custom made streams > --- > > Key: SPARK-12178 > URL: https://issues.apache.org/jira/browse/SPARK-12178 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Rodrigo Boavida >Priority: Minor > > For custom made direct streams, the Spark Streaming context needs to be > informed of the RDD count per batch execution. This is not exposed by the > InputDStream abstract class. > The suggestion is to create a method in the InputDStream class that reports > to the streaming context and make that available to child classes of > InputDStream. > Signature example: > def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : > org.apache.spark.streaming.scheduler.StreamInputInfo) > I have already done this on my own private branch. I can merge that change in > if approval is given. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org