[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams

2015-12-15 Thread Rodrigo Boavida (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058121#comment-15058121
 ] 

Rodrigo Boavida commented on SPARK-12178:
-

I plan onto to make my akka direct stream implementation open sourced - but 
this would be absolutely necessary to have it complete. 
I heard there is someone working on a flume based implementation of direct 
stream and I'm sure other streaming engines will follow soon.
Is there something I could do to push this forward? I don't mind being the one 
doing the change.

Tnks,
Rod



> Expose reporting of StreamInputInfo for custom made streams
> ---
>
> Key: SPARK-12178
> URL: https://issues.apache.org/jira/browse/SPARK-12178
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Rodrigo Boavida
>Priority: Minor
>
> For custom made direct streams, the Spark Streaming context needs to be 
> informed of the RDD count per batch execution. This is not exposed by the 
> InputDStream abstract class. 
> The suggestion is to create a method in the InputDStream class that reports 
> to the streaming context and make that available to child classes of 
> InputDStream.
> Signature example:
> def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : 
> org.apache.spark.streaming.scheduler.StreamInputInfo)
> I have already done this on my own private branch. I can merge that change in 
> if approval is given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams

2015-12-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046602#comment-15046602
 ] 

Saisai Shao commented on SPARK-12178:
-

This is a good idea to make it generic if there's more direct stream other than 
Kafka. I thought about this when implementing this InputInfoTracker, but at 
that time there's only one special case (Kafka direct stream).

> Expose reporting of StreamInputInfo for custom made streams
> ---
>
> Key: SPARK-12178
> URL: https://issues.apache.org/jira/browse/SPARK-12178
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Rodrigo Boavida
>Priority: Minor
>
> For custom made direct streams, the Spark Streaming context needs to be 
> informed of the RDD count per batch execution. This is not exposed by the 
> InputDStream abstract class. 
> The suggestion is to create a method in the InputDStream class that reports 
> to the streaming context and make that available to child classes of 
> InputDStream.
> Signature example:
> def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : 
> org.apache.spark.streaming.scheduler.StreamInputInfo)
> I have already done this on my own private branch. I can merge that change in 
> if approval is given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams

2015-12-07 Thread Rodrigo Boavida (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044846#comment-15044846
 ] 

Rodrigo Boavida commented on SPARK-12178:
-

For any new implementation of a custom stream. 
For example, the KafkaDirectInputDStream is a custom stream which has its own 
compute method with its way of calculating the StreamInputInfo that feeds into 
the Spark Streaming context the ingestion rate and information differently than 
the ReceiverInputDStream.

I'm currently implementing a similar DStream to the KafkaDirectStream which 
feeds on Akka to retrieve data from each worker thus the ingestion report needs 
to be custom made as well. 

If we don't have this reporting function exposed, the spark streaming page will 
not be able to show us the events/sec rate.

I hope this helps understand the requirement.

tnks,
Rod

> Expose reporting of StreamInputInfo for custom made streams
> ---
>
> Key: SPARK-12178
> URL: https://issues.apache.org/jira/browse/SPARK-12178
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Rodrigo Boavida
>Priority: Minor
>
> For custom made direct streams, the Spark Streaming context needs to be 
> informed of the RDD count per batch execution. This is not exposed by the 
> InputDStream abstract class. 
> The suggestion is to create a method in the InputDStream class that reports 
> to the streaming context and make that available to child classes of 
> InputDStream.
> Signature example:
> def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : 
> org.apache.spark.streaming.scheduler.StreamInputInfo)
> I have already done this on my own private branch. I can merge that change in 
> if approval is given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams

2015-12-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044829#comment-15044829
 ] 

Sean Owen commented on SPARK-12178:
---

What is the use case for this?

> Expose reporting of StreamInputInfo for custom made streams
> ---
>
> Key: SPARK-12178
> URL: https://issues.apache.org/jira/browse/SPARK-12178
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Rodrigo Boavida
>Priority: Minor
>
> For custom made direct streams, the Spark Streaming context needs to be 
> informed of the RDD count per batch execution. This is not exposed by the 
> InputDStream abstract class. 
> The suggestion is to create a method in the InputDStream class that reports 
> to the streaming context and make that available to child classes of 
> InputDStream.
> Signature example:
> def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : 
> org.apache.spark.streaming.scheduler.StreamInputInfo)
> I have already done this on my own private branch. I can merge that change in 
> if approval is given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12178) Expose reporting of StreamInputInfo for custom made streams

2015-12-07 Thread Rodrigo Boavida (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044850#comment-15044850
 ] 

Rodrigo Boavida commented on SPARK-12178:
-

This is probably the first step onto the effort of making the concept of Direct 
Streams generic and reusable for different technologies (not just Kafka). 
Reactive streams concept is an example of a further step.

I'd like to tag here Prakash Chockalingam from Databricks with who I had this 
conversation about on the latest Spark Summit, but can't find his user name.
CCing Iulian as well
[~dragos]

Tnks,
Rod

> Expose reporting of StreamInputInfo for custom made streams
> ---
>
> Key: SPARK-12178
> URL: https://issues.apache.org/jira/browse/SPARK-12178
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Rodrigo Boavida
>Priority: Minor
>
> For custom made direct streams, the Spark Streaming context needs to be 
> informed of the RDD count per batch execution. This is not exposed by the 
> InputDStream abstract class. 
> The suggestion is to create a method in the InputDStream class that reports 
> to the streaming context and make that available to child classes of 
> InputDStream.
> Signature example:
> def reportInfo(validTime : org.apache.spark.streaming.Time, inputInfo : 
> org.apache.spark.streaming.scheduler.StreamInputInfo)
> I have already done this on my own private branch. I can merge that change in 
> if approval is given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org