subject:"\[jira\] \[Commented\] \(SPARK\-3146\) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM"

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2016-10-08 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558550#comment-15558550
 ] 

Cody Koeninger commented on SPARK-3146:
---

SPARK-4964 / the direct stream added a messageHandler.


> Improve the flexibility of Spark Streaming Kafka API to offer user the 
> ability to process message before storing into BM
> 
>
> Key: SPARK-3146
> URL: https://issues.apache.org/jira/browse/SPARK-3146
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Saisai Shao
>
> Currently Spark Streaming Kafka API stores the key and value of each message 
> into BM for processing, potentially this may lose the flexibility for 
> different requirements:
> 1. currently topic/partition/offset information for each message is discarded 
> by KafkaInputDStream. In some scenarios people may need this information to 
> better filter the message, like SPARK-2388 described.
> 2. People may need to add timestamp for each message when feeding into Spark 
> Streaming, which can better measure the system latency.
> 3. Checkpointing the partition/offsets or others...
> So here we add a messageHandler in interface to give people the flexibility 
> to preprocess the message before storing into BM. In the meantime time this 
> improvement keep compatible with current API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2016-02-09 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139250#comment-15139250
 ] 

Cody Koeninger commented on SPARK-3146:
---

I think this can be safely closed, given the messageHandler in the direct stream

> Improve the flexibility of Spark Streaming Kafka API to offer user the 
> ability to process message before storing into BM
> 
>
> Key: SPARK-3146
> URL: https://issues.apache.org/jira/browse/SPARK-3146
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Saisai Shao
>
> Currently Spark Streaming Kafka API stores the key and value of each message 
> into BM for processing, potentially this may lose the flexibility for 
> different requirements:
> 1. currently topic/partition/offset information for each message is discarded 
> by KafkaInputDStream. In some scenarios people may need this information to 
> better filter the message, like SPARK-2388 described.
> 2. People may need to add timestamp for each message when feeding into Spark 
> Streaming, which can better measure the system latency.
> 3. Checkpointing the partition/offsets or others...
> So here we add a messageHandler in interface to give people the flexibility 
> to preprocess the message before storing into BM. In the meantime time this 
> improvement keep compatible with current API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-24 Thread Tathagata Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258593#comment-14258593
]

Tathagata Das commented on SPARK-3146:
--

I am a little wary of adding any functionality to a single input stream that we
would like to be added to all input streams in a general way in the future. It
leads to a lot of API mess. However this is also true that I dont see (yet) an
obvious way to add the general interceptor (M = Iterable[T]) pattern to all
receiver input streams. So I am conflicted on this. I am going to create a JIRA
for this feature, on which we as a community can brainstorm different ways to
create the feature.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

Currently Spark Streaming Kafka API stores the key and value of each message
into BM for processing, potentially this may lose the flexibility for
different requirements:
1. currently topic/partition/offset information for each message is discarded
by KafkaInputDStream. In some scenarios people may need this information to
better filter the message, like SPARK-2388 described.
2. People may need to add timestamp for each message when feeding into Spark
Streaming, which can better measure the system latency.
3. Checkpointing the partition/offsets or others...
So here we add a messageHandler in interface to give people the flexibility
to preprocess the message before storing into BM. In the meantime time this
improvement keep compatible with current API.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-24 Thread Tathagata Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258598#comment-14258598
]

Tathagata Das commented on SPARK-3146:
--

[~c...@koeninger.org] [~jerryshao] If either of you wants to take a crack at
the general interceptor pattern for all receivers, here is the JIRA -
SPARK-4960

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-19 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254100#comment-14254100
]

Cody Koeninger commented on SPARK-3146:
---

This is a real problem for production use, not an abstract concern about what
users might or might not do if you gave them full access to power that they
already should have had.

The patch is a working implementation that solves the problem, we've been using
it in production.

I think the patch needs to be improved to take a function MessageAndMetadata =
Iterable[R] instead of the current MessageAndMetadata = R (in other words it
needs to be a flatMap operation instead of the current map).

If for some reason this needs to be added to all receivers, and it needs to be
done via a setter instead of a constructor argument, I don't understand why,
but it will solve the problem, so I'm all for it.

If it really needs to be a function (Any, Any) = Iterable[R], I think that's
worse from a user interface point of view, but again it will solve the problem,
so I'm all for it.

If someone with the power to merge PRs will say which of those options they
would be willing to accept, I'll happily implement it.

What I definitely don't want to see is for this ticket to languish for another
6 months, during which time I keep having to do manual maintenance of a patched
Spark.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-19 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254123#comment-14254123
]

Hari Shreedharan commented on SPARK-3146:
-

For now, I am ok with just adding it to individual DStreams, not necessarily
evverything - though that would be the best case. Unfortunately, that would
break the current interface (unless of course we add an implementation) -
though since each event is received by individual receivers, using this means
the receiver would need to call a method like intercept or something. Either
way, I am also a strong +1 on getting this in soon!

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252113#comment-14252113
]

Hari Shreedharan commented on SPARK-3146:
-

I am a +1 for using something similar to the interceptor pattern. I have worked
quite a bit on Flume to know that it is an incredibly useful feature especially
for filtering and enrichment which can usually be pretty lightweight. If there
is enough demand, I can help with this - perhaps implement it on the lines of
Flume's Interceptor implementation?

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252146#comment-14252146
]

Cody Koeninger commented on SPARK-3146:
---

From my point of view, the interceptor pattern is little more than an overly
complex way of spelling flatMap. What does it actually gain you?

The specific issue here is two-fold:

1. Exposing MessageAndMetadata to users, not just (key, message). This is a
kafka-specific concern, not applicable to all types of receivers. If it had
been originally written as a DStream[MessageAndMetadata], there would be less
of a need for this patch.

2. Controlling timing, and thus checkpointing, based on the contents of the
individual item. I can see this being applicable to many types of receivers,
but not all. For instance, if I sleep inside this function, what happens for
Kafka is not the same as what happens for e.g. Twitter.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252160#comment-14252160
]

Hari Shreedharan commented on SPARK-3146:
-

The difference between interceptor pattern vs flatMap would be latency. Rather
than have the operation run on the Spark cluster every batch, the operation
would happen when the data is received and thus any intercepted messages would
change/be filtered before the batch is generated. Sufficient checks would need
to be present to ensure that all user code does not end up in an intercept call.

I am not exactly sure what (1) buys you, but (2) seems to be easily possible
via the interceptor pattern no?

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252210#comment-14252210
]

Cody Koeninger commented on SPARK-3146:
---

(1) is important because MessageAndMetadata contains more information than
just a key and value, namely topic, partition and offset. There's no reason to
hide that information from users, and exposing it to them lets them fix all
kinds of problems for themselves, namely SPARK-2388, manual offset management,
and things I can't imagine yet.

(2) Regarding interceptor, maybe this is just a semantic difference. The
function you're talking about _is_ flatMap, namely (Container[A], (A) =
Container[B]) : Container[B]

The important thing is as you note, that it runs before storage and offset
checkpointing happens. I'd find it unfortunate if the api was exposed in terms
of an Interceptor interface that users had to implement and then use a setter
for, rather than just simply passing in a function to the constructor... but
ultimately, what matters is that the functionality is exposed.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252256#comment-14252256
]

Hari Shreedharan commented on SPARK-3146:
-

I understand what you mean by flatMap (the reason I don't want to call it
flatMap is that we already have a flatMap which behaves differently in terms of
when and where it is executed).

Interceptors can simply be functions passed in (the reason it is not a function
in Flume is because Flume is all Java - functions could not be passed around
till Java 8), the idea being the same. I think we both agree on the idea, we
can discuss how it is exposed in more detail once we have more consensus on if
it makes sense.

For (1), I see that metadata is something that might be helpful. What if we
pass the metadata to the interceptor method too (see how the BlockGenerator
gets it now?). It is better than rewriting it as a DStream[MessageAndMetadata]
- which may not be as easy to get done.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252299#comment-14252299
]

Cody Koeninger commented on SPARK-3146:
---

Yes, for the specific case of kafka, regardless of the name of the api, the
user supplied function needs to be of type

MessageAndMetadata = Iterable[R]

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252319#comment-14252319
]

Hari Shreedharan commented on SPARK-3146:
-

If you look at the addDataWithCallback method, we pass (data, metadata) to the
BlockGenerator, where data here is K,V and metadata is the (topicAndPartition,
msgAndMetadata.offset). Would passing data, metadata to an intercept method be
useful?

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252397#comment-14252397
]

Cody Koeninger commented on SPARK-3146:
---

((K, V), (topicAndPartition, offset)) is all of the public information in a
MessageAndMetadata object, aside from the decoders, so that's probably ok. If
you're trying to come up with a uniform interface (Any, Any) = Iterable[R]
doesn't seem much better than Any = Iterable[R] though.

Also, just to be clear, the particular example calls you're talking about only
exist in ReliableKafkaReceiver, not KafkaInputDStream, and we need something
that works for both.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-12-18 Thread Saisai Shao (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252742#comment-14252742
]

Saisai Shao commented on SPARK-3146:

Hi all, thanks a lot for your comments. My original purpose of this proposal is
to solve SPARK-2388 and make this solution more general, I'm not sure if other
receivers have such requirements, also is it reasonable to add this interceptor
to some receivers like socket, file ?

We can further improve the flexibility of streaming API by this interceptors,
like filter out some unwanted messages, track the latency by timestamping the
message once received. But the problem is that such flexibility will make user
abusing this API to process the data once received, rather than by transforming
function, this potentially disobey the semantics of Spark Streaming.

I think probably we should redesign the whole thing to make it both flexible
and meaningful.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-11-03 Thread Cody Koeninger (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195146#comment-14195146
]

Cody Koeninger commented on SPARK-3146:
---

I think this PR is an elegant way to solve SPARK-2388, which is an otherwise
blocking bug for our usage of kafka.

Absent a concrete design for doing something equivalent for all InputDStreams,
I'd encourage merging it.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-09-02 Thread Saisai Shao (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119128#comment-14119128
]

Saisai Shao commented on SPARK-3146:

Hi [~tdas],

Sorry for late response, thanks a lot for your reply. My initial thought was to
add flexibility to Spark Streaming Kafka's API, because in some cases people
needs to know the topic of this message or others. Maybe it's worthy to change
to a generic way to all the input dstreams. I will carefully think about the
implementations.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-08-26 Thread Tathagata Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111614#comment-14111614
]

Tathagata Das commented on SPARK-3146:
--

I think this can be further generalized to the interceptor pattern of flume,
that is applicable to all types of receivers. So I feel that this should be
implemented in such a way that generally applies to all input dstreams at the
API level. For example,

InputDStream.setInterceptorFunction(function: (T) = Iterable[T])

Then all receivers should be able to apply a filter function.

[~jerryshao] What do you think?

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2, 1.1.0
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-08-20 Thread Saisai Shao (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103486#comment-14103486
]

Saisai Shao commented on SPARK-3146:

This issue can actually solve the problem mentioned in SPARK-2388, besides
offer a more general way.

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2
Reporter: Saisai Shao

Currently Spark Streaming Kafka API stores the key and value of each message
into BM for processing, potentially this may lose the flexibility for
different requirements:
1. currently topic/partition/offset information for each message is discarded
by KafkaInputDStream. In some scenarios people may need this information to
better filter the message, like SPARK-2388 described.
2. People may need to add timestamp for each message when feeding into Spark
Streaming, which can better measure the system latency.
So here we add a messageHandler in interface to give people the flexibility
to preprocess the message before storing into BM. In the meantime time this
improvement keep compatible with current API.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-08-20 Thread Apache Spark (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103520#comment-14103520
]

Apache Spark commented on SPARK-3146:
-

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/2053

Improve the flexibility of Spark Streaming Kafka API to offer user the
ability to process message before storing into BM

Key: SPARK-3146
URL: https://issues.apache.org/jira/browse/SPARK-3146
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.0.2
Reporter: Saisai Shao

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

20 matches

Site Navigation

Mail list logo

Footer information