I've a simple program which takes some inputs from a command line (Socket
stream) and then aggregates based on the key.
When running this program on my local machine I see some output that is
counter intuitive to my understanding of windows in Flink.
The start time of the Window is around the
My algorithm is roughly like this taking top-K words problem as an example
(the purpose of computing local “word count” is to deal with data
imbalance):
DataStream of words ->
timeWindow of 1h ->
converted to DataSet of words ->
random partitioning by rebalance ->
local “word count” using
Hi,
I'm working on a project which uses Flink to compute hourly log statistics
like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and
packed
into a DataStream.
The problem is, I find the computation quite challenging to express with
Flink's DataStream API:
1. If I use something
Hi all,
I have an incoming stream of event objects, each with its session ID. I am
writing a task that aggregate the events by session. The general logics
looks like
case class Event(sessionId:Int, data:String)case class Session(id:Int,
var events:List[Event])
val events = ... //some source
On Fri, Jun 3, 2016 at 6:47 AM, Kostas Kloudas
wrote:
> To see a relatively more complex example of such a trigger and how to
> implement it,
> you can have a look at this implementation:
>
On Sun, Jun 5, 2016 at 3:16 PM, Stephan Ewen wrote:
> You raised a good point. Fortunately, there should be a simply way to fix
> this.
>
> The Kafka Sunk Function should implement the "Checkpointed" interface. It
> will get a call to the "snapshotState()" method whenever a
The question is how to encapsulate numerous transformations into one object or
may be a function in Apache Flink Java setting. I have tried to investigate
this question using an example of Pi calculation (see below). I am wondering
whether or not the suggested approach is valid from the Flink's
Hi Josh,
I have not yet :-( . I am working on getting a REST service setup on AWS that
can do it rather than using Flink client remotely.
This way the AKKA communication is within AWS.
However, I still need the solution for running some of the integration/system
tests.
~ Abhi
From: Josh
open() not being called is a valuable observation, but from then on out
i have problems following you.
within hasNext() we first check whether exhausted is true, and if so
return false. Since it is initialized with false we will not return here.
this seems like correct behaviour.
I have one
Hi,
I tried streaming the source data 2 ways
1. Is a simple straight forward way of sending data without using the
serving speed concept
http://pastebin.com/cTv0Pk5U
2. The one where I use the TaxiRide source which is exactly similar except
loading the data in the proper data structures.
I just did that:
public T nextRecord(final T reuse) throws IOException {
if (this.rs == null){
// throw new IOException("No table result scanner provided!");
return null;
}
...
because in the class FileSourceFunction we have:
@Override public void run(SourceContext ctx) throws Exception {
Hey Iñaki,
you can use the KeySelector as described here:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-keys
But you only a local view for the current element, e.g. the library
you use to determine the similarity has to know the similarities
Unless someone really invests time into debugging this I fear that the
different misspellings are not really helpful, Flavio.
On Mon, Jun 6, 2016 at 10:31 AM, Flavio Pompermaier
wrote:
> This time I had the following exception (obviously
>
>From the code it looks like the open method of the TableInputFormat is
never called. What are you doing differently in the
StreamingTableInputFormat?
– Ufuk
On Mon, Jun 6, 2016 at 1:49 PM, Christophe Salperwyck
wrote:
> Hi all,
>
> I am trying to read data
Hi,
the error is logged is coming from the Kafka code we are referring to. The
Kafka user@ list is probably a better place to seek for help on this one.
I've searched for the error message at Google. It seems that the message
does not immediately indicate an error. Also note, it is logged at
Thanks for clarification. I think it might be related to the YARN
properties file, which is still being used for the batch jobs. Can you
try to delete it between submissions as a temporary workaround to
check whether it's related?
– Ufuk
On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud
Hi,
The zookeeper path is only for my persistent container, and I do use a
different one for all my persistent containers.
The -Drecovery.mode=standalone was passed inside theJVM_ARGS ("${JVM_ARGS}
-Drecovery.mode=standalone -Dyarn.properties-file.location=/tmp/flink/batch")
I've tried
Hi Flink community,
In worker's log of Flink I saw the following info appears 30 times or more
every 10 minutes approximately.
`org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking
the coordinator 2147483637 dead. `
Would this indicate Kafka-Flink consumer group is
Hi,
I use Flink 1.0.0. I have a persistent yarn container set (a persistent flink
job manager) that I use for streaming jobs ; and I use the “yarn-cluster” mode
to launch my batches.
I’ve just switched “HA” mode on for my streaming persistent job manager and it
seems to works ; however my
Hi all,
I am trying to read data from HBase and use the windows functions of Flink
streaming. I can read my data using the ExecutionEnvironment but not from
the StreamExecutionEnvironment.
Is that a known issue?
Are the inputsplits used in the streaming environment?
Here a sample of my code:
Hi,
I am using a Data Source Generator, which very closely resembles the one
here on the dataartisans github page
https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java
This
the JDBC IF does not and never has used the configuration.
On 06.06.2016 09:27, Aljoscha Krettek wrote:
The problem could be that open() is not called with a proper
Configuration object in streaming mode.
On Sun, 5 Jun 2016 at 19:33 Stephan Ewen >
Hi guys,
I am using Flink on my project and I have a question. (I am using Java)
Is it possible to modify the keyby method in order to key by similarities
and not by the exact name?
Example: I recieve 2 DataStreams, in the first one , the name of the field
that I want to KeyBy is "John Locke",
The problem could be that open() is not called with a proper Configuration
object in streaming mode.
On Sun, 5 Jun 2016 at 19:33 Stephan Ewen wrote:
> Hi David!
>
> You are using the JDBC format that was written for the batch API in the
> streaming API.
>
> While that should
Hi,
I think the problem is that the case class has generic parameters. You can
try making TypeInformation for those parameters implicitly available at the
call site, i.e:
implicit val typeT = createTypeInformation[T] // where you insert the
specific type for T and do the same for the other
25 matches
Mail list logo