What is the recommended practice for using a dedicated ExecutionContexts
inside Flink code?
We are making some external network calls using Futures. Currently all of
them are running on the global execution context (import
scala.concurrent.ExecutionContext.Implicits.global).
Thanks
-Soumya
Hi,
When the web monitor restarts the uploaded jars disappear — in fact, every time
it restarts the upload directory is different.
This seems intentional:
https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/WebRuntimeMonitor.java#L162
You can use Mahout XMLInputFormat with Flink - HAdoopInputFormat
definitions. See
http://stackoverflow.com/questions/29429428/xmlinputformat-for-apache-flink
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-td7023.html
On Tue, Jun 7, 2016 at 10:11 PM, Jamie
Hi Andrea,
How large are these data files? The implementation you've mentioned here
is only usable if they are very small. If so, you're fine. If not read
on...
Processing XML input files in parallel is tricky. It's not a great format
for this type of processing as you've seen. They are
Chesnay:Just want to thank you. I might have one or two related questions later
on, but now just thanks.
On Tuesday, June 7, 2016 8:18 AM, Greg Hogan wrote:
"The question is how to encapsulate numerous transformations into one object
or may be a function in
I'm assuming what you're trying to do is essentially sum over two different
fields of your data. I would do this with my own ReduceFunction.
stream
.keyBy("someKey")
.reduce(CustomReduceFunction) // sum whatever fields you want and return
the result
I think it does make sense that Flink
If you use open vpn for accessing aws then you can use private IP of ec2
machine from your laptop.
Thanks
Ashutosh
On Tue, Jun 7, 2016 at 11:00 PM, Shannon Carey wrote:
> We're also starting to look at automating job deployment/start to Flink
> running on EMR. There are a
Thanks for the clarification.
On Tue, Jun 7, 2016 at 9:15 PM, Aljoscha Krettek
wrote:
> Hi,
> I'm afraid you're running into a bug into the special processing-time
> window operator. A suggested workaround would be to switch to
> characteristic IngestionTime and use
Hi,
I'm afraid you're running into a bug into the special processing-time
window operator. A suggested workaround would be to switch to
characteristic IngestionTime and use TumblingEventTimeWindows.
I also open a Jira issue for the bug so that we can keep track of it:
On Tue, Jun 7, 2016 at 4:52 AM, Stephan Ewen wrote:
> The concern you raised about the sink being synchronous is exactly what my
> last suggestion should address:
>
> The internal state backends can return a handle that can do the sync in a
> background thread. The sink would
Ah, sorry, you are right. You could also call keyBy again before the
second sum, but maybe someone else has a better idea.
Best,
Gábor
2016-06-07 16:18 GMT+02:00 Al-Isawi Rami :
> Thanks Gábor, but the first sum call will return
>
> SingleOutputStreamOperator
>
> I
Hi all,
I am evaluating Apache Flink for processing large sets of Geospatial data.
The use case I am working on will involve reading a certain number of GPX
files stored on Amazon S3.
GPX files are actually XML files and therefore cannot be read on a line by
line basis.
One GPX file will produce
Thanks Gábor, but the first sum call will return
SingleOutputStreamOperator
I could not do another sum call on that. Would tell me how did you manage to do
stream.sum().sum()
Regards,
-Rami
On 7 Jun 2016, at 16:13, Gábor Gévay
> wrote:
Hello,
In
After a second look to KryoSerializer I fear that Input and Output are
never closed..am I right?
On Tue, Jun 7, 2016 at 3:06 PM, Flavio Pompermaier
wrote:
> Hi Aljoscha,
> of course I can :)
> Thanks for helping me..do you think it is the right thing to do calling
>
Hello,
In the case of "sum", you can just specify them one after the other, like:
stream.sum(1).sum(2)
This works, because summing the two fields are independent. However,
in the case of "keyBy", the information is needed from both fields at
the same time to produce the key.
Best,
Gábor
Hi Aljoscha,
of course I can :)
Thanks for helping me..do you think it is the right thing to do calling
reset()?
Actually, I don't know whether this is meaningful or not, but I already ran
the job successfully once on the cluster (a second attempt is curerntly
running) after my accidental
The problem is why is the window end time in the future ?
For example if my window size is 60 seconds and my window is being evaluated at
3.00 pm then why is the window end time 3.01 pm and not 3.00 pm even when the
data that is being evaluated falls in the window 2.59 - 3.00.
Sent from my
That's nice. Can you try it on your cluster with an added "reset" call on
the buffer?
On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote:
> After "some" digging into this problem I'm quite convinced that the
> problem is caused by a missing reset of the buffer during the
Hi,
Is there any reason why “keyBy" accepts multi-field, while for example “sum”
does not.
-Rami
Disclaimer: This message and any attachments thereto are intended solely for
the addressed recipient(s) and may contain confidential information. If you are
not the intended recipient, please
After "some" digging into this problem I'm quite convinced that the problem
is caused by a missing reset of the buffer during the Kryo deserialization,
likewise to what has been fixed by FLINK-2800 (
https://github.com/apache/flink/pull/1308/files).
That fix added an output.clear() in
"The question is how to encapsulate numerous transformations into one
object or may be a function in Apache Flink Java setting."
Implement CustomUnaryOperation. This can then be applied to a DataSet by
calling `DataSet result = DataSet.runOperation(new MyOperation<>(...));`.
On Mon, Jun 6, 2016
1a. ah. yeah i see how it could work, but i wouldn't count on it in a
cluster.
you would (most likely) run the the sub-job (calculating pi) only on a
single node.
1b. different execution environments generally imply different flink
programs.
2. sure it does, since it's a normal flink job.
Hi Elias!
The concern you raised about the sink being synchronous is exactly what my
last suggestion should address:
The internal state backends can return a handle that can do the sync in a
background thread. The sink would continue processing messages, and the
checkpoint would only be
Chesnay:
1a. The code actually works, that is the point. 1b. What restrict for a Flink
program to have several execution environments?2. I am not sure that your
modification allows for parallelism. Does it?3. This code is a simple example
of writing/organizing large and complicated programs,
could you state a specific problem?
On 07.06.2016 06:40, Soumya Simanta wrote:
I've a simple program which takes some inputs from a command line
(Socket stream) and then aggregates based on the key.
When running this program on my local machine I see some output that
is counter intuitive to
from what i can tell from your code you are trying to execute a job
within a job. This just doesn't work.
your main method should look like this:
|publicstaticvoidmain(String[]args)throwsException{doublepi =new
classPI().compute();System.out.println("We estimate Pi to be: "+pi);}|
On
Hi Robert,
Thank you for checking the issue. That INFO is the only information Flink
workers say.
I agree your point of view. Looks like it closes the connections to all
other topics which is not used(idle) although it's a bit misleading.
Ref: https://github.com/edenhill/librdkafka/issues/437
Hi Jack,
right now this is not possible except when writing a custom operator. We
are working on support for a time-to-live setting on states, this should
solve your problem.
For writing a custom operator, check out DataStream.transform() and
StreamMap, which is the operator implementation for
Thanks for your answer Ufuk.
However, I have been reading about KeySelector and I don't understand
completely how it works with my idea.
I am using an algorithm that gives me an score between some different
strings. My idea is: if the score is higher than 0'80 for example, then
those two strings
29 matches
Mail list logo