Hi!
Fabian refers to the TypeSerializerOutputFormat [1]. You can save your
types in efficient binary representation by calling 'dataset.write(new
TypeSerializerOutputFormatType(), /your/file/path); '
Greetings,
Stephan
[1]
I compiled from the release-0.8 branch.
On Thu, Feb 5, 2015 at 8:55 AM, Stephan Ewen se...@apache.org wrote:
Hey Robert!
On which version are you? 0.8 or 0.9- SNAPSHOT?
Am 04.02.2015 14:49 schrieb Robert Waury robert.wa...@googlemail.com:
Hi,
I'm suddenly getting FileNotFoundExceptions
Hi!
I would go with the TypeSerializerInputFormat.
Here is a code sample (in Java, Scala should work the same way):
DataSetMyType dataSet = ...;
// write it out
dataSet.write(new TypeSerializerOutputFormatMyType(), path);
// read
Nice!
BTW: The TypeSerializerInputFormat just changed (in the 0.9-SNAPSHOT
master) so that it now takes the type information, rather than a type
serializer...
Stephan
On Wed, Feb 4, 2015 at 11:52 AM, Vinh June hoangthevinh@gmail.com
wrote:
Thanks,
I just tried and it works with scala
Thanks,
I just tried and it works with scala also.
Small notice for anyone who mights interested is that the constructor of
TypeSerializerInputFormat needs a TypeSerializer, not a TypeInformation. So
this would work in Scala:
[SCALA]
val readback = env
Hi Stefan,
DataSet.first(n) produces a child DataSet, while I need the element
Specifically, I have a CSV with header line and I want to make the maps of
each (header,value) pair for each line
--
View this message in context:
I talked with the admins. The problem seemed to have been that the disk was
full and Flink couldn't create the directory.
Maybe the the error message should reflect if that is the cause.
While cleaning up the disk we noticed that a lot of temporary blobStore
files were not deleted by Flink after
Hi Vinh,
have a look at the first function:
http://flink.apache.org/docs/0.8/dataset_transformations.html#first-n
Stefan
On 5 February 2015 at 15:14, Vinh June hoangthevinh@gmail.com wrote:
Hi,
Is there any way to get 1 element of a DataSet, for example:
I think that process killing (HALT signal) is a very typical way in Linux
to shut down processes. It is the most robust way, since it does not
require to send any custom messages to the process.
This is sort of graceful, as the JVM gets the signal and may do a lot of
things before shutting down,
Hi,
Thank you for you reply. It helped us solve the looping problems in a nicer
way.
We are struggling with some aspects of the cross function.
Still trying to implement the Apriori algorithm, we need to create
combinations of frequent itemSets.
Our problem is that the crossing gives us
Hi Yiannis,
if you scale Flink to larger setups you need to adapt the number of network
buffers.
The background section of the configuration reference explains the details
on that [1].
Let us know, if that helped to solve the problem.
Best, Fabian
[1]
Hi there,
not sure if its a bug of 0.9-SNAPSHOT version, or me just doing something
wrong.
I have this simple scala program:
val input = env.readTextFile(hdfs://my.host.com:54310/home/sampleRaw)
input.map(e = e.split(,)).filter(e = e.length=4)
.map(e = (e(1),e(2),e(3).toDouble))
.groupBy(0,1)
Hi,
you are doing everything correct.
This is a bug in the Flink runtime.
I created a JIRA (https://issues.apache.org/jira/browse/FLINK-1574) and
will push a fix later this evening once all tests have passed.
Thanks for reporting the issue!
Cheers, Fabian
2015-02-17 18:00 GMT+01:00 Yiannis
Have you tried to increase the heap size by shrinking the TM-managed memory?
Reduce the fraction (taskmanager.memory.fraction) or fix the amount of TM
memory (taskmanager.memory.size) in the flink-config.yaml [1].
Cheers, Fabian
[1] http://flink.apache.org/docs/0.8/config.html
On 20 Feb
What happens (in the original stack trace) is the following: The serializer
starts producing the byte stream data and we buffer it, to determine the
length, before sending it over the network. While buffering that data, the
memory runs out.
It may be that you are simply short of memory, it may
I've just looked into the BitSetSerializer of Chill. And it seems to be true
that each bit is encoded as a boolean (for all bit positions = logical
length).
Regarding the DataOutputSerializer: would help to catch OoM exceptions during
resize operations and rethrow it with a more detailed
Hi Flavio!
In Scala:
You can do that, using the pimp my library pattern. Define your own data
set (MyDataSet) that has the method myFunction() and define an implicit
conversion from DataSet to MyData set. See here for more details:
The topic will be co-presented by Gyula myself.
Let's keep our fingers crossed for the other Flink talks. :)
On Fri, Jan 30, 2015 at 9:50 AM, Till Rohrmann trohrm...@apache.org wrote:
Great news Márton. Congrats!
On Jan 30, 2015 2:41 AM, Henry Saputra henry.sapu...@gmail.com wrote:
W00t!
Max's answer is the best approach for pre-packaged Jar programs.
In addition, you have the RemoteEnvironment in Flink. It allows you to
write constructs like this:
public class MyProgram {
private static final String JAR_FILE_PATH = /path/to/jar;
public static void main(String[] args)
Hi,
I'm trying to write a simple join in flink-scala but for some reasons
flink fails to compile my code. I've tried several reformulations but
can't get rid of the error. Can you tell me how to fix this piece of
code? I'm using flink 0.8.0.
Sebastian
Hi,
does the default configuration of Flink use all of the available memory? I
mean the physical memory installed in the computer, not whatever amount of
memory the JVM allocates by default.
Cheers,
Viktor
--
View this message in context:
Hi,
In graph api there's an single source shortest path library.
DataSetVertexlt;Long,Double singleSourceShortestPaths =
graph.run(new SingleSourceShortestPathsLong(srcVertexId,
maxIterations)).getVertices();
For Multiple Source, would it be possible to run it for all nodes
Hi,
Flink uses only the memory which is configured to the JobManager and
TaskManager JVMs.
By default this is 256MB for the JM and 512MB for the TM (see [1] for
details).
The TM memory is split into equally large chunks for each configured slot
of the TM.
You should definitely configure the TM
From the Flink documentation:
Conditions for a class to be treated as a POJO by Flink:
- The class must be public
- It must have a public constructor without arguments
- All fields either have to be public or there must be getters and
setters for all non-public fields.
In your example, the
In case you want to contribute or follow the discussion, here's the JIRA:
https://issues.apache.org/jira/browse/FLINK-1511
Again, thanks for reporting!
2015-02-11 9:59 GMT+01:00 Fabian Hueske fhue...@gmail.com:
Hi,
you are right, there is a problem. I reproduced the problem and it seems
Hi everyone,
From playing around a bit around with delta iterations, I saw that you can
update elements from the solution set and add new elements. My question is: is
it possible to remove elements from the solution set (apart from marking them
as deleted somehow)?
My use case at hand for
Hi,
I managed to reproduce the behavior and as far as I can tell it seems to be
a problem with the memory allocation.
I have filed a bug report in JIRA to get the attention of somebody who
knows the runtime better than I do.
https://issues.apache.org/jira/browse/FLINK-1734
Cheers,
Robert
On
Hi Mihail, Robert,
I've tried reproducing this, but I couldn't.
I'm using the same twitter input graph from SNAP that you link to and also
Scala IDE.
The job finishes without a problem (both the SSSP example from Gelly and
the unweighted version).
The only thing I changed to run your version was
This job probably suffers from overly conservative memory assignment,
giving the solution set too little memory.
Can you try to make the solution set unmanaged, excluding it from Flink's
memory management? That may help with the problem.
See here:
I think this is due to a change introduced by Fabian to fix the issue of
trailing slashes. May have this side effect...
I agree, this is critical and should be fixed soon...
On Tue, Mar 17, 2015 at 3:41 PM, Dániel Bali balijanosdan...@gmail.com
wrote:
Hi!
I fetched the new updates from the
Hi Stephan,
The problem is that file:/C: is evaluated as a non-absolute path in
`Path:isAbsolute`
This hack seems to fix the issue:
in Path.java from line 318:
public boolean isAbsolute() {
final int start = hasWindowsDrive(uri.getPath(), true) ? 3 : 0;
if (uri.getPath().length()
Thanks for your answer. I guess i'm a bit infected by writing to much
Crunch code and I also suspected that getDataSet() was the wrong thing to
do :-)
However I was expecting DataSet.sortPartition to do the sorting, but this
method is missing in 0.8.1?
Do you have a minimal example? I was
Hi!
I think sort partition is the right think, if you have only one partition
(which makes sense, if you want a total order). It is not a parallel
operation any mode, so use it only after the data size has been reduced
(filters / aggregations).
What about data.sortPartition().setParallelism(1).
Hi,
@Emmanuel:
Is the Flink behavior mentioned native or is this something happening when
running Flink on YARN?
The input split assignment behavior Stephan described is implemented into
Flink, so it works in a stanalone Flink cluster and in a YARN setup.
In a setup where each machine running a
Hi!
If you are programming in Scala, you can always use Option[String] for an
optional String field.
Stephan
On Mon, Mar 16, 2015 at 4:57 PM, pietro pietro.pin...@gmail.com wrote:
I have to implement a program based on Flink that process some records.
The peculiarity of those records is
Ah, okay. Then how about using a List of Strings?
On Mon, Mar 16, 2015 at 5:34 PM, pietro pietro.pin...@gmail.com wrote:
Hi Stephan, thanks for the reply!
My problem is that I cannot know whether I will have 0, 1,2,..or more
strings. Then, Option is not gonna help in my case :(
--
View
Sure, that is totally possible. They can be used with and function.
On Mon, Mar 16, 2015 at 7:04 PM, Vinh June hoangthevinh@gmail.com
wrote:
hello,
Is it possible to use .withBroadcastSet in other operations than Map, says
Join for example?
--
View this message in context:
FYI
Posted my dev cluster deployment in Docker here:
https://github.com/streamnsight/docker-flink
Still need to work on aggregating the logs but I hope it can get people started
easy.
Cheers
Great addition!
On Tue, Mar 17, 2015 at 5:11 AM, Emmanuel ele...@msn.com wrote:
FYI
Posted my dev cluster deployment in Docker here:
https://github.com/streamnsight/docker-flink
Still need to work on aggregating the logs but I hope it can get people
started easy.
Cheers
Hey Emmanuel,
thank you for this great contribution. I'm going to test the docker
deployment soon.
I would actually like to include the files into the Flink source repository.
Either into the flink-contrib module, or into the tools directory.
Whats the take of the other committers on this?
On
n way... that was it!? :))) Big thanks! :)
The result is also correct now.
Cheers,
M.
On 18.03.2015 22:49, Vasiliki Kalavri wrote:
haha, yes, actually I just confirmed!
If I flip my args, I get the error you mention in the first e-mail.
you're trying to generate a graph giving the
hello ,
I have been making some research on
https://issues.apache.org/jira/browse/FLINK-1617 using
http://hirzels.com/martin/papers/csur14-streamopt.pdf and others. I
find out that there are many optimization techniques available like
1. OPERATOR REORDERING
2. REDUNDANCY ELIMINATION
3. OPERATOR
Hi Flavio!
The issue that abstract classes and interfaces are not supported is
definitely fixed in 0.9.
Your other fix (adding the call for configuring the output format) - is
that always needed, or just important in a special case? How has the output
format worked before?
If this is critical
0.8,1
On Fri, Mar 20, 2015 at 6:11 PM, Stephan Ewen se...@apache.org wrote:
Hi Flavio!
Is this on Flink 0.9-SNAPSHOT or 0.8.1 ?
Stephan
On Fri, Mar 20, 2015 at 6:03 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
To make it work I had to clone the Flink repo, imporrt the Flink-java
Hello,
I am thinking of implementing the following optimization algorithms
for GSOC2015
These are supposed to do statistical graph analysis and streaming
graph optimization
1. OPERATOR REORDERING
Means changing the order in which the operators appear in the stream
graph to eliminate overheads.
Hi Robert,
thank you for your reply.
I'm starting the job from the Scala IDE. So only one JobManager and one
TaskManager in the same JVM.
I've doubled the memory in the eclipse.ini settings but I still get the
Exception.
-vmargs
-Xmx2048m
-Xms100m
-XX:MaxPermSize=512m
Best,
Mihail
On
Hi!
I fetched the new updates from the master branch recently and now all tests
fail on Windows.
Here is a full stack trace:
https://gist.github.com/balidani/f429b62208ea90015435
The problem appears to be here:
Caused by: java.lang.IllegalArgumentException:
java.net.URISyntaxException:
Hi,
I created a PR, I hope it's ok like this:
https://github.com/apache/flink/pull/491
2015-03-17 16:51 GMT+01:00 Stephan Ewen se...@apache.org:
Looks fine. Can you make a pull request with that fix?
On Tue, Mar 17, 2015 at 4:41 PM, Dániel Bali balijanosdan...@gmail.com
wrote:
Hi Stephan,
Hi Fabian,
I was trying to use the strategy you suggested with flink 0.8.1 but it
seems that the union of the datasets cannot be created programmatically
because the union operator gives a name to the generated dataset that is
the name of the calling function so that only the first dataset is
Hi Emmanuel!
I think there is currently no way to tell the scheduler to pick the least
loaded host. That would be feature we would need to add to Flink.
If you want, open a feature request, and give us some more information on
how you would expect this to behave.
Greetings,
Stephan
Am
Thanks for the clarification Guyla,
I had 9 slots per node, however one node only has 3 CPUs.So, my parallelism
here was 9, and all tasks were allocated to the 9 slots on the one host
I understand the strategy of trying to minimize network IOs by sending to the
same host, but in this case where
Hi,
I did change my config to have parallelism of 9 and 3 slots on each machine and
now it does distribute properly.
The other day I was told i could have many more slots than CPUs available and
the system would distribute the load properly between the hosts with available
CPU time, but it
HI Emmanuel,
We are using Github mirror [1] to accept patches and Pull Requests (PR).
I have ticket item to update the how to contribute page about sending
pull requests for Flink but it is the same as sending PR to other
Github repos [2].
Like Robert said before, we could do it for you but if
My Java is still rusty and I often run into OutOfMemoryError: GC overhead
exceeded...
Yes, I need to look for memory leaks...
But first I need to clear up this memory so I can run again without having to
shut down and restart everything.
I've tried using the jcmd pid GC.run command on eachof the
Dear all,
I have been developing a Flink application that has to run on Amazon Elastic
Map Reduce.
For convenience the data that the application has to read and write are on
the S3.
But, I have not been able to access S3 .This is the error I got:
Thank you Ufuk! That helped a lot.
But I have an other problem now.
Am I missing something?
Caused by: java.net.UnknownHostException: MYBUCKETNAME
at java.net.InetAddress.getAllByName0(InetAddress.java:1250)
at java.net.InetAddress.getAllByName(InetAddress.java:1162)
Hey Pietro!
You have to add the following lines to your flink-conf.yaml:
fs.s3.accessKey: YOUR ACCESS KEY
fs.s3.secretKey: YOUR SECRET KEY
I will fix the error message to include a hint on how to configure this
correctly.
– Ufuk
On Tue, Mar 31, 2015 at 10:53 AM, pietro
Hi Emmanuel,
In Java, the garbage collector will always run periodically. So remotely
executing it won't make any difference.
If you want to reuse the existing Java process without restarting it, you
have to stop the program code from executing which is causing the
OutOfMemoryError. Usually,
hello
I have updated the proposal on melange and will like that you appraise the
proposal so that i will do any necessary refinements before its late for
any corrections
Thanks @gyfora , @rmetzger, @senorcarbone @mbalassi
--
Wepngong Ngeh Benaiah
Max,
Thanks for the answer...
What I am saying is that my program is not running indeed, yet it doesn't seem
garbage collection occurs after cancelling the job. is you saw in the log, the
memory is still 99% used even though I cancelled the job, and I cannot seem to
run another job. I've had
Any news about this? Could someone look into the problem or should I open a
ticket in JIRA?
On Sun, Mar 22, 2015 at 12:09 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi Stephan,
the problem is when you try to write into HBase with the
HadoopOutputFormat.
Unfortunately the recordWriter
Hi there,
is it possible to write the results to HDFS on different files based a
field of a tuple?
Something similar to this:
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
Thanks a lot!
Hi guys,
I'm Giacomo from Italy, I'm newbie with Flink.
I setted up a cluster with Hadoop 1.2 and Flink.
I would like to ask to you how to run the WordCount example taking the
input file from hdfs (example myuser/testWordCount/hamlet.
txt) and put the output also inside hdfs (example
Thank you!This is complete solving the problem.
--
View this message in context:
http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Error-in-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p793.html
Sent from the Apache Flink (Incubator)
Thanks a lot Marton and Max, it worked perfectly.
Regards from Italy :)
On Tue, Feb 24, 2015 at 11:31 AM, Max Michels m...@apache.org wrote:
Hi Giacomo,
Congratulations on setting up a Flink cluster with HDFS :) To run the
WordCount example provided with Flink, you should first upload your
Hi,
non-equi joins are only supported by building the cross product.
This is essentially the nested-loop join strategy, that a conventional
database system would chose. However, such joins are prohibitively
expensive when applied to large data sets.
If you have one small and another large data
No worries ;-)
Looking forward to meet you as well!
On Mar 26, 2015 6:58 PM, Henry Saputra henry.sapu...@gmail.com wrote:
Oh my goodness, I am so sorry Fabian =(
I sent the email out in the morning before I hit my coffee.
Looking forward meeting you at the ApacheCon, Fabian =)
- Henry
Just a quick ping on this for the streaming folks: The deadline for the
proposal submissions is Friday, so the GSoC applicants need to get our
feedback asap.
The student asked me today in the #flink channel whether we can review this
proposal.
I have the following comments regarding the
I just tried the image/scripts: Amazing work! Very well documented, also
for docker beginners like me.
@Emmanuel: Would you like to open a pull request to add the files to the
flink-contib directory?
If you're too busy right now, I can also do it for you (only if you agree
of course ;)) If you
Hi all,
there is no Scala API for Gelly yet and no corresponding JIRA either.
It's definitely in our plans, just not for 0.9 :-)
Cheers,
-V.
On 24 March 2015 at 00:21, Henry Saputra henry.sapu...@gmail.com wrote:
Any JIRA filed to add Scala counterparts for Gelly?
- Henry
On Mon, Mar 23,
haha, yes, actually I just confirmed!
If I flip my args, I get the error you mention in the first e-mail. you're
trying to generate a graph giving the edge list as a vertex list and this
is a way too big dataset for your memory settings (cmp. ~15m edges vs. the
actual 400k).
I hope that clear
Hi Mihail,
I used your code to generate the vertex file, then gave this and the edge
list as input to your SSSP implementation and still couldn't reproduce the
exception. I'm using the same local setup as I describe above.
I'm not aware of any recent changes that might be relevant, but, just in
I'm also using 0 as sourceID. The exact program arguments:
0 /home/vieru/dev/flink-experiments/data/social_network.edgelist
/home/vieru/dev/flink-experiments/data/social_network.verticeslist
/home/vieru/dev/flink-experiments/sssp-output-higgstwitter 10
And yes, I call both methods on the
Creating a JIRA issue never hurts.
Have you tried to add your code snippet to the
HadoopOutputFormatBase.configure() method? Seems to me the right place for
it.
Do you want to open a PR for that?
2015-03-23 16:01 GMT+01:00 Flavio Pompermaier pomperma...@okkam.it:
Any news about this? Could
Congrats Marton and everybody who is working on Flink Streaming!
Looking forward to the blog post :-)
2015-01-30 1:06 GMT+01:00 Márton Balassi mbala...@apache.org:
Hi everyone,
Thanks for your support for the Flink talks at the community choice for
the next Hadoop Summit Europe. The results
W00t! Congrats guys!
On Thu, Jan 29, 2015 at 4:06 PM, Márton Balassi mbala...@apache.org wrote:
Hi everyone,
Thanks for your support for the Flink talks at the community choice for the
next Hadoop Summit Europe. The results are out. [1]
Our submission Real-Time Stream Processing with Apache
Impressive number of votes. Congratulations, Márton!
On Fri, Jan 30, 2015 at 9:50 AM, Till Rohrmann trohrm...@apache.org wrote:
Great news Márton. Congrats!
On Jan 30, 2015 2:41 AM, Henry Saputra henry.sapu...@gmail.com wrote:
W00t! Congrats guys!
On Thu, Jan 29, 2015 at 4:06 PM, Márton
Maven seems to be unable to find the artifact, I also can't find it
under mvn repository:
http://mvnrepository.com/search?q=flink
Best,
Sebastian
On 23.03.2015 23:10, Andra Lungu wrote:
Hi Sebastian,
For me it works just as described there, with 0.9, but there should be
no problem for
No I haven't. There are some points that are not clear to me:
1) why the parameters I set in the job configuration get lost when arriving
to the job and task managers?
2)do you think I should put the setConf in the configure method?what is the
lifecycle of the Outputformat?
3)is it really
Hi,
Is gelly already usable in the 0.8.1 release? I tried adding
dependency
groupIdorg.apache.flink/groupId
artifactIdflink-gelly/artifactId
version0.8.1/version
/dependency
as described in [1], but my project fails to build.
Best,
Sebastian
[1]
Hi Sebastian,
For me it works just as described there, with 0.9, but there should be no
problem for 0.8.1.
Here is an example pom.xml
https://github.com/andralungu/gelly-partitioning/blob/first/pom.xml
Hope that helps!
Andra
On Mon, Mar 23, 2015 at 11:02 PM, Sebastian ssc.o...@googlemail.com
Hi,
Gelly is not part of any offical flink release.
You have to use a Snapshot version of Flink if you want to try it out.
Sent from my iPhone
On 23.03.2015, at 23:10, Andra Lungu lungu.an...@gmail.com wrote:
Hi Sebastian,
For me it works just as described there, with 0.9, but there
Is gelly supposed to be usable from Scala? It looks as it is hardcoded
to use the Java API.
Best,
Sebastian
On 23.03.2015 23:15, Robert Metzger wrote:
Hi,
Gelly is not part of any offical flink release.
You have to use a Snapshot version of Flink if you want to try it out.
Sent from my
For now it just works with the Java API.
On Mon, Mar 23, 2015 at 11:42 PM, Sebastian ssc.o...@googlemail.com wrote:
Is gelly supposed to be usable from Scala? It looks as it is hardcoded to
use the Java API.
Best,
Sebastian
On 23.03.2015 23:15, Robert Metzger wrote:
Hi,
Gelly is not
Hello,
I'm implementing an algorithm which requires nested iterations, and, from
what I understood, this feature was not yet available in Flink [1], and my
experiments with 2 nested bulk iterations seem to confirm that. However I
came across a Flink unit test [2] using nested iterations,
Hi guys,
I have a question about how parallelism works.
If I have a large dataset and I would divide it into 5 blocks, can I pass
each block of data to a fixed parallel process (for example I set up 5
process) ?
And if the results data from each process arrive to the output not in an
ordered
Hi all,
I wonder if the coGroup operator have the ability to sink two output
simultaneously. I am trying to mock it by calling a function inside the
operator, in which I sink the first output, and get the second output
myself.
I am not sure if this is the best way, and I would like to hear your
Thanks for prompt reply.
Maybe the expression Sink is not suitable to what I need. What if I want
to *Collect* two data sets directly from the coGroup operator. Is there
anyway to do so ?!!
As I might know, the operator has only Collector Object, but I wonder if
there is another feature in Flink
Hi,
you can write the output of a coGroup operator to two sinks:
--\ /Sink1
\ /
(CoGroup)
/\
--/ \--Sink2
You can actually write to as many sinks as you want.
Note that the data written to Sink1 and Sink2 will be
Hi Giacomo,
If I understand you correctly, you want your Flink job to execute with a
parallelism of 5. Just call setDegreeOfParallelism(5) on your
ExecutionEnvironment. That way, all operations, when possible, will be
performed using 5 parallel instances. This is also true for the DataSink
which
Hello there,
What functions should be used to aggregate (unordered) tuples for every
window in a WindowedDataStream to a (ordered) list?
Neither foldWindow nor reduceWindow seems to be applicable, and
aggregate does not, to my understanding, take user-defined functions.
To get started with
Hi Max,
thank you for your reply.
DataSink contains data ordered, I mean, it contains in order output1,
output1 ... output5? Or are them mixed?
Thanks a lot,
Giacomo
On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels m...@apache.org wrote:
Hi Giacomo,
If I understand you correctly, you
Hi Giacomo,
If you use a FileOutputFormat as a DataSink (e.g. as in
env.writeAsText(/path), then the output directory will contain 5 files
named 1, 2, 3, 4, and 5, each containing the output of the corresponding
task. The order of the data in the files follows the order of the
distributed
Woot! Great news!
On Monday, April 13, 2015, Kostas Tzoumas ktzou...@apache.org wrote:
We are very excited to announce Flink 0.9.0-milestone1, a preview release
to give users early access to some Flink 0.9.0 features, including:
- A Table API for SQL-like queries embedded in Java and Scala
If your inner iterations happens to work only on the data of a single
partition, then you can also implement this iteration as part of a
mapPartition operator. The only problem there would be that you have to
keep all the partition's data on the heap, if you need access to it.
Cheers,
Till
On
Hi Benoît!
You are right, the nested iterations are currently not supported.
The test you found actually checks that the Optimizer gives a good error
message when encountering nested iterations.
Can you write your program as one iterations (the inner) and start the
program multiple times to
Thanks for you quick answers!
The algorithm is the following: I've got a spatial set of data and I want
to find dense regions. The space is beforehand discretized into cells of
a fixed size. Then, for each dense cell (1st iteration), starting with the
most dense, the algorithm tries to
Hello,
After a quite successful benchmark yesterday (Flink being about twice faster
than Spark on my use cases), I’ve turned instantly from spark-fan to flink-fan
– great job, committers!
So I’ve decided to port my existing Spark tools to Flink. Happily, most of the
difficulty was renaming
Will you elaborate on your use case? It would help to find out where Flink
shines. IMO, its a great project, but needs more differentiation from Spark.
On Thu, Apr 23, 2015 at 7:25 AM, LINZ, Arnaud al...@bouyguestelecom.fr
wrote:
Hello,
After a quite successful benchmark yesterday (Flink
I've searched within flink for a working example of TypeSerializerOutputFormat
usage but I didn't find anything usable.
Cold you show me a simple snippet of code?
Do I have to configure BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY? Which
size do I have to use? Will flink write a single file or a set
1 - 100 of 66206 matches
Mail list logo