Re: CSV input with unknown # of fields and Custom output format

2015-02-04 Thread Stephan Ewen
Hi! Fabian refers to the TypeSerializerOutputFormat [1]. You can save your types in efficient binary representation by calling 'dataset.write(new TypeSerializerOutputFormatType(), /your/file/path); ' Greetings, Stephan [1]

Re: Job fails with FileNotFoundException from blobStore

2015-02-05 Thread Robert Waury
I compiled from the release-0.8 branch. On Thu, Feb 5, 2015 at 8:55 AM, Stephan Ewen se...@apache.org wrote: Hey Robert! On which version are you? 0.8 or 0.9- SNAPSHOT? Am 04.02.2015 14:49 schrieb Robert Waury robert.wa...@googlemail.com: Hi, I'm suddenly getting FileNotFoundExceptions

Re: CSV input with unknown # of fields and Custom output format

2015-02-04 Thread Stephan Ewen
Hi! I would go with the TypeSerializerInputFormat. Here is a code sample (in Java, Scala should work the same way): DataSetMyType dataSet = ...; // write it out dataSet.write(new TypeSerializerOutputFormatMyType(), path); // read

Re: CSV input with unknown # of fields and Custom output format

2015-02-04 Thread Stephan Ewen
Nice! BTW: The TypeSerializerInputFormat just changed (in the 0.9-SNAPSHOT master) so that it now takes the type information, rather than a type serializer... Stephan On Wed, Feb 4, 2015 at 11:52 AM, Vinh June hoangthevinh@gmail.com wrote: Thanks, I just tried and it works with scala

Re: CSV input with unknown # of fields and Custom output format

2015-02-04 Thread Vinh June
Thanks, I just tried and it works with scala also. Small notice for anyone who mights interested is that the constructor of TypeSerializerInputFormat needs a TypeSerializer, not a TypeInformation. So this would work in Scala: [SCALA] val readback = env

Re: Get 1 element of DataSet

2015-02-05 Thread Vinh June
Hi Stefan, DataSet.first(n) produces a child DataSet, while I need the element Specifically, I have a CSV with header line and I want to make the maps of each (header,value) pair for each line -- View this message in context:

Re: Job fails with FileNotFoundException from blobStore

2015-02-05 Thread Robert Waury
I talked with the admins. The problem seemed to have been that the disk was full and Flink couldn't create the directory. Maybe the the error message should reflect if that is the cause. While cleaning up the disk we noticed that a lot of temporary blobStore files were not deleted by Flink after

Re: Get 1 element of DataSet

2015-02-05 Thread Stefan Bunk
Hi Vinh, have a look at the first function: http://flink.apache.org/docs/0.8/dataset_transformations.html#first-n Stefan On 5 February 2015 at 15:14, Vinh June hoangthevinh@gmail.com wrote: Hi, Is there any way to get 1 element of a DataSet, for example:

Re: Job fails with FileNotFoundException from blobStore

2015-02-05 Thread Stephan Ewen
I think that process killing (HALT signal) is a very typical way in Linux to shut down processes. It is the most robust way, since it does not require to send any custom messages to the process. This is sort of graceful, as the JVM gets the signal and may do a lot of things before shutting down,

Re: flink loop

2015-02-08 Thread tanguy racinet
Hi, Thank you for you reply. It helped us solve the looping problems in a nicer way. We are struggling with some aspects of the cross function. Still trying to implement the Apriori algorithm, we need to create combinations of frequent itemSets. Our problem is that the crossing gives us

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

2015-02-18 Thread Fabian Hueske
Hi Yiannis, if you scale Flink to larger setups you need to adapt the number of network buffers. The background section of the configuration reference explains the details on that [1]. Let us know, if that helped to solve the problem. Best, Fabian [1]

Exception in Simple Job: Thread 'SortMerger spilling thread' terminated due to an exception: The user-defined combiner failed in its 'open()' method

2015-02-17 Thread Yiannis Gkoufas
Hi there, not sure if its a bug of 0.9-SNAPSHOT version, or me just doing something wrong. I have this simple scala program: val input = env.readTextFile(hdfs://my.host.com:54310/home/sampleRaw) input.map(e = e.split(,)).filter(e = e.length=4) .map(e = (e(1),e(2),e(3).toDouble)) .groupBy(0,1)

Re: Exception in Simple Job: Thread 'SortMerger spilling thread' terminated due to an exception: The user-defined combiner failed in its 'open()' method

2015-02-17 Thread Fabian Hueske
Hi, you are doing everything correct. This is a bug in the Flink runtime. I created a JIRA (https://issues.apache.org/jira/browse/FLINK-1574) and will push a fix later this evening once all tests have passed. Thanks for reporting the issue! Cheers, Fabian 2015-02-17 18:00 GMT+01:00 Yiannis

Re: OutOfMemory during serialization

2015-02-20 Thread Fabian Hueske
Have you tried to increase the heap size by shrinking the TM-managed memory? Reduce the fraction (taskmanager.memory.fraction) or fix the amount of TM memory (taskmanager.memory.size) in the flink-config.yaml [1]. Cheers, Fabian [1] http://flink.apache.org/docs/0.8/config.html On 20 Feb

Re: OutOfMemory during serialization

2015-02-20 Thread Stephan Ewen
What happens (in the original stack trace) is the following: The serializer starts producing the byte stream data and we buffer it, to determine the length, before sending it over the network. While buffering that data, the memory runs out. It may be that you are simply short of memory, it may

Re: OutOfMemory during serialization

2015-02-20 Thread Ufuk Celebi
I've just looked into the BitSetSerializer of Chill. And it seems to be true that each bit is encoded as a boolean (for all bit positions = logical length). Regarding the DataOutputSerializer: would help to catch OoM exceptions during resize operations and rethrow it with a more detailed

Re: Operators chaining as custom functions

2015-01-27 Thread Stephan Ewen
Hi Flavio! In Scala: You can do that, using the pimp my library pattern. Define your own data set (MyDataSet) that has the method myFunction() and define an implicit conversion from DataSet to MyData set. See here for more details:

Re: Community vote for Hadoop Summit result

2015-01-30 Thread Márton Balassi
The topic will be co-presented by Gyula myself. Let's keep our fingers crossed for the other Flink talks. :) On Fri, Jan 30, 2015 at 9:50 AM, Till Rohrmann trohrm...@apache.org wrote: Great news Márton. Congrats! On Jan 30, 2015 2:41 AM, Henry Saputra henry.sapu...@gmail.com wrote: W00t!

Re: How to submit flink jars from plain Java programs?

2015-01-26 Thread Stephan Ewen
Max's answer is the best approach for pre-packaged Jar programs. In addition, you have the RemoteEnvironment in Flink. It allows you to write constructs like this: public class MyProgram { private static final String JAR_FILE_PATH = /path/to/jar; public static void main(String[] args)

Flink cannot compile my join code...

2015-02-10 Thread Sebastian
Hi, I'm trying to write a simple join in flink-scala but for some reasons flink fails to compile my code. I've tried several reformulations but can't get rid of the error. Can you tell me how to fix this piece of code? I'm using flink 0.8.0. Sebastian

Does Flink use all of the available memory?

2015-02-14 Thread Viktor Rosenfeld
Hi, does the default configuration of Flink use all of the available memory? I mean the physical memory installed in the computer, not whatever amount of memory the JVM allocates by default. Cheers, Viktor -- View this message in context:

Multiple sources shortest path

2015-02-14 Thread HungChang
Hi, In graph api there's an single source shortest path library. DataSetVertexlt;Long,Double singleSourceShortestPaths = graph.run(new SingleSourceShortestPathsLong(srcVertexId, maxIterations)).getVertices(); For Multiple Source, would it be possible to run it for all nodes

Re: Does Flink use all of the available memory?

2015-02-14 Thread Fabian Hueske
Hi, Flink uses only the memory which is configured to the JobManager and TaskManager JVMs. By default this is 256MB for the JM and 512MB for the TM (see [1] for details). The TM memory is split into equally large chunks for each configured slot of the TM. You should definitely configure the TM

Re: [Exception]Key expressions are only supported on POJO types and Tuples

2015-02-11 Thread Aljoscha Krettek
From the Flink documentation: Conditions for a class to be treated as a POJO by Flink: - The class must be public - It must have a public constructor without arguments - All fields either have to be public or there must be getters and setters for all non-public fields. In your example, the

Re: [Exception]Key expressions are only supported on POJO types and Tuples

2015-02-11 Thread Fabian Hueske
In case you want to contribute or follow the discussion, here's the JIRA: https://issues.apache.org/jira/browse/FLINK-1511 Again, thanks for reporting! 2015-02-11 9:59 GMT+01:00 Fabian Hueske fhue...@gmail.com: Hi, you are right, there is a problem. I reproduced the problem and it seems

DeltaIterations: shrink solution set

2015-02-10 Thread Kruse, Sebastian
Hi everyone, From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as deleted somehow)? My use case at hand for

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Robert Waury
Hi, I managed to reproduce the behavior and as far as I can tell it seems to be a problem with the memory allocation. I have filed a bug report in JIRA to get the attention of somebody who knows the runtime better than I do. https://issues.apache.org/jira/browse/FLINK-1734 Cheers, Robert On

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Vasiliki Kalavri
Hi Mihail, Robert, I've tried reproducing this, but I couldn't. I'm using the same twitter input graph from SNAP that you link to and also Scala IDE. The job finishes without a problem (both the SSSP example from Gelly and the unweighted version). The only thing I changed to run your version was

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Stephan Ewen
This job probably suffers from overly conservative memory assignment, giving the solution set too little memory. Can you try to make the solution set unmanaged, excluding it from Flink's memory management? That may help with the problem. See here:

Re: Windows file path problems

2015-03-17 Thread Stephan Ewen
I think this is due to a change introduced by Fabian to fix the issue of trailing slashes. May have this side effect... I agree, this is critical and should be fixed soon... On Tue, Mar 17, 2015 at 3:41 PM, Dániel Bali balijanosdan...@gmail.com wrote: Hi! I fetched the new updates from the

Re: Windows file path problems

2015-03-17 Thread Dániel Bali
Hi Stephan, The problem is that file:/C: is evaluated as a non-absolute path in `Path:isAbsolute` This hack seems to fix the issue: in Path.java from line 318: public boolean isAbsolute() { final int start = hasWindowsDrive(uri.getPath(), true) ? 3 : 0; if (uri.getPath().length()

Re: Sort tuple dataset

2015-03-15 Thread Kristoffer Sjögren
Thanks for your answer. I guess i'm a bit infected by writing to much Crunch code and I also suspected that getDataSet() was the wrong thing to do :-) However I was expecting DataSet.sortPartition to do the sorting, but this method is missing in 0.8.1? Do you have a minimal example? I was

Re: Sort tuple dataset

2015-03-15 Thread Stephan Ewen
Hi! I think sort partition is the right think, if you have only one partition (which makes sense, if you want a total order). It is not a parallel operation any mode, so use it only after the data size has been reduced (filters / aggregations). What about data.sortPartition().setParallelism(1).

Re: Maintaining data locality with list of paths (strings) as input

2015-03-15 Thread Robert Metzger
Hi, @Emmanuel: Is the Flink behavior mentioned native or is this something happening when running Flink on YARN? The input split assignment behavior Stephan described is implemented into Flink, so it works in a stanalone Flink cluster and in a YARN setup. In a setup where each machine running a

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread Stephan Ewen
Hi! If you are programming in Scala, you can always use Option[String] for an optional String field. Stephan On Mon, Mar 16, 2015 at 4:57 PM, pietro pietro.pin...@gmail.com wrote: I have to implement a program based on Flink that process some records. The peculiarity of those records is

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread Stephan Ewen
Ah, okay. Then how about using a List of Strings? On Mon, Mar 16, 2015 at 5:34 PM, pietro pietro.pin...@gmail.com wrote: Hi Stephan, thanks for the reply! My problem is that I cannot know whether I will have 0, 1,2,..or more strings. Then, Option is not gonna help in my case :( -- View

Re: using BroadcastSet in Join/CoGroup/Cross

2015-03-16 Thread Stephan Ewen
Sure, that is totally possible. They can be used with and function. On Mon, Mar 16, 2015 at 7:04 PM, Vinh June hoangthevinh@gmail.com wrote: hello, Is it possible to use .withBroadcastSet in other operations than Map, says Join for example? -- View this message in context:

Flink cluster dev environment in Docker

2015-03-16 Thread Emmanuel
FYI Posted my dev cluster deployment in Docker here: https://github.com/streamnsight/docker-flink Still need to work on aggregating the logs but I hope it can get people started easy. Cheers

Re: Flink cluster dev environment in Docker

2015-03-17 Thread Flavio Pompermaier
Great addition! On Tue, Mar 17, 2015 at 5:11 AM, Emmanuel ele...@msn.com wrote: FYI Posted my dev cluster deployment in Docker here: https://github.com/streamnsight/docker-flink Still need to work on aggregating the logs but I hope it can get people started easy. Cheers

Re: Flink cluster dev environment in Docker

2015-03-17 Thread Robert Metzger
Hey Emmanuel, thank you for this great contribution. I'm going to test the docker deployment soon. I would actually like to include the files into the Flink source repository. Either into the flink-contrib module, or into the tools directory. Whats the take of the other committers on this? On

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Mihail Vieru
n way... that was it!? :))) Big thanks! :) The result is also correct now. Cheers, M. On 18.03.2015 22:49, Vasiliki Kalavri wrote: haha, yes, actually I just confirmed! If I flip my args, I get the error you mention in the first e-mail. you're trying to generate a graph giving the

Questions on GSoC project: Query optimisation layer for Flink Streaming

2015-03-19 Thread Wepngong Benaiah
hello , I have been making some research on https://issues.apache.org/jira/browse/FLINK-1617 using http://hirzels.com/martin/papers/csur14-streamopt.pdf and others. I find out that there are many optimization techniques available like 1. OPERATOR REORDERING 2. REDUNDANCY ELIMINATION 3. OPERATOR

Re: HBase TableOutputFormat

2015-03-21 Thread Stephan Ewen
Hi Flavio! The issue that abstract classes and interfaces are not supported is definitely fixed in 0.9. Your other fix (adding the call for configuring the output format) - is that always needed, or just important in a special case? How has the output format worked before? If this is critical

Re: HBase TableOutputFormat

2015-03-20 Thread Flavio Pompermaier
0.8,1 On Fri, Mar 20, 2015 at 6:11 PM, Stephan Ewen se...@apache.org wrote: Hi Flavio! Is this on Flink 0.9-SNAPSHOT or 0.8.1 ? Stephan On Fri, Mar 20, 2015 at 6:03 PM, Flavio Pompermaier pomperma...@okkam.it wrote: To make it work I had to clone the Flink repo, imporrt the Flink-java

Re: Questions on GSoC project: Query optimisation layer for Flink Streaming

2015-03-20 Thread Wepngong Benaiah
Hello, I am thinking of implementing the following optimization algorithms for GSOC2015 These are supposed to do statistical graph analysis and streaming graph optimization 1.  OPERATOR REORDERING Means changing the order in which the operators appear in the stream graph to eliminate overheads.

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-17 Thread Mihail Vieru
Hi Robert, thank you for your reply. I'm starting the job from the Scala IDE. So only one JobManager and one TaskManager in the same JVM. I've doubled the memory in the eclipse.ini settings but I still get the Exception. -vmargs -Xmx2048m -Xms100m -XX:MaxPermSize=512m Best, Mihail On

Windows file path problems

2015-03-17 Thread Dániel Bali
Hi! I fetched the new updates from the master branch recently and now all tests fail on Windows. Here is a full stack trace: https://gist.github.com/balidani/f429b62208ea90015435 The problem appears to be here: Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException:

Re: Windows file path problems

2015-03-17 Thread Dániel Bali
Hi, I created a PR, I hope it's ok like this: https://github.com/apache/flink/pull/491 2015-03-17 16:51 GMT+01:00 Stephan Ewen se...@apache.org: Looks fine. Can you make a pull request with that fix? On Tue, Mar 17, 2015 at 4:41 PM, Dániel Bali balijanosdan...@gmail.com wrote: Hi Stephan,

Re: Union of multiple datasets vs Join

2015-03-17 Thread Flavio Pompermaier
Hi Fabian, I was trying to use the strategy you suggested with flink 0.8.1 but it seems that the union of the datasets cannot be created programmatically because the union operator gives a name to the generated dataset that is the name of the calling function so that only the first dataset is

RE: Flink logs only written to one host

2015-03-12 Thread Stephan Ewen
Hi Emmanuel! I think there is currently no way to tell the scheduler to pick the least loaded host. That would be feature we would need to add to Flink. If you want, open a feature request, and give us some more information on how you would expect this to behave. Greetings, Stephan Am

RE: Flink logs only written to one host

2015-03-12 Thread Emmanuel
Thanks for the clarification Guyla, I had 9 slots per node, however one node only has 3 CPUs.So, my parallelism here was 9, and all tasks were allocated to the 9 slots on the one host I understand the strategy of trying to minimize network IOs by sending to the same host, but in this case where

RE: Flink logs only written to one host

2015-03-12 Thread Emmanuel
Hi, I did change my config to have parallelism of 9 and 3 slots on each machine and now it does distribute properly. The other day I was told i could have many more slots than CPUs available and the system would distribute the load properly between the hosts with available CPU time, but it

Re: Flink cluster dev environment in Docker

2015-03-24 Thread Henry Saputra
HI Emmanuel, We are using Github mirror [1] to accept patches and Pull Requests (PR). I have ticket item to update the how to contribute page about sending pull requests for Flink but it is the same as sending PR to other Github repos [2]. Like Robert said before, we could do it for you but if

GC on taskmanagers

2015-03-30 Thread Emmanuel
My Java is still rusty and I often run into OutOfMemoryError: GC overhead exceeded... Yes, I need to look for memory leaks... But first I need to clear up this memory so I can run again without having to shut down and restart everything. I've tried using the jcmd pid GC.run command on eachof the

Problem with Amazon S3

2015-03-31 Thread pietro
Dear all, I have been developing a Flink application that has to run on Amazon Elastic Map Reduce. For convenience the data that the application has to read and write are on the S3. But, I have not been able to access S3 .This is the error I got:

Re: Problem with Amazon S3

2015-03-31 Thread pietro
Thank you Ufuk! That helped a lot. But I have an other problem now. Am I missing something? Caused by: java.net.UnknownHostException: MYBUCKETNAME at java.net.InetAddress.getAllByName0(InetAddress.java:1250) at java.net.InetAddress.getAllByName(InetAddress.java:1162)

Re: Problem with Amazon S3

2015-03-31 Thread Ufuk Celebi
Hey Pietro! You have to add the following lines to your flink-conf.yaml: fs.s3.accessKey: YOUR ACCESS KEY fs.s3.secretKey: YOUR SECRET KEY I will fix the error message to include a hint on how to configure this correctly. – Ufuk On Tue, Mar 31, 2015 at 10:53 AM, pietro

Re: GC on taskmanagers

2015-03-31 Thread Maximilian Michels
Hi Emmanuel, In Java, the garbage collector will always run periodically. So remotely executing it won't make any difference. If you want to reuse the existing Java process without restarting it, you have to stop the program code from executing which is causing the OutOfMemoryError. Usually,

Query Optimisation layer for flink streaming

2015-03-27 Thread Wepngong Benaiah
hello I have updated the proposal on melange and will like that you appraise the proposal so that i will do any necessary refinements before its late for any corrections Thanks @gyfora , @rmetzger, @senorcarbone @mbalassi -- Wepngong Ngeh Benaiah

RE: GC on taskmanagers

2015-03-31 Thread Emmanuel
Max, Thanks for the answer... What I am saying is that my program is not running indeed, yet it doesn't seem garbage collection occurs after cancelling the job. is you saw in the log, the memory is still 99% used even though I cancelled the job, and I cannot seem to run another job. I've had

Re: HBase TableOutputFormat

2015-03-23 Thread Flavio Pompermaier
Any news about this? Could someone look into the problem or should I open a ticket in JIRA? On Sun, Mar 22, 2015 at 12:09 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi Stephan, the problem is when you try to write into HBase with the HadoopOutputFormat. Unfortunately the recordWriter

MultipleFileOutput based on field

2015-02-23 Thread Yiannis Gkoufas
Hi there, is it possible to write the results to HDFS on different files based a field of a tuple? Something similar to this: http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job Thanks a lot!

HDFS Clustering

2015-02-24 Thread Giacomo Licari
Hi guys, I'm Giacomo from Italy, I'm newbie with Flink. I setted up a cluster with Hadoop 1.2 and Flink. I would like to ask to you how to run the WordCount example taking the input file from hdfs (example myuser/testWordCount/hamlet. txt) and put the output also inside hdfs (example

Re: error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread HungChang
Thank you!This is complete solving the problem. -- View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Error-in-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p793.html Sent from the Apache Flink (Incubator)

Re: HDFS Clustering

2015-02-24 Thread Giacomo Licari
Thanks a lot Marton and Max, it worked perfectly. Regards from Italy :) On Tue, Feb 24, 2015 at 11:31 AM, Max Michels m...@apache.org wrote: Hi Giacomo, Congratulations on setting up a Flink cluster with HDFS :) To run the WordCount example provided with Flink, you should first upload your

Re: Some questions about Join

2015-02-21 Thread Fabian Hueske
Hi, non-equi joins are only supported by building the cross product. This is essentially the nested-loop join strategy, that a conventional database system would chose. However, such joins are prohibitively expensive when applied to large data sets. If you have one small and another large data

Re: ApacheCon 2015 is coming to Austin, Texas, USA

2015-03-26 Thread Fabian Hueske
No worries ;-) Looking forward to meet you as well! On Mar 26, 2015 6:58 PM, Henry Saputra henry.sapu...@gmail.com wrote: Oh my goodness, I am so sorry Fabian =( I sent the email out in the morning before I hit my coffee. Looking forward meeting you at the ApacheCon, Fabian =) - Henry

Re: GSoC project proposal: Query optimisation layer for Flink Streaming

2015-03-24 Thread Robert Metzger
Just a quick ping on this for the streaming folks: The deadline for the proposal submissions is Friday, so the GSoC applicants need to get our feedback asap. The student asked me today in the #flink channel whether we can review this proposal. I have the following comments regarding the

Re: Flink cluster dev environment in Docker

2015-03-24 Thread Robert Metzger
I just tried the image/scripts: Amazing work! Very well documented, also for docker beginners like me. @Emmanuel: Would you like to open a pull request to add the files to the flink-contib directory? If you're too busy right now, I can also do it for you (only if you agree of course ;)) If you

Re: Gelly available already?

2015-03-24 Thread Vasiliki Kalavri
Hi all, there is no Scala API for Gelly yet and no corresponding JIRA either. It's definitely in our plans, just not for 0.9 :-) Cheers, -V. On 24 March 2015 at 00:21, Henry Saputra henry.sapu...@gmail.com wrote: Any JIRA filed to add Scala counterparts for Gelly? - Henry On Mon, Mar 23,

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Vasiliki Kalavri
haha, yes, actually I just confirmed! If I flip my args, I get the error you mention in the first e-mail. you're trying to generate a graph giving the edge list as a vertex list and this is a way too big dataset for your memory settings (cmp. ~15m edges vs. the actual 400k). I hope that clear

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Vasiliki Kalavri
Hi Mihail, I used your code to generate the vertex file, then gave this and the edge list as input to your SSSP implementation and still couldn't reproduce the exception. I'm using the same local setup as I describe above. I'm not aware of any recent changes that might be relevant, but, just in

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-18 Thread Mihail Vieru
I'm also using 0 as sourceID. The exact program arguments: 0 /home/vieru/dev/flink-experiments/data/social_network.edgelist /home/vieru/dev/flink-experiments/data/social_network.verticeslist /home/vieru/dev/flink-experiments/sssp-output-higgstwitter 10 And yes, I call both methods on the

Re: HBase TableOutputFormat

2015-03-23 Thread Fabian Hueske
Creating a JIRA issue never hurts. Have you tried to add your code snippet to the HadoopOutputFormatBase.configure() method? Seems to me the right place for it. Do you want to open a PR for that? 2015-03-23 16:01 GMT+01:00 Flavio Pompermaier pomperma...@okkam.it: Any news about this? Could

Re: Community vote for Hadoop Summit result

2015-01-29 Thread Fabian Hueske
Congrats Marton and everybody who is working on Flink Streaming! Looking forward to the blog post :-) 2015-01-30 1:06 GMT+01:00 Márton Balassi mbala...@apache.org: Hi everyone, Thanks for your support for the Flink talks at the community choice for the next Hadoop Summit Europe. The results

Re: Community vote for Hadoop Summit result

2015-01-29 Thread Henry Saputra
W00t! Congrats guys! On Thu, Jan 29, 2015 at 4:06 PM, Márton Balassi mbala...@apache.org wrote: Hi everyone, Thanks for your support for the Flink talks at the community choice for the next Hadoop Summit Europe. The results are out. [1] Our submission Real-Time Stream Processing with Apache

Re: Community vote for Hadoop Summit result

2015-01-30 Thread Max Michels
Impressive number of votes. Congratulations, Márton! On Fri, Jan 30, 2015 at 9:50 AM, Till Rohrmann trohrm...@apache.org wrote: Great news Márton. Congrats! On Jan 30, 2015 2:41 AM, Henry Saputra henry.sapu...@gmail.com wrote: W00t! Congrats guys! On Thu, Jan 29, 2015 at 4:06 PM, Márton

Re: Gelly available already?

2015-03-23 Thread Sebastian
Maven seems to be unable to find the artifact, I also can't find it under mvn repository: http://mvnrepository.com/search?q=flink Best, Sebastian On 23.03.2015 23:10, Andra Lungu wrote: Hi Sebastian, For me it works just as described there, with 0.9, but there should be no problem for

Re: HBase TableOutputFormat

2015-03-23 Thread Flavio Pompermaier
No I haven't. There are some points that are not clear to me: 1) why the parameters I set in the job configuration get lost when arriving to the job and task managers? 2)do you think I should put the setConf in the configure method?what is the lifecycle of the Outputformat? 3)is it really

Gelly available already?

2015-03-23 Thread Sebastian
Hi, Is gelly already usable in the 0.8.1 release? I tried adding dependency groupIdorg.apache.flink/groupId artifactIdflink-gelly/artifactId version0.8.1/version /dependency as described in [1], but my project fails to build. Best, Sebastian [1]

Re: Gelly available already?

2015-03-23 Thread Andra Lungu
Hi Sebastian, For me it works just as described there, with 0.9, but there should be no problem for 0.8.1. Here is an example pom.xml https://github.com/andralungu/gelly-partitioning/blob/first/pom.xml Hope that helps! Andra On Mon, Mar 23, 2015 at 11:02 PM, Sebastian ssc.o...@googlemail.com

Re: Gelly available already?

2015-03-23 Thread Robert Metzger
Hi, Gelly is not part of any offical flink release. You have to use a Snapshot version of Flink if you want to try it out. Sent from my iPhone On 23.03.2015, at 23:10, Andra Lungu lungu.an...@gmail.com wrote: Hi Sebastian, For me it works just as described there, with 0.9, but there

Re: Gelly available already?

2015-03-23 Thread Sebastian
Is gelly supposed to be usable from Scala? It looks as it is hardcoded to use the Java API. Best, Sebastian On 23.03.2015 23:15, Robert Metzger wrote: Hi, Gelly is not part of any offical flink release. You have to use a Snapshot version of Flink if you want to try it out. Sent from my

Re: Gelly available already?

2015-03-23 Thread Andra Lungu
For now it just works with the Java API. On Mon, Mar 23, 2015 at 11:42 PM, Sebastian ssc.o...@googlemail.com wrote: Is gelly supposed to be usable from Scala? It looks as it is hardcoded to use the Java API. Best, Sebastian On 23.03.2015 23:15, Robert Metzger wrote: Hi, Gelly is not

Nested Iterations supported in Flink?

2015-04-14 Thread Benoît Hanotte
Hello, I'm implementing an algorithm which requires nested iterations, and, from what I understood, this feature was not yet available in Flink [1], and my experiments with 2 nested bulk iterations seem to confirm that. However I came across a Flink unit test [2] using nested iterations,

Parallelism question

2015-04-14 Thread Giacomo Licari
Hi guys, I have a question about how parallelism works. If I have a large dataset and I would divide it into 5 blocks, can I pass each block of data to a fixed parallel process (for example I set up 5 process) ? And if the results data from each process arrive to the output not in an ordered

CoGgroup Operator Data Sink

2015-04-14 Thread Mustafa Elbehery
Hi all, I wonder if the coGroup operator have the ability to sink two output simultaneously. I am trying to mock it by calling a function inside the operator, in which I sink the first output, and get the second output myself. I am not sure if this is the best way, and I would like to hear your

Re: CoGgroup Operator Data Sink

2015-04-14 Thread Mustafa Elbehery
Thanks for prompt reply. Maybe the expression Sink is not suitable to what I need. What if I want to *Collect* two data sets directly from the coGroup operator. Is there anyway to do so ?!! As I might know, the operator has only Collector Object, but I wonder if there is another feature in Flink

Re: CoGgroup Operator Data Sink

2015-04-14 Thread Robert Metzger
Hi, you can write the output of a coGroup operator to two sinks: --\ /Sink1 \ / (CoGroup) /\ --/ \--Sink2 You can actually write to as many sinks as you want. Note that the data written to Sink1 and Sink2 will be

Re: Parallelism question

2015-04-14 Thread Maximilian Michels
Hi Giacomo, If I understand you correctly, you want your Flink job to execute with a parallelism of 5. Just call setDegreeOfParallelism(5) on your ExecutionEnvironment. That way, all operations, when possible, will be performed using 5 parallel instances. This is also true for the DataSink which

Sorting in a WindowedDataStream

2015-04-14 Thread Niklas Semmler
Hello there, What functions should be used to aggregate (unordered) tuples for every window in a WindowedDataStream to a (ordered) list? Neither foldWindow nor reduceWindow seems to be applicable, and aggregate does not, to my understanding, take user-defined functions. To get started with

Re: Parallelism question

2015-04-14 Thread Giacomo Licari
Hi Max, thank you for your reply. DataSink contains data ordered, I mean, it contains in order output1, output1 ... output5? Or are them mixed? Thanks a lot, Giacomo On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels m...@apache.org wrote: Hi Giacomo, If I understand you correctly, you

Re: Parallelism question

2015-04-14 Thread Maximilian Michels
Hi Giacomo, If you use a FileOutputFormat as a DataSink (e.g. as in env.writeAsText(/path), then the output directory will contain 5 files named 1, 2, 3, 4, and 5, each containing the output of the corresponding task. The order of the data in the files follows the order of the distributed

Re: Flink 0.9.0-milestone1 released

2015-04-13 Thread Henry Saputra
Woot! Great news! On Monday, April 13, 2015, Kostas Tzoumas ktzou...@apache.org wrote: We are very excited to announce Flink 0.9.0-milestone1, a preview release to give users early access to some Flink 0.9.0 features, including: - A Table API for SQL-like queries embedded in Java and Scala

Re: Nested Iterations supported in Flink?

2015-04-14 Thread Till Rohrmann
If your inner iterations happens to work only on the data of a single partition, then you can also implement this iteration as part of a mapPartition operator. The only problem there would be that you have to keep all the partition's data on the heap, if you need access to it. Cheers, Till On

Re: Nested Iterations supported in Flink?

2015-04-14 Thread Stephan Ewen
Hi Benoît! You are right, the nested iterations are currently not supported. The test you found actually checks that the Optimizer gives a good error message when encountering nested iterations. Can you write your program as one iterations (the inner) and start the program multiple times to

Re: Nested Iterations supported in Flink?

2015-04-14 Thread Benoît Hanotte
Thanks for you quick answers! The algorithm is the following: I've got a spatial set of data and I want to find dense regions. The space is beforehand discretized into cells of a fixed size. Then, for each dense cell (1st iteration), starting with the most dense, the algorithm tries to

How to make a generic key for groupBy

2015-04-23 Thread LINZ, Arnaud
Hello, After a quite successful benchmark yesterday (Flink being about twice faster than Spark on my use cases), I’ve turned instantly from spark-fan to flink-fan – great job, committers! So I’ve decided to port my existing Spark tools to Flink. Happily, most of the difficulty was renaming

Re: How to make a generic key for groupBy

2015-04-23 Thread Soumitra Kumar
Will you elaborate on your use case? It would help to find out where Flink shines. IMO, its a great project, but needs more differentiation from Spark. On Thu, Apr 23, 2015 at 7:25 AM, LINZ, Arnaud al...@bouyguestelecom.fr wrote: Hello, After a quite successful benchmark yesterday (Flink

Re: Tuples serialization

2015-04-23 Thread Flavio Pompermaier
I've searched within flink for a working example of TypeSerializerOutputFormat usage but I didn't find anything usable. Cold you show me a simple snippet of code? Do I have to configure BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY? Which size do I have to use? Will flink write a single file or a set

  1   2   3   4   5   6   7   8   9   10   >