Job Statistics
Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: passing variable to filter function
Looking at the program on Pastebin, there are some things that look not right. I would be surprised if this program executes at all. In particular, you are referring to outside distributed data sets inside the filter function. You are calling collect() in every filter function, which actually triggers the program execution (every time the filter function is invoked!) To make this work, you need to pull the collect call out of the filter function. Also, consider using a join, if you want to do an intersection of data sets (or contains-check). Broadcast variables are also available for filter functions. Stephan On Tue, Jun 16, 2015 at 4:28 PM, Fabian Hueske fhue...@gmail.com wrote: Hi, which version of Flink are you working with? The master (0.9-SNAPSHOT) has a RichFilterFunction [1]. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/functions/RichFilterFunction.java 2015-06-16 23:52 GMT+02:00 Vinh June hoangthevinh@gmail.com: Hello, How do you pass a parameter to a filter function? With Map, Join, I can use withBroadcastSet to pass to RichMapFunction or RichJoinFunction, but with filter, how can I pass it ? I would like to pass the variable to be able to use as in line 60 here http://pastebin.com/cFZVCLGZ Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/passing-variable-to-filter-function-tp1666.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: Memory in local setting
Hi, I had the same problem and setting the solution set to unmanaged helped: VertexCentricConfiguration parameters = new VertexCentricConfiguration(); parameters.setSolutionSetUnmanagedMemory(false); runVertexCentricIteration(..., parameters); Best, Mihail On 17.06.2015 07:01, Sebastian wrote: Hi, Flink has memory problems when I run an algorithm from my local IDE on a 2GB graph. Is there any way that I can increase the memory given to Flink? Best, Sebastian Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 146 Overall memory: 14024704 Partition memory: 4194304 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745)
Re: Memory in local setting
Hi, look at slide 35 for more details about memory configuration: http://www.slideshare.net/robertmetzger1/apache-flink-hands-on -Matthias On 06/17/2015 09:29 AM, Chiwan Park wrote: Hi. You can increase the memory given to Flink by increasing JVM Heap memory in local. If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run configuration. [1]. Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using the same way. [2] [1] http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm [2] https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html Regards, Chiwan Park On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote: Hi, Flink has memory problems when I run an algorithm from my local IDE on a 2GB graph. Is there any way that I can increase the memory given to Flink? Best, Sebastian Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 146 Overall memory: 14024704 Partition memory: 4194304 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) signature.asc Description: OpenPGP digital signature
Re: Flink Streaming State Management
Hey Hilmi, Flink currently supports user defined state through the Checkpointed interface. Using this interface the user can define what state should the system be aware of when doing snapshots for fault tolerance. The state returned in the snapshotState method will be checkpointed (and restored upon failure). Currently there is no explicit update function so the user need to implement his state updates as part of the operator logic (inside the map function for instance). After the 0.9.0 we will change the state interfaces to have explicit update functions, you can preview some of the functionality in the PR Matthias referenced, or here is a document for a more detailed discussion of this feature: https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk/edit?usp=sharing Cheers, Gyula Hilmi Yildirim hilmi.yildi...@neofonie.de ezt írta (időpont: 2015. jún. 17., Sze, 9:36): Hi, does Flink Streaming support state management? For example, I have a state which will be used inside the streaming operations but the state can be updated. For example: stream.map( use state for operation).updateState(update state). Best Regards, Hilmi -- -- Hilmi Yildirim Software Developer RD http://www.neofonie.de Besuchen Sie den Neo Tech Blog für Anwender: http://blog.neofonie.de/ Folgen Sie uns: https://plus.google.com/+neofonie http://www.linkedin.com/company/neofonie-gmbh https://www.xing.com/companies/neofoniegmbh Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin Handelsregister Berlin-Charlottenburg: HRB 67460 Geschäftsführung: Thomas Kitlitschko
Re: Flink Streaming State Management
Hi Hilmi, currently, this is not supported. However, state management is already work in progress and should be available soon. See https://github.com/apache/flink/pull/747 -Matthias On 06/17/2015 09:36 AM, Hilmi Yildirim wrote: Hi, does Flink Streaming support state management? For example, I have a state which will be used inside the streaming operations but the state can be updated. For example: stream.map( use state for operation).updateState(update state). Best Regards, Hilmi signature.asc Description: OpenPGP digital signature
Re: Memory in local setting
Hi Ufuk, Can I configure this when running locally in the IDE or do I have to install Flink for that? Best, Sebastian On 17.06.2015 09:34, Ufuk Celebi wrote: Hey Sebastian, with taskmanager.memory.fraction you can give more memory to the Flink runtime. Current default is to give 70% to Flink and leave 30% for the user code. taskmanager.memory.fraction: 0.9 will increase this to 90%. Does this help? [1] http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html On 17 Jun 2015, at 09:29, Chiwan Park chiwanp...@icloud.com wrote: Hi. You can increase the memory given to Flink by increasing JVM Heap memory in local. If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run configuration. [1]. Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using the same way. [2] [1] http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm [2] https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html Regards, Chiwan Park On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote: Hi, Flink has memory problems when I run an algorithm from my local IDE on a 2GB graph. Is there any way that I can increase the memory given to Flink? Best, Sebastian Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 146 Overall memory: 14024704 Partition memory: 4194304 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745)
Re: Memory in local setting
On 17 Jun 2015, at 09:35, Mihail Vieru vi...@informatik.hu-berlin.de wrote: Hi, I had the same problem and setting the solution set to unmanaged helped: VertexCentricConfiguration parameters = new VertexCentricConfiguration(); parameters.setSolutionSetUnmanagedMemory(false); runVertexCentricIteration(..., parameters); That's indeed a very good point, Mihal. Thanks for the pointer. The compacting hash table used in iterations cannot spill at the moment.
Re: sorting groups
Hi Fabian, My dataset is of this type RegionType (Long, String, Long, Long, Char, Array[GValue]) Where GValue is a case class implemented by GString(v:String) GDouble(v:Double) I have two case of sorting: In the first (topk) i have to group by the first field of the regions and sort by a set of fields of the GValue array In the second (topg) i have to sort by the first field of the regions and by a set of fields of the array, then sort by one field of the array For grouping i am using the groupby function with a function as parameter that creates the hash of the desired fields, something like ds.groupby((r:RegionType) = s = new stringBuilder s.append(r._1) grouping.init.foreach((index:int) = s.append(#) s.append(r._6(index)) ) Md5.hash(s.toString) ) Then i sort it using (in the topg case, the second) .sortGroup(((r:RegionType)= r._6(grouping.last ) /*here i am doing some cast, i am writing from my smartphone i don't remember all the details sorry*/ ),Order.ASCENDING) in the first case instead i group only by r._1 and i have a recursive function that appends sortgroup operator to the grouoed dataset Is there a way to solve this? I think i don't understand what a keySelector is Thanks! Michele Da: Fabian Hueske fhue...@gmail.com Inviato: martedì 16 giugno 2015 23.43.03 A: user@flink.apache.org Oggetto: Re: sorting groups Hi, the error is related to the way you specify the grouping and the sorting key. The API is currently restricted in the way, that you can only use a key selector function for the sorting key if you also used a selector function for the grouping key. In Scala the use of key selector functions is often not very obvious. If you post the groupBy().sortGroup() command and the input type, I can help you getting it right. Cheers, Fabian 2015-06-16 23:37 GMT+02:00 Michele Bertoni michele1.bert...@mail.polimi.itmailto:michele1.bert...@mail.polimi.it: Hi everybody, I am trying to sorting a grouped dataset, but i am getting this error: Exception in thread main org.apache.flink.api.common.InvalidProgramException: Sorting on KeySelector keys only works with KeySelector grouping. at org.apache.flink.api.scala.GroupedDataSet.sortGroup(GroupedDataSet.scala:113) at it.polimi.genomics.flink.FlinkImplementation.regionOperation.OrderRD$.sort(OrderRD.scala:82) ... can anybody help me understanding the error? i have no idea what it means and google is not helpful in this case thanks! cheers Michele
Flink Streaming State Management
Hi, does Flink Streaming support state management? For example, I have a state which will be used inside the streaming operations but the state can be updated. For example: stream.map( use state for operation).updateState(update state). Best Regards, Hilmi -- -- Hilmi Yildirim Software Developer RD http://www.neofonie.de Besuchen Sie den Neo Tech Blog für Anwender: http://blog.neofonie.de/ Folgen Sie uns: https://plus.google.com/+neofonie http://www.linkedin.com/company/neofonie-gmbh https://www.xing.com/companies/neofoniegmbh Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin Handelsregister Berlin-Charlottenburg: HRB 67460 Geschäftsführung: Thomas Kitlitschko
Re: Flink Streaming State Management
Hi Matthias, great! Thank you. Best Regards, Hilmi Am 17.06.2015 um 09:38 schrieb Matthias J. Sax: Hi Hilmi, currently, this is not supported. However, state management is already work in progress and should be available soon. See https://github.com/apache/flink/pull/747 -Matthias On 06/17/2015 09:36 AM, Hilmi Yildirim wrote: Hi, does Flink Streaming support state management? For example, I have a state which will be used inside the streaming operations but the state can be updated. For example: stream.map( use state for operation).updateState(update state). Best Regards, Hilmi
Re: Flink Streaming State Management
Hi Gyula, thank you. Best Regards, Hilmi Am 17.06.2015 um 09:44 schrieb Gyula Fóra: Hey Hilmi, Flink currently supports user defined state through the Checkpointed interface. Using this interface the user can define what state should the system be aware of when doing snapshots for fault tolerance. The state returned in the snapshotState method will be checkpointed (and restored upon failure). Currently there is no explicit update function so the user need to implement his state updates as part of the operator logic (inside the map function for instance). After the 0.9.0 we will change the state interfaces to have explicit update functions, you can preview some of the functionality in the PR Matthias referenced, or here is a document for a more detailed discussion of this feature: https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk/edit?usp=sharing Cheers, Gyula Hilmi Yildirim hilmi.yildi...@neofonie.de mailto:hilmi.yildi...@neofonie.de ezt írta (időpont: 2015. jún. 17., Sze, 9:36): Hi, does Flink Streaming support state management? For example, I have a state which will be used inside the streaming operations but the state can be updated. For example: stream.map( use state for operation).updateState(update state). Best Regards, Hilmi -- -- Hilmi Yildirim Software Developer RD http://www.neofonie.de Besuchen Sie den Neo Tech Blog für Anwender: http://blog.neofonie.de/ Folgen Sie uns: https://plus.google.com/+neofonie http://www.linkedin.com/company/neofonie-gmbh https://www.xing.com/companies/neofoniegmbh Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin Handelsregister Berlin-Charlottenburg: HRB 67460 Geschäftsführung: Thomas Kitlitschko
Re: Memory in local setting
On 17 Jun 2015, at 10:10, Sebastian ssc.o...@googlemail.com wrote: Hi Ufuk, Can I configure this when running locally in the IDE or do I have to install Flink for that? Yes. org.apache.flink.configuration.Configuration conf = new Configuration(); conf.setDouble(ConfigConstants.TASK_MANAGER_MEMORY_FRACTION_KEY, 0.7); LocalEnvironment env = LocalEnvironment.createLocalEnvironment(conf); You can check the size of size of Flink's managed memory in the logs of the task manager: 11:56:28,061 INFO org.apache.flink.runtime.taskmanager.TaskManager - Using 1227 MB for Flink managed memory.
Re: Memory in local setting
Hi. You can increase the memory given to Flink by increasing JVM Heap memory in local. If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run configuration. [1]. Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using the same way. [2] [1] http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm [2] https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html Regards, Chiwan Park On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote: Hi, Flink has memory problems when I run an algorithm from my local IDE on a 2GB graph. Is there any way that I can increase the memory given to Flink? Best, Sebastian Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 146 Overall memory: 14024704 Partition memory: 4194304 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745)