Job Statistics

2015-06-17 Thread Jean Bez
Hello,

Is it possible to view job statistics after it finished to execute directly
in the command line? If so, could you please explain how? I could not find
any mentions about this in the docs. I also tried to set the logs to debug
mode, but no other information was presented.

Thank you!

Regards,
Jean


Re: passing variable to filter function

2015-06-17 Thread Stephan Ewen
Looking at the program on Pastebin, there are some things that look not
right. I would be surprised if this program executes at all.

In particular, you are referring to outside distributed data sets inside
the filter function. You are calling collect() in every filter function,
which actually triggers the program execution (every time the filter
function is invoked!)

To make this work, you need to pull the collect call out of the filter
function.

Also, consider using a join, if you want to do an intersection of data sets
(or contains-check). Broadcast variables are also available for filter
functions.

Stephan


On Tue, Jun 16, 2015 at 4:28 PM, Fabian Hueske fhue...@gmail.com wrote:

 Hi,

 which version of Flink are you working with?
 The master (0.9-SNAPSHOT) has a RichFilterFunction [1].

 Best, Fabian

 [1]
 https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/functions/RichFilterFunction.java

 2015-06-16 23:52 GMT+02:00 Vinh June hoangthevinh@gmail.com:

 Hello,

 How do you pass a parameter to a filter function? With Map, Join, I can
 use
 withBroadcastSet to pass to RichMapFunction or RichJoinFunction, but with
 filter, how can I pass it ?

 I would like to pass the variable to be able to use as in line 60 here
 http://pastebin.com/cFZVCLGZ

 Thanks



 --
 View this message in context:
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/passing-variable-to-filter-function-tp1666.html
 Sent from the Apache Flink User Mailing List archive. mailing list
 archive at Nabble.com.





Re: Memory in local setting

2015-06-17 Thread Mihail Vieru

Hi,

I had the same problem and setting the solution set to unmanaged helped:

VertexCentricConfiguration parameters = new VertexCentricConfiguration();
parameters.setSolutionSetUnmanagedMemory(false);

runVertexCentricIteration(..., parameters);

Best,
Mihail

On 17.06.2015 07:01, Sebastian wrote:

Hi,

Flink has memory problems when I run an algorithm from my local IDE on 
a 2GB graph. Is there any way that I can increase the memory given to 
Flink?


Best,
Sebastian

Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 
32 minPartition: 4 maxPartition: 4 number of overflow segments: 151 
bucketSize: 146 Overall memory: 14024704 Partition memory: 4194304
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)




Re: Memory in local setting

2015-06-17 Thread Matthias J. Sax
Hi,

look at slide 35 for more details about memory configuration:
http://www.slideshare.net/robertmetzger1/apache-flink-hands-on

-Matthias


On 06/17/2015 09:29 AM, Chiwan Park wrote:
 Hi.
 
 You can increase the memory given to Flink by increasing JVM Heap memory in 
 local.
 If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run 
 configuration. [1].
 Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using 
 the same way. [2]
 
 [1] 
 http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm
 [2] 
 https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html
 
 Regards,
 Chiwan Park
 
 On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote:

 Hi,

 Flink has memory problems when I run an algorithm from my local IDE on a 2GB 
 graph. Is there any way that I can increase the memory given to Flink?

 Best,
 Sebastian

 Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 
 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 
 146 Overall memory: 14024704 Partition memory: 4194304
  at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784)
  at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668)
  at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538)
  at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
  at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
  at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
  at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
  at java.lang.Thread.run(Thread.java:745)
 
 
 
 
 
 
 



signature.asc
Description: OpenPGP digital signature


Re: Flink Streaming State Management

2015-06-17 Thread Gyula Fóra
Hey Hilmi,

Flink currently supports user defined state through the Checkpointed
interface. Using this interface the user can define what state should the
system be aware of when doing snapshots for fault tolerance. The state
returned in the snapshotState method will be checkpointed (and restored
upon failure).

Currently there is no explicit update function so the user need to
implement his state updates as part of the operator logic (inside the map
function for instance). After the 0.9.0 we will change the state interfaces
to have explicit update functions, you can preview some of the
functionality in the PR Matthias referenced, or here is a document for a
more detailed discussion of this feature:
https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk/edit?usp=sharing

Cheers,
Gyula

Hilmi Yildirim hilmi.yildi...@neofonie.de ezt írta (időpont: 2015. jún.
17., Sze, 9:36):

 Hi,
 does Flink Streaming support state management? For example, I have a
 state which will be used inside the streaming operations but the state
 can be updated.

 For example:
 stream.map( use state for operation).updateState(update state).


 Best Regards,
 Hilmi




 --
 --
 Hilmi Yildirim
 Software Developer RD


 http://www.neofonie.de

 Besuchen Sie den Neo Tech Blog für Anwender:
 http://blog.neofonie.de/

 Folgen Sie uns:
 https://plus.google.com/+neofonie
 http://www.linkedin.com/company/neofonie-gmbh
 https://www.xing.com/companies/neofoniegmbh

 Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin
 Handelsregister Berlin-Charlottenburg: HRB 67460
 Geschäftsführung: Thomas Kitlitschko




Re: Flink Streaming State Management

2015-06-17 Thread Matthias J. Sax
Hi Hilmi,

currently, this is not supported. However, state management is already
work in progress and should be available soon. See
https://github.com/apache/flink/pull/747

-Matthias

On 06/17/2015 09:36 AM, Hilmi Yildirim wrote:
 Hi,
 does Flink Streaming support state management? For example, I have a
 state which will be used inside the streaming operations but the state
 can be updated.
 
 For example:
 stream.map( use state for operation).updateState(update state).
 
 
 Best Regards,
 Hilmi
 
 
 
 



signature.asc
Description: OpenPGP digital signature


Re: Memory in local setting

2015-06-17 Thread Sebastian

Hi Ufuk,

Can I configure this when running locally in the IDE or do I have to 
install Flink for that?


Best,
Sebastian

On 17.06.2015 09:34, Ufuk Celebi wrote:

Hey Sebastian,

with taskmanager.memory.fraction you can give more memory to the Flink 
runtime. Current default is to give 70% to Flink and leave 30% for the user code.

taskmanager.memory.fraction: 0.9

will increase this to 90%.


Does this help?


[1] http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html

On 17 Jun 2015, at 09:29, Chiwan Park chiwanp...@icloud.com wrote:


Hi.

You can increase the memory given to Flink by increasing JVM Heap memory in 
local.
If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run 
configuration. [1].
Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using 
the same way. [2]

[1] 
http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm
[2] 
https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html

Regards,
Chiwan Park


On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote:

Hi,

Flink has memory problems when I run an algorithm from my local IDE on a 2GB 
graph. Is there any way that I can increase the memory given to Flink?

Best,
Sebastian

Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 
minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 
146 Overall memory: 14024704 Partition memory: 4194304
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538)
at 
org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)


Re: Memory in local setting

2015-06-17 Thread Ufuk Celebi

On 17 Jun 2015, at 09:35, Mihail Vieru vi...@informatik.hu-berlin.de wrote:

 Hi,
 
 I had the same problem and setting the solution set to unmanaged helped:
 
 VertexCentricConfiguration parameters = new VertexCentricConfiguration();
 parameters.setSolutionSetUnmanagedMemory(false);
 
 runVertexCentricIteration(..., parameters);

That's indeed a very good point, Mihal. Thanks for the pointer. The compacting 
hash table used in iterations cannot spill at the moment.

Re: sorting groups

2015-06-17 Thread Michele Bertoni
Hi Fabian,
My dataset is of this type
RegionType (Long, String, Long, Long, Char, Array[GValue])
Where GValue is a case class implemented by
GString(v:String)
GDouble(v:Double)

I have two case of sorting:
In the first (topk) i have to group by the first field of the regions and sort 
by a set of fields of the GValue array

In the second (topg) i have to sort by the first field of the regions and by a 
set of fields of the array, then sort by one field of the array

For grouping i am using the groupby function with a function as parameter that 
creates the hash of the desired fields, something like
ds.groupby((r:RegionType) =
  s = new stringBuilder
  s.append(r._1)
  grouping.init.foreach((index:int) =
s.append(#)
s.append(r._6(index))
  )
  Md5.hash(s.toString)
)

Then i sort it using (in the topg case, the second)
.sortGroup(((r:RegionType)=
  r._6(grouping.last ) /*here i am doing some cast, i am writing from my 
smartphone i don't remember all the details sorry*/ ),Order.ASCENDING)

in the first case instead i group only by r._1 and i have a recursive function 
that appends sortgroup operator to the grouoed dataset

Is there a way to solve this?

I think i don't understand what a keySelector is


Thanks!
Michele

Da: Fabian Hueske fhue...@gmail.com
Inviato: martedì 16 giugno 2015 23.43.03
A: user@flink.apache.org
Oggetto: Re: sorting groups

Hi,

the error is related to the way you specify the grouping and the sorting key.
The API is currently restricted in the way, that you can only use a key 
selector function for the sorting key if you also used a selector function for 
the grouping key.

In Scala the use of key selector functions is often not very obvious.

If you post the groupBy().sortGroup() command and the input type, I can help 
you getting it right.

Cheers, Fabian

2015-06-16 23:37 GMT+02:00 Michele Bertoni 
michele1.bert...@mail.polimi.itmailto:michele1.bert...@mail.polimi.it:
Hi everybody,
I am trying to sorting a grouped dataset, but i am getting this error:

Exception in thread main org.apache.flink.api.common.InvalidProgramException: 
Sorting on KeySelector keys only works with KeySelector grouping.
at 
org.apache.flink.api.scala.GroupedDataSet.sortGroup(GroupedDataSet.scala:113)
at 
it.polimi.genomics.flink.FlinkImplementation.regionOperation.OrderRD$.sort(OrderRD.scala:82)
...

can anybody help me understanding the error?
i have no idea what it means and google is not helpful in this case


thanks!
cheers
Michele



Flink Streaming State Management

2015-06-17 Thread Hilmi Yildirim

Hi,
does Flink Streaming support state management? For example, I have a 
state which will be used inside the streaming operations but the state 
can be updated.


For example:
stream.map( use state for operation).updateState(update state).


Best Regards,
Hilmi




--
--
Hilmi Yildirim
Software Developer RD


http://www.neofonie.de

Besuchen Sie den Neo Tech Blog für Anwender:
http://blog.neofonie.de/

Folgen Sie uns:
https://plus.google.com/+neofonie
http://www.linkedin.com/company/neofonie-gmbh
https://www.xing.com/companies/neofoniegmbh

Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin
Handelsregister Berlin-Charlottenburg: HRB 67460
Geschäftsführung: Thomas Kitlitschko



Re: Flink Streaming State Management

2015-06-17 Thread Hilmi Yildirim

Hi Matthias,
great! Thank you.

Best Regards,
Hilmi

Am 17.06.2015 um 09:38 schrieb Matthias J. Sax:

Hi Hilmi,

currently, this is not supported. However, state management is already
work in progress and should be available soon. See
https://github.com/apache/flink/pull/747

-Matthias

On 06/17/2015 09:36 AM, Hilmi Yildirim wrote:

Hi,
does Flink Streaming support state management? For example, I have a
state which will be used inside the streaming operations but the state
can be updated.

For example:
stream.map( use state for operation).updateState(update state).


Best Regards,
Hilmi






Re: Flink Streaming State Management

2015-06-17 Thread Hilmi Yildirim

Hi Gyula,
thank you.

Best Regards,
Hilmi

Am 17.06.2015 um 09:44 schrieb Gyula Fóra:

Hey Hilmi,

Flink currently supports user defined state through the Checkpointed 
interface. Using this interface the user can define what state should 
the system be aware of when doing snapshots for fault tolerance. The 
state returned in the snapshotState method will be checkpointed (and 
restored upon failure).


Currently there is no explicit update function so the user need to 
implement his state updates as part of the operator logic (inside the 
map function for instance). After the 0.9.0 we will change the state 
interfaces to have explicit update functions, you can preview some of 
the functionality in the PR Matthias referenced, or here is a document 
for a more detailed discussion of this feature:

https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk/edit?usp=sharing

Cheers,
Gyula

Hilmi Yildirim hilmi.yildi...@neofonie.de 
mailto:hilmi.yildi...@neofonie.de ezt írta (időpont: 2015. jún. 
17., Sze, 9:36):


Hi,
does Flink Streaming support state management? For example, I have a
state which will be used inside the streaming operations but the state
can be updated.

For example:
stream.map( use state for operation).updateState(update state).


Best Regards,
Hilmi




--
--
Hilmi Yildirim
Software Developer RD


http://www.neofonie.de

Besuchen Sie den Neo Tech Blog für Anwender:
http://blog.neofonie.de/

Folgen Sie uns:
https://plus.google.com/+neofonie
http://www.linkedin.com/company/neofonie-gmbh
https://www.xing.com/companies/neofoniegmbh

Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin
Handelsregister Berlin-Charlottenburg: HRB 67460
Geschäftsführung: Thomas Kitlitschko



Re: Memory in local setting

2015-06-17 Thread Ufuk Celebi

On 17 Jun 2015, at 10:10, Sebastian ssc.o...@googlemail.com wrote:

 Hi Ufuk,
 
 Can I configure this when running locally in the IDE or do I have to install 
 Flink for that?

Yes.

org.apache.flink.configuration.Configuration conf = new Configuration();
conf.setDouble(ConfigConstants.TASK_MANAGER_MEMORY_FRACTION_KEY, 0.7);

LocalEnvironment env = LocalEnvironment.createLocalEnvironment(conf);


You can check the size of size of Flink's managed memory in the logs of the 
task manager:

11:56:28,061 INFO  org.apache.flink.runtime.taskmanager.TaskManager 
 - Using 1227 MB for Flink managed memory.

Re: Memory in local setting

2015-06-17 Thread Chiwan Park
Hi.

You can increase the memory given to Flink by increasing JVM Heap memory in 
local.
If you are using Eclipse as IDE, add “-XmxHEAPSIZE” option in run 
configuration. [1].
Although you are using IntelliJ IDEA as IDE, you can increase JVM Heap using 
the same way. [2]

[1] 
http://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftasks-java-local-configuration.htm
[2] 
https://www.jetbrains.com/idea/help/creating-and-editing-run-debug-configurations.html

Regards,
Chiwan Park

 On Jun 17, 2015, at 2:01 PM, Sebastian s...@apache.org wrote:
 
 Hi,
 
 Flink has memory problems when I run an algorithm from my local IDE on a 2GB 
 graph. Is there any way that I can increase the memory given to Flink?
 
 Best,
 Sebastian
 
 Caused by: java.lang.RuntimeException: Memory ran out. numPartitions: 32 
 minPartition: 4 maxPartition: 4 number of overflow segments: 151 bucketSize: 
 146 Overall memory: 14024704 Partition memory: 4194304
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:784)
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromSearch(CompactingHashTable.java:668)
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:538)
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
   at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
   at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)