Re: Problem with Windowing

2015-08-31 Thread Matthias J. Sax
Can you post your whole program (both versions if possible)?

Otherwise I have only a wild guess: A common mistake is not to assign
the stream variable properly:

DataStream ds = ...

ds = ds.APPLY_FUNCTIONS

ds.APPLY_MORE_FUNCTIONS

In your code example, the assignment is missing -- but maybe it just
missing in your email.

-Matthias


On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
> Hi!
> 
> I have a problem that I cannot really track down. I'll try to describe
> the issue.
> 
> My streaming flink program computes something. At the end I'm doing the
> follwing on my DataStream ds
> ds.window(2, TimeUnit.SECONDS)
> .groupBy(/*custom KeySelector converting input to a String
> representation*/)
> .mapWindow(/*TypeConversion*/)
> .flatten()
> 
> Then the result is written to a Kafka topic.
> 
> The purpose of this is output deduplication within a 2 seconds window...
> 
> Without the above the program works fine. But with the above I don't get
> any output and no error appears in the log. The program keeps running.
> Am I doing something wrong?
> 
> I would be happy for help!
> 
> Cheers, Rico.



signature.asc
Description: OpenPGP digital signature


Re: Problem with Windowing

2015-08-31 Thread Rico Bergmann
The part is exactly as I wrote. ds is assigned a data flow that computes some 
stuff. Then the de duplication code as written in my first mail us assigned to 
a new variable called output. Then output.addSink(.) is called. 


> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax 
> :
> 
> Can you post your whole program (both versions if possible)?
> 
> Otherwise I have only a wild guess: A common mistake is not to assign
> the stream variable properly:
> 
> DataStream ds = ...
> 
> ds = ds.APPLY_FUNCTIONS
> 
> ds.APPLY_MORE_FUNCTIONS
> 
> In your code example, the assignment is missing -- but maybe it just
> missing in your email.
> 
> -Matthias
> 
> 
>> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
>> Hi!
>> 
>> I have a problem that I cannot really track down. I'll try to describe
>> the issue.
>> 
>> My streaming flink program computes something. At the end I'm doing the
>> follwing on my DataStream ds
>> ds.window(2, TimeUnit.SECONDS)
>> .groupBy(/*custom KeySelector converting input to a String
>> representation*/)
>> .mapWindow(/*TypeConversion*/)
>> .flatten()
>> 
>> Then the result is written to a Kafka topic.
>> 
>> The purpose of this is output deduplication within a 2 seconds window...
>> 
>> Without the above the program works fine. But with the above I don't get
>> any output and no error appears in the log. The program keeps running.
>> Am I doing something wrong?
>> 
>> I would be happy for help!
>> 
>> Cheers, Rico.
> 


Re: Best way for simple logging in jobs?

2015-08-31 Thread Stephan Ewen
@Arnaud

Are you looking for a separate user log file next to the system log file,
or would Robert's suggestion work?

On Fri, Aug 28, 2015 at 4:20 PM, Robert Metzger  wrote:

> Hi,
>
> Creating a slf4j logger like this:
>
> private static final Logger LOG = 
> LoggerFactory.getLogger(PimpedKafkaSink.class);
>
> Works for me. The messages also end up in the regular YARN logs. Also
> system out should end up in YARN actually (when retrieving the logs from
> the log aggregation).
>
> Regards,
>
> Robert
>
>
> On Fri, Aug 28, 2015 at 3:55 PM, LINZ, Arnaud 
> wrote:
>
>> Hi,
>>
>>
>>
>> I am wondering if it’s possible to get my own logs inside the job
>> functions (sources, mappers, sinks…).  It would be nice if I could get
>> those logs in the Yarn’s logs, but writing System.out/System.err has no
>> effect.
>>
>>
>>
>> For now I’m using a “StringBuffer” accumulator but it does not work in
>> streaming apps before v0.10, and only show results at the end.
>>
>>
>>
>> I’ll probably end up using a HDFS logging system but there is maybe a
>> smarter way ?
>>
>>
>>
>> Greetings,
>>
>> Arnaud
>>
>>
>>
>> --
>>
>> L'intégrité de ce message n'étant pas assurée sur internet, la société
>> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
>> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
>> vous n'êtes pas destinataire de ce message, merci de le détruire et
>> d'avertir l'expéditeur.
>>
>> The integrity of this message cannot be guaranteed on the Internet. The
>> company that sent this message cannot therefore be held liable for its
>> content nor attachments. Any unauthorized use or dissemination is
>> prohibited. If you are not the intended recipient of this message, then
>> please delete it and notify the sender.
>>
>
>


Re: Custom Class for state checkpointing

2015-08-31 Thread Robert Metzger
We've finally merged the fix for the bug you've reported here (
https://issues.apache.org/jira/browse/FLINK-2543).
You should now be able to use the file-based state handle with user classes
as well.

Please let us know if you encounter more issues.

On Wed, Aug 19, 2015 at 10:20 AM, Rico Bergmann 
wrote:

> Hi.
>
> Thanks for the tip. It seems to work...
>
> Greets.
>
>
>
> Am 18.08.2015 um 13:56 schrieb Stephan Ewen :
>
> Yep, that is a valid bug!
> State is apparently not resolved with the correct classloader.
>
> As a workaround, you can checkpoint byte arrays and serialize/deserialize
> the state into byte arrays yourself. You can use the apache commons
> SerializationUtil class, or Flinks InstantiationUtil class for that.
>
> You can get the ClassLoader for the user code (needed for deserialization)
> via "getRuntimeContext().getUserCodeClassLoader()".
>
> Let us know if that workaround works. We'll try to get a fix for that out
> very soon!
>
> Greetings,
> Stephan
>
>
>
> On Tue, Aug 18, 2015 at 12:23 PM, Robert Metzger 
> wrote:
>
>> Java's HashMap is serializable.
>> If it is only the map, you can just use the HashMap<> as the state.
>>
>> If you have more data, you can use TupleX, for example:
>>
>> Tuple2, Long>(myMap, myLong);
>>
>>
>> On Tue, Aug 18, 2015 at 12:21 PM, Rico Bergmann 
>> wrote:
>>
>>> Hi!
>>>
>>> Using TupleX is not possible since the state is very big (a Hashtable).
>>>
>>> How would I have to do serialization into a byte array?
>>>
>>> Greets. Rico.
>>>
>>>
>>>
>>> Am 18.08.2015 um 11:44 schrieb Robert Metzger :
>>>
>>> Hi Rico,
>>>
>>> I'm pretty sure that this is a valid bug you've found, since this case
>>> is not yet tested (afaik).
>>> We'll fix the issue asap, until then, are you able to encapsulate your
>>> state in something that is available in Flink, for example a TupleX or just
>>> serialize it yourself into a byte[] ?
>>>
>>> On Tue, Aug 18, 2015 at 11:32 AM, Rico Bergmann 
>>> wrote:
>>>
 Hi!
 Is it possible to use your own class?
 I'm using the file state handler at the Jobmanager and implemented the
 Checkpointed interface.

 I tried this and got an exception:

 Error: java.lang.RuntimeException: Failed to deserialize state handle
 and setup initial operator state.
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException:
 com.ottogroup.bi.searchlab.searchsessionizer.OperatorState

 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:348)
 at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63)
 at
 org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33)
 at
 org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83)
 at
 org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276)
 at
 org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541)


>>>
>>
>