Re: Problem with Windowing
Can you post your whole program (both versions if possible)? Otherwise I have only a wild guess: A common mistake is not to assign the stream variable properly: DataStream ds = ... ds = ds.APPLY_FUNCTIONS ds.APPLY_MORE_FUNCTIONS In your code example, the assignment is missing -- but maybe it just missing in your email. -Matthias On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote: > Hi! > > I have a problem that I cannot really track down. I'll try to describe > the issue. > > My streaming flink program computes something. At the end I'm doing the > follwing on my DataStream ds > ds.window(2, TimeUnit.SECONDS) > .groupBy(/*custom KeySelector converting input to a String > representation*/) > .mapWindow(/*TypeConversion*/) > .flatten() > > Then the result is written to a Kafka topic. > > The purpose of this is output deduplication within a 2 seconds window... > > Without the above the program works fine. But with the above I don't get > any output and no error appears in the log. The program keeps running. > Am I doing something wrong? > > I would be happy for help! > > Cheers, Rico. signature.asc Description: OpenPGP digital signature
Re: Problem with Windowing
The part is exactly as I wrote. ds is assigned a data flow that computes some stuff. Then the de duplication code as written in my first mail us assigned to a new variable called output. Then output.addSink(.) is called. > Am 31.08.2015 um 17:45 schrieb Matthias J. Sax >: > > Can you post your whole program (both versions if possible)? > > Otherwise I have only a wild guess: A common mistake is not to assign > the stream variable properly: > > DataStream ds = ... > > ds = ds.APPLY_FUNCTIONS > > ds.APPLY_MORE_FUNCTIONS > > In your code example, the assignment is missing -- but maybe it just > missing in your email. > > -Matthias > > >> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote: >> Hi! >> >> I have a problem that I cannot really track down. I'll try to describe >> the issue. >> >> My streaming flink program computes something. At the end I'm doing the >> follwing on my DataStream ds >> ds.window(2, TimeUnit.SECONDS) >> .groupBy(/*custom KeySelector converting input to a String >> representation*/) >> .mapWindow(/*TypeConversion*/) >> .flatten() >> >> Then the result is written to a Kafka topic. >> >> The purpose of this is output deduplication within a 2 seconds window... >> >> Without the above the program works fine. But with the above I don't get >> any output and no error appears in the log. The program keeps running. >> Am I doing something wrong? >> >> I would be happy for help! >> >> Cheers, Rico. >
Re: Best way for simple logging in jobs?
@Arnaud Are you looking for a separate user log file next to the system log file, or would Robert's suggestion work? On Fri, Aug 28, 2015 at 4:20 PM, Robert Metzgerwrote: > Hi, > > Creating a slf4j logger like this: > > private static final Logger LOG = > LoggerFactory.getLogger(PimpedKafkaSink.class); > > Works for me. The messages also end up in the regular YARN logs. Also > system out should end up in YARN actually (when retrieving the logs from > the log aggregation). > > Regards, > > Robert > > > On Fri, Aug 28, 2015 at 3:55 PM, LINZ, Arnaud > wrote: > >> Hi, >> >> >> >> I am wondering if it’s possible to get my own logs inside the job >> functions (sources, mappers, sinks…). It would be nice if I could get >> those logs in the Yarn’s logs, but writing System.out/System.err has no >> effect. >> >> >> >> For now I’m using a “StringBuffer” accumulator but it does not work in >> streaming apps before v0.10, and only show results at the end. >> >> >> >> I’ll probably end up using a HDFS logging system but there is maybe a >> smarter way ? >> >> >> >> Greetings, >> >> Arnaud >> >> >> >> -- >> >> L'intégrité de ce message n'étant pas assurée sur internet, la société >> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces >> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si >> vous n'êtes pas destinataire de ce message, merci de le détruire et >> d'avertir l'expéditeur. >> >> The integrity of this message cannot be guaranteed on the Internet. The >> company that sent this message cannot therefore be held liable for its >> content nor attachments. Any unauthorized use or dissemination is >> prohibited. If you are not the intended recipient of this message, then >> please delete it and notify the sender. >> > >
Re: Custom Class for state checkpointing
We've finally merged the fix for the bug you've reported here ( https://issues.apache.org/jira/browse/FLINK-2543). You should now be able to use the file-based state handle with user classes as well. Please let us know if you encounter more issues. On Wed, Aug 19, 2015 at 10:20 AM, Rico Bergmannwrote: > Hi. > > Thanks for the tip. It seems to work... > > Greets. > > > > Am 18.08.2015 um 13:56 schrieb Stephan Ewen : > > Yep, that is a valid bug! > State is apparently not resolved with the correct classloader. > > As a workaround, you can checkpoint byte arrays and serialize/deserialize > the state into byte arrays yourself. You can use the apache commons > SerializationUtil class, or Flinks InstantiationUtil class for that. > > You can get the ClassLoader for the user code (needed for deserialization) > via "getRuntimeContext().getUserCodeClassLoader()". > > Let us know if that workaround works. We'll try to get a fix for that out > very soon! > > Greetings, > Stephan > > > > On Tue, Aug 18, 2015 at 12:23 PM, Robert Metzger > wrote: > >> Java's HashMap is serializable. >> If it is only the map, you can just use the HashMap<> as the state. >> >> If you have more data, you can use TupleX, for example: >> >> Tuple2 , Long>(myMap, myLong); >> >> >> On Tue, Aug 18, 2015 at 12:21 PM, Rico Bergmann >> wrote: >> >>> Hi! >>> >>> Using TupleX is not possible since the state is very big (a Hashtable). >>> >>> How would I have to do serialization into a byte array? >>> >>> Greets. Rico. >>> >>> >>> >>> Am 18.08.2015 um 11:44 schrieb Robert Metzger : >>> >>> Hi Rico, >>> >>> I'm pretty sure that this is a valid bug you've found, since this case >>> is not yet tested (afaik). >>> We'll fix the issue asap, until then, are you able to encapsulate your >>> state in something that is available in Flink, for example a TupleX or just >>> serialize it yourself into a byte[] ? >>> >>> On Tue, Aug 18, 2015 at 11:32 AM, Rico Bergmann >>> wrote: >>> Hi! Is it possible to use your own class? I'm using the file state handler at the Jobmanager and implemented the Checkpointed interface. I tried this and got an exception: Error: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state. at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.ottogroup.bi.searchlab.searchsessionizer.OperatorState at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83) at org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276) at org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541) >>> >> >