subject:"FlinkML"

Error when using FlinkML iterations with KeyedCoProcessFunction

2024-03-27 Thread Komal M

Hi,

As the DataStream API's iterativeStream method has been deprecated for future 
flink releases, the documentation recommend’s using Flink ML's iteration as an 
alternative. I am trying to build my understanding of the new iterations API as 
it will be a requirement for our future projects.

As an exercise, I’m trying to implement a KeyedRichCoProcessFunction inside the 
iteration body that takes the feedback Stream and non-feedbackstream as inputs 
but get the following error. Do you know what could be causing it? For 
reference, I do not get any error when applying  .keyBy().flatMap() function on 
the streams individually inside the iteration body.

Exception in thread "main" 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
….
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
…
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
... 5 more
Caused by: java.lang.ClassCastException: class 
org.apache.flink.iteration.IterationRecord cannot be cast to class 
org.apache.flink.api.java.tuple.Tuple 
(org.apache.flink.iteration.IterationRecord and 
org.apache.flink.api.java.tuple.Tuple are in unnamed module of loader 'app')
at 
org.apache.flink.api.java.typeutils.runtime.TupleComparator.extractKeys(TupleComparator.java:148)
at 
org.apache.flink.streaming.util.keys.KeySelectorUtil$ComparableKeySelector.getKey(KeySelectorUtil.java:195)
at 
org.apache.flink.streaming.util.keys.KeySelectorUtil$ComparableKeySelector.getKey(KeySelectorUtil.java:168)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setKeyContextElement(AbstractStreamOperator.java:502)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setKeyContextElement1(AbstractStreamOperator.java:478)
at 
org.apache.flink.iteration.operator.allround.AbstractAllRoundWrapperOperator.setKeyContextElement1(AbstractAllRoundWrapperOperator.java:203)
at 
org.apache.flink.streaming.runtime.io.RecordProcessorUtils.lambda$getRecordProcessor1$1(RecordProcessorUtils.java:87)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:254)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:146)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Thread.java:829)


I am attaching the full test code below for reference. All it does is subtracts 
1 from the feeback stream until  the tuples reaches 0.0. For each subtraction 
it outputs a relevant message in the finaloutput stream. These messages are 
stored in the keyedState of KeyedCoProcessFunction and are preloaded in the 
parallel instances by a dataset stream called initialStates. For each key there 
are different messages associated with it, hence the need for MapState.



import java.util.*;
import org.apache.flink.api.common.state.MapState;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.iteration.DataStreamList;
import org.apache.flink.iteration.IterationBodyResult;
import

Re: FlinkML 'DenseVector' object has no attribute 'get_fields_by_names'

2023-09-19 Thread Evgeniy Lyutikov

Thanks for the answer, I'll try.
Are there examples or tutorials somewhere on how to use FlinkML in real-life
scenarios, such as streaming Kafka through a model?

От: Xin Jiang
Отправлено: 19 сентября 2023 г. 8:07:11
Кому: Evgeniy Lyutikov
Копия: user@flink.apache.org
Тема: Re: FlinkML 'DenseVector' object has no attribute 'get_fields_by_names'

Hi Evgeniy,

Yes, the reason of the exception is that you are returning an incorrect data
type. Flink ML doesn’t have a data type for `DenseVector` but it provides a
function called `pyflink.ml.functions.array_to_vector` which returns an
`Expression`. So maybe you can modify your UDF to union multiple columns as one
column of `DataTypes.ARRAY()`, and then call
`pyflink.ml.functions.array_to_vector` on this column.

Best,
Xin

“This message contains confidential information/commercial secret. If you are
not the intended addressee of this message you may not copy, save, print or
forward it to any third party and you are kindly requested to destroy this
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом
отправителя электронным письмом.”

39 matches

Mail list logo