Flink Custom Partitioner Issue in Flink Streaming

2019-01-12 Thread sohimankotia
Hi Team,

I am facing some issue with Custom Partitioner in flink Streaming . I am
using watcher to read file from folder and then I have to partition records
and send to sink . 

- This is happening if parallelism > 1 .
- Checkpoint is enabled .
- If I don't use partitioner , then everything works fine . 
- If use shuffle also then its fine . 
- If I use IdPartitioner then also its fine . 

Attaching Logs :  ParitionLogs.log

  


*Main Class Code :*




@SuppressWarnings("Duplicates")
public class GroupContentScoring {

  private static final Logger LOGGER =
LoggerFactory.getLogger(GroupContentScoring.class);
  private static final String CONFIG_DIR = "configDir";
  private static final String PROPERTIES = "properties";
  private static final String TYPES_TO_PROCESS = "typesToProcess";
  private static final String CONTENT_TYPE = "group";
  private static final String DECAY_SCORE_NAME = "decay-score-" +
CONTENT_TYPE;

  public static void main(String[] args) throws Exception {

final ParameterTool params = ParameterTool.fromArgs(args);
final String configDir = params.get(CONFIG_DIR);
final String propertiesFile = params.get(PROPERTIES);
final String[] typesToProcess = params.get(TYPES_TO_PROCESS,
EMPTY).split(COMMA);
Preconditions.checkArgument(StringUtils.isNotEmpty(configDir), "Empty
config dir is specified");
Preconditions.checkArgument(StringUtils.isNotEmpty(propertiesFile),
"Empty properties file is specified");
Preconditions.checkArgument(typesToProcess.length > 0, " No type is
specified for processing");

LOGGER.info("config directory for properties and config files : {} ",
configDir);
LOGGER.info("Content type to process are  : {} ",
Arrays.asList(typesToProcess));

final String configFile = configDir + "/config.json";
System.out.println("Config File " + configFile);

final Configuration appConfigs = ParameterTool.fromPropertiesFile(new
File(configDir + "/" + propertiesFile)).getConfiguration();
final StreamExecutionEnvironment env = prepareStreamEnv(appConfigs);

final CsConfig groupConfig = CsConfig.fromFile(CONTENT_TYPE,
configFile);
groupConfig.put("watchFromDate", appConfigs.getString("watchFromDate",
""));
groupConfig.put("entityType", "CONTENT_GROUP");

final DataStream input = prepareInput(env, groupConfig);

final DataStream partitioned = input
.partitionCustom(new EventPartitioner(), new CsEventKeySelector());

partitioned.addSink(new PrintSinkFunction<>());

env.execute();


  }



  private static DataStream prepareInput(StreamExecutionEnvironment
env, CsConfig groupConfig) {

HdfsConfig hdfsConf = new HdfsConfig();
hdfsConf.setDir(groupConfig.get("dir").toString());
   
hdfsConf.setWatchIntervalInSeconds(groupConfig.getStrAsLong("watchIntervalInSeconds"));
hdfsConf.setWatchFilter(groupConfig.get("watchFilter").toString());
   
hdfsConf.setFileProcessingMode(groupConfig.get("fileProcessingMode").toString());
hdfsConf.setParallelism(env.getParallelism());
hdfsConf.setMaxFiles(groupConfig.getStrAsInt("maxFiles"));
hdfsConf.setWatchFromDate(groupConfig.get("watchFromDate").toString());
   
hdfsConf.setWatcherLookupThresholdHour(groupConfig.getStrAsInt("watcherLookupThresholdHour"));
   
hdfsConf.setModTimeDiffInSeconds(groupConfig.getStrAsLong("modTimeDiffInSeconds"));

AvroFileWatcherSource avroSource = new AvroFileWatcherSource(hdfsConf);
return avroSource.readStream(env, "group");
  }

  private static StreamExecutionEnvironment prepareStreamEnv(Configuration
appConfigs) {

final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
final int checkPointInterval =
appConfigs.getInteger("checkpoint.interval", 10_000);
final int checkPointMinPause =
appConfigs.getInteger("checkpoint.min.pause", 60_000);

env.enableCheckpointing(checkPointInterval);
   
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
   
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(checkPointMinPause);
final StateBackend stateBackend = new
FsStateBackend("hdfs:///new_data_pipeline/dev/phase2/");
// env.setStateBackend(stateBackend);
return env;
  }

}


*Partitioner :*

public class EventPartitioner implements Partitioner {

  private static final long serialVersionUID = 1L;

  @Override
  public int partition(String key, int numPartitions) {
return key.hashCode() % numPartitions;
  }
}

*Key Selector*

public class CsEventKeySelector implements KeySelector,
Serializable {

  private static final long serialVersionUID = 1L;

  @Override
  public String getKey(Event value) {
final String[] split = value.getKey().split("~");
return split[1];
  }
}



Thanks











--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Is the eval method invoked in a thread safe manner in Fink UDF

2019-01-12 Thread Anil
Thanks Hequn!. Is it also thread safe when the same UDF is called multiple
times in the same record. 
Is the UDF called sequentially for each fields a single record,  I have a
query like  - 
 select GetName(data.id, 'city'), GetName(data.id, 'zone') from ..




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Reply: Re: Reply: Re: What happen to state in Flink Task Manager when crash?

2019-01-12 Thread Siew Wai Yow
Thanks Qiu,
this is an useful information indeed, but this strategy will only reduce the 
chance of re-execution whole graph. I think it won't help if TM crash, which 
anyhow the whole cluster need to restart to redistribute states, am I right?


From: Congxian Qiu 
Sent: Sunday, January 13, 2019 9:39 AM
To: Siew Wai Yow
Cc: Jamie Grier; user
Subject: Re: Reply: Re: Reply: Re: What happen to state in Flink Task Manager 
when crash?

Hi, Yow
I think there is another restart strategy in flink: region failover[1], but I 
could not find the documentation, maybe someone else may help here, For region 
failover, please take a look at this issue[2] before you use it. And you can 
take a look at this FLIP[3].

[1] 
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/RestartPipelinedRegionStrategy.java
[2] https://issues.apache.org/jira/browse/FLINK-10712
[3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月13日周日 
上午8:49写道:
Hi Qiu thanks again!
Based on my experience on Flink 1.3, when one of the TM crash the whole cluster 
need to be restarted so i guess this is the recovery you mentioned. But it 
sounds defeat the purpose of cluster as one TM crash should not crash the whole 
cluster. May i know is this still the same in Flink 1.7? Restart strategy is 
for job though not for TM failure.

Thanks!

Hi, Siew Wai Yow

Yes, David is correct, the TM must be recovered, the number of TMs before and 
after the crash must be the same.
In my last reply, I want to say that the states may not on the same TM after 
the crash. Sorry for the unclear description.

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月12日周六 
下午6:44写道:
Thanks Qiu but David has different view from stackoverflow. He mentioned the 
Crashed TM must be recovered.

https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686

"The crashed TM must be recovered, and state updates since the last checkpoint 
will be lost. Of course that lost state should be recreated as the job rewinds 
and resumes."

Hi, Siew Wai Yow

When the job is running, the states are stored in the local RocksDB, Flink will 
copy all the needed states to checkpointPath when doing a Checkpoint.
If there have any failures, the job will be restored from the last previously 
*Successfully* checkpoint, and assign the restored states to all the current TM
(These TMs do not need to be the same as before) .

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月12日周六 
上午11:24写道:
Thanks. But this is something I know. I would like to know will the other TM 
take over the crashed TM's state to ensure data completion(say the state BYKEY, 
different key state will be stored in different TM) OR the crashed TM need to 
be recovered to continue?

For example, 5 records,
rec1:KEYA
rec2:KEYB
rec3:KEYA
rec4:KEYC
rec5:KEYB

TM1 stored state for rec1:KEYA, rec3:KEYA
TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to those 
state stored inTM2
TM3 stored state for rec4:KEYC

In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or those 
record only recover when TM2 being recover?

Thanks.



From: Jamie Grier mailto:jgr...@lyft.com>>
Sent: Saturday, January 12, 2019 2:26 AM
To: Siew Wai Yow
Cc: user@flink.apache.org
Subject: Re: What happen to state in Flink Task Manager when crash?

Flink is designed such that local state is backed up to a highly available 
system such as HDFS or S3.  When a TaskManager fails state is recovered from 
there.

I suggest reading this:  

 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html


On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow 
mailto:wai_...@hotmail.com>> wrote:
Hello,

May i know what happen to state stored in Flink Task Manager when this Task 
manager crash. Say the state storage is rocksdb, would those data transfer to 
other running Task Manager so that complete state data is ready for data 
processing?

Regards,
Yow


--
Best,
Congxian


--
Best,
Congxian


--
Best,
Congxian


Re: Reply: Re: Reply: Re: What happen to state in Flink Task Manager when crash?

2019-01-12 Thread Congxian Qiu
Hi, Yow
I think there is another restart strategy in flink: region failover[1], but
I could not find the documentation, maybe someone else may help here, For
region failover, please take a look at this issue[2] before you use it. And
you can take a look at this FLIP[3].

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/RestartPipelinedRegionStrategy.java
[2] https://issues.apache.org/jira/browse/FLINK-10712
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

Siew Wai Yow  于2019年1月13日周日 上午8:49写道:

> Hi Qiu thanks again!
> Based on my experience on Flink 1.3, when one of the TM crash the whole
> cluster need to be restarted so i guess this is the recovery you mentioned.
> But it sounds defeat the purpose of cluster as one TM crash should not
> crash the whole cluster. May i know is this still the same in Flink 1.7?
> Restart strategy is for job though not for TM failure.
>
> Thanks!
>
> Hi, Siew Wai Yow
>
> Yes, David is correct, the TM must be recovered, the number of TMs before
> and after the crash must be the same.
> In my last reply, I want to say that the states may not on the same TM
> after the crash. Sorry for the unclear description.
>
> Siew Wai Yow  于2019年1月12日周六 下午6:44写道:
>
>> Thanks Qiu but David has different view from stackoverflow. He mentioned
>> the Crashed TM must be recovered.
>>
>>
>> https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686
>>
>> "The crashed TM must be recovered, and state updates since the last
>> checkpoint will be lost. Of course that lost state should be recreated as
>> the job rewinds and resumes."
>>
>> Hi, Siew Wai Yow
>>
>> When the job is running, the states are stored in the local RocksDB,
>> Flink will copy all the needed states to checkpointPath when doing a
>> Checkpoint.
>> If there have any failures, the job will be restored from the last
>> previously *Successfully* checkpoint, and assign the restored states to all
>> the current TM
>> (These TMs do not need to be the same as before) .
>>
>> Siew Wai Yow  于2019年1月12日周六 上午11:24写道:
>>
>>> Thanks. But this is something I know. I would like to know will the
>>> other TM take over the crashed TM's state to ensure data completion(say the
>>> state BYKEY, different key state will be stored in different TM) OR the
>>> crashed TM need to be recovered to continue?
>>>
>>> For example, 5 records,
>>> rec1:KEYA
>>> rec2:KEYB
>>> rec3:KEYA
>>> rec4:KEYC
>>> rec5:KEYB
>>>
>>> TM1 stored state for rec1:KEYA, rec3:KEYA
>>> TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to
>>> those state stored inTM2
>>> TM3 stored state for rec4:KEYC
>>>
>>> In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or
>>> those record only recover when TM2 being recover?
>>>
>>> Thanks.
>>>
>>>
>>> --
>>> *From:* Jamie Grier 
>>> *Sent:* Saturday, January 12, 2019 2:26 AM
>>> *To:* Siew Wai Yow
>>> *Cc:* user@flink.apache.org
>>> *Subject:* Re: What happen to state in Flink Task Manager when crash?
>>>
>>> Flink is designed such that local state is backed up to a highly
>>> available system such as HDFS or S3.  When a TaskManager fails state is
>>> recovered from there.
>>>
>>> I suggest reading this:
>>> 
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html
>>>
>>>
>>> On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow 
>>> wrote:
>>>
>>> Hello,
>>>
>>> May i know what happen to state stored in Flink Task Manager when this
>>> Task manager crash. Say the state storage is rocksdb, would those data
>>> transfer to other running Task Manager so that complete state data is ready
>>> for data processing?
>>>
>>> Regards,
>>> Yow
>>>
>>>
>>
>> --
>> Best,
>> Congxian
>>
>>
>
> --
> Best,
> Congxian
>
>

-- 
Best,
Congxian


Reply: Re: Reply: Re: What happen to state in Flink Task Manager when crash?

2019-01-12 Thread Siew Wai Yow
Hi Qiu thanks again!
Based on my experience on Flink 1.3, when one of the TM crash the whole cluster 
need to be restarted so i guess this is the recovery you mentioned. But it 
sounds defeat the purpose of cluster as one TM crash should not crash the whole 
cluster. May i know is this still the same in Flink 1.7? Restart strategy is 
for job though not for TM failure.

Thanks!

Hi, Siew Wai Yow

Yes, David is correct, the TM must be recovered, the number of TMs before and 
after the crash must be the same.
In my last reply, I want to say that the states may not on the same TM after 
the crash. Sorry for the unclear description.

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月12日周六 
下午6:44写道:
Thanks Qiu but David has different view from stackoverflow. He mentioned the 
Crashed TM must be recovered.

https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686

"The crashed TM must be recovered, and state updates since the last checkpoint 
will be lost. Of course that lost state should be recreated as the job rewinds 
and resumes."

Hi, Siew Wai Yow

When the job is running, the states are stored in the local RocksDB, Flink will 
copy all the needed states to checkpointPath when doing a Checkpoint.
If there have any failures, the job will be restored from the last previously 
*Successfully* checkpoint, and assign the restored states to all the current TM
(These TMs do not need to be the same as before) .

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月12日周六 
上午11:24写道:
Thanks. But this is something I know. I would like to know will the other TM 
take over the crashed TM's state to ensure data completion(say the state BYKEY, 
different key state will be stored in different TM) OR the crashed TM need to 
be recovered to continue?

For example, 5 records,
rec1:KEYA
rec2:KEYB
rec3:KEYA
rec4:KEYC
rec5:KEYB

TM1 stored state for rec1:KEYA, rec3:KEYA
TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to those 
state stored inTM2
TM3 stored state for rec4:KEYC

In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or those 
record only recover when TM2 being recover?

Thanks.



From: Jamie Grier mailto:jgr...@lyft.com>>
Sent: Saturday, January 12, 2019 2:26 AM
To: Siew Wai Yow
Cc: user@flink.apache.org
Subject: Re: What happen to state in Flink Task Manager when crash?

Flink is designed such that local state is backed up to a highly available 
system such as HDFS or S3.  When a TaskManager fails state is recovered from 
there.

I suggest reading this:  

 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html


On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow 
mailto:wai_...@hotmail.com>> wrote:
Hello,

May i know what happen to state stored in Flink Task Manager when this Task 
manager crash. Say the state storage is rocksdb, would those data transfer to 
other running Task Manager so that complete state data is ready for data 
processing?

Regards,
Yow


--
Best,
Congxian


--
Best,
Congxian


Re: Java Exapmle of Stochastic Outlier Selection

2019-01-12 Thread Rong Rong
Hi James,

Usually Flink ML is highly integrated with Scala. I did poke around to and
try to make the example work in Java and it does require a significant
amount of effort, but you can try:

First the implicit type parameters needs to be passed over to the execution
environment to generate the DataSet appropriately. I found this link [1]
might be useful depending on your use case.
Next you need to extract the TransformDataSetOperation from
the StochasticOutlierSelection method since it's also an implicit argument
to the transform method.

--
Rong

[1]:
https://www.programcreek.com/java-api-examples/index.php?api=scala.reflect.ClassTag


On Sat, Jan 12, 2019 at 12:39 AM James.Y.Yang (g-mis.cncd02.Newegg) 42035 <
james.y.y...@newegg.com> wrote:

> Hi,
>
>
>
> I want to use Stochastic Outlier Selection in ML Library. But after I read
> the document [1] , I find there is not Java example. Sorry I am not
> familiar with Scala
>
> So I appreciate that someone can share a Java example.
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/libs/ml/sos.html
>
>
>
> Best Regards,
>
> James Yang
>


Re: Reply: Re: What happen to state in Flink Task Manager when crash?

2019-01-12 Thread Congxian Qiu
Hi, Siew Wai Yow

Yes, David is correct, the TM must be recovered, the number of TMs before
and after the crash must be the same.
In my last reply, I want to say that the states may not on the same TM
after the crash. Sorry for the unclear description.

Siew Wai Yow  于2019年1月12日周六 下午6:44写道:

> Thanks Qiu but David has different view from stackoverflow. He mentioned
> the Crashed TM must be recovered.
>
>
> https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686
>
> "The crashed TM must be recovered, and state updates since the last
> checkpoint will be lost. Of course that lost state should be recreated as
> the job rewinds and resumes."
>
> Hi, Siew Wai Yow
>
> When the job is running, the states are stored in the local RocksDB, Flink
> will copy all the needed states to checkpointPath when doing a Checkpoint.
> If there have any failures, the job will be restored from the last
> previously *Successfully* checkpoint, and assign the restored states to all
> the current TM
> (These TMs do not need to be the same as before) .
>
> Siew Wai Yow  于2019年1月12日周六 上午11:24写道:
>
>> Thanks. But this is something I know. I would like to know will the other
>> TM take over the crashed TM's state to ensure data completion(say the state
>> BYKEY, different key state will be stored in different TM) OR the crashed
>> TM need to be recovered to continue?
>>
>> For example, 5 records,
>> rec1:KEYA
>> rec2:KEYB
>> rec3:KEYA
>> rec4:KEYC
>> rec5:KEYB
>>
>> TM1 stored state for rec1:KEYA, rec3:KEYA
>> TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to
>> those state stored inTM2
>> TM3 stored state for rec4:KEYC
>>
>> In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or those
>> record only recover when TM2 being recover?
>>
>> Thanks.
>>
>>
>> --
>> *From:* Jamie Grier 
>> *Sent:* Saturday, January 12, 2019 2:26 AM
>> *To:* Siew Wai Yow
>> *Cc:* user@flink.apache.org
>> *Subject:* Re: What happen to state in Flink Task Manager when crash?
>>
>> Flink is designed such that local state is backed up to a highly
>> available system such as HDFS or S3.  When a TaskManager fails state is
>> recovered from there.
>>
>> I suggest reading this:
>> 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html
>>
>>
>> On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow  wrote:
>>
>> Hello,
>>
>> May i know what happen to state stored in Flink Task Manager when this
>> Task manager crash. Say the state storage is rocksdb, would those data
>> transfer to other running Task Manager so that complete state data is ready
>> for data processing?
>>
>> Regards,
>> Yow
>>
>>
>
> --
> Best,
> Congxian
>
>

-- 
Best,
Congxian


Reply: Re: What happen to state in Flink Task Manager when crash?

2019-01-12 Thread Siew Wai Yow
Thanks Qiu but David has different view from stackoverflow. He mentioned the 
Crashed TM must be recovered.

https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686

"The crashed TM must be recovered, and state updates since the last checkpoint 
will be lost. Of course that lost state should be recreated as the job rewinds 
and resumes."

Hi, Siew Wai Yow

When the job is running, the states are stored in the local RocksDB, Flink will 
copy all the needed states to checkpointPath when doing a Checkpoint.
If there have any failures, the job will be restored from the last previously 
*Successfully* checkpoint, and assign the restored states to all the current TM
(These TMs do not need to be the same as before) .

Siew Wai Yow mailto:wai_...@hotmail.com>> 于2019年1月12日周六 
上午11:24写道:
Thanks. But this is something I know. I would like to know will the other TM 
take over the crashed TM's state to ensure data completion(say the state BYKEY, 
different key state will be stored in different TM) OR the crashed TM need to 
be recovered to continue?

For example, 5 records,
rec1:KEYA
rec2:KEYB
rec3:KEYA
rec4:KEYC
rec5:KEYB

TM1 stored state for rec1:KEYA, rec3:KEYA
TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to those 
state stored inTM2
TM3 stored state for rec4:KEYC

In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or those 
record only recover when TM2 being recover?

Thanks.



From: Jamie Grier mailto:jgr...@lyft.com>>
Sent: Saturday, January 12, 2019 2:26 AM
To: Siew Wai Yow
Cc: user@flink.apache.org
Subject: Re: What happen to state in Flink Task Manager when crash?

Flink is designed such that local state is backed up to a highly available 
system such as HDFS or S3.  When a TaskManager fails state is recovered from 
there.

I suggest reading this:  

 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html


On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow 
mailto:wai_...@hotmail.com>> wrote:
Hello,

May i know what happen to state stored in Flink Task Manager when this Task 
manager crash. Say the state storage is rocksdb, would those data transfer to 
other running Task Manager so that complete state data is ready for data 
processing?

Regards,
Yow


--
Best,
Congxian


Re: breaking the pipe line

2019-01-12 Thread Hequn Cheng
Hi Alieh,

Which kind of API do you use? TableApi or SQL or DataStream or DataSet.
Would be great if you can show us some information about your pipeline or
provide a way to reproduce the problem.

Best, Hequn

On Sat, Jan 12, 2019 at 1:58 AM Alieh 
wrote:

> Hello all,
>
> I have a very very long pipeline (implementation of an incremental
> algorithm). It takes a very long time for Flink execution planner to
> create the plan. So I splitted the pipeline into several independent
> pipelines by writing down the intermediate results and again read them.
> Is there any other more efficient way to do it?
>
> Best,
>
> Alieh
>
>


Re: Is the eval method invoked in a thread safe manner in Fink UDF

2019-01-12 Thread Hequn Cheng
Hi Anil,

It is thread-safe.
Each udf instance will only run in one task. And for each udf, it processes
data synchronously, i.e, the next record will not be processed until the
current record is processed.

Best, Hequn

On Sat, Jan 12, 2019 at 3:12 AM Anil  wrote:

> Is the eval method invoked in a thread safe manner in Fink UDF.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Java Exapmle of Stochastic Outlier Selection

2019-01-12 Thread James.Y.Yang (g-mis.cncd02.Newegg) 42035
Hi,



I want to use Stochastic Outlier Selection in ML Library. But after I read the 
document [1] , I find there is not Java example. Sorry I am not familiar with 
Scala

So I appreciate that someone can share a Java example.



[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/libs/ml/sos.html



Best Regards,

James Yang