Hi,

I think what Bswaraj want is excatly something like Storm Distributed Cache 
API[1] (if I’m not misunderstanding). 

> The distributed cache feature in storm is used to efficiently distribute 
> files (or blobs, which is the equivalent terminology for a file in the 
> distributed cache and is used interchangeably in this document) that are 
> large and can change during the lifetime of a topology, such as geo-location 
> data, dictionaries, etc. Typical use cases include phrase recognition, entity 
> extraction, document classification, URL re-writing, location/address 
> detection and so forth. Such files may be several KB to several GB in size. 
> For small datasets that don't need dynamic updates, including them in the 
> topology jar could be fine. But for large files, the startup times could 
> become very large. In these cases, the distributed cache feature can provide 
> fast topology startup, especially if the files were previously downloaded for 
> the same submitter and are still in the cache. This is useful with frequent 
> deployments, sometimes few times a day with updated jars, because the large 
> cached files will remain available without changes. The large cached blobs 
> that do not change frequently will remain available in the distributed cache.

We can look into this whether it is a common use case and how to implement it 
in Flink. 

[1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-blobstore.html 
<http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-blobstore.html>
 

- Jark Wu 

> 在 2016年8月23日,下午9:45,Lohith Samaga M <lohith.sam...@mphasis.com> 写道:
> 
> Hi
> May be you could use Cassandra to store and fetch all such reference data.  
> This way the reference data can be updated without restarting your 
> application.
> 
> Lohith
> 
> Sent from my Sony Xperia™ smartphone
> 
> 
> 
> ---- Baswaraj Kasture wrote ----
> 
> Thanks Kostas !
> I am using DataStream API.
> 
> I have few config/property files (key vale text file) and also have business 
> rule files (json).
> These rules and configurations are needed when we process incoming event.
> Is there any way to share them to task nodes from driver program ?
> I think this is very common use case and am sure other users may face similar 
> issues.
> 
> +Baswaraj
> 
> On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas <k.klou...@data-artisans.com 
> <mailto:k.klou...@data-artisans.com>> wrote:
> Hello Baswaraj,
> 
> Are you using the DataSet (batch) or the DataStream API?
> 
> If you are in the first, you can use a broadcast variable 
> <https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables>
>  for your task.
> If you are using the DataStream one, then there is no proper support for that.
> 
> Thanks,
> Kostas
> 
>> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kbaswar...@gmail.com 
>> <mailto:kbaswar...@gmail.com>> wrote:
>> 
>> Am running Flink standalone cluster.
>> 
>> I have text file that need to be shared across tasks when i submit my 
>> application.
>> in other words , put this text file in class path of running tasks.
>> 
>> How can we achieve this with flink ?
>> 
>> In spark, spark-submit has --jars option that puts all the files specified 
>> in class path of executors (executors run in separate JVM and spawned 
>> dynamically, so it is possible).
>> 
>> Flink's task managers run tasks in separate thread under taskmanager JVM (?) 
>> , how can we make this text file to be accessible on all tasks spawned by 
>> current application ?
>> 
>> Using HDFS, NFS or including file in program jar is one way that i know, but 
>> am looking for solution that can allows me to provide text file at run time 
>> and still accessible in all tasks.
>> Thanks.
> 
> 
> 
> Information transmitted by this e-mail is proprietary to Mphasis, its 
> associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or 
> dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at 
> mailmas...@mphasis.com and delete this mail from your records.
> 

Reply via email to