Hi again,

After reading https://apacheignite.readme.io/docs/memory-configuration and
https://apacheignite.readme.io/docs/evictions I have been able to configure
the eviction and max size for a DataStorageConfiguration for a
DataRegionConfiguration that I have associated to the IGFS
dataCacheConfiguration. I understand that configuring the evictions and
expirity through CacheConfiguration only makes sense for on-heap caches,
which I guess was the only option available at the time the book was
written. However some of my previous questions still apply, and I have a
couple additional ones:

 - When using IGFS with a secondary file system, the main storage is the
secondary file system, but I might be interested in using Ignite
persistence to be able to cache data not only in the memory of the Ignite
workers, but also in their disk. For example I might have a data lake in a
huge HDFS cluster, and separate smaller compute cluster where I want to run
Spark and cache the data stored in the other HDFS by using IGFS. If I
enable the persistence for a data region that is used for the IGFS
dataCacheConfiguration, will the data be deleted from the disk of the
Ignite servers when it is evicted, or only from the memory? I would like it
to be deleted from the disk in this case, because the intention is using
the disk for tiered storage of IGFS understood as a cache of the external
HDFS cluster. Otherwise the disk might get filled because the HDFS cluster
is much bigger than the compute cluster where IGFS is running.

 - Does IGFS sync the eviction of entries in the data and the metadata
cache? Even if I use 2 different data regions for the 2 caches? A metadata
entry with no data entries can be useful, but not the other way around

 - Is there any recommended ratio between the page size used for the
DataStorageConfiguration for a DataRegionConfiguration used for the IGFS
dataCacheConfiguration, and the block size configured for IGFS?

Thanks again for all your help.

Best Regards,

Juan



On Tue, Dec 12, 2017 at 6:38 PM, Juan Rodríguez Hortalá <
[email protected]> wrote:

> Hi,
>
> I'm trying to understand the configuration parameters for IGFS. My use
> case is using IGFS with a secondary file system, thus acting as a cache for
> a hadoop file system, without having to modify any existing application
> (just the input and output path that will now use the igfs scheme). In the
> javadoc for FileSystemConfiguration I see:
>
> int getPerNodeBatchSize()
> Gets number of file blocks buffered on local node before sending batch to
> remote node.
> int getPerNodeParallelBatchCount()
> Gets number of batches that can be concurrently sent to remote node.
> int getPrefetchBlocks()
> Get number of pre-fetched blocks if specific file's chunk is requested.
>
> What is the remote node here? I understand this doesn't have to do with
> other ignite nodes holding backup copies, as that would be set in the cache
> configuration.
>
> I have also taken a look to http://apache-ignite-users.
> 70518.x6.nabble.com/IGFS-Data-cache-size-td2875.html but that post seems
> to refer to a deprecated field FileSystemConfiguration.maxSpaceSize that
> I haven't been able to see neither in the javadoc or in
> https://github.com/apache/ignite/blob/2.3.0/modules/core/
> src/main/java/org/apache/ignite/configuration/FileSystemConfiguration.java.
> Other questions that I have regarding Ignite configuration in the context
> of this use case:
>
>  - When I use ATOMIC for the atomicityMode of metaCacheConfiguration I get
> an launch exception  "Failed to start grid: IGFS metadata cache should be
> transactional: igfs". So I understand TRANSACTIONAL is required for
> metaCacheConfiguration, but I get no error when using ATOMIC
> for dataCacheConfiguration, is there any reason to use TRANSACTIONAL for
> dataCacheConfiguration? I understand ATOMIC gets better performance if you
> don't use the transaction features.
>
>  - The readThrough, writeThrough,writeBehind fields for the
> CacheConfiguration dataCacheConfiguration and metaCacheConfiguration have
> any effect? Or maybe IGFS is setting them according to the IgfsMode
> configured in the defaultMode field of FileSystemConfiguration?
>
> - Similarly, does the setExpiryPolicyFactory in dataCacheConfiguration and
> metaCacheConfiguration have any effect? I'd be interested in
> using DUAL_ASYNC defaultMode, and I though that maybe the ExpiryPolicy
> could give an upper bound for the time it takes for a record to be written
> to the secondary file system, because it has been expired from the cache.
> That way I could safely tear down the IGFS cluster after that time without
> any data loss. Is there some way of achieving that? Otherwise I think
> DUAL_ASYNC could only be used in long lived cluster, because I understand
> there is no functionality to flush the IGFS caches into the secondary file
> system.
>
> - Similarly, does the eviction policy configured for
> dataCacheConfiguration and metaCacheConfiguration have any effect? In any
> case I understand that IGFS can never fail due to having no more space in
> the caches, because it will evict the requires entries, saving them to the
> secondary file system if needed in order to avoid data loss.
>
> It would be nice if someone could point me to some webminar or
> documentation specific for IGFS. I have already watched
> https://www.youtube.com/watch?v=pshM_gy7Wig and I think it is a good
> introduction, but I would like to get more details. I have also read the
> book "High-Performance In-Memory Computing With Apache Ignite"
>
> Thanks a lot for all your help.
>
> Best Regards,
>
> Juan
>
>
>

Reply via email to