Hi Eric,

Currently HDFS store writes data in sequence file format and HFile format.
Each value is a serialized event which contain metadata and the value
provided by the user. The value can be deserialized using geode classes.
Each file can be deserialized independently and does not depend on a live
Geode cluster. A user level api to construct this data will be added soon
(see GFInputFormat as an example).

HDFS can be used as archive by means of Write-only regions. These regions
do not follow LSM-tree structure. LSM structure is used for Read-Write
regions.

I am planning to create a jira and provide more details. Meanwhile, can you
help us understand your use case. In your opinion, what could this
interface look like? What about old versions of a key? Do you care for
accessing hdfs files directly or is Hdfs Region interface better? Any other
information that could be relevant to the hdfs region data access pattern.

Thanks
Ashvin



On Mon, Jul 20, 2015 at 12:57 PM, Eric Pederson <[email protected]> wrote:

> In the spec for HDFS integration it says that data events are archived on
> HDFS for offline analysis.  How do you do offline analysis?  Is there an
> API for the file format so third party tools can read it?  Or do you go
> through an HDFS region?
>
> Also, just curious, are you using a LSM-tree to structure the data?
>
> Thanks,
>
> -- Eric
>

Reply via email to