Hi Eric, In case if you haven't come across...We have GemFire spark connector which can be used to store/retrieve data from Spark.
https://issues.apache.org/jira/browse/GEODE-9 Thanks, -Anil. On Wed, Jul 22, 2015 at 8:31 AM, Eric Pederson <[email protected]> wrote: > Hi Ashvin: > > We are using tools like Spark (and Hive for metadata) to process files in > HDFS. We're interested in both the Gemfire RDD and the Gemfire HDFS > integration as ways to access the data we have in Gemfire using Spark and > potentially Drill or Impala. > > Thanks, > > > -- Eric > > On Tue, Jul 21, 2015 at 1:35 AM, Ashvin A <[email protected]> wrote: > >> Hi Eric, >> >> Currently HDFS store writes data in sequence file format and HFile >> format. Each value is a serialized event which contain metadata and the >> value provided by the user. The value can be deserialized using geode >> classes. Each file can be deserialized independently and does not depend on >> a live Geode cluster. A user level api to construct this data will be added >> soon (see GFInputFormat as an example). >> >> HDFS can be used as archive by means of Write-only regions. These regions >> do not follow LSM-tree structure. LSM structure is used for Read-Write >> regions. >> >> I am planning to create a jira and provide more details. Meanwhile, can >> you help us understand your use case. In your opinion, what could this >> interface look like? What about old versions of a key? Do you care for >> accessing hdfs files directly or is Hdfs Region interface better? Any other >> information that could be relevant to the hdfs region data access pattern. >> >> Thanks >> Ashvin >> >> >> >> On Mon, Jul 20, 2015 at 12:57 PM, Eric Pederson <[email protected]> >> wrote: >> >>> In the spec for HDFS integration it says that data events are archived >>> on HDFS for offline analysis. How do you do offline analysis? Is there an >>> API for the file format so third party tools can read it? Or do you go >>> through an HDFS region? >>> >>> Also, just curious, are you using a LSM-tree to structure the data? >>> >>> Thanks, >>> >>> -- Eric >>> >> >> >
