Hi Eric,

In case if you haven't come across...We have GemFire spark connector which
can be used to store/retrieve data from Spark.

https://issues.apache.org/jira/browse/GEODE-9

Thanks,
-Anil.


On Wed, Jul 22, 2015 at 8:31 AM, Eric Pederson <[email protected]> wrote:

> Hi Ashvin:
>
> We are using tools like Spark (and Hive for metadata) to process files in
> HDFS.   We're interested in both the Gemfire RDD and the Gemfire HDFS
> integration as ways to access the data we have in Gemfire using Spark and
> potentially Drill or Impala.
>
> Thanks,
>
>
> -- Eric
>
> On Tue, Jul 21, 2015 at 1:35 AM, Ashvin A <[email protected]> wrote:
>
>> Hi Eric,
>>
>> Currently HDFS store writes data in sequence file format and HFile
>> format. Each value is a serialized event which contain metadata and the
>> value provided by the user. The value can be deserialized using geode
>> classes. Each file can be deserialized independently and does not depend on
>> a live Geode cluster. A user level api to construct this data will be added
>> soon (see GFInputFormat as an example).
>>
>> HDFS can be used as archive by means of Write-only regions. These regions
>> do not follow LSM-tree structure. LSM structure is used for Read-Write
>> regions.
>>
>> I am planning to create a jira and provide more details. Meanwhile, can
>> you help us understand your use case. In your opinion, what could this
>> interface look like? What about old versions of a key? Do you care for
>> accessing hdfs files directly or is Hdfs Region interface better? Any other
>> information that could be relevant to the hdfs region data access pattern.
>>
>> Thanks
>> Ashvin
>>
>>
>>
>> On Mon, Jul 20, 2015 at 12:57 PM, Eric Pederson <[email protected]>
>> wrote:
>>
>>> In the spec for HDFS integration it says that data events are archived
>>> on HDFS for offline analysis.  How do you do offline analysis?  Is there an
>>> API for the file format so third party tools can read it?  Or do you go
>>> through an HDFS region?
>>>
>>> Also, just curious, are you using a LSM-tree to structure the data?
>>>
>>> Thanks,
>>>
>>> -- Eric
>>>
>>
>>
>

Reply via email to