I'm not sure if Pig can do this.  It's designed to follow the
MapReduce/Hadoop paradigm which typically involves data on disk ->
MapReduce Jobs -> data on disk.

You could try to create a custom InputSplit/RecordReader to read from
a program's standard output or something but this is kind of hacky.
There are RecordReaders which read from SQL databases.  There's also
something like this:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamBaseRecordReader.html
Which can be used with Hadoop streaming.

But this is all somewhat intensive and would require a bit of work (if
it's even possible) - I don't think Pig has direct support yet for the
kind of interface you're looking for.

That being said, I'm somewhat new to Pig/Hadoop so if there's anyone
else who can chime in with comments or agreements/disagreements, I'd
appreciate it.


On Fri, May 13, 2011 at 1:32 PM, Jianting Cao <[email protected]> wrote:
> Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
> are some program running and generating a lot of data, instead of putting
> these data to a relational database, I want to directly output them to Pig
> and do some analysis along the way or afterwards. So I'm asking if there is
> a JDBC-like interface with which I could load these newly generated data
> into Pig and do analytic. all of this is happening within a Java process.
>
> Jianting
>
> On Fri, May 13, 2011 at 10:14 AM, Mark Laczin <[email protected]> wrote:
>
>> Technically speaking, yes you could store data in memory and keep it
>> there, then have your program present some interface to store data
>> (shared memory or reading from the stdin or something) but I'm not
>> sure why you'd want to do this.
>>
>> Maybe I'm misunderstanding your question, but it sounds like you want
>> to run using a filesystem that's in memory as opposed to on disk.
>>
>> -Mark
>>
>> On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <[email protected]>
>> wrote:
>> > Hi,
>> >
>> >
>> >
>> > Is there only one way to load data into pig, i.e. using load command to
>> load
>> > data from files? Can I load data from memory, for example in embedded
>> code
>> > create a table and store data into it?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Jianting Cao
>> >
>> >
>>
>

Reply via email to