I'm not sure if Pig can do this. It's designed to follow the MapReduce/Hadoop paradigm which typically involves data on disk -> MapReduce Jobs -> data on disk.
You could try to create a custom InputSplit/RecordReader to read from a program's standard output or something but this is kind of hacky. There are RecordReaders which read from SQL databases. There's also something like this: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamBaseRecordReader.html Which can be used with Hadoop streaming. But this is all somewhat intensive and would require a bit of work (if it's even possible) - I don't think Pig has direct support yet for the kind of interface you're looking for. That being said, I'm somewhat new to Pig/Hadoop so if there's anyone else who can chime in with comments or agreements/disagreements, I'd appreciate it. On Fri, May 13, 2011 at 1:32 PM, Jianting Cao <[email protected]> wrote: > Thank you Mark. Sorry that I'm not clear enough. What I want is this, there > are some program running and generating a lot of data, instead of putting > these data to a relational database, I want to directly output them to Pig > and do some analysis along the way or afterwards. So I'm asking if there is > a JDBC-like interface with which I could load these newly generated data > into Pig and do analytic. all of this is happening within a Java process. > > Jianting > > On Fri, May 13, 2011 at 10:14 AM, Mark Laczin <[email protected]> wrote: > >> Technically speaking, yes you could store data in memory and keep it >> there, then have your program present some interface to store data >> (shared memory or reading from the stdin or something) but I'm not >> sure why you'd want to do this. >> >> Maybe I'm misunderstanding your question, but it sounds like you want >> to run using a filesystem that's in memory as opposed to on disk. >> >> -Mark >> >> On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <[email protected]> >> wrote: >> > Hi, >> > >> > >> > >> > Is there only one way to load data into pig, i.e. using load command to >> load >> > data from files? Can I load data from memory, for example in embedded >> code >> > create a table and store data into it? >> > >> > >> > >> > Thanks, >> > >> > Jianting Cao >> > >> > >> >
