Hi Cesar,
UDF is good for processing data. For writing data you should write
custom Storer. Also, for writing data into MongoDB there is already
Storer written available in GitHub with more rich features.
Take a look at : https://github.com/mongodb/mongo-hadoop/tree/master/pig
MongoStorage code :
https://github.com/mongodb/mongo-hadoop/blob/master/pig/src/main/java/com/mongodb/hadoop/pig/MongoStorage.java
Thanks
Suraj Nayak
On Wednesday 03 December 2014 01:50 PM, Cesar Pumar García wrote:
Hi there,
We are given a text file containing several lines, where each one
corresponds a mongo document, and we load it as follows:
DEFINE PigToMongo com.beeva.PigToMongo.PigToMongo();
A = LOAD '/home/hduser/pigfiles/input.txt' USING TextLoader() AS
(line:chararray);
B = FOREACH A GENERATE PigToMongo(line);
DUMP B
By using PigToMongo(line), we connect to mongo, map A, write and close the
connection.
PigToMongo creates a connection for each line as follows (which implies our
MongoDB is down*):
MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
DB db = mongoClient.getDB( "hadoopDB" );
DBCollection coll = db.getCollection("output0");
I wonder whether it is possible to open and close the connection only once,
outside the UDF.
- By the way, does MongoDB support multiple connections at the same
time? (from several reducers storing data during a map/reduce job, for
example)
Thank you,
*CÉSAR PUMAR GARCÍA*
*BEEVA FOR GRADUATES*
*cesar.pu...@beeva.com <cesar.pu...@beeva.com>[image: www.beeva.com]
<http://www.beeva.com>*