I guess you are getting a bag of tuples here. Try to apply FLATTEN on the bag.
Thanks On Wed, Dec 18, 2013 at 12:20 AM, Tim Robertson <[email protected]>wrote: > Hi all, > > I am new to Pig, and struggle to split up a long text line into multiple > lines. > I have an input format from a legacy mysqldump like: > > LOCK TABLES `t` WRITE; > /*!40000 ALTER TABLE `t` DISABLE KEYS */; > INSERT INTO `t` VALUES ('a','b'),('c','d'),('e','f'); > /*!40000 ALTER TABLE `t` ENABLE KEYS */; > UNLOCK TABLES; > /*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */; > > and I am trying to turn that into something like: > > 'a','b' > 'c','d' > 'e','f' > > So far I have come up with the following: > > -- Load in the raw data that is the actual mysqldump output > mysqldump = LOAD '/Users/tim/Desktop/rollover/dump.txt' USING TextLoader as > (line:chararray); > > -- Find only those lines starting with the insert statement we care about > insertLines = FILTER mysqldump BY (line matches 'INSERT INTO.*'); > > -- split them by the ),( > splits = FOREACH insertLines GENERATE STRSPLIT(line,'\\),\\('); > > Can anyone please help me with the last bit so I can turn those into a line > per split, instead of a tuple per split? > > Sorry that my terminology is probably wrong... it's my first day on Pig. > > Thanks, > Tim >
