Which version of Pig you're using? We've fixed many issues of multi-query since the feature was released. Please try the queries again with the latest version and let us know any problems.
Thanks, -Richard -----Original Message----- From: Mridul Muralidharan [mailto:[email protected]] Sent: Thursday, December 24, 2009 10:45 AM To: [email protected] Subject: Re: Some newbie questions That is not exactly the same as what I proposed - not just from a performance/implementation point of view, but probably also from a code reuse point of view. But then I am not very familiar with multi-query work much because of the stability issues I face when using it - so probably it is comparable ! Ofcourse, to each his own :-) Regards, Mridul Richard Ding wrote: > Pig supports queries that have multiple outputs (multi-query support). > > Thanks, > -Richard > > -----Original Message----- > From: Mridul Muralidharan [mailto:[email protected]] > Sent: Thursday, December 24, 2009 10:10 AM > To: [email protected] > Subject: Re: Some newbie questions > > > IIRC pig does not support multiple output collector, does it ? > And lack of common schema in this case (each type has its own schema) is > worrying. > > Regards, > Mridul > > Richard Ding wrote: >> Actually, you don't need to use Hadoop to create this map-only job, Pig will >> do it for you. >> >> Thanks, >> -Richard >> -----Original Message----- >> From: Mridul Muralidharan [mailto:[email protected]] >> Sent: Thursday, December 24, 2009 7:54 AM >> To: [email protected] >> Subject: Re: Some newbie questions >> >> >> If this is a one-time operation in your pipeline and you are ok with >> splitting it, you might want to consider using hadoop directly and >> splitting based on multiple-output collector. >> >> It can be a map-only job with a line record reader or similar, a map >> function which does the split as you were doing in the existing db code, >> and writing to appropriate output collector based on the type. >> >> >> All further analysis can be through pig - which works on a more type >> specific schema aware form (assuming each type has a fixed schema, while >> initial jumble of types does not have uniform schema). >> >> >> >> Not sure if it is practical since i have not used this for map-only jobs ... >> >> Regards, >> Mridul >> >> >> Gökhan Çapan wrote: >>> Hi, probably that was discussed before in this list, but i couldn't find. >>> We are implementing log analysis tools for some web sites that have high >>> traffic. >>> From now on, we want to use Pig to implement such analysis tools. >>> >>> We have millions of logs of a web site in a session-URL-time format. >>> This is not just search logs, or just product views, but it consists of >>> different types of actions. >>> >>> For example, if a URL contains a specific pattern, we call it a search log, >>> etc. >>> >>> Until now, I was using a factory method to instantiate appropriate >>> URLHandler and after extracting some information from URL, I was storing >>> this information to the appropriate database table. For example if the >>> program decides a URL is a search log, it extracts session, query, time, >>> corrects typos, determine implicit rating, goes to Search table(this is a >>> relational database table), and store these to the table. If the program >>> decides a URL is a product view log, it extracts session, member_id, >>> product_id, time, product title, rating for product, goes to Product_View >>> table and stores it. After finishing storing, for example, it extracts >>> popular queries for assisting search. >>> >>> If I want to do all of these with Pig; >>> - Should I partition the global log file to separate files(search_logs and >>> product_view_logs are in seperate files)? or >>> - Can some pig commands load data, treat each tuple with its type (e.g. This >>> is a search log and it should have "session-query-time-implicit rating") and >>> I can get rid of partitioning data for each type of log? >>> >>> I have just downloaded Pig and it seems it is able to do such tasks, and I >>> will appreciate if anyone can show me a starting point for such an >>> application, and share some ideas. >>> Thank you. >
