date:20130610

Re: save several 64MB files in Pig Latin

2013-06-10 Thread Bertrand Dechoux

I wasn't clear. Specifying the size of the files is not your real aim, I guess. But you think that's what is needed in order to solve your problem that we don't know about. 500MB is not a really big file in itself and is not an issue for HDFS and MapReduce. There is no absolute way to know how

GROUP BY Issue

2013-06-10 Thread Gourav Sengupta

Hi, On running the following query I am getting multiple records with same value of F1 SELECT F1, COUNT(*) FROM ( SELECT F1, F2, COUNT(*) FROM TABLE1 GROUP BY F1, F2 ) a GROUP BY F1; As per what I understand there are multiple number of records based on number of reducers. Replicating the test

Re: GROUP BY Issue

2013-06-10 Thread Gourav Sengupta

Hi Shahab, It will be great if someone can delete this email from PIG group. I am aware of this mistake and had posted this issue to HIVE group almost immediately. Regards, Gourav On Mon, Jun 10, 2013 at 5:28 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Gourav, this is not a HIVE mailing

Re: problems with .gz

2013-06-10 Thread Alan Crosswell

Ignore what I said and see https://forums.aws.amazon.com/thread.jspa?threadID=51232 bzip2 was documented somewhere as being splittable but this appears to not actually be implemented at least in AWS S3. /a On Mon, Jun 10, 2013 at 12:41 PM, Alan Crosswell a...@crosswell.us wrote: Suggest that

Re: problems with .gz

2013-06-10 Thread Niels Basjes

Bzip2 is only splittable in newer versions of hadoop. On Jun 10, 2013 10:28 PM, Alan Crosswell a...@crosswell.us wrote: Ignore what I said and see https://forums.aws.amazon.com/thread.jspa?threadID=51232 bzip2 was documented somewhere as being splittable but this appears to not actually be

Loading data from ranges of ordered subdirs

2013-06-10 Thread Rodrick Megraw

Let's say I have my input data from the past 12 months organized into subdirs by date: /data/2012-06-10 /data/2012-06-11 ... /data/2013-06-09 And now say that I want to run a Pig script to process data from a range of dates within the last 12 months, say 2012-11-07 through 2013-05-26. The

running pig from eclipse on hadoop cluster

2013-06-10 Thread Weiping Qu

Hi, I am currently running pig from eclipse on hadoop cluster. I added the hadoop conf location to the runtime configuration. But the mapreduce jobs failed as the built class files of pig cannot be called by hadoop. I added class file location to the classpath, but it did not work. Any hints?

Re: Loading data from ranges of ordered subdirs

2013-06-10 Thread Pradeep Gollakota

There's two possibilites that come to mind. 1. Write a custom LoadFunc in which you can handle these regular expressions. *Not the most ideal solution* 2. Use HCatalog. The example they have in their documentation seems to fit your use case perfectly.

Re: running pig from eclipse on hadoop cluster

2013-06-10 Thread Weiping Qu

Hi, Forget the question raised before. It's solved. Hi, I am currently running pig from eclipse on hadoop cluster. I added the hadoop conf location to the runtime configuration. But the mapreduce jobs failed as the built class files of pig cannot be called by hadoop. I added class file

RE: Loading data from ranges of ordered subdirs

2013-06-10 Thread Rodrick Megraw

Thank you for the suggestions. Writing a custom LoadFunc seems like a valid solution for me, given that I don't currently have Hive or HCatalog installed and I'm working on more of an ad-hoc problem at this point. HCatalog seems like a good solution for doing this type of thing on a repeated

Re: save several 64MB files in Pig Latin

GROUP BY Issue

Re: GROUP BY Issue

Re: problems with .gz

Re: problems with .gz

Loading data from ranges of ordered subdirs

running pig from eclipse on hadoop cluster

Re: Loading data from ranges of ordered subdirs

Re: running pig from eclipse on hadoop cluster

RE: Loading data from ranges of ordered subdirs

10 matches

Site Navigation

Mail list logo

Footer information