Re: Need help getting started.

2014-01-10 Thread Mariano Kamp
Hi Josh. Ok, got it. Interesting. Downloaded ant, recompiled and now it works. Thank you. On Fri, Jan 10, 2014 at 10:16 PM, Josh Elser wrote: > Mariano, > > Pig 0.12.0 does work with Hadoop-2.2.0, but you need to recompile Pig > first. > > In your $PIG_HOME, run the following command to rebu

Re: Need help getting started.

2014-01-10 Thread Josh Elser
Mariano, Pig 0.12.0 does work with Hadoop-2.2.0, but you need to recompile Pig first. In your $PIG_HOME, run the following command to rebuild Pig: `ant clean jar-withouthadoop -Dhadoopversion=23` Then, try re-running your script. On 1/9/14, 5:13 PM, Mariano Kamp wrote: Hi, I am trying to ru

Re: Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Mehmet Tepedelenlioglu
If it is indeed a balancing issue, you could load to counter 1 and 2, filter, group/count, and join. That way you assure that the filtering is done after the mappers, and then the combiner kicks in for the counts, and the join is done on unique keys you grouped on already. Downside is 2 MR steps

RE: Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Zebeljan, Nebojsa
Yes, you're right. It spills over 600sec. (10 mins) and than it fails. I don't want to increase the time out and therefore I wonder if there is a way to optimize the pig script or to add some arguments to tune the performance ... From: Pradeep Gollakota [p

Re: Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Pradeep Gollakota
Did you mean to say "timeout" instead of "spill"? Spills don't cause task failures (unless a spill fails). Default timeout for a task is 10 min. It would be very helpful to have a stack trace to look at, at the very least. On Fri, Jan 10, 2014 at 7:53 AM, Zebeljan, Nebojsa < nebojsa.zebel...@adte

trying to understand local mode and core-site.xml

2014-01-10 Thread Peter Sanford
Hello everybody! I'm getting started with pig and I'm trying to understand how to configure io.compression.codecs for local mode. I've got something working but I'm not entirely clear on why it works or if there is a better way to do this. For some background, I'm trying to read data into pig fro

RE: Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Zebeljan, Nebojsa
Hi Serega, Default task attempts = 4 --> Yes, 4 task attempts Do you use any "balancing" properties, for eaxmple pig.exec.reducers.bytes.per.reducer --> No I suppose you have unbalanced data --> I guess so It's better to provide logs --> Unfortunately not possible any more "May be cleaned up by

Re: Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Serega Sheypak
"and after trying it on several datanodes in the end it failes" Default task attempts = 4? 1. It's better to provde logs 2. Do you use any "balancing" properties, for eaxmple pig.exec.reducers.bytes.per.reducer ? I suppose you have unbalanced data 2014/1/10 Zebeljan, Nebojsa > Hi, > I'm encou

Re: How to read a file generated by Pig+BinStorage using the HDFS API ?

2014-01-10 Thread Vincent Barat
Thanks for your help. I succeeded in reading my data. Here is the code: Path path = new Path("/mydata"); BinStorageRecordReader recordReader = new BinStorageRecordReader(); FileStatus fileStatus = fileSystem.getFileStatus(path); recordReader.initialize(new FileSplit(path, 0, file

Spilling issue - Optimize "GROUP BY"

2014-01-10 Thread Zebeljan, Nebojsa
Hi, I'm encountering for a "simple" pig script, spilling issues. All map tasks and reducers succeed pretty fast except the last reducer! The last reducer always starts spilling after ~10mins and after trying it on several datanodes in the end it failes. Do you have any idea, how I could optimize

Pig 0.11.1 Unable to check name problem

2014-01-10 Thread zhenyings...@hugedata.com.cn
Hi, Can anyone help me on this problem? http://stackoverflow.com/questions/19330860/apache-pig-error-6007-unable-to-check-name-message I still get the same error, I have also tried to setting the following to true pigServer.getPigContext().getProperties().setProperty("fs.hdfs.impl.disable.cac