[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Yan Zhou (JIRA)
: java -cp /grid/0/gs/conf/current:/grid/0/jars/pig.jar -Djava.library.path=/grid/0/gs/hadoop/current/lib/native/Linux-i386-32 -Dpig.tmpfilecompression=true -Dpig.tmpfilecompression.codec=lzo org.apache.pig.Main ./test.pig need to investigate the impact of compression on pig performance

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Olga Natkovich (JIRA)
is because the default compression is gzip which is really slow and most of the time not what you want. Because of the licensing issue with lzo, users need to setup it on their own. Once they do the setup, they can enable the compression. need to investigate the impact of compression on pig

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-26 Thread Yan Zhou (JIRA)
-Dpig.tmpfilecompression=true -Dpig.tmpfilecompression.codec=lzo org.apache.pig.Main ./test.pig need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-26 Thread Thejas M Nair (JIRA)
Patch committed to trunk. Thanks Yan! need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Yan Zhou (JIRA)
the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test Reporter: Olga

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Thejas M Nair (JIRA)
on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test Reporter: Olga Natkovich Assignee: Yan

RE: [jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Yan Zhou
Thank for quick turnaround Tejas. Yan -Original Message- From: Thejas M Nair (JIRA) [mailto:j...@apache.org] Sent: Wednesday, August 25, 2010 8:54 AM To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-24 Thread Thejas M Nair (JIRA)
of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test Reporter: Olga Natkovich

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-20 Thread Yan Zhou (JIRA)
to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test Reporter

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-20 Thread Yan Zhou (JIRA)
html files, SampleOptimizer.html and org.apache.pig.impl.util.Utils.html. need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-11 Thread Thejas M Nair (JIRA)
wondering if the additional unused features of TFile (index, metadata) result in any overhead compared to SequenceFile. need to investigate the impact of compression on pig performance Key: PIG-1501

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-11 Thread Yan Zhou (JIRA)
comparison. It appears for compressed data, TFile performs better than SeqFile. need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Alan Gates (JIRA)
with going with lzo/Tfile. As the lzo libs are GPL we cannot ship with that as default. I wasn't clear from your last comment which you were proposing as the default. need to investigate the impact of compression on pig performance

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Yan Zhou (JIRA)
has it, at least in my test cluster. need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Alan Gates (JIRA)
grids) but you cannot ship lzo with Hadoop or Pig. need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1501: -- Attachment: PIG-1501.patch need to investigate the impact of compression on pig performance

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-09 Thread Yan Zhou (JIRA)
is in line with the general observation that gzip compresses better but performs worse. need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-09 Thread Yan Zhou (JIRA)
go with LZO compression on TFile with the default option to disable compression that will be the old behavoir. need to investigate the impact of compression on pig performance Key: PIG-1501 URL

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-29 Thread Yan Zhou (JIRA)
) RCFile as above has not shown clear advantage in terms of better columnar compression ratio. Bu this observation could be data-sensitive. need to investigate the impact of compression on pig performance Key: PIG-1501

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-15 Thread Alan Gates (JIRA)
in sequence files. For now we can continue with the same serialization used in BinStorage, though in the future we may want to change this as well. need to investigate the impact of compression on pig performance

[jira] Created: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-13 Thread Olga Natkovich (JIRA)
need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-06-16 Thread Daniel Dai (JIRA)
will use pig.jar, pigperf.jar. Scripts is in test/utils/pigmix/scripts. To generate data, use generate_data.sh. To run PigMix2, use runpigmix-adhoc.pl. Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-30 Thread Daniel Dai (JIRA)
pig 0.6 release? What error message did you see? Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-03-15 Thread Daniel Dai (JIRA)
to generate input data for Pigmix is: 1. apply perf-0.6.patch on pig 0.6 release 2. ant jar compile-test 3. export PIG_HOME=. 4. test/utils/pigmix/datagen/generate_data.sh Pig Performance Benchmarks -- Key: PIG-200 URL: https

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-14 Thread duncan (JIRA)
things in the perf.patch. I want to generate data set and use those 14 pig queries for benchmarking. Would you mind telling me more on how to use the perf.patch? Thanks Duncan Pig Performance Benchmarks -- Key: PIG-200 URL: https

[jira] Assigned: (PIG-200) Pig Performance Benchmarks

2009-11-11 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-200: -- Assignee: Alan Gates Pig Performance Benchmarks -- Key: PIG

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Ying He (JIRA)
be installed on top of perf.patch. The design doc is here. http://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig

[jira] Issue Comment Edited: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Olga Natkovich (JIRA)
://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Ying He (JIRA)
://wiki.apache.org/pig/DataGeneratorHadoop Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Attachments

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2009-06-19 Thread Zheng Shao (JIRA)
the SIGMOD 2009 paper. https://issues.apache.org/jira/browse/HIVE-396 We also spent a lot of time in writing pig programs for those queries, and we have some preliminary results. Will somebody from the pig team take a look and help improve the pig queries? Pig Performance Benchmarks

[jira] Resolved: (PIG-200) Pig Performance Benchmarks

2009-01-26 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-200. Resolution: Fixed PigMix is out set of benchmarks going forward. Pig Performance Benchmarks

RE: Pig performance

2008-12-22 Thread Olga Natkovich
: Saturday, December 20, 2008 10:33 AM To: pig-dev@hadoop.apache.org Cc: pig-dev@hadoop.apache.org Subject: Re: Pig performance I think the key points that Alan brought up in his blog comment were that trunk pig is paradoxically not the most current and that storing intermediate results can

Re: Pig performance

2008-12-20 Thread Alan Gates
I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote: Hey Pig team, Did anyone check out the recent claims about Pig's poor performance versus Cascading? Though I haven't worked extensively with either