[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Yan Zhou (JIRA)
gzip if the compression was made on by default. Currently, the compression has to be specified and takes no default value. This is to ask user to take full appreciation of pros and cons of either compression method. > need to investigate the impact of compression on pig perf

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Olga Natkovich (JIRA)
t is because the default compression is gzip which is really slow and most of the time not what you want. Because of the licensing issue with lzo, users need to setup it on their own. Once they do the setup, they can enable the compression. > need to investigate the impact of compression

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Ashutosh Chauhan (JIRA)
n is there any specific reason to default pig.tmpfilecompression to false. This seems to be a useful feature, so it should be true by default, no ? > need to investigate the impact of compression on pig performance > > >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Yan Zhou (JIRA)
ueryterm' as (query_term); C = join B1 by query_term, B by query_term using 'skewed' parallel 300; D = distinct C parallel 300; store D into 'output.lzo'; which is launched as follows: java -cp /grid/0/gs/conf/current:/grid/0/jars/pig.jar -Djava.library.path=/grid/0/gs/hadoop

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Yan Zhou (JIRA)
rallel 300; store D into 'output.lzo'; which is launched as follows: java -cp /grid/0/gs/conf/current:/grid/0/jars/pig.jar -Djava.library.path=/grid/0/gs/hadoop/current/lib/native/Linux-i386-32 -Dpig.tmpfilecompression=true -Dpig.tmpfilecom

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-26 Thread Thejas M Nair (JIRA)
Patch committed to trunk. Thanks Yan! > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-26 Thread Yan Zhou (JIRA)
/grid/0/gs/hadoop/current/lib/native/Linux-i386-32 -Dpig.tmpfilecompression=true -Dpig.tmpfilecompression.codec=lzo org.apache.pig.Main ./test.pig > need to investigate the impact of compression on pig performance > > &

RE: [jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Yan Zhou
Thank for quick turnaround Tejas. Yan -Original Message- From: Thejas M Nair (JIRA) [mailto:j...@apache.org] Sent: Wednesday, August 25, 2010 8:54 AM To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Thejas M Nair (JIRA)
ression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 > Project: Pig > Issue Type: Test >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-25 Thread Yan Zhou (JIRA)
to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 > Project: Pig > Issue Type: Test >

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-24 Thread Thejas M Nair (JIRA)
impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 > Project: Pig > Issue Type: Test >

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-20 Thread Yan Zhou (JIRA)
arnings are on two html files, SampleOptimizer.html and org.apache.pig.impl.util.Utils.html. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: h

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-20 Thread Yan Zhou (JIRA)
eed to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 > Project: Pig >

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-11 Thread Yan Zhou (JIRA)
e vs TFile comparison. It appears for compressed data, TFile performs better than SeqFile. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https:

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-11 Thread Thejas M Nair (JIRA)
I am wondering if the additional unused features of TFile (index, metadata) result in any overhead compared to SequenceFile. > need to investigate the impact of compression on pig performance > > >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1501: -- Attachment: PIG-1501.patch > need to investigate the impact of compression on pig performa

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Alan Gates (JIRA)
its grids) but you cannot ship lzo with Hadoop or Pig. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/j

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Yan Zhou (JIRA)
rs that Hadoop installation has it, at least in my test cluster. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/P

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-10 Thread Alan Gates (JIRA)
#x27;m +1 with going with lzo/Tfile. As the lzo libs are GPL we cannot ship with that as default. I wasn't clear from your last comment which you were proposing as the default. > need to investigate the impact of compress

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-09 Thread Yan Zhou (JIRA)
I'll go with LZO compression on TFile with the default option to disable compression that will be the old behavoir. > need to investigate the impact of compression on pig performance > > >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-09 Thread Yan Zhou (JIRA)
in line with the general observation that gzip compresses better but performs worse. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 >

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-29 Thread Yan Zhou (JIRA)
test results as an attachment. > need to investigate the impact of compression on pig performance > > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 >

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-29 Thread Yan Zhou (JIRA)
negligible within background noise or within a few percentages of the overall run times. But this is not conclusive yet. Larger and more real life queries would be more suitable for the comparison purpose ; 5) RCFile

[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-15 Thread Alan Gates (JIRA)
mpression in sequence files. For now we can continue with the same serialization used in BinStorage, though in the future we may want to change this as well. > need to investigate the impact of compression on pig

[jira] Created: (PIG-1501) need to investigate the impact of compression on pig performance

2010-07-13 Thread Olga Natkovich (JIRA)
need to investigate the impact of compression on pig performance Key: PIG-1501 URL: https://issues.apache.org/jira/browse/PIG-1501 Project: Pig Issue Type: Test

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-07-06 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-200: --- Attachment: (was: pigmix2.patch) > Pig Performance Benchma

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-07-06 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-200: --- Attachment: pigmix2.patch > Pig Performance Benchmarks > -- > >

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-06-15 Thread Daniel Dai (JIRA)
will use pig.jar, pigperf.jar. Scripts is in test/utils/pigmix/scripts. To generate data, use generate_data.sh. To run PigMix2, use runpigmix-adhoc.pl. > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-30 Thread Daniel Dai (JIRA)
you using pig 0.6 release? What error message did you see? > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Ta

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-28 Thread duncan (JIRA)
t;ant jar compile-test". What do I need to installed before I execute this command? Thanks Duncan > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Pr

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-18 Thread duncan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846782#action_12846782 ] duncan commented on PIG-200: Thank you very Daniel~ > Pig Performance Ben

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-03-15 Thread Daniel Dai (JIRA)
to generate input data for Pigmix is: 1. apply perf-0.6.patch on pig 0.6 release 2. ant jar compile-test 3. export PIG_HOME=. 4. test/utils/pigmix/datagen/generate_data.sh > Pig Performance Benchmarks > -- > > Key: PIG-200 >

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-15 Thread Daniel Dai (JIRA)
http://www.eli.sdsu.edu/java-SDSU/sdsuLibJKD12.jar, and put in your lib > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Task &g

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-14 Thread duncan (JIRA)
rent things in the perf.patch. I want to generate data set and use those 14 pig queries for benchmarking. Would you mind telling me more on how to use the perf.patch? Thanks Duncan > Pig Performance Benchmarks > -- > > Key: PIG-200 >

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-13 Thread Daniel Dai (JIRA)
rate the input file for the queries. > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Task >

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-13 Thread duncan (JIRA)
h in order to run those 14 queries? > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Task >

[jira] Assigned: (PIG-200) Pig Performance Benchmarks

2009-11-11 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-200: -- Assignee: Alan Gates > Pig Performance Benchmarks > -- > >

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Ying He (JIRA)
http://wiki.apache.org/pig/DataGeneratorHadoop > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Task >

[jira] Issue Comment Edited: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Olga Natkovich (JIRA)
here. http://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: T

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2009-08-03 Thread Ying He (JIRA)
be installed on top of perf.patch. The design doc is here. http://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2009-06-19 Thread Zheng Shao (JIRA)
the SIGMOD 2009 paper. https://issues.apache.org/jira/browse/HIVE-396 We also spent a lot of time in writing pig programs for those queries, and we have some preliminary results. Will somebody from the pig team take a look and help improve the pig queries? > Pig Performance Ben

Re: Pig Performance Benchmarks

2009-02-17 Thread Alan Gates
That's correct. The 10m in the names weren't really meant to be hardcoded into the patch, as the idea is that the tables could be created at different sizes depending on your cluster size. Sorry for the incomplete state of things, obviously that patch needs some work before I commit it.

Pig Performance Benchmarks

2009-02-13 Thread Ashutosh Chauhan
Hi Alan & Others, I am using pigmix patch at: https://issues.apache.org/jira/browse/PIG-200 and want to generate test data and run pigmix queries on it. As I understand, shell scripts in the patch are intended to generate data for pigmix queries. I have been able to adapt the shell scripts, map-re

[jira] Resolved: (PIG-200) Pig Performance Benchmarks

2009-01-26 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-200. Resolution: Fixed PigMix is out set of benchmarks going forward. > Pig Performance Benchma

Re: Pig performance

2008-12-31 Thread Alan Gates
computations across multiple stores. Olga -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Saturday, December 20, 2008 10:33 AM To: pig-dev@hadoop.apache.org Cc: pig-dev@hadoop.apache.org Subject: Re: Pig performance I think the key points that Alan brought up

Re: Pig performance

2008-12-30 Thread Kevin Weil
combine computations across > multiple stores. > > Olga > > > -Original Message- > > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > > Sent: Saturday, December 20, 2008 10:33 AM > > To: pig-dev@hadoop.apache.org > > Cc: pig-dev@hadoop.apache.

RE: Pig performance

2008-12-22 Thread Olga Natkovich
> Sent: Saturday, December 20, 2008 10:33 AM > To: pig-dev@hadoop.apache.org > Cc: pig-dev@hadoop.apache.org > Subject: Re: Pig performance > > > I think the key points that Alan brought up in his blog > comment were that trunk pig is paradoxically not the most > curr

Re: Pig performance

2008-12-20 Thread Ted Dunning
I think the key points that Alan brought up in his blog comment were that trunk pig is paradoxically not the most current and that storing intermediate results can decrease the scope of optimizations. On Dec 20, 2008, at 10:16, Alan Gates wrote: I left a comment on the blog addressing som

Re: Pig performance

2008-12-20 Thread Alan Gates
I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote: Hey Pig team, Did anyone check out the recent claims about Pig's poor performance versus Cascading? Though I haven't worked extensively with either system,

Pig performance

2008-12-20 Thread Jeff Hammerbacher
Hey Pig team, Did anyone check out the recent claims about Pig's poor performance versus Cascading? Though I haven't worked extensively with either system, I found the statements made fairly bold and am curious to hear more about their validity from the Pig development team: http://www.manamplifie

[jira] Updated: (PIG-200) Pig Performance Benchmarks

2008-12-04 Thread Alan Gates (JIRA)
benchmarks for pig. It contains a set of 14 queries which are designed to try to cover a range of ways users use pig. It also includes implementations of the same queries in java code for map reduce, so that developers can compare pig performance against map reduce performance. See http