: java -cp
/grid/0/gs/conf/current:/grid/0/jars/pig.jar
-Djava.library.path=/grid/0/gs/hadoop/current/lib/native/Linux-i386-32
-Dpig.tmpfilecompression=true -Dpig.tmpfilecompression.codec=lzo
org.apache.pig.Main ./test.pig
need to investigate the impact of compression on pig performance
is because the default compression is gzip
which is really slow and most of the time not what you want. Because of the
licensing issue with lzo, users need to setup it on their own. Once they do the
setup, they can enable the compression.
need to investigate the impact of compression on pig
-Dpig.tmpfilecompression=true -Dpig.tmpfilecompression.codec=lzo
org.apache.pig.Main ./test.pig
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https
Patch committed to trunk. Thanks Yan!
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
Issue Type: Test
Reporter: Olga
on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
Issue Type: Test
Reporter: Olga Natkovich
Assignee: Yan
Thank for quick turnaround Tejas.
Yan
-Original Message-
From: Thejas M Nair (JIRA) [mailto:j...@apache.org]
Sent: Wednesday, August 25, 2010 8:54 AM
To: pig-dev@hadoop.apache.org
Subject: [jira] Commented: (PIG-1501) need to investigate the impact of
compression on pig performance
of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
Issue Type: Test
Reporter: Olga Natkovich
to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
Issue Type: Test
Reporter
html files, SampleOptimizer.html and
org.apache.pig.impl.util.Utils.html.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse
wondering if the additional
unused features of TFile (index, metadata) result in any overhead compared to
SequenceFile.
need to investigate the impact of compression on pig performance
Key: PIG-1501
comparison. It
appears for compressed data, TFile performs better than SeqFile.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira
with going with lzo/Tfile. As the lzo libs are GPL we cannot ship with
that as default. I wasn't clear from your last comment which you were
proposing as the default.
need to investigate the impact of compression on pig performance
has it, at least in
my test cluster.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
grids) but you cannot
ship lzo with Hadoop or Pig.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
[
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yan Zhou updated PIG-1501:
--
Attachment: PIG-1501.patch
need to investigate the impact of compression on pig performance
is in line with
the general observation that gzip compresses better but performs worse.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org
go with LZO
compression on TFile with the default option to disable compression that will
be the old behavoir.
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL
) RCFile as above has not shown clear advantage in terms of better columnar
compression ratio. Bu this observation could be data-sensitive.
need to investigate the impact of compression on pig performance
Key: PIG-1501
in sequence files. For now we can continue with the same
serialization used in BinStorage, though in the future we may want to change
this as well.
need to investigate the impact of compression on pig performance
need to investigate the impact of compression on pig performance
Key: PIG-1501
URL: https://issues.apache.org/jira/browse/PIG-1501
Project: Pig
Issue Type: Test
will use pig.jar, pigperf.jar. Scripts is in test/utils/pigmix/scripts.
To generate data, use generate_data.sh. To run PigMix2, use runpigmix-adhoc.pl.
Pig Performance Benchmarks
--
Key: PIG-200
URL: https://issues.apache.org/jira/browse
pig 0.6 release? What error
message did you see?
Pig Performance Benchmarks
--
Key: PIG-200
URL: https://issues.apache.org/jira/browse/PIG-200
Project: Pig
Issue Type: Task
Reporter: Amir Youssefi
to
generate input data for Pigmix is:
1. apply perf-0.6.patch on pig 0.6 release
2. ant jar compile-test
3. export PIG_HOME=.
4. test/utils/pigmix/datagen/generate_data.sh
Pig Performance Benchmarks
--
Key: PIG-200
URL: https
things in the perf.patch.
I want to generate data set and use those 14 pig queries for benchmarking.
Would you mind telling me more on how to use the perf.patch?
Thanks
Duncan
Pig Performance Benchmarks
--
Key: PIG-200
URL: https
[
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates reassigned PIG-200:
--
Assignee: Alan Gates
Pig Performance Benchmarks
--
Key: PIG
be installed on top of perf.patch.
The design doc is here.
http://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop
Pig Performance Benchmarks
--
Key: PIG-200
URL: https://issues.apache.org/jira/browse/PIG-200
Project: Pig
://twiki.corp.yahoo.com/view/Tiger/DataGeneratorHadoop
Pig Performance Benchmarks
--
Key: PIG-200
URL: https://issues.apache.org/jira/browse/PIG-200
Project: Pig
Issue Type: Task
Reporter: Amir
://wiki.apache.org/pig/DataGeneratorHadoop
Pig Performance Benchmarks
--
Key: PIG-200
URL: https://issues.apache.org/jira/browse/PIG-200
Project: Pig
Issue Type: Task
Reporter: Amir Youssefi
Attachments
the SIGMOD 2009 paper.
https://issues.apache.org/jira/browse/HIVE-396
We also spent a lot of time in writing pig programs for those queries, and we
have some preliminary results.
Will somebody from the pig team take a look and help improve the pig queries?
Pig Performance Benchmarks
[
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich resolved PIG-200.
Resolution: Fixed
PigMix is out set of benchmarks going forward.
Pig Performance Benchmarks
: Saturday, December 20, 2008 10:33 AM
To: pig-dev@hadoop.apache.org
Cc: pig-dev@hadoop.apache.org
Subject: Re: Pig performance
I think the key points that Alan brought up in his blog
comment were that trunk pig is paradoxically not the most
current and that storing intermediate results can
I left a comment on the blog addressing some of the issues he brought
up.
Alan.
On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:
Hey Pig team,
Did anyone check out the recent claims about Pig's poor performance
versus
Cascading? Though I haven't worked extensively with either
33 matches
Mail list logo