[
https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-67:
---
Attachment: FileLocalizer.java
get JobConf from PigMapReduce class so that reducers can operate on files as
well.
[
https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-67:
---
Attachment: (was: FileLocalizer.java)
FileLocalizer doesn't work on reduce sise
[
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732766#action_12732766
]
Ying He commented on PIG-792:
-
For MPCompiler, the job parallelism is reset to deal with situation
[
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-200:
Attachment: perf.hadoop.patch
perf.hadoop.patch is used to support running DataGenerator in hadoop mode. It
should
[
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738609#action_12738609
]
Ying He commented on PIG-200:
-
doc for DataGenerator in hadoop mode is here:
[
https://issues.apache.org/jira/browse/PIG-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-929:
Attachment: memusage.patch
change the default value of memusage for skewed join from 0.5 to 0.3.
Default value of
Skewed join fails when pig.skewedjoin.reduce.memusage is not configured
---
Key: PIG-954
URL: https://issues.apache.org/jira/browse/PIG-954
Project: Pig
Issue Type:
[
https://issues.apache.org/jira/browse/PIG-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753888#action_12753888
]
Ying He commented on PIG-954:
-
the sampling job fails when pig.skewedjoin.reduce.memusage is not
[
https://issues.apache.org/jira/browse/PIG-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-954:
Attachment: PIG-954.patch
use default value if pig.skewedjoin.reduce.memusage is not configured in pig
property file
[
https://issues.apache.org/jira/browse/PIG-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-954:
Attachment: PIG-954.patch
use final variable to define the default value of pig.skewedjoin.reduce.memusage
Skewed
Skewed join generates incorrect results
-
Key: PIG-955
URL: https://issues.apache.org/jira/browse/PIG-955
Project: Pig
Issue Type: Improvement
Reporter: Ying He
Fragmented replicated
[
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-955:
Attachment: PIG-955.patch
use tuple type to lookup skewed key map
Skewed join generates incorrect results
[
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754319#action_12754319
]
Ying He commented on PIG-955:
-
the sampling process generated a file which contains skewed keys
[
https://issues.apache.org/jira/browse/PIG-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-954:
Attachment: PIG-954.patch2
add JUnit test
Skewed join fails when pig.skewedjoin.reduce.memusage is not configured
[
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-955:
Description: SkewedPartitioner doesn't partition the skewed keys in
partition table (first table) correctly. This can
[
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-955:
Description: SkewedPartitioner doesn't the skewed keys in partition table
correctly. This can cause data loss. (was:
[
https://issues.apache.org/jira/browse/PIG-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-954:
Description: query fails if pig.skewedjoin.reduce.memusage is not
configured. (was: Fragmented replicated join has
[
https://issues.apache.org/jira/browse/PIG-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-929:
Description: default value pig.skewedjoin.reduce.memusage , which is used
in skewed join, should be set to 0.3 (was:
[
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-955:
Attachment: PIG-955.patch2
add Junit test
Skewed join generates incorrect results
[
https://issues.apache.org/jira/browse/PIG-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-961:
Attachment: PIG-961.patch
patch for pig to work with hadoop 21with new API
Integration with Hadoop 21
[
https://issues.apache.org/jira/browse/PIG-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-961:
Attachment: hadoop21.jar
hadoop jar file used by pig
Integration with Hadoop 21
--
[
https://issues.apache.org/jira/browse/PIG-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755748#action_12755748
]
Ying He commented on PIG-961:
-
there are a few problems while porting pig to hadoop 21 new API.
Need a databag that does not register with SpillableMemoryManager and spill
data pro-actively
-
Key: PIG-975
URL: https://issues.apache.org/jira/browse/PIG-975
[
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-975:
Description: POPackage uses DefaultDataBag during reduce process to hold
data. It is registered with
[
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759299#action_12759299
]
Ying He commented on PIG-975:
-
Answer to Olga's questions:
1. The synchronization can be removed.
[
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-975:
Attachment: PIG-975.patch3
remove synchronization
Need a databag that does not register with SpillableMemoryManager
[
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759681#action_12759681
]
Ying He commented on PIG-975:
-
I think this is too implementation specific to expose to end user.
[
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-975:
Attachment: PIG-975.patch4
Add switch to old bag. Setting property pig.cachedbag.type=default would
switch to old
InternalCachedBag.java generates javac warning and findbug warning
--
Key: PIG-1000
URL: https://issues.apache.org/jira/browse/PIG-1000
Project: Pig
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1000:
-
Attachment: PIG-1000.patch
fix javac warning and findbug warning
InternalCachedBag.java generates javac warning
[
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1000:
-
Description: patch submitted by PIG-975 generates javac warning and findbug
warning (was: POPackage uses
explain and dump not working with two UDFs inside inner plan of foreach
---
Key: PIG-1030
URL: https://issues.apache.org/jira/browse/PIG-1030
Project: Pig
Issue Type: Bug
[
https://issues.apache.org/jira/browse/PIG-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1030:
-
Description:
this scprit does not work
register /homes/yinghe/owl/string.jar;
a = load '/user/yinghe/a.txt' as
[
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1037:
-
Attachment: PIG-1037.patch
first cut of patch for initial testing purpose. regression tests are not done
yet. It
[
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1037:
-
Attachment: PIG-1037.patch2
fix javac and findbugs warnings
better memory layout and spill for sorted and
[
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770246#action_12770246
]
Ying He commented on PIG-1037:
--
Alan, thanks for the feedback.
For the calculation of average
[
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1037:
-
Attachment: PIG-1037.patch3
fix the comments and remove synchronization
better memory layout and spill for
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775054#action_12775054
]
Ying He commented on PIG-979:
-
Without patch from PIG-1038, this patch won't compile. So all tests
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775057#action_12775057
]
Ying He commented on PIG-979:
-
Without patch from PIG-1038, this patch won't compile. So all tests
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775058#action_12775058
]
Ying He commented on PIG-979:
-
Without patch from PIG-1038, this patch won't compile. So all tests
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775059#action_12775059
]
Ying He commented on PIG-979:
-
Without patch from PIG-1038, this patch won't compile. So all tests
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775184#action_12775184
]
Ying He commented on PIG-979:
-
Alan, thanks for the feedback.
1. A test case is already created
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-979:
Attachment: PIG-979.patch
patch to address Alan's comments.
Acummulator Interface for UDFs
[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776760#action_12776760
]
Ying He commented on PIG-979:
-
performance tests doesn't show noticeable difference between trunk
[
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1118:
-
Attachment: PIG_1118.patch
bug fix.
expression with aggregate functions returning null, with accumulate
[
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785043#action_12785043
]
Ying He commented on PIG-1118:
--
Olga, thank for review. A unit test is in the patch,
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Attachment: PIG_480.patch
patch to use identity map.
An IdentityMapOptimizer is applied when a MR plan contains at
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Status: Open (was: Patch Available)
this patch has a conflict with the new code that just checked in, which results
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Attachment: PIG_480.patch
fix the compilation error.
PERFORMANCE: Use identity mapper in a chain of M-R jobs
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787060#action_12787060
]
Ying He commented on PIG-480:
-
The javac warnings are caused by the references to hadoop
skewed join partitioner returns negative partition index
-
Key: PIG-1135
URL: https://issues.apache.org/jira/browse/PIG-1135
Project: Pig
Issue Type: Improvement
Reporter:
[
https://issues.apache.org/jira/browse/PIG-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1135:
-
Description: skewed join returns negative reducer index (was: Fragmented
replicated join has a few limitations:
[
https://issues.apache.org/jira/browse/PIG-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1135:
-
Status: Patch Available (was: Open)
skewed join partitioner returns negative partition index
[
https://issues.apache.org/jira/browse/PIG-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1135:
-
Attachment: PIG_1135.patch
if reducer index is greater than 128, the index of streaming table becomes
negative
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Status: Open (was: Patch Available)
cancel this patch to add new patch to support combiner
PERFORMANCE: Use
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Attachment: PIG_480.patch
add support for combiner
PERFORMANCE: Use identity mapper in a chain of M-R jobs
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797412#action_12797412
]
Ying He commented on PIG-480:
-
I did more performance tests. It shows the performance is related
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-480:
Status: Patch Available (was: Open)
PERFORMANCE: Use identity mapper in a chain of M-R jobs
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1178:
-
Attachment: lp.patch
initial drop for new logical plan framework
LogicalPlan and Optimizer are too complex and
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798273#action_12798273
]
Ying He commented on PIG-1178:
--
yes, the operator plan that Rule.match returned has the same
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798880#action_12798880
]
Ying He commented on PIG-1178:
--
the Rule.match() finds a potential match and delegate to
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1178:
-
Attachment: PIG_1178.patch
add test cases to TestExperimentalRule, and fix findbugs problems
LogicalPlan and
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1178:
-
Status: Patch Available (was: Open)
LogicalPlan and Optimizer are too complex and hard to work with
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799370#action_12799370
]
Ying He commented on PIG-480:
-
I did some tests with larger data set, and the results are
[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799376#action_12799376
]
Ying He commented on PIG-480:
-
the option to turn it off is already there. Use
[
https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1185:
-
Attachment: PIG_1185.patch
close files are spill file is read out.
Data bags do not close spill files after
[
https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799502#action_12799502
]
Ying He commented on PIG-1185:
--
this patch doesn't contain any junit test, because I can't
[
https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1185:
-
Status: Patch Available (was: Open)
Data bags do not close spill files after using iterator to read tuples
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799859#action_12799859
]
Ying He commented on PIG-1178:
--
a couple questions on expression operators:
1. in
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800289#action_12800289
]
Ying He commented on PIG-1178:
--
+1
LogicalPlan and Optimizer are too complex and hard to work
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800353#action_12800353
]
Ying He commented on PIG-1178:
--
to answer Daniel's questions:
. In Rule.match, is
[
https://issues.apache.org/jira/browse/PIG-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802439#action_12802439
]
Ying He commented on PIG-1195:
--
+1
POSort should take care of sort order
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1178:
-
Attachment: lp.patch
patch to add relational operator, optimization rules and logical plan migration
visitor
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1178:
-
Status: Patch Available (was: Open)
LogicalPlan and Optimizer are too complex and hard to work with
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803856#action_12803856
]
Ying He commented on PIG-1178:
--
Alan,thanks for the review.
for 6), the predecessor of LOFilter
explain plan throws out exception
--
Key: PIG-1202
URL: https://issues.apache.org/jira/browse/PIG-1202
Project: Pig
Issue Type: Bug
Reporter: Ying He
run the following script
a = load
cast ends up with NULL value
Key: PIG-1222
URL: https://issues.apache.org/jira/browse/PIG-1222
Project: Pig
Issue Type: Bug
Reporter: Ying He
I want to generate data with bags, so I did this,
take
It's better for POPartitionRearrange to use List instead of DataBag to hold
duplicated tuples for partitioned keys
--
Key: PIG-1225
URL:
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832164#action_12832164
]
Ying He commented on PIG-1178:
--
for the annotation resetting, I think it can be implemented as a
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832313#action_12832313
]
Ying He commented on PIG-1178:
--
Here is my thoughts to use this framework to implement
Accumulator is turned on when a map is used with a non-accumulative UDF
---
Key: PIG-1241
URL: https://issues.apache.org/jira/browse/PIG-1241
Project: Pig
Issue Type: Bug
[
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1241:
-
Attachment: accum.patch
patch to check UDF when it's with map operation
Accumulator is turned on when a map is
[
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ying He updated PIG-1241:
-
Status: Patch Available (was: Open)
Accumulator is turned on when a map is used with a non-accumulative UDF
[
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834955#action_12834955
]
Ying He commented on PIG-1241:
--
no, by default it is on.
boolean isAccum =
84 matches
Mail list logo