allow pig to write output into a JDBC db
Key: PIG-1229
URL: https://issues.apache.org/jira/browse/PIG-1229
Project: Pig
Issue Type: New Feature
Components: impl
Reporter: Ian
[
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Holsman updated PIG-1229:
-
Attachment: DbStorage.java
allow pig to write output into a JDBC db
Hey Jian,
Hive supports arbitrary procedural languages through Hadoop Streaming; see
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
Also, both Hive and Pig have support for handling skewed joins if you use
their higher-level interface. See
Hi Jeff,
Thank you Jeff.
I known Hive has handling skewed join, but I think it is not enough:
1.Need cost sample
2.Can't control the size of a task
3.Not exact
4.Must use Hive or Pig
I think this is a fundamental solution for skew problem by adding balacne
between map and reduce. Maybe I need
We can regards a task as a sleep call, the parameter of sleep is the time
long.
sleep(N) - For hive ,the N is not certain
sleep(M) - For MBR, the M is certain
2010/2/8 jian yi eyj...@gmail.com
Hi Jeff,
Thank you Jeff.
I known Hive has handling skewed join, but I think it is not enough:
Jian,
If what you are looking for is something that will let you deal with
skewed data and forget about how the underlying distributed system
works, both Pig and Hive will help you do that to some extent. If you
are looking for something that will let you exercise fine-grained
control over
Also, I looked at the idea you posted and it seems to me that your
balance step is in effect the sample step Pig's skewed data solution
implements. Except your balance step needs 100% of the data.
Consider how your balancing works when there's 1000 map tasks, each of
which produces outputs that
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831006#action_12831006
]
Hadoop QA commented on PIG-1227:
-1 overall. Here are the results of testing the latest
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831008#action_12831008
]
Hadoop QA commented on PIG-1227:
-1 overall. Here are the results of testing the latest
[
https://issues.apache.org/jira/browse/PIG-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep Kamath resolved PIG-1228.
-
Resolution: Invalid
Seems like a jira created by accident
\
-
Key: PIG-1228
[
https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831048#action_12831048
]
Olga Natkovich commented on PIG-1224:
-
+1; please. commit
Collected group should change
[
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831052#action_12831052
]
Aaron Kimball commented on PIG-1229:
Ian,
This class looks reasonable to me. You'll
[
https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-1224:
--
Resolution: Fixed
Status: Resolved (was: Patch Available)
Patch checked-in.
Collected
DataBagIterator.hasNext() should be idempotent
--
Key: PIG-1231
URL: https://issues.apache.org/jira/browse/PIG-1231
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions:
[
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang updated PIG-1140:
-
Status: Open (was: Patch Available)
[zebra] Use of Hadoop 2.0 APIs
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831086#action_12831086
]
Yan Zhou commented on PIG-1227:
---
The patch is only applicable to the Apache 0.6 branch only,
[
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831090#action_12831090
]
Xuefu Zhang commented on PIG-1140:
--
New submission. It includes changes required for PIG
[
https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-1230:
--
Status: Patch Available (was: Open)
Streaming input in POJoinPackage should use nonspillable
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Wang updated PIG-1227:
---
Patch looks good +1.
[zebra] Missing column group meta file should not be allowed at query time
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831154#action_12831154
]
Yan Zhou commented on PIG-1227:
---
Patch committed to the 0.6 branch.
[zebra] Missing column
[
https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yan Zhou updated PIG-1227:
--
Resolution: Fixed
Status: Resolved (was: Patch Available)
[zebra] Missing column group meta file
[
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-1215:
--
Attachment: pig-1215.patch
With this patch, Job ids will now be printed as:
2010-02-08
[
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-1215:
--
Status: Patch Available (was: Open)
Make Hadoop jobId more prominent in the client log
[
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831205#action_12831205
]
Alan Gates commented on PIG-259:
Sorry, I missed that it was already for load-store redesign.
[
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831212#action_12831212
]
Dmitriy V. Ryaboy commented on PIG-259:
---
Doesn't the StoreFunc take care of resource
[
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831222#action_12831222
]
Alan Gates commented on PIG-259:
If we make overwrite part of the language (as the JIRA
[
https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831224#action_12831224
]
Hadoop QA commented on PIG-1230:
-1 overall. Here are the results of testing the latest
[
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1231:
Status: Patch Available (was: Open)
DataBagIterator.hasNext() should be idempotent
[
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1231:
Attachment: PIG-1231-1.patch
DefaultDataBagIterator is the only DataBag has this problem. Other databag
[
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831237#action_12831237
]
Dmitriy V. Ryaboy commented on PIG-259:
---
Yeah I think it makes more sense on that level.
[
https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-1230:
--
Attachment: pig-1230_1.patch
Fixed findbugs warnings. Result of test-patch:
{code}
[exec]
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831248#action_12831248
]
Viraj Bhat commented on PIG-1131:
-
Olga I marked it as critical since we mention that Pig can
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831251#action_12831251
]
Viraj Bhat commented on PIG-1131:
-
Ashutosh I was able to recreate a similar problem using
[
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1231:
Description:
DefaultDataBagIterator.hasNext() is not repeatable when the below conditions
met:
1. There is
[
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831309#action_12831309
]
Hadoop QA commented on PIG-1215:
-1 overall. Here are the results of testing the latest
[
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831316#action_12831316
]
Hadoop QA commented on PIG-1215:
-1 overall. Here are the results of testing the latest
[
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-834:
-
Status: Patch Available (was: Open)
Trying to get hudson going on this.
incorrect plan when
[
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated PIG-834:
-
Status: Open (was: Patch Available)
incorrect plan when algebraic functions are nested
[
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankur reassigned PIG-1229:
--
Assignee: Ankur
allow pig to write output into a JDBC db
[
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831337#action_12831337
]
Ankur commented on PIG-1229:
Aaron, Thanks for the suggestions.
I'll have an updated patch coming
40 matches
Mail list logo