[jira] Created: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ian Holsman (JIRA)
allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian

[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ian Holsman (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Holsman updated PIG-1229: - Attachment: DbStorage.java allow pig to write output into a JDBC db

Re: Will Pig support SQL?

2010-02-08 Thread Jeff Hammerbacher
Hey Jian, Hive supports arbitrary procedural languages through Hadoop Streaming; see http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more. Also, both Hive and Pig have support for handling skewed joins if you use their higher-level interface. See

Re: Will Pig support SQL?

2010-02-08 Thread jian yi
Hi Jeff, Thank you Jeff. I known Hive has handling skewed join, but I think it is not enough: 1.Need cost sample 2.Can't control the size of a task 3.Not exact 4.Must use Hive or Pig I think this is a fundamental solution for skew problem by adding balacne between map and reduce. Maybe I need

Re: Will Pig support SQL?

2010-02-08 Thread jian yi
We can regards a task as a sleep call, the parameter of sleep is the time long. sleep(N) - For hive ,the N is not certain sleep(M) - For MBR, the M is certain 2010/2/8 jian yi eyj...@gmail.com Hi Jeff, Thank you Jeff. I known Hive has handling skewed join, but I think it is not enough:

Re: Will Pig support SQL?

2010-02-08 Thread Dmitriy Ryaboy
Jian, If what you are looking for is something that will let you deal with skewed data and forget about how the underlying distributed system works, both Pig and Hive will help you do that to some extent. If you are looking for something that will let you exercise fine-grained control over

Re: Will Pig support SQL?

2010-02-08 Thread Dmitriy Ryaboy
Also, I looked at the idea you posted and it seems to me that your balance step is in effect the sample step Pig's skewed data solution implements. Except your balance step needs 100% of the data. Consider how your balancing works when there's 1000 map tasks, each of which produces outputs that

[jira] Commented: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831006#action_12831006 ] Hadoop QA commented on PIG-1227: -1 overall. Here are the results of testing the latest

[jira] Commented: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831008#action_12831008 ] Hadoop QA commented on PIG-1227: -1 overall. Here are the results of testing the latest

[jira] Resolved: (PIG-1228) \

2010-02-08 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1228. - Resolution: Invalid Seems like a jira created by accident \ - Key: PIG-1228

[jira] Commented: (PIG-1224) Collected group should change to use new (internal) bag

2010-02-08 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831048#action_12831048 ] Olga Natkovich commented on PIG-1224: - +1; please. commit Collected group should change

[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Aaron Kimball (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831052#action_12831052 ] Aaron Kimball commented on PIG-1229: Ian, This class looks reasonable to me. You'll

[jira] Updated: (PIG-1224) Collected group should change to use new (internal) bag

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1224: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked-in. Collected

[jira] Created: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)
DataBagIterator.hasNext() should be idempotent -- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions:

[jira] Updated: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-08 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1140: - Status: Open (was: Patch Available) [zebra] Use of Hadoop 2.0 APIs

[jira] Commented: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831086#action_12831086 ] Yan Zhou commented on PIG-1227: --- The patch is only applicable to the Apache 0.6 branch only,

[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-08 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831090#action_12831090 ] Xuefu Zhang commented on PIG-1140: -- New submission. It includes changes required for PIG

[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1230: -- Status: Patch Available (was: Open) Streaming input in POJoinPackage should use nonspillable

[jira] Updated: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Chao Wang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1227: --- Patch looks good +1. [zebra] Missing column group meta file should not be allowed at query time

[jira] Commented: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831154#action_12831154 ] Yan Zhou commented on PIG-1227: --- Patch committed to the 0.6 branch. [zebra] Missing column

[jira] Updated: (PIG-1227) [zebra] Missing column group meta file should not be allowed at query time

2010-02-08 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1227: -- Resolution: Fixed Status: Resolved (was: Patch Available) [zebra] Missing column group meta file

[jira] Updated: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1215: -- Attachment: pig-1215.patch With this patch, Job ids will now be printed as: 2010-02-08

[jira] Updated: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1215: -- Status: Patch Available (was: Open) Make Hadoop jobId more prominent in the client log

[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-08 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831205#action_12831205 ] Alan Gates commented on PIG-259: Sorry, I missed that it was already for load-store redesign.

[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-08 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831212#action_12831212 ] Dmitriy V. Ryaboy commented on PIG-259: --- Doesn't the StoreFunc take care of resource

[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-08 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831222#action_12831222 ] Alan Gates commented on PIG-259: If we make overwrite part of the language (as the JIRA

[jira] Commented: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

2010-02-08 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831224#action_12831224 ] Hadoop QA commented on PIG-1230: -1 overall. Here are the results of testing the latest

[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Status: Patch Available (was: Open) DataBagIterator.hasNext() should be idempotent

[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Attachment: PIG-1231-1.patch DefaultDataBagIterator is the only DataBag has this problem. Other databag

[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-08 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831237#action_12831237 ] Dmitriy V. Ryaboy commented on PIG-259: --- Yeah I think it makes more sense on that level.

[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1230: -- Attachment: pig-1230_1.patch Fixed findbugs warnings. Result of test-patch: {code} [exec]

[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831248#action_12831248 ] Viraj Bhat commented on PIG-1131: - Olga I marked it as critical since we mention that Pig can

[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831251#action_12831251 ] Viraj Bhat commented on PIG-1131: - Ashutosh I was able to recreate a similar problem using

[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-08 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Description: DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is

[jira] Commented: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-02-08 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831309#action_12831309 ] Hadoop QA commented on PIG-1215: -1 overall. Here are the results of testing the latest

[jira] Commented: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-02-08 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831316#action_12831316 ] Hadoop QA commented on PIG-1215: -1 overall. Here are the results of testing the latest

[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-834: - Status: Patch Available (was: Open) Trying to get hudson going on this. incorrect plan when

[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-08 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-834: - Status: Open (was: Patch Available) incorrect plan when algebraic functions are nested

[jira] Assigned: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur reassigned PIG-1229: -- Assignee: Ankur allow pig to write output into a JDBC db

[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831337#action_12831337 ] Ankur commented on PIG-1229: Aaron, Thanks for the suggestions. I'll have an updated patch coming