Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Thejas Nair
Yes, that should work. I will use InputFormat.getNext from the SampleLoader to skip the records. Thanks, Thejas On 11/3/09 6:39 PM, Alan Gates ga...@yahoo-inc.com wrote: We definitely want to avoid parsing every tuple when sampling. But do we need to implement a special function for it? Pig

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Status: Open (was: Patch Available) Support of HBase 0.20.0 --- Key:

LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Thejas Nair
In the new implementation of SampleLoader subclasses (used by order-by, skew-join ..) as part of the loader redesign, we are not only reading all the records input but also parsing them as pig tuples. This is because the SampleLoaders are wrappers around the actual input loaders specified in the

[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: LeftOuterFRJoin.patch Attaching a new patch. The join now only supports two way Left join.

[jira] Commented: (PIG-1048) inner join using 'skewed' produces multiple rows for keys with single row in both input relations

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1277#action_1277 ] Alan Gates commented on PIG-1048: - When attempting to apply this patch to the 0.5 branch, I

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773103#action_12773103 ] Alan Gates commented on PIG-970: When I run TestHBaseStorage now I get: Testcase:

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-970: --- Attachment: test-output.tgz TEST-org.apache.pig.test.TestHBaseStorage.txt Test run results plus

Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Alan Gates
We definitely want to avoid parsing every tuple when sampling. But do we need to implement a special function for it? Pig will have access to the InputFormat instance, correct? Can it not call InputFormat.getNext the desired number of times (which will not parse the tuple) and then call

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773348#action_12773348 ] Alan Gates commented on PIG-970: afterside:~/src/pig/PIG-970-3/trunk jar tf

[jira] Resolved: (PIG-1002) FINDBUGS: BC: Equals method should not assume anything about the type of its argument

2009-11-03 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1002. - Resolution: Fixed this has been addressed in other JIRAs FINDBUGS: BC: Equals method should not

[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1036: Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Status:

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773309#action_12773309 ] Jeff Zhang commented on PIG-970: Alan, do you have file hbase-site.xml in folder test ? ( I

[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: (was: LeftOuterFRJoin.patch) Fragment-replicate left outer join

[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-11-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773295#action_12773295 ] Pradeep Kamath commented on PIG-966: I have updated

[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Open (was: Patch Available) Fragment-replicate left outer join --

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773297#action_12773297 ] Jeff Zhang commented on PIG-970: yes, Alan, Could you attach the whole log including the logs

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Attachment: (was: Pig_HBase_0.20.0.patch) Support of HBase 0.20.0 ---

[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings

2009-11-03 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed. Thanks, Pradeep,

[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773181#action_12773181 ] Pradeep Kamath commented on PIG-958: bq. 2. Deleting the temporary directory manually in

RE: two-level access problem?

2009-11-03 Thread Pradeep Kamath
The twoLevelAccessRequired flag is not quite a long term solution to the problem. The problem is that we treat output of relations to be bags but their schemas do NOT have twoLevelAccessRequired to be true. Only bag constants and bags from input data have this flag set to true. We need to move

[jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773192#action_12773192 ] Alan Gates commented on PIG-997: After applying this patch TestColumnSecurity fails. The

[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Patch Available (was: Open) Fragment-replicate left outer join --

Pig 0.5.0 is released!

2009-11-03 Thread Olga Natkovich
Pig Team is happy to announce Pig 0.5.0 release! Pig is a Hadoop subproject that provides high-level data-flow language and an execution framework for parallel computation on a Hadoop cluster. More details about Pig can be found at http://hadoop.apache.org/pig/. This release makes

[jira] Assigned: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-03 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1071: - Assignee: Richard Ding Support comma separated file/directory names in load statements

[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773184#action_12773184 ] Pradeep Kamath commented on PIG-958: I saw compile errors while trying to run unit test:

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773314#action_12773314 ] Alan Gates commented on PIG-970: Yes, it's there. Support of HBase 0.20.0

[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Status: Open (was: Patch Available) The failure is due to a misplaced test in the nightly suite. I'm going to

[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings

2009-11-03 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Status: Patch Available (was: Open) FINDBUGS: remaining Correctness Warnings

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Attachment: Pig_HBase_0.20.0.patch Alan, I find the problem. Before in eclipse I put the output folder to

[jira] Created: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-03 Thread Richard Ding (JIRA)
Support comma separated file/directory names in load statements --- Key: PIG-1071 URL: https://issues.apache.org/jira/browse/PIG-1071 Project: Pig Issue Type: New Feature

[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773273#action_12773273 ] Hadoop QA commented on PIG-1036: +1 overall. Here are the results of testing the latest

[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Attachment: SortedTable.patch [zebra] Sorted Table Support by Zebra -

[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings

2009-11-03 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Status: Open (was: Patch Available) FINDBUGS: remaining Correctness Warnings

[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773229#action_12773229 ] Pradeep Kamath commented on PIG-1036: - +1, will commit once hudson QA comes back.

Re: two-level access problem?

2009-11-03 Thread Dmitriy Ryaboy
Thanks Pradeep, I saw that comment. I guess my question is, given the solution this comment describes, what are you referring to in the Load/Store redesign doc when you say we must fix the two level access issues with schema of bags in current schema before we make these changes, otherwise that

RE: two-level access problem?

2009-11-03 Thread Pradeep Kamath
From comments in Schema.java: // In bags which have a schema with a tuple which contains // the fields present in it, if we access the second field (say) // we are actually trying to access the second field in the // tuple in the bag. This is currently true for two cases: // 1)

[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773339#action_12773339 ] Jeff Zhang commented on PIG-970: Well, it's weird. Alan, could check again that the

[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings

2009-11-03 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Attachment: PIG-1058_v2.patch Addressed unit test failures FINDBUGS: remaining Correctness

[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773389#action_12773389 ] Ankur commented on PIG-958: --- Can you explain this a little bit more - .. In the earlier patch

Problem running Pig 0.60

2009-11-03 Thread Yiping Han
Hi pig team, I¹m testing zebra v2 and trying to run the pig 0.60 jar that I got from Yan. However, I got the following error: Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at

[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Status: Patch Available (was: Open) [zebra] Sorted Table Support by Zebra -