[jira] Commented: (PIG-765) to implement jdiff

2009-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752954#action_12752954
 ] 

Hadoop QA commented on PIG-765:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419017/pig-765.patch
  against trunk revision 812599.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 268 release audit warnings 
(more than the trunk's current 162 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/console

This message is automatically generated.

 to implement jdiff
 --

 Key: PIG-765
 URL: https://issues.apache.org/jira/browse/PIG-765
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: pig-765.patch, pig-765.patch, pig-765.patch, 
 pig-765.patch, pig-765.patch, pig-765.patch, pig-765.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: questions about integration of pig and HBase

2009-09-09 Thread Vincent BARAT

Thank you for the link.

Anyway, what I was looking for is an example of PIG syntax loading 
from a HBase table, is it something like:


queries = LOAD 'HBase Table USING HBaseStorage()

?

Jeff Zhang a écrit :

Using HBaseStorage as your loadFunc, it uses a customer slicer HBaseSlice

You can refer this link for more information

http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer



2009/9/9 Vincent BARAT vincent.ba...@ubikod.com



Alan Gates a écrit :

Pig supports reading from Hbase (in Hadoop/Hbase 0.18 only).

Hello,

Do you have any link to the documentation about how to do that?
I can't find any example...

Thanks,





Re: questions about integration of pig and HBase

2009-09-09 Thread Alan Gates
See the JIRA PIG-6.  See also the HbaseStorage unit test that tests  
the functionality.


Alan.

On Sep 9, 2009, at 5:31 AM, Vincent BARAT wrote:


Thank you for the link.

Anyway, what I was looking for is an example of PIG syntax loading  
from a HBase table, is it something like:


queries = LOAD 'HBase Table USING HBaseStorage()

?

Jeff Zhang a écrit :
Using HBaseStorage as your loadFunc, it uses a customer slicer  
HBaseSlice

You can refer this link for more information
http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer
2009/9/9 Vincent BARAT vincent.ba...@ubikod.com


Alan Gates a écrit :

Pig supports reading from Hbase (in Hadoop/Hbase 0.18 only).

Hello,

Do you have any link to the documentation about how to do that?
I can't find any example...

Thanks,





[jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs

2009-09-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753173#action_12753173
 ] 

Daniel Dai commented on PIG-948:


One thing I am not sure is the way you interpolate the job tracker url
{code}
http://+ jobTrackerAdd+port+/jobdetails.jsp?jobid=+job.getAssignedJobID();
{code}
I am not sure if we shall have this logic in pig, looks hacky to me. 

Other part is good.

 [Usability] Relating pig script with MR jobs
 

 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: pig-948.patch


 Currently its hard to find a way to relate pig script with specific MR job. 
 In a loaded cluster with multiple simultaneous job submissions, its not easy 
 to figure out which specific MR jobs were launched for a given pig script. If 
 Pig can provide this info, it will be useful to debug and monitor the jobs 
 resulting from a pig script.
 At the very least, Pig should be able to provide user the following 
 information
 1) Job id of the launched job.
 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs

2009-09-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753260#action_12753260
 ] 

Ashutosh Chauhan commented on PIG-948:
--

In this string, we are determining job-tracker address, port number and job-ids 
through apis, so thats fine. 
I agree that hardcoding other parts of url ( jobdetails.jsp?jobid= ) is not the 
best way to do it, as it will break the link if that web-url changes in later 
hadoop releases. But since there is no way to programatically  get that url, I 
went ahead with this. If there is a way to get that url programatically, let me 
know. If not, I think its useful enough to have it like this and update it if 
it gets changed in later hadoop releases. 

 [Usability] Relating pig script with MR jobs
 

 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: pig-948.patch


 Currently its hard to find a way to relate pig script with specific MR job. 
 In a loaded cluster with multiple simultaneous job submissions, its not easy 
 to figure out which specific MR jobs were launched for a given pig script. If 
 Pig can provide this info, it will be useful to debug and monitor the jobs 
 resulting from a pig script.
 At the very least, Pig should be able to provide user the following 
 information
 1) Job id of the launched job.
 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-938) Pig Docs for 0.4.0

2009-09-09 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-938:


Attachment: PIG-938-2.patch

Patch #2 - includes OUTER JOIN write up.

 Pig Docs for 0.4.0
 --

 Key: PIG-938
 URL: https://issues.apache.org/jira/browse/PIG-938
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.4.0
Reporter: Corinne Chandel
Priority: Minor
 Attachments: PIG-938-2.patch, PIG-938.patch


 Pig docs for 0.4.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-938) Pig Docs for 0.4.0

2009-09-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-938:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Corinne!

 Pig Docs for 0.4.0
 --

 Key: PIG-938
 URL: https://issues.apache.org/jira/browse/PIG-938
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.4.0
Reporter: Corinne Chandel
Priority: Minor
 Attachments: PIG-938-2.patch, PIG-938-2b.patch, PIG-938.patch


 Pig docs for 0.4.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Preparing to branch for Pig 0.4.0 release

2009-09-09 Thread Olga Natkovich
Hi,

 

I am updating the tree to make it ready for a branch for the release.
Please, hold off any commits till this is done. I will send an email
once the branch is created.

 

Thanks,

 

Olga



[jira] Commented: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-09-09 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753339#action_12753339
 ] 

Yan Zhou commented on PIG-944:
--

The previously attached patch is based upon some new features under development 
and consequently might not be applicable to the trunk. I'm going to attahc 
another patch shortly based upon version 1 branch.

In addition to the problem in TableOutputFormat.checkOutputSpecs, the 
SchemaConverter.toPigSchema  did not convert the Zebra schema to Pig's properly 
if nested types are involved: the low level column schemas were simply missing.
Also, the conversion from Pig to Zebra schema is just missing beyond a hack to 
work on specially prefixed column names.

 Zebra schema is taken from Pig through TableStorer's construct
 --

 Key: PIG-944
 URL: https://issues.apache.org/jira/browse/PIG-944
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Attachments: zebra_pig_interface.patch


 It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
 because the information is dynamic in Pig's execution engine and should not 
 be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-09-09 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-944:
-

Attachment: zebra_pig_interface_1_1.patch

 Zebra schema is taken from Pig through TableStorer's construct
 --

 Key: PIG-944
 URL: https://issues.apache.org/jira/browse/PIG-944
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Attachments: zebra_pig_interface.patch, zebra_pig_interface_1_1.patch


 It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
 because the information is dynamic in Pig's execution engine and should not 
 be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-950) Pig Loader does not handle unix hidden files ( files starting with dot)

2009-09-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753361#action_12753361
 ] 

Daniel Dai commented on PIG-950:


I tried, actually Hadoop will ignore files start with . while processing a 
map-reduce job. So guess we can do nothing, just not name the file starts with 
..

 Pig Loader does not handle unix hidden files ( files starting with dot)
 ---

 Key: PIG-950
 URL: https://issues.apache.org/jira/browse/PIG-950
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Jing Huang

 I am trying to load .btschema file using pig loader, ( .btschema is not an 
 empty file)
 This is what I did:
 grunt a = load '.btschema';
 grunt dump a;
 2009-09-09 17:41:21,170 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-09 17:41:21,170 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-09 17:41:23,092 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-09 17:41:23,106 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2009-09-09 17:41:23,127 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-09 17:41:23,623 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-09 17:41:28,644 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-09 17:41:28,644 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Successfully stored result in: file:/tmp/temp165972/tmp-527102439
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Records written : 0
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Bytes written : 0
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 grunt 
 =
 it dumps nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Preparing to branch for Pig 0.4.0 release

2009-09-09 Thread Olga Natkovich
I am having some problems with the docs that I will need to resolve
tomorrow. I would like to keep the tree closed till then. If you
absolutely need to make a checkin, please, go ahead and I will integrate
your patch into the branch.

Thanks,

Olga

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Wednesday, September 09, 2009 3:31 PM
To: pig-dev@hadoop.apache.org
Subject: Preparing to branch for Pig 0.4.0 release

Hi,

 

I am updating the tree to make it ready for a branch for the release.
Please, hold off any commits till this is done. I will send an email
once the branch is created.

 

Thanks,

 

Olga



[jira] Commented: (PIG-950) Pig Loader does not handle unix hidden files ( files starting with dot)

2009-09-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753368#action_12753368
 ] 

Daniel Dai commented on PIG-950:


Here is the experiment I tried:

hadoop fs -ls gutenberg
{code}
/user/jianyong/gutenberg/.2.txt
/user/jianyong/gutenberg/1.txt
{code}
hadoop fs -cat /gutenberg/1.txt
{code}
hello
{code}
hadoop fs -cat /gutenberg/.2.txt
{code}
daniel
{code}
hadoop jar hadoop-0.18.1-examples.jar wordcount gutenberg gutenberg-output
hadoop fs -cat gutenberg-output/part-0
{code}
hello   1
{code}

 Pig Loader does not handle unix hidden files ( files starting with dot)
 ---

 Key: PIG-950
 URL: https://issues.apache.org/jira/browse/PIG-950
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Jing Huang

 I am trying to load .btschema file using pig loader, ( .btschema is not an 
 empty file)
 This is what I did:
 grunt a = load '.btschema';
 grunt dump a;
 2009-09-09 17:41:21,170 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-09 17:41:21,170 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-09 17:41:23,092 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-09 17:41:23,106 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2009-09-09 17:41:23,127 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-09 17:41:23,623 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-09 17:41:28,644 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-09 17:41:28,644 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Successfully stored result in: file:/tmp/temp165972/tmp-527102439
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Records written : 0
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Bytes written : 0
 2009-09-09 17:41:28,645 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 grunt 
 =
 it dumps nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-09 Thread Ashutosh Chauhan (JIRA)
Reset parallelism to 1 for indexing job in MergeJoin


 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


After sampling one tuple from every block, one reducer is used to sort the 
index entries in reduce phase to produce sorted index to be used in actual join 
job. Thus, parallelism of index job should be explictly set to 1. Currently, 
its not.

Currently, this is a non-issue, since we don't allow any blocking operators in 
pipeline before merge-join. However, later when we do allow blocking operators, 
then parallelism of indexing job will be that of preceding blocking operator. 
Even then, job will complete successfully because all tuple will go to only one 
reducer, because we are grouping on only one key all. However, it will waste 
cluster resources by starting all the extra reducers which get no data and thus 
do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-951:
-

Attachment: pig-951.patch

One line patch which fixes this. Also, added test case to catch regression on 
this.

 Reset parallelism to 1 for indexing job in MergeJoin
 

 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-951.patch


 After sampling one tuple from every block, one reducer is used to sort the 
 index entries in reduce phase to produce sorted index to be used in actual 
 join job. Thus, parallelism of index job should be explictly set to 1. 
 Currently, its not.
 Currently, this is a non-issue, since we don't allow any blocking operators 
 in pipeline before merge-join. However, later when we do allow blocking 
 operators, then parallelism of indexing job will be that of preceding 
 blocking operator. Even then, job will complete successfully because all 
 tuple will go to only one reducer, because we are grouping on only one key 
 all. However, it will waste cluster resources by starting all the extra 
 reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-948) [Usability] Relating pig script with MR jobs

2009-09-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-948:
-

Status: Patch Available  (was: Open)

 [Usability] Relating pig script with MR jobs
 

 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: pig-948.patch


 Currently its hard to find a way to relate pig script with specific MR job. 
 In a loaded cluster with multiple simultaneous job submissions, its not easy 
 to figure out which specific MR jobs were launched for a given pig script. If 
 Pig can provide this info, it will be useful to debug and monitor the jobs 
 resulting from a pig script.
 At the very least, Pig should be able to provide user the following 
 information
 1) Job id of the launched job.
 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.