[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744953#action_12744953
 ] 

Hadoop QA commented on PIG-924:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416945/pig_924.3.patch
  against trunk revision 804406.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/console

This message is automatically generated.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-925) Fix join in local mode

2009-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745018#action_12745018
 ] 

Hudson commented on PIG-925:


Integrated in Pig-trunk #527 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/527/])
: Fix join in local mode


 Fix join in local mode
 --

 Key: PIG-925
 URL: https://issues.apache.org/jira/browse/PIG-925
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.4.0

 Attachments: PIG-925-1.patch, PIG-925-2.patch, PIG-925-3.patch


 Join is broken after LOJoin patch (Optimizer_Phase5.patch of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest 
 join script is not working under local mode:
 eg:
 a = load '1.txt';
 b = load '2.txt';
 c = join a by $0, b by $0;
 dump c;
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745049#action_12745049
 ] 

He Yongqiang commented on PIG-833:
--

Can add more description/explain in this jira or wiki page about usage etc, 
such as schma format, storage format, projection, and partition?

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745109#action_12745109
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Regarding deprecation -- I tried setting it back to off, and adding 
@SuppressWarnings(deprecation) to the shims for 20, but and complained about 
deprecation nonetheless. Not sure what its deal is.

Adding something like this to the main build.xml works. Does this seem like a 
reasonable solution?

{code}
!-- set deprecation off if hadoop version greater or equals 20 --
target name=set_deprecation
  condition property=hadoop_is20
equals arg1=${hadoop.version} arg2=20/
  /condition
  antcall target=if_hadoop_is20/
  antcall target=if_hadoop_not20/
/target
target name=if_hadoop_is20 if=hadoop_is20
  property name=javac.deprecation value=off /
/target
target name=if_hadoop_not20 unless=hadoop_is20
  property name=javac.deprecation value=on /
/target


target name=init depends=set_deprecation
  []
{code}

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745125#action_12745125
 ] 

Jing Huang commented on PIG-833:


Zebra supports int, long, float, double, bool, collection (equivalent to Pig 
Bag), map, record (equivalent to Pig Tuple), string, bytes (equivalent to Pig 
Bytearray)

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2009-08-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-928:
---

Attachment: package.zip

Attaching some preliminary work by Kishore Gopalakrishna on this.  This code is 
a good start, but not ready for inclusion.  It needs to be cleaned up, put in 
our class structure, etc.  

Comments from Kishore:

It contains all the libraries required and also the GenericEval UDF and
GenericFilter UDF

I dint get a chance to get the Algebraic function working.

To test it, just unzip the package and run

rm -rf wordcount/output;
pig -x local wordcount.pig --- to test eval
pig -x local wordcount_filter.pig --- to test filter [sorry it should
be named filter.pig]
cat wordcount/output

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Attachment: (was: mj_phase2_1.patch)

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor

 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745160#action_12745160
 ] 

Daniel Dai commented on PIG-924:


From your latest patch, shims works this way
1. The version of shims Pig compiles is controlled by hadoop.version property 
in build.xml
2. The version of shims Pig uses is determined dynamically by hacking the 
string returned by VersionInfo.getVersion

As in your code comment, version string hack is not safe. My thinking is that 
pig only use bundled hadoop unless override:
1. Pig compile all version of shims, There is no conflict between different 
version of shims, why not compile them all? So user do not need to recompile 
the code if he want to use different external hadoop.
2. Pig bundles a default hadoop, which is specified by hadoop.version in 
build.xml. Pig use this version of shims by default
3. If user want to use an external hadoop, he/she need to override the default 
hadoop version explicitly, eg, -Dhadoop_version in command line. 

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Status: Open  (was: Patch Available)

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745166#action_12745166
 ] 

Todd Lipcon commented on PIG-924:
-

bq. If existing deployments need a single pig.jar without a hadoop dependency, 
it might be possible to create a new target (pig-all) that would create a 
statically bundled jar; but I think the default behavior should be to not 
bundle, build all the shims, and use whatever hadoop is on the path.

+1 for making the default to *not* bundle hadoop inside pig.jar, and adding 
another non-default target for those people who might want it.

bq. The current patch is written as is so that it can be applied to trunk, 
enabling people to compile statically, and only require a change to the ant 
build files to switch to a dynamic compile later on (after 0.4, probably)

From the packager's perspective, I'd love if this change could get in for 0.4. 
If it doesn't, we'll end up applying the patch ourselves for packaging 
purposes - we need to have the hadoop dependency be on the user's installed 
hadoop, not on whatever happened to get bundled into pig.jar.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745181#action_12745181
 ] 

Pradeep Kamath commented on PIG-926:



In MRCompiler:
You should change:
{code}
 indexerArgs[0] = rightLoader.getLFile().getFuncName();
to
 indexerArgs[0] = rightLoader.getLFile().getFuncSpec().toString();
{code}
to handle the case where the loader may have constructor args (like 
PigStorage(,) - PigStorage with comma as delim)

In the error message when the loader does not implement SamplableLoader, you 
can change:
{noformat}
This loader doesn't implement it.
to
The loader specified in  + indexerArgs[0] +  doesn't implement it
{noformat}

Otherwise looks good.


 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Status: Open  (was: Patch Available)

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #174

2009-08-19 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/174/changes

Changes:

[daijy] PIG-925: Fix join in local mode

--
started
Building remotely on minerva.apache.org (Ubuntu)
Updating http://svn.apache.org/repos/asf/hadoop/pig/trunk
U test/org/apache/pig/test/TestLocal2.java
U CHANGES.txt
U 
src/org/apache/pig/backend/local/executionengine/physicalLayer/LocalLogToPhyTranslationVisitor.java
Fetching 'http://svn.apache.org/repos/asf/hadoop/nightly/test-patch' at -1 into 
'http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/test/bin'
 
At revision 805964
At revision 805964
no change for http://svn.apache.org/repos/asf/hadoop/nightly/test-patch since 
the previous build
[Pig-Patch-minerva.apache.org] $ /bin/bash /tmp/hudson4495449519603127293.sh
/home/hudson/tools/java/latest1.6/bin/java
Buildfile: build.xml

check-for-findbugs:

findbugs.check:

java5.check:

forrest.check:

hudson-test-patch:
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for PIG-926.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Reverted 'test/org/apache/pig/test/MiniCluster.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/datastorage/HConfiguration.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/datastorage/HDataStorage.java'
 [exec] Reverted 'src/org/apache/pig/tools/pigstats/PigStats.java'
 [exec] Reverted 'src/org/apache/pig/impl/io/NullableBytesWritable.java'
 [exec] Reverted 'build.xml'
 [exec] Reverted 'contrib/piggybank/java/build.xml'
 [exec] 
 [exec] Fetching external item into 'test/bin'
 [exec] Atest/bin/test-patch.sh
 [exec] Updated external to revision 805964.
 [exec] 
 [exec] Updated to revision 805964.
 [exec] PIG-926 is not Patch Available.  Exiting.
 [exec]   % Total% Received % Xferd  Average Speed   TimeTime 
Time  Current
 [exec]  Dload  Upload   Total   Spent
Left  Speed
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec]   0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0  0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0

BUILD SUCCESSFUL
Total time: 9 seconds
ERROR: No artifacts found that match the file pattern 
trunk/build/test/findbugs/newPatchFindbugsWarnings.html,trunk/patchprocess/*Warnings.txt.
 Configuration error?
ERROR: 'trunk/build/test/findbugs/newPatchFindbugsWarnings.html' doesn't match 
anything: 'trunk' exists but not 
'trunk/build/test/findbugs/newPatchFindbugsWarnings.html'
Recording test results
Description found: PIG-926



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Attachment: (was: mj_phase2_1.patch)

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Attachment: mj_phase2_1.patch

Updated patch addressing Pradeep's comments.

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Status: Patch Available  (was: Open)

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745219#action_12745219
 ] 

Raghu Angadi commented on PIG-833:
--

Thanks Jing. There are some PIG examples listed at the bottom of Zebra wiki : 
http://wiki.apache.org/pig/zebra (wiki is still under construction).

Just listing java strings in Jing's comment with out Jira formatting :

{noformat}
final static String STR_SCHEMA = 
 s1:bool, s2:int, s3:long, s4:float, s5:string, s6:bytes,  +
 r1:record(f1:int, f2:long), r2:record(r3:record(f3:float, f4)),  +
 m1:map(string),m2:map(map(int)), c:collection(f13:double, f14:float, 
f15:bytes);

final static String STR_STORAGE = 
  [s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}];   +
  [r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}];
{noformat}

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745250#action_12745250
 ] 

He Yongqiang commented on PIG-833:
--

Thanks Jing.
I now have a better understand of schema and columngroups.
What the projection and partition are used for?

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745270#action_12745270
 ] 

Hadoop QA commented on PIG-926:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12417053/mj_phase2_1.patch
  against trunk revision 805684.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/175/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/175/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/175/console

This message is automatically generated.

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.