date:20100412

Pig should handle deep casting of complex types 


 Key: PIG-1371
 URL: https://issues.apache.org/jira/browse/PIG-1371
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath
 Fix For: 0.8.0


Consider input data in BinStorage format which has a field of bag type - 
bg:{t:(i:int)}. In the load statement if the schema specified has the type for 
this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig 
thinks of the field to be of type specified in the load statement 
(bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag 
of chararray (the user specified schema) is made.

There are two issues currently:
1) The TypeCastInserter only considers the byte 'type' between the loader 
presented schema and user specified schema to decided whether to introduce a 
cast or not. In the above case since both schema have the type bag no cast is 
inserted. This check has to be extended to consider the full FieldSchema (with 
inner subschema) in order to decide whether a cast is needed.
2) POCast should be changed to handle casting a complex type to the type 
specified the user supplied FieldSchema. Here is there is one issue to be 
considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} 
and the real data had only one field what should the result of the cast be:
 * A bag with two fields - the int field and a null? - In this approach pig is 
assuming the lone field in the data is the first field which might be incorrect 
if it in fact is the second field.
 * A null bag to indicate that the bag is of unknown value - this is the one I 
personally prefer
 * The cast throws an IncompatibleCastException


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Patch Available  (was: Open)

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Open  (was: Patch Available)

The unit tests all run successfully on my local machine - the hudson QA failure 
was due to a temporal port conflict issue - will resubmit - meantime the patch 
is ready for review.

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release


[ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856088#action_12856088
 ] 

Olga Natkovich commented on PIG-1364:
-

+1 for all the branches

 Public javadoc on apache site still on 0.2, needs to be updated for each 
 version release
 

 Key: PIG-1364
 URL: https://issues.apache.org/jira/browse/PIG-1364
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.4.0, 0.5.0, 0.6.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0

 Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, 
 PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch


 See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
 javadocs for 0.2.  It is also versionless.
 It needs to be changed so that javadocs for recent versions are posted.  It 
 also needs to change so that the version is in the api so that multiple 
 versions of the API can be posted.
 It's probably too late to do this for 0.6 and before, but it needs to happen 
 for 0.7.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1331) Owl Hadoop Table Management Service

2010-04-12 Thread Ajay Kidave (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kidave updated PIG-1331:
-

Attachment: owl.contrib.4.tar.gz

The test issue was because the jettyrunner jar had not been copied into 
contrib/owl/ci directory. The manual steps required are

cp jetty-runner-7.0.0.pre5.jar contrib/owl/ci
cp log4j-1.2.15.jar contrib/owl/java/lib
chmod +x contrib/owl/bin/owl.sh contrib/owl/ci/derby_initialize_schema.sh 
contrib/owl/ci/derby_cleanup.sh contrib/owl/ci/jetty_check.sh 
contrib/owl/ci/jetty_start.sh contrib/owl/ci/jetty_stop.sh


Attached an updated patch with the javadoc and findbugs issues fixed.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, 
 owl.contrib.3.tgz, owl.contrib.4.tar.gz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


[ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856096#action_12856096
 ] 

Daniel Dai commented on PIG-1369:
-

+1

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig


[ 
https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856122#action_12856122
 ] 

Daniel Dai commented on PIG-1361:
-

+1. I tested from Pig, getSchema behaves as described. Didn't look at the code, 
someone know the code should also take a look.

 [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema 
 specified in the constructor of TableLoader instead of pruned proejction by 
 pig 
 -

 Key: PIG-1361
 URL: https://issues.apache.org/jira/browse/PIG-1361
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1361.patch


 Pig request for consistency reasons among different TableLoader  that Zebra 
 TableLoader.getSchema() should return the projectionSchema specified in the 
 constructor of TableLoader instead of pruned proejction by pig 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1338) Pig should exclude hadoop conf in local mode


 [ 
https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1338:


Fix Version/s: 0.8.0

 Pig should exclude hadoop conf in local mode
 

 Key: PIG-1338
 URL: https://issues.apache.org/jira/browse/PIG-1338
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, 
 PIG-1338-4.patch, PIG-1338-5.patch, PIG-1338-6.patch


 Currently, the behavior for hadoop conf look up is:
 * in local mode, if there is hadoop conf, bail out; if there is no hadoop 
 conf, launch local mode
 * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if 
 no, still launch without warning, but many functionality will go wrong
 We should bring it to a more intuitive way, which is:
 * in local mode, always launch Pig in local mode
 * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if 
 no, bail out with a meaningful message

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1330) Move pruned schema tracking logic from LoadFunc to core code


 [ 
https://issues.apache.org/jira/browse/PIG-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1330:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

No unit test since this mostly a document change, no behavior will be changed 
for now, just to be consistent as the code evolves. Committed to both 0.7 
branch and trunk.

 Move pruned schema tracking logic from LoadFunc to core code
 

 Key: PIG-1330
 URL: https://issues.apache.org/jira/browse/PIG-1330
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1330-1.patch


 Currently, LoadFunc.getSchema require a schema after column pruning. The good 
 side of this is LoadFunc.getSchema matches the data it actually load. This 
 gives a sense of consistency. However, by doing this, every LoadFunc need to 
 keep track of the columns pruned. This is an unnecessary burden to the 
 LoadFunc writer and it is very error proning. This issue is to move this 
 logic from LoadFunc to Pig core. LoadFunc.getSchema then only need to return 
 original schema even after pruning.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1199) help includes obsolete options


 [ 
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1199:


Fix Version/s: 0.8.0
   (was: 0.7.0)

 help includes obsolete options
 --

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0


 This is confusing to users

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (PIG-803) Pig Latin Reference Manual - discussion of Pig streaming is incomplete


 [ 
https://issues.apache.org/jira/browse/PIG-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-803:
--

Assignee: Corinne Chandel  (was: Olga Natkovich)

Looks like streaming discussion in 
http://hadoop.apache.org/pig/docs/r0.6.0/piglatin_ref2.html#STREAM is not very 
clear on how define is used with streaming. Corinne could we make the 
connection more clear, thanks.

 Pig Latin Reference Manual - discussion of Pig streaming is incomplete
 --

 Key: PIG-803
 URL: https://issues.apache.org/jira/browse/PIG-803
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: David Ciemiewicz
Assignee: Corinne Chandel
 Fix For: 0.7.0


 The Pig Latin Reference Manual section on STREAM is missing broad swaths of 
 information such as a discussion of the ship() clause.
 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_STREAM_
 A more complete definition seems to be here:
 http://wiki.apache.org/pig/PigStreamingFunctionalSpec
 However, it discusses auto shipping of scripts which doesn't seem to be 
 working.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1359) bin/pig script does not pick up correct jar libraries


[ 
https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856165#action_12856165
 ] 

Daniel Dai commented on PIG-1359:
-

Hi, Gianmarco,
Thanks for your concern. Actually we need one additional step to make bin/pig 
work. We shall copy $PIG_HOME/build/pig-0.8.0-dev.jar to 
$PIG_HOME/pig-0.8.0-core.jar. This will be handled in ant's package target 
when releasing. But if you check out from svn, we will do this additional step 
to work with bin/pig.

 bin/pig script does not pick up correct jar libraries
 -

 Key: PIG-1359
 URL: https://issues.apache.org/jira/browse/PIG-1359
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
 Environment: Linux Ubuntu 8.10, java-6-sun
Reporter: Gianmarco De Francisci Morales
Priority: Trivial
 Fix For: 0.8.0

 Attachments: pig-1359.patch


 The bin/pig script tries to load pig jar libraries from the pig-*-core.jar 
 using this bash fragment
 {code}
 # for releases, add core pig to CLASSPATH
 for f in $PIG_HOME/pig-*core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 # during development pig jar might be in build
 for f in $PIG_HOME/build/pig-*-core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 {code} 
 The pig-\*-core.jar does not contain the dependencies for pig that are found 
 in build/ivy/lib/Pig/\*.jar (jline).
 The script does not even pick the pig.jar in PIG_HOME that is produced as a 
 result of the ant build process.
 This results in the following error after successfully building pig:
 {code} 
 Exception in thread main java.lang.NoClassDefFoundError: 
 jline/ConsoleReaderInputStream
 Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
 {code} 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig


 [ 
https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1361:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Yan already reviewed it. Commit the patch to both trunk and 0.7 branch.

 [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema 
 specified in the constructor of TableLoader instead of pruned proejction by 
 pig 
 -

 Key: PIG-1361
 URL: https://issues.apache.org/jira/browse/PIG-1361
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1361.patch


 Pig request for consistency reasons among different TableLoader  that Zebra 
 TableLoader.getSchema() should return the projectionSchema specified in the 
 constructor of TableLoader instead of pruned proejction by pig 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig


 [ 
https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1361:


Fix Version/s: 0.7.0
   (was: 0.8.0)
Affects Version/s: 0.7.0
   (was: 0.8.0)
  Description: Pig request for consistency reasons among different 
TableLoader  that Zebra TableLoader.getSchema() should return the 
projectionSchema specified in the constructor of TableLoader instead of pruned 
proejction by pig   (was: 
Pig request for consistency reasons among different TableLoader  that Zebra 
TableLoader.getSchema() should return the projectionSchema specified in the 
constructor of TableLoader instead of pruned proejction by pig )

 [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema 
 specified in the constructor of TableLoader instead of pruned proejction by 
 pig 
 -

 Key: PIG-1361
 URL: https://issues.apache.org/jira/browse/PIG-1361
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1361.patch


 Pig request for consistency reasons among different TableLoader  that Zebra 
 TableLoader.getSchema() should return the projectionSchema specified in the 
 constructor of TableLoader instead of pruned proejction by pig 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1359) bin/pig script does not pick up correct jar libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1359:


Status: Resolved  (was: Patch Available)
Resolution: Won't Fix

 bin/pig script does not pick up correct jar libraries
 -

 Key: PIG-1359
 URL: https://issues.apache.org/jira/browse/PIG-1359
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
 Environment: Linux Ubuntu 8.10, java-6-sun
Reporter: Gianmarco De Francisci Morales
Priority: Trivial
 Fix For: 0.8.0

 Attachments: pig-1359.patch


 The bin/pig script tries to load pig jar libraries from the pig-*-core.jar 
 using this bash fragment
 {code}
 # for releases, add core pig to CLASSPATH
 for f in $PIG_HOME/pig-*core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 # during development pig jar might be in build
 for f in $PIG_HOME/build/pig-*-core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 {code} 
 The pig-\*-core.jar does not contain the dependencies for pig that are found 
 in build/ivy/lib/Pig/\*.jar (jline).
 The script does not even pick the pig.jar in PIG_HOME that is produced as a 
 result of the ant build process.
 This results in the following error after successfully building pig:
 {code} 
 Exception in thread main java.lang.NoClassDefFoundError: 
 jline/ConsoleReaderInputStream
 Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
 {code} 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1359) bin/pig script does not pick up correct jar libraries


[ 
https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856179#action_12856179
 ] 

Daniel Dai commented on PIG-1359:
-

The comment change # Set the version for Hadoop, default to 17 - # Set the 
version for Hadoop, default to 20 is totally valid, we will change it.

 bin/pig script does not pick up correct jar libraries
 -

 Key: PIG-1359
 URL: https://issues.apache.org/jira/browse/PIG-1359
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
 Environment: Linux Ubuntu 8.10, java-6-sun
Reporter: Gianmarco De Francisci Morales
Priority: Trivial
 Fix For: 0.8.0

 Attachments: pig-1359.patch


 The bin/pig script tries to load pig jar libraries from the pig-*-core.jar 
 using this bash fragment
 {code}
 # for releases, add core pig to CLASSPATH
 for f in $PIG_HOME/pig-*core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 # during development pig jar might be in build
 for f in $PIG_HOME/build/pig-*-core.jar; do
 CLASSPATH=${CLASSPATH}:$f;
 done
 {code} 
 The pig-\*-core.jar does not contain the dependencies for pig that are found 
 in build/ivy/lib/Pig/\*.jar (jline).
 The script does not even pick the pig.jar in PIG_HOME that is produced as a 
 result of the ant build process.
 This results in the following error after successfully building pig:
 {code} 
 Exception in thread main java.lang.NoClassDefFoundError: 
 jline/ConsoleReaderInputStream
 Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
 {code} 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

Restore PigInputFormat.sJob for backward compatibility
--

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


The preferred method to get the job's Configuration object would be to use 
UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
will be marking it deprecated and indicating to use UDFContext.getJobConf() 
instead) to be backward compatible - we can remove it from pig in a future 
release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend


 [ 
https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1323:


Status: Resolved  (was: Patch Available)
Resolution: Invalid

There is already a hadoop property mapred.task.id which is set to the 
map/reduce task id in the backend and is not set in the front end which can be 
used to figure this out. Hence it is best not to introduce new properties in 
the configuration for this purpose.

 Communicate whether the call to LoadFunc.setLocation is being made in 
 hadoop's front end or backend
 ---

 Key: PIG-1323
 URL: https://issues.apache.org/jira/browse/PIG-1323
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1323.patch


 Loaders which interact with external systems like a metadata server may need 
 to know if the LoadFunc.setLocation call happens from the frontend (on the 
 client machine) or in the backend (on each map task). The Configuration in 
 the Job argument to setLocation() can contain this information.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

[
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856208#action_12856208
]

Alan Gates commented on PIG-1331:
-

Now that I've figured out how to read directions I've run the tests on the new
patch and they pass.

I've also run javadocs and findbugs and all looks good.

I'll start a vote on whether to accept this as a contrib.

Owl Hadoop Table Management Service
---

Key: PIG-1331
URL: https://issues.apache.org/jira/browse/PIG-1331
Project: Pig
Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Attachments: anttestoutput.tgz, build.log, ivy_version.patch,
owl.contrib.3.tgz, owl.contrib.4.tar.gz

This JIRA is a proposal to create a Hadoop table management service: Owl.
Today, MapReduce and Pig applications interacts directly with HDFS
directories and files and must deal with low level data management issues
such as storage format, serialization/compression schemes, data layout, and
efficient data accesses, etc, often with different solutions. Owl aims to
provide a standard way to addresses this issue and abstracts away the
complexities of reading/writing huge amount of data from/to HDFS.
Owl has a data access API that is modeled after the traditional Hadoop
!InputFormt and a management API to manipulate Owl objects. This JIRA is
related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata
store. Owl integrates with different storage module like Zebra with a
pluggable architecture.
Initially, the proposal is to submit Owl as a Pig contrib project. Over
time, it makes sense to move it to a Hadoop subproject.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1363) Unnecessary loadFunc instantiations

2010-04-12 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1363:
--

Attachment: pig-1363.patch

Ideal solution of this problem is to have {{{LoadFunc}}} implements 
{{{Serializable}}}. Then LoadFunc will be instantiated once first time its 
needed (in LoLoad) and then everywhere this one object is used. But this will 
be backward incompatible as all the load func implementation then have to be 
necessarily implement Serializable. So, for now we will live with this. 
This patch gets rid of the multiple load func instantiation in front end where 
it could be avoided without the need of making it Serializable. No test cases 
are needed since this is purely code cleanup and doesn't add/delete/modify any 
existing functionality, so current regression tests suffice. 

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

[
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856239#action_12856239
]

Alan Gates commented on PIG-1331:
-

After looking through the rules it looks like we don't need a vote on contrib
projects. This JIRA serves as the place for people to voice their concerns and
vote for or against.

I'd like to commit this within a few days unless I hear otherwise.

Owl Hadoop Table Management Service
---

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Attachment: PIG-1372.patch

Attached patch restores PigInputFormat.sJob - however it is deprecated (and so 
also PigMapReduce.sJobConf for user code) and the javadoc comment indicates to 
use UDFContext.getUDFContext().getJobConf() instead. No tests are included 
since this simply restores a static variable for backward compatibility and is 
not used in pig code.

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Patch Available  (was: Open)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1370) Marking Pig interfaces for org.apache.pig package


[ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856246#action_12856246
 ] 

Alan Gates commented on PIG-1370:
-

bq. In Expression.java the comment reads - // Because this isn't actually used 
yet.  - Expression class is currently used in 
LoadMetadata.setPartitionFilter() - I agree though that we should mark this not 
stable - Evolving perhaps?
Changed to Evolving, which matches LoadMetadata.

bq. In SortColInfo.java, I think we should not mark this public since this is 
used entirely inside Pig code and any communicating to external storeFuncs is 
through ResourceSchema. The same comment holds for SortInfo.java
But it is accepted as an arg to one of ResourceSchema's constructors.  I think 
that makes it public, unless we want to say that constructor isn't intended for 
public use (in which case, why is it public?).

bq. Is there a way to mark public methods in a class as internal - for 
example PigServer.getAliases() - currently this is used by unit tests - should 
we be exposing this method to the user? if not can we mark it not public 
through annotation? (Is there a different policy like if there is no javadoc 
comments for a public method, then it is not truely public?)
I don't know how to do this with annotations.  I've changed the javadocs to 
have an initial sentence of Intended to be used by unit tests only.

bq. I think CollectableLoadFunc should be evolving
Motion carried, I've changed it to evolving.

bq. ComparsionFunc.java unfortunately already had ctrl-m chars - the new 
additions in the patch also do - if it isn't extensive, we could remove the 
ctrl-m chars. Another comment is that per 
http://wiki.apache.org/pig/Pig070IncompatibleChanges, custom comparators are no 
longer supported since 0.7 - if so, should this be @deprecated? - I think 
currently custom comparators don't work in local mode.
I did mark ComparisonFunc as deprecated.  Are you saying we should just remove 
it instead of deprecate it?

bq. I hope marking LoadFunc stable will not prevent additions to this abstract 
class (which should not break backward compatibilty if default impls are 
provided)
The definition of stable is that it will work across versions without a 
recompile of Java code.  So this won't prevent growing existing abstract 
classes.



 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package


 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Attachment: PIG-1370_2.patch

New patch with changes based on Pradeep's feedback.

 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch, PIG-1370_2.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package


 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Status: Open  (was: Patch Available)

 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch, PIG-1370_2.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package


 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Status: Patch Available  (was: Open)

 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch, PIG-1370_2.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (PIG-1363) Unnecessary loadFunc instantiations

2010-04-12 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned PIG-1363:
-

Assignee: Ashutosh Chauhan

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1363) Unnecessary loadFunc instantiations

2010-04-12 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1363:
--

Status: Patch Available  (was: Open)

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release


 [ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1364:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

All of the patches checked in.  Confirmed that for 0.4, 0.5, and 0.6 on the 
website the link now points to the version specific docs.

 Public javadoc on apache site still on 0.2, needs to be updated for each 
 version release
 

 Key: PIG-1364
 URL: https://issues.apache.org/jira/browse/PIG-1364
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.4.0, 0.5.0, 0.6.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.7.0, 0.6.0, 0.5.0, 0.4.0

 Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, 
 PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch


 See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
 javadocs for 0.2.  It is also versionless.
 It needs to be changed so that javadocs for recent versions are posted.  It 
 also needs to change so that the version is in the api so that multiple 
 versions of the API can be posted.
 It's probably too late to do this for 0.6 and before, but it needs to happen 
 for 0.7.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (PIG-1373) We need to add jdiff output to docs on the website

We need to add jdiff output to docs on the website
--

 Key: PIG-1373
 URL: https://issues.apache.org/jira/browse/PIG-1373
 Project: Pig
  Issue Type: Bug
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.8.0


Our build process constructs a jdiff between APIs for different versions.  But 
we don't post the results of that to the website when we deploy the docs.  We 
should, in order to help users understand changes across versions of pig.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag

2010-04-12 Thread Viraj Bhat (JIRA)

Order by fails with java.lang.String cannot be cast to 
org.apache.pig.data.DataBag
--

 Key: PIG-1374
 URL: https://issues.apache.org/jira/browse/PIG-1374
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat


Script loads data from BinStorage(), then flattens columns and then sorts on 
the second column with order descending. The order by fails with the 
ClassCastException

{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
d = order c by $1 desc;
dump d;
{code}

The sampling job fails with the following error:
===
java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.pig.data.DataBag
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
===

The schema for b, c and d are as follows:

b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}}

c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}

d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}

If we modify this script to order on the first column it seems to work

{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
d = order c by $0 desc;
dump d;
{code}

(gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
(ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)


There is a workaround to do a projection before ORDER

{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
newc = foreach c generate $0 as uuid, $1 as velocity;
newd = order newc by velocity desc;
dump newd;
{code}

(gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
(ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)


The schema for the Loader is as follows:

{code}
  public Schema outputSchema(Schema input) {
 try{  
ListSchema.FieldSchema list = new 
ArrayListSchema.FieldSchema();
list.add(new Schema.FieldSchema(uuid, 
DataType.CHARARRAY));
list.add(new Schema.FieldSchema(velocity, 
DataType.DOUBLE));
Schema tupleSchema = new Schema(list);
Schema.FieldSchema tupleFs = new 
Schema.FieldSchema(tuple, tupleSchema, DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs);
bagSchema.setTwoLevelAccessRequired(true);
Schema.FieldSchema bagFs = new 
Schema.FieldSchema(bag_of_tuples,bagSchema, DataType.BAG);
return new Schema(bagFs);
}catch (Exception e){
return null;
}
}
{code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag


 [ 
https://issues.apache.org/jira/browse/PIG-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1374:
---

Assignee: Daniel Dai

 Order by fails with java.lang.String cannot be cast to 
 org.apache.pig.data.DataBag
 --

 Key: PIG-1374
 URL: https://issues.apache.org/jira/browse/PIG-1374
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0


 Script loads data from BinStorage(), then flattens columns and then sorts on 
 the second column with order descending. The order by fails with the 
 ClassCastException
 {code}
 register loader.jar;
 a = load 'c2' using BinStorage();
 b = foreach a generate org.apache.pig.CCMLoader(*);
 describe b;
 c = foreach b generate flatten($0);
 describe c;
 d = order c by $1 desc;
 dump d;
 {code}
 The sampling job fails with the following error:
 ===
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 org.apache.pig.data.DataBag
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 ===
 The schema for b, c and d are as follows:
 b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}}
 c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}
 d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}
 If we modify this script to order on the first column it seems to work
 {code}
 register loader.jar;
 a = load 'c2' using BinStorage();
 b = foreach a generate org.apache.pig.CCMLoader(*);
 describe b;
 c = foreach b generate flatten($0);
 describe c;
 d = order c by $0 desc;
 dump d;
 {code}
 (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
 (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)
 There is a workaround to do a projection before ORDER
 {code}
 register loader.jar;
 a = load 'c2' using BinStorage();
 b = foreach a generate org.apache.pig.CCMLoader(*);
 describe b;
 c = foreach b generate flatten($0);
 describe c;
 newc = foreach c generate $0 as uuid, $1 as velocity;
 newd = order newc by velocity desc;
 dump newd;
 {code}
 (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
 (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)
 The schema for the Loader is as follows:
 {code}
   public Schema outputSchema(Schema input) {
  try{  
 ListSchema.FieldSchema list = new 
 ArrayListSchema.FieldSchema();
 list.add(new Schema.FieldSchema(uuid, 
 DataType.CHARARRAY));
 list.add(new Schema.FieldSchema(velocity, 
 DataType.DOUBLE));
 Schema tupleSchema = new Schema(list);
 Schema.FieldSchema tupleFs = new 
 Schema.FieldSchema(tuple, tupleSchema, DataType.TUPLE);
 Schema bagSchema = new Schema(tupleFs);
 bagSchema.setTwoLevelAccessRequired(true);
 Schema.FieldSchema bagFs = new 
 Schema.FieldSchema(bag_of_tuples,bagSchema, DataType.BAG);
 return new Schema(bagFs);
 }catch (Exception e){
 return null;
 }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag