[jira] Commented: (PIG-724) Treating integers and strings in PigStorage

2009-04-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697054#action_12697054
 ] 

Alan Gates commented on PIG-724:


Currently Pig doesn't require that all keys and values in a map share the same 
type.  There is a proposal to change it so that key types can only be chararray 
(see PIG-734), as we don't see anyone using anything but chararray and the 
generality is causing us some other issues.  But we still wouldn't require that 
all values in a given map be of the same type.  Are you proposing allowing 
users to put a constraint on a given map so that all values in that particular 
map must be of that type?

 Treating integers and strings in PigStorage
 ---

 Key: PIG-724
 URL: https://issues.apache.org/jira/browse/PIG-724
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.1
Reporter: Santhosh Srinivasan
 Fix For: 0.2.1


 Currently, PigStorage cannot treats the materialized string 123 as an integer 
 with the value 123. If the user intended this to be the string 123, 
 PigStorage cannot deal with it. This reasoning also applies to doubles. Due 
 to this issue, maps that contain values which are of the same type but 
 manifest the issue discussed at beginning of the paragraph, Pig throws its 
 hands up at runtime.  An example to illustrate the problem will help.
 In the example below a sample row in the data (map.txt) contains the 
 following:
 [key01#35,key02#value01]
 When Pig tries to convert the stream to a map, it creates a MapObject, 
 Object where the key is a string and the value is an integer. Running the 
 script shown below, results in a run-time error.
 {code}
 grunt a = load 'map.txt' as (themap: map[]);
 grunt b = filter a by (chararray)(themap#'key01') == 'hello';
   
 grunt dump b;
 2009-03-18 15:19:03,773 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-03-18 15:19:28,797 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Map reduce job failed
 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1081: Cannot cast to chararray. Expected bytearray but received: int
 {code} 
 There are two ways to resolve this issue:
 1. Change the conversion routine for bytesToMap to return a map where the 
 value is a bytearray and not the actual type. This change breaks backward 
 compatibility
 2. Introduce checks in POCast where conversions that are legal in the type 
 checking world are allowed, i.e., run time checks will be made to check for 
 compatible casts. In the above example, an int can be converted to a 
 chararray and the cast will be made. If on the other hand, it was a chararray 
 to int conversion then an exception will be thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-724) Treating integers and strings in PigStorage

2009-04-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695724#action_12695724
 ] 

Jeff Zhang commented on PIG-724:


In my opinion, it would be better if we can explicit declare the type of key 
and value of map.

like this statement:

A = LOAD 'file:data/c.txt' USING PigStorage() AS (themap: 
map[key:chararray#value:chararray]);



 Treating integers and strings in PigStorage
 ---

 Key: PIG-724
 URL: https://issues.apache.org/jira/browse/PIG-724
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.1
Reporter: Santhosh Srinivasan
 Fix For: 0.2.1


 Currently, PigStorage cannot treats the materialized string 123 as an integer 
 with the value 123. If the user intended this to be the string 123, 
 PigStorage cannot deal with it. This reasoning also applies to doubles. Due 
 to this issue, maps that contain values which are of the same type but 
 manifest the issue discussed at beginning of the paragraph, Pig throws its 
 hands up at runtime.  An example to illustrate the problem will help.
 In the example below a sample row in the data (map.txt) contains the 
 following:
 [key01#35,key02#value01]
 When Pig tries to convert the stream to a map, it creates a MapObject, 
 Object where the key is a string and the value is an integer. Running the 
 script shown below, results in a run-time error.
 {code}
 grunt a = load 'map.txt' as (themap: map[]);
 grunt b = filter a by (chararray)(themap#'key01') == 'hello';
   
 grunt dump b;
 2009-03-18 15:19:03,773 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-03-18 15:19:28,797 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Map reduce job failed
 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1081: Cannot cast to chararray. Expected bytearray but received: int
 {code} 
 There are two ways to resolve this issue:
 1. Change the conversion routine for bytesToMap to return a map where the 
 value is a bytearray and not the actual type. This change breaks backward 
 compatibility
 2. Introduce checks in POCast where conversions that are legal in the type 
 checking world are allowed, i.e., run time checks will be made to check for 
 compatible casts. In the above example, an int can be converted to a 
 chararray and the cast will be made. If on the other hand, it was a chararray 
 to int conversion then an exception will be thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.