[jira] Commented: (PIG-724) Treating integers and strings in PigStorage

2009-04-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697054#action_12697054
 ] 

Alan Gates commented on PIG-724:


Currently Pig doesn't require that all keys and values in a map share the same 
type.  There is a proposal to change it so that key types can only be chararray 
(see PIG-734), as we don't see anyone using anything but chararray and the 
generality is causing us some other issues.  But we still wouldn't require that 
all values in a given map be of the same type.  Are you proposing allowing 
users to put a constraint on a given map so that all values in that particular 
map must be of that type?

> Treating integers and strings in PigStorage
> ---
>
> Key: PIG-724
> URL: https://issues.apache.org/jira/browse/PIG-724
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.1
>Reporter: Santhosh Srinivasan
> Fix For: 0.2.1
>
>
> Currently, PigStorage cannot treats the materialized string 123 as an integer 
> with the value 123. If the user intended this to be the string 123, 
> PigStorage cannot deal with it. This reasoning also applies to doubles. Due 
> to this issue, maps that contain values which are of the same type but 
> manifest the issue discussed at beginning of the paragraph, Pig throws its 
> hands up at runtime.  An example to illustrate the problem will help.
> In the example below a sample row in the data (map.txt) contains the 
> following:
> [key01#35,key02#value01]
> When Pig tries to convert the stream to a map, it creates a Map Object> where the key is a string and the value is an integer. Running the 
> script shown below, results in a run-time error.
> {code}
> grunt> a = load 'map.txt' as (themap: map[]);
> grunt> b = filter a by (chararray)(themap#'key01') == 'hello';
>   
> grunt> dump b;
> 2009-03-18 15:19:03,773 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-03-18 15:19:28,797 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1081: Cannot cast to chararray. Expected bytearray but received: int
> {code} 
> There are two ways to resolve this issue:
> 1. Change the conversion routine for bytesToMap to return a map where the 
> value is a bytearray and not the actual type. This change breaks backward 
> compatibility
> 2. Introduce checks in POCast where conversions that are legal in the type 
> checking world are allowed, i.e., run time checks will be made to check for 
> compatible casts. In the above example, an int can be converted to a 
> chararray and the cast will be made. If on the other hand, it was a chararray 
> to int conversion then an exception will be thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-724) Treating integers and strings in PigStorage

2009-04-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695724#action_12695724
 ] 

Jeff Zhang commented on PIG-724:


In my opinion, it would be better if we can explicit declare the type of key 
and value of map.

like this statement:

A = LOAD 'file:data/c.txt' USING PigStorage() AS (themap: 
map[key:chararray#value:chararray]);



> Treating integers and strings in PigStorage
> ---
>
> Key: PIG-724
> URL: https://issues.apache.org/jira/browse/PIG-724
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.1
>Reporter: Santhosh Srinivasan
> Fix For: 0.2.1
>
>
> Currently, PigStorage cannot treats the materialized string 123 as an integer 
> with the value 123. If the user intended this to be the string 123, 
> PigStorage cannot deal with it. This reasoning also applies to doubles. Due 
> to this issue, maps that contain values which are of the same type but 
> manifest the issue discussed at beginning of the paragraph, Pig throws its 
> hands up at runtime.  An example to illustrate the problem will help.
> In the example below a sample row in the data (map.txt) contains the 
> following:
> [key01#35,key02#value01]
> When Pig tries to convert the stream to a map, it creates a Map Object> where the key is a string and the value is an integer. Running the 
> script shown below, results in a run-time error.
> {code}
> grunt> a = load 'map.txt' as (themap: map[]);
> grunt> b = filter a by (chararray)(themap#'key01') == 'hello';
>   
> grunt> dump b;
> 2009-03-18 15:19:03,773 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-03-18 15:19:28,797 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1081: Cannot cast to chararray. Expected bytearray but received: int
> {code} 
> There are two ways to resolve this issue:
> 1. Change the conversion routine for bytesToMap to return a map where the 
> value is a bytearray and not the actual type. This change breaks backward 
> compatibility
> 2. Introduce checks in POCast where conversions that are legal in the type 
> checking world are allowed, i.e., run time checks will be made to check for 
> compatible casts. In the above example, an int can be converted to a 
> chararray and the cast will be made. If on the other hand, it was a chararray 
> to int conversion then an exception will be thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.