[ https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783733#action_12783733 ]
Olga Natkovich commented on PIG-1031: ------------------------------------- For starters we should stop interpreting the data in PigStorage > PigStorage interpreting chararray/bytearray for a tuple element inside a bag > as float or double > ----------------------------------------------------------------------------------------------- > > Key: PIG-1031 > URL: https://issues.apache.org/jira/browse/PIG-1031 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.5.0 > Reporter: Viraj Bhat > Fix For: 0.6.0 > > > I have a data stored in a text file as: > {(4153E765)} > {(AF533765)} > I try reading it using PigStorage as: > {code} > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term:bytearray)}); > dump A; > {code} > I get the following results: > ({(Infinity)}) > ({(AF533765)}) > The problem seems to be with the method: parseFromBytes(byte[] b) in class > Utf8StorageConverter. This method uses the TextDataParser (class generated > via jjt) to interpret the type of data from content, even though the schema > tells it is a bytearray. > TextDataParser.jjt sample code > {code} > TOKEN : > { > ... > < DOUBLENUMBER: (["-","+"])? <FLOATINGPOINT> ( ["e","E"] ([ "-","+"])? > <FLOATINGPOINT> )?> > < FLOATNUMBER: <DOUBLENUMBER> (["f","F"])? > > ... > } > {code} > I tried the following options, but it will not work as we need to call > bytesToBag(byte[] b) in the Utf8StorageConverter class. > {code} > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term)}); > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term:chararray)}); > {code} > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.