[jira] Commented: (PIG-821) simulate NTILE(n) , rank() functionality in pig

2009-05-28 Thread Rekha (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713942#action_12713942
 ] 

Rekha commented on PIG-821:
---

Hi,

I searched the jars and am trying with statistics.QUANTILES for NTILE().Not 
sure if it will work, am trying.
For rank(), please suggest.

Thanks  Regards,
/Rekha.

 simulate NTILE(n) , rank() functionality in pig
 ---

 Key: PIG-821
 URL: https://issues.apache.org/jira/browse/PIG-821
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
 Environment: mithril gold -gateway 4000
Reporter: Rekha
 Fix For: 0.2.0


 Hi,
 I came across a job which has some processing which I cant seem to get easily 
 over-the-counter from pig.
 These are NTILE() /rank() operations available in oracle.
 While I am trying to write a UDF, that is not working out too well for me 
 yet.. :(
 I have a ntile(n) over (partititon by x, y, z order by a desc, b desc) 
 operation to be done in pig scripts.
 Is there a default function in pig scripting which can do this?
 For example, lets consider a simple example at 
 http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions091.htm
 So here, how would we ideally substitute NTILE() with? any pig counterpart 
 function/udf?
 SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC) 
AS quartile FROM employees
WHERE department_id = 100;
  
 LAST_NAME SALARY   QUARTILE
 - -- --
 Greenberg  12000  1
 Faviet  9000  1
 Chen8200  2
 Urman   7800  2
 Sciarra 7700  3
 Popp6900  4
  
 In real case, i have ntile over multiple columns, so ideal way to find 
 histograms/boundary/spitting out the bucket number is needed.
 Similarly a pig function is required for rank() over(partition by a,b,c order 
 by d desc) as e
 Please let me know soon.
 Thanks  Regards,
 /Rekha

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-05-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-619:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
Assignee: Alan Gates
 Fix For: 0.3.0

 Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Proposed design for new merge join in pig

2009-05-28 Thread Alan Gates

http://wiki.apache.org/pig/PigMergeJoin

Alan.


[jira] Updated: (PIG-819) run -param -param; is a valid grunt command

2009-05-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-819:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch reviewed and committed. Thanks, Milind!

 run -param -param; is a valid grunt command
 ---

 Key: PIG-819
 URL: https://issues.apache.org/jira/browse/PIG-819
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
 Environment: all
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: invalidparam.patch


 By mistake, I typed 
 {code}
 run -param -param;
 {code}
 in grunt. And was surprised to find it to be  a valid grunt command.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714209#action_12714209
 ] 

Ashutosh Chauhan commented on PIG-796:
--

Since Pig allows values in a map to be of different types caching the type may 
not be safe. There are two possible alternatives:

a) Find type by introspection every time. This will ensure we are always 
correct and can handle all cases (including when values in maps are of 
different types). This though will incur a performance overhead for every cast 
call.
b) Find the type first time and then cache it for subsequent calls. When 
encountered with different type Pig will bail out with a ClassCastException. 
This will avoid performance overhead but Pig will die when values in maps are 
of different types.

In this performance Vs handling all cases trade-off wondering which route 
should we go ?  

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714212#action_12714212
 ] 

Olga Natkovich commented on PIG-796:


I think we should be safe and check type for every value

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714224#action_12714224
 ] 

Alan Gates commented on PIG-796:


Can options a and b not be combined?  Could we cache the type the first time, 
and if we see the ClassCastException then attempt to infer the type, caching 
whatever we see for the next time?  This will benefit users who have all or 
most of their values of the same type, since we won't be introspecting every 
time.  It will penalize users who's values switch frequently (as exceptions are 
very slow), but it will still work.  I'm guessing the former is much more 
common than the latter.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-807) PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator)

2009-05-28 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714227#action_12714227
 ] 

David Ciemiewicz commented on PIG-807:
--

@Yiping

I see what you mean.  Maybe we should have FOREACH and FORALL as in B = FORALL 
A GENERATE SUM(m);

Another version of this my be B = OVER A GENERATE SUM(m); or B = OVERALL A 
GENERATE SUM(m);


There was a hallway conversation about the situation of:

{code}
B = GROUP A BY key;
C = FOREACH B {
SORTED = ORDER A BY value;
GENERATE
COUNT(SORTED) as count,
QUANTILES(SORTED.value, 0.0, 0.5, 0.75, 0.9, 1.0) as quantiles: 
(p00, p50, p75, p90, p100);
};
{code}

I was told that a ReadOnce bag would not solve this problem because we'd need 
to pass through SORTED twice because there were two UDFs.

I disagree.  It is possible to pass over this data once and only once if we 
create a class of Accumulating or Running functions that differs from the 
current DataBag and AlgebraicDataBag functions.

First, functions like SUM, COUNT, AVG, VAR, MIN, MAX, STDEV, ResevoirSampling, 
statistics.SUMMARY, can all computed on a ReadOnce / Streaming DataBag of 
unknown length or size.  For each of these functions, we simply add or 
accumulate  the values on row at a time, we can invoke a combiner for 
intermediate results across partitions, and produce a final result, all without 
materializing a DataBag as implemented today.

QUANTILES is a different beast.  To compute quantiles, the data must be sorted, 
which I prefer to do outside the UDF at this time.  Also, the COUNT of the data 
is needed a prior.  Fortunately sorting COULD produce a ReadOnce / Streaming 
DataBag of KNOWN as opposed to unknown length or size so only two scans through 
the data (sorting and quantiles) are needed without needing three scans (sort, 
count, quantiles).

So, if Pig could understand two additional data types

ReadOnceSizeUnknown -- COUNT() counts all individual rows
ReadOnceSizeKnown -- COUNT() just returns size attribute of ReadOnce data 
reference

And if Pig had RunningEval and RunningAlgebraicEval classes of functions which 
accumulate values a row at a time, many computations in Pig could be much much 
more efficient.

In case anyone doesn't get what I mean by having running functions, here's 
some Perl code that implements what I'm suggesting. I'll leave it as an 
exercise for the Pig development team to figure out the RunningAlgebraicEval 
versions of these functions/classes. :^)

runningsums.pl
{code}
#! /usr/bin/perl

use RunningSum;
use RunningCount;

$a_count = RunningCount-new();
$a_sum = RunningSum-new();
$b_sum = RunningSum-new();
$c_sum = RunningSum-new();

while ()
{
s/\r*\n*//g;

($a, $b, $c) = split(/\t/);

$a_count-accumulate($a);
$a_sum-accumulate($a);
$b_sum-accumulate($b);
$c_sum-accumulate($c);
}

print join(\t,
$a_count-final(),
$a_sum-final(),
$b_sum-final(),
$c_sum-final()
), \n;
{code}

RunningCount.pm
{code}
package RunningCount;

sub new
{
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}

sub accumulate
{
my $self = shift;
my $value = shift;

$self-{'count'} ++;
}

sub final
{
my $self = shift;
return $self-{'count'};
}

1;
{code}

RunningSum.pl
{code}
package RunningSum;

sub new
{
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}

sub accumulate
{
my $self = shift;
my $value = shift;

$self-{'sum'} += $value;
}

sub final
{
my $self = shift;
return $self-{'sum'};
}

1;
{code}








 PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the 
 Hadoop values iterator)
 

 Key: PIG-807
 URL: https://issues.apache.org/jira/browse/PIG-807
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
 Fix For: 0.3.0


 Currently all bags resulting from a group or cogroup are materialized as bags 
 containing all of the contents. The issue with this is that if a particular 
 key has many corresponding values, all these values get stuffed in a bag 
 which may run out of memory and hence spill causing slow down in performance 
 and sometime memory exceptions. In many cases, the udfs which use these bags 
 coming out a group and cogroup only need to iterate over the bag in a 
 unidirectional read-once manner. This can be implemented by having the bag 
 implement its iterator by simply iterating over the underlying hadoop 
 iterator provided in the reduce. This kind of a bag is also needed in 
 

[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714233#action_12714233
 ] 

Milind Bhandarkar commented on PIG-796:
---

Can't the user simply do:

{code}
foreach input generate (chararray)((int)mymap#'key') as myvalue;
{code}

Minimizing implicit casting is a good thing (tm) anyway.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714234#action_12714234
 ] 

Santhosh Srinivasan commented on PIG-796:
-

Milind,

This issue is in the backend. Users can do that you suggest in the front-end.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714234#action_12714234
 ] 

Santhosh Srinivasan edited comment on PIG-796 at 5/28/09 5:07 PM:
--

Milind,

This issue is in the backend. Users can do what you suggest in the front-end.

  was (Author: sms):
Milind,

This issue is in the backend. Users can do that you suggest in the front-end.
  
 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-802:
---

Status: Patch Available  (was: Open)

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: OrderByOptimization.patch


 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Updated PigMix numbers for latest top of trunk

2009-05-28 Thread Alan Gates

http://wiki.apache.org/pig/PigMix

Alan.


Build failed in Hudson: Pig-Patch-minerva.apache.org #62

2009-05-28 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/62/changes

Changes:

[olga] PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan)

[olga] PIG-819: run -param -param; is a valid grunt command (milindb via olgan)

[gates] Removed extraneous println.

--
[...truncated 91043 lines...]
 [exec] [junit] 09/05/28 19:25:55 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/05/28 19:25:55 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:59197 to delete  blk_2886084364139085012_1005 
blk_5437118033708517285_1006 blk_-1965900046949440212_1004
 [exec] [junit] 09/05/28 19:25:55 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:59482 to delete  blk_2886084364139085012_1005 
blk_5437118033708517285_1006
 [exec] [junit] 09/05/28 19:25:56 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/05/28 19:25:56 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200905281925_0002/job.jar. 
blk_1722768514708010616_1012
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_1722768514708010616_1012 src: /127.0.0.1:44495 dest: /127.0.0.1:36643
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_1722768514708010616_1012 src: /127.0.0.1:59825 dest: /127.0.0.1:59197
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_1722768514708010616_1012 src: /127.0.0.1:49547 dest: /127.0.0.1:59482
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59482 is added to 
blk_1722768514708010616_1012 size 1411066
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Received block 
blk_1722768514708010616_1012 of size 1411066 from /127.0.0.1
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: PacketResponder 0 
for block blk_1722768514708010616_1012 terminating
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Received block 
blk_1722768514708010616_1012 of size 1411066 from /127.0.0.1
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59197 is added to 
blk_1722768514708010616_1012 size 1411066
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: PacketResponder 1 
for block blk_1722768514708010616_1012 terminating
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Received block 
blk_1722768514708010616_1012 of size 1411066 from /127.0.0.1
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:36643 is added to 
blk_1722768514708010616_1012 size 1411066
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: PacketResponder 2 
for block blk_1722768514708010616_1012 terminating
 [exec] [junit] 09/05/28 19:25:56 INFO 
mapReduceLayer.MapReduceLauncher: 0% complete
 [exec] [junit] 09/05/28 19:25:56 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200905281925_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/05/28 19:25:56 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200905281925_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200905281925_0002/job.split. 
blk_3936907331388633924_1013
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_3936907331388633924_1013 src: /127.0.0.1:44581 dest: /127.0.0.1:49484
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_3936907331388633924_1013 src: /127.0.0.1:49549 dest: /127.0.0.1:59482
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Receiving block 
blk_3936907331388633924_1013 src: /127.0.0.1:59829 dest: /127.0.0.1:59197
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Received block 
blk_3936907331388633924_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: PacketResponder 0 
for block blk_3936907331388633924_1013 terminating
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59197 is added to 
blk_3936907331388633924_1013 size 14547
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.DataNode: Received block 
blk_3936907331388633924_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/05/28 19:25:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59482 is added to 

[jira] Created: (PIG-822) Flatten semantics are unknown

2009-05-28 Thread George Mavromatis (JIRA)
Flatten semantics are unknown
-

 Key: PIG-822
 URL: https://issues.apache.org/jira/browse/PIG-822
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: George Mavromatis
Priority: Critical


There is no formal specification of the flatten keyword in 
http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html 
There are only some examples.

I have found flatten to be very fragile and unpredictable with the data types 
it reads and creates. I have wasted too many hours (and Viraj too) trying to 
figure out its peculiarities, the latest of which is here: 
http://bug.corp.yahoo.com/show_bug.cgi?id=2768016 comment #15

Please document:
Flatten to be explained formally in its own dedicated section: What are the 
valid input types, the output types it creates, what transformation it does 
from input to output and how the resulting data are named.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714278#action_12714278
 ] 

Milind Bhandarkar commented on PIG-796:
---

So, can we live with the classcastexception generated by the front end ? I 
recall reading somewhere that pigs do what they are told. If they are told to 
do things that are even impossible for humans to comprehend, i.e. somehow 
interpret a byte array to be an integer, and then to convert them to strings, 
how would they cope up ?

IMHO, eliminating such implicit casts would reduce complexity of pig, and would 
fit in the pig philosphy. But that means being able to convert everything to a 
chararray at most. If someone request a chararray cast of a bytearray, give 
them a hex representation, and have them write a UDF to convert hex string to 
string (i.e. toInt('0x'+myvalue) in the above code.)

thoughts ?

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714279#action_12714279
 ] 

Milind Bhandarkar commented on PIG-796:
---

Modifying my earlier comment:

 So, can we live with the classcastexception generated by the front end ?

I meant the back end of course.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-05-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714283#action_12714283
 ] 

Santhosh Srinivasan commented on PIG-796:
-

Milind,

You are generalizing a specific problem. Pig can convert a byte array to an 
integer and then to a string as long as the byte array is convertible to an 
integer. The problem being discussed is for bytes that come out of a Map. The 
title of this jira is incorrect as I have pointed out in my first comment.

Regarding ClassCastExceptions, Pig fails and the script aborts; Here, I am 
excluding less than a handful of cases where we do not bail out.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.