[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-09-17 Thread Sandesh Devaraju (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910629#action_12910629
 ] 

Sandesh Devaraju commented on PIG-1229:
---

I narrowed down the problem to org.apache.hadoop.mapred.Task.java lines 411-418.

{code:title=org.apache.hadoop.mapred.Task.java|linenumbers=true|firstline=411}
if (useNewApi) {
  LOG.debug(using new api for output committer);
  outputFormat =
ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job);
  committer = outputFormat.getOutputCommitter(taskContext);
} else {
  committer = conf.getOutputCommitter();
}
{code}

But DBStorage UDF assumes that the OutputFormat is in a closure.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1618) Switch to new parser generator technology

2010-09-17 Thread Alan Gates (JIRA)
Switch to new parser generator technology
-

 Key: PIG-1618
 URL: https://issues.apache.org/jira/browse/PIG-1618
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
 Fix For: 0.9.0


There are many bugs in Pig related to the parser, particularly to bad error 
messages.  After review of Java CC we feel these will be difficult to address 
using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
and maintain.  

ANTLR is being reviewed as the most likely choice to move to, but other parsers 
will be reviewed as well.

This JIRA will act as an umbrella issue for other parser issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1619) Bad error message when a double constant is incorrectly specified

2010-09-17 Thread Alan Gates (JIRA)
Bad error message when a double constant is incorrectly specified
-

 Key: PIG-1619
 URL: https://issues.apache.org/jira/browse/PIG-1619
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


Given the following Pig Latin script (notice that the exponent for the floating 
point is a floating point when it should be a integer)

{code}
A = load '/Users/gates/test/data/studenttab10';
B = foreach A generate $0, 3.0e10.1;
dump B;
{code}

Pig returns
{code}
 ERROR 2999: Unexpected internal error. For input string: 3.0e10.1
{code}

This should be a syntax error caught by the parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1620) ARRANGE keyword should be deprecated

2010-09-17 Thread Alan Gates (JIRA)
ARRANGE keyword should be deprecated


 Key: PIG-1620
 URL: https://issues.apache.org/jira/browse/PIG-1620
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


ARRANGE is a synonym for ORDER in Pig Latin.  As far as I know no one uses it.  
Its use is not documented.  And I am a strong fan of having one way to do 
things in programming languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1621) What does EVAL keyword do?

2010-09-17 Thread Alan Gates (JIRA)
What does EVAL keyword do?
--

 Key: PIG-1621
 URL: https://issues.apache.org/jira/browse/PIG-1621
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


EVAL is listed as a keyword in Pig Latin, in both the documentation and the 
QueryParser.jjt file.  However, it has no productions in the grammar and no 
further mention in the documentation.  We need to either clarify what it does, 
or why we are reserving it as a keyword, or remove it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1622) DEFINE streaming options are ill defined and not properly documented

2010-09-17 Thread Alan Gates (JIRA)
DEFINE streaming options are ill defined and not properly documented


 Key: PIG-1622
 URL: https://issues.apache.org/jira/browse/PIG-1622
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


According to the documentation 
(http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the syntax 
for DEFINE when used to define a streaming command is:

DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, 
...]) CACHE (path [, path, ...])

However, the actual parser accepts something pretty different.  Consider the 
following script:

{code}
define strm `wc -l` INPUT(stdin) 
CACHE('/Users/gates/.vimrc#myvim') 
OUTPUT(stdin)
INPUT('/tmp/fred') 
OUTPUT('/tmp/bob')
SHIP('/Users/gates/.bashrc') 
SHIP('/Users/gates/.vimrc') 
CACHE('/Users/gates/.bashrc#mybash')
stderr('/tmp/errors' limit 10);

A = load '/Users/gates/test/data/studenttab10';
B = stream A through strm;
dump B;
{code}

The above actually parsers.  I see several issues here:

# What do multiple INPUT and OUTPUT statements mean in the context of 
streaming?  These should not be allowed.
# The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not 
enforced by the parser.  We should either enforce the order in the parser or 
update the documentation.  Most likely the latter to avoid breaking existing 
scripts.
# Why are multiple SHIP and CACHE clauses allowed when each can take multiple 
paths?  It seems we should only allow one of each.
# The error clause is completely different that what is given in the 
documentation.  I suspect this is a documentation error and the grammar 
supported by the parser here is what we want.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1479) Embed Pig in scripting languages

2010-09-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1479:
--

Attachment: pig-greek-test.tar

Attach the test script modified based on Julien's comment. As for commend line 
option -g, it can  also use one parameter (script file name) and  let Pig 
determine the script engine by the file extension.



 Embed Pig in scripting languages
 

 Key: PIG-1479
 URL: https://issues.apache.org/jira/browse/PIG-1479
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
 Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, 
 pig-greek-test.tar, pig-greek.tgz


 It should be possible to embed Pig calls in a scripting language and let 
 functions defined in the same script available as UDFs.
 This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
 lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1623) Register syntax is ambiguous

2010-09-17 Thread Alan Gates (JIRA)
Register syntax is ambiguous


 Key: PIG-1623
 URL: https://issues.apache.org/jira/browse/PIG-1623
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


All of the following register statements parse

{code}
register 
/Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar
register 
'/Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar'
register 
'/Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar';
{code}

As far as I know register is the only Pig Latin command that does not require a 
semicolon at the end.  It is also the only command that allows unquoted strings 
for file paths.  We should align this with other similar syntax in Pig Latin.

I order to avoid breaking existing scripts we may need to warn about this 
behavior for a while before no longer supporting it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1624) FOREACH AS documentation is incorrect

2010-09-17 Thread Alan Gates (JIRA)
FOREACH AS documentation is incorrect
-

 Key: PIG-1624
 URL: https://issues.apache.org/jira/browse/PIG-1624
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
 Fix For: 0.9.0


According to the Pig Latin manual 
(http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the 
correct usage of AS in a FOREACH clause is:

{code}
B = foreach A generate $0, $1, $2 as (user, age, gpa);
{code}

However, this is incorrect, and produce a syntax error.  The correct syntax for 
AS for FOREACH is:

{code}
B = foreach A generate $0 as user, $1 as age, $2 as gpa;
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1624) FOREACH AS documentation is incorrect

2010-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910705#action_12910705
 ] 

Alan Gates commented on PIG-1624:
-

I should note as well that when a flatten is involved, the proper syntax is:

{code}
A = load '/Users/gates/test/data/studenttab10' as (name:chararray, b{}, 
gpa:double);
B = foreach A generate name, flatten(b) as (fred, bob, joe), gpa;
dump B;
{code}

Note the use of parenthesis to enclose the list of fields coming from the 
flattened bag.

 FOREACH AS documentation is incorrect
 -

 Key: PIG-1624
 URL: https://issues.apache.org/jira/browse/PIG-1624
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
 Fix For: 0.9.0


 According to the Pig Latin manual 
 (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the 
 correct usage of AS in a FOREACH clause is:
 {code}
 B = foreach A generate $0, $1, $2 as (user, age, gpa);
 {code}
 However, this is incorrect, and produce a syntax error.  The correct syntax 
 for AS for FOREACH is:
 {code}
 B = foreach A generate $0 as user, $1 as age, $2 as gpa;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1624) FOREACH AS documentation is incorrect

2010-09-17 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1624:


Fix Version/s: 0.8.0
   (was: 0.9.0)

We are still updating docs so we should be able to get this in for 0.8

 FOREACH AS documentation is incorrect
 -

 Key: PIG-1624
 URL: https://issues.apache.org/jira/browse/PIG-1624
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
 Fix For: 0.8.0


 According to the Pig Latin manual 
 (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the 
 correct usage of AS in a FOREACH clause is:
 {code}
 B = foreach A generate $0, $1, $2 as (user, age, gpa);
 {code}
 However, this is incorrect, and produce a syntax error.  The correct syntax 
 for AS for FOREACH is:
 {code}
 B = foreach A generate $0 as user, $1 as age, $2 as gpa;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1625) Docs incorreclty say SAMPLE can be used in a nested FOREACH and do not mention projections in nested foreach

2010-09-17 Thread Alan Gates (JIRA)
Docs incorreclty say SAMPLE can be used in a nested FOREACH and do not mention 
projections in nested foreach


 Key: PIG-1625
 URL: https://issues.apache.org/jira/browse/PIG-1625
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
 Fix For: 0.8.0


Currently the docs in 
http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH say that 
SAMPLE can be used as an operator in nested foreach.  It cannot.

Also, they do not mention the ability to do projections inside nested foreach, 
such as the following:

{code}
A = load '/Users/gates/test/data/studenttab10';
B = group A all;
C = foreach B {
C1 = A.$0;
C2 = distinct C1;
generate C2;
}
dump C;
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2

2010-09-17 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1598:
---

Attachment: PIG-1598_0.patch

 Pig gobbles up error messages - Part 2
 --

 Key: PIG-1598
 URL: https://issues.apache.org/jira/browse/PIG-1598
 Project: Pig
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG-1598_0.patch


 Another case of PIG-1531 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2

2010-09-17 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1598:
---

Status: Patch Available  (was: Open)

 Pig gobbles up error messages - Part 2
 --

 Key: PIG-1598
 URL: https://issues.apache.org/jira/browse/PIG-1598
 Project: Pig
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG-1598_0.patch


 Another case of PIG-1531 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1626) Need to clarify how COUNT handles nulls

2010-09-17 Thread Olga Natkovich (JIRA)
Need to clarify how COUNT handles nulls
---

 Key: PIG-1626
 URL: https://issues.apache.org/jira/browse/PIG-1626
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0


The current documentation just states: The COUNT function ignores NULL values. 
If you want to include NULL values in the count computation, use COUNT_STAR. 

The new text should be something like

The COUNT function follows syntax semantics and ignores nulls. What this means 
is that a tuple in the bag will not be counted if the first field in this tuple 
is NULL. If you want to include NULL values in the count computation, use 
COUNT_STAR. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1627) Flattening of bags with unknown schemas produces wrong schema

2010-09-17 Thread Alan Gates (JIRA)
Flattening of bags with unknown schemas produces wrong schema
-

 Key: PIG-1627
 URL: https://issues.apache.org/jira/browse/PIG-1627
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.9.0


The following should produce an unknown schema:

{code}
A = load '/Users/gates/test/data/studenttab10';
B = group A by $0;
C = foreach B generate flatten(A);
describe C;
{code}

Instead it gives
{code}
C: {bytearray}
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1627) Flattening of bags with unknown schemas produces wrong schema

2010-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910757#action_12910757
 ] 

Alan Gates commented on PIG-1627:
-

The problem is in the flatten, not the group.  The group has the proper schema 
(bytearray, bag{}).  Loading a bag of unknown schema and flattening it produces 
the same result.

Flattening a tuple of unknown content has the same problem as well.

 Flattening of bags with unknown schemas produces wrong schema
 -

 Key: PIG-1627
 URL: https://issues.apache.org/jira/browse/PIG-1627
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.9.0


 The following should produce an unknown schema:
 {code}
 A = load '/Users/gates/test/data/studenttab10';
 B = group A by $0;
 C = foreach B generate flatten(A);
 describe C;
 {code}
 Instead it gives
 {code}
 C: {bytearray}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'

2010-09-17 Thread Thejas M Nair (JIRA)
log this message at debug level : 'Pig Internal storage in use'
---

 Key: PIG-1628
 URL: https://issues.apache.org/jira/browse/PIG-1628
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


The temporary storage functions used are logging at the INFO level. This should 
change to debug level, they are reducing the visibility of more useful INFO 
messages. The messages include  'Pig Internal storage in use' from InterStorage 
and  'TFile storage in use' from TFileStorage.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6

2010-09-17 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1508:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch checked in.  Thanks Carl.

 Make 'docs' target (forrest) work with Java 1.6
 ---

 Key: PIG-1508
 URL: https://issues.apache.org/jira/browse/PIG-1508
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.8.0

 Attachments: PIG-1508.patch.txt


 FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with 
 Java 1.6
 The same ticket also suggests a workaround: disabling sitemap and stylesheet 
 validation
 by setting the forrest.validate.sitemap and forrest.validate.stylesheets 
 properties to false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT

2010-09-17 Thread Olga Natkovich (JIRA)
Need ability to limit bags produced during GROUP + LIMIT


 Key: PIG-1629
 URL: https://issues.apache.org/jira/browse/PIG-1629
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Fix For: 0.9.0


Currently, the code below will construct the full group in memory and then trim 
it. This requires in use of more memory than needed.

A = load 'data' as (x, y, z);
B = group A by x;
C = foreach B{
D = limit A 100;
generate group, MyUDF(D);}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.