[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769032#action_12769032
 ] 

Hadoop QA commented on PIG-598:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422928/PIG-598.patch
  against trunk revision 828891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 48 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/111/console

This message is automatically generated.

> Parameter substitution ($PARAMETER) should not be performed in comments
> ---
>
> Key: PIG-598
> URL: https://issues.apache.org/jira/browse/PIG-598
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Thejas M Nair
> Attachments: PIG-598.patch
>
>
> Compiling the following code example will generate an error that 
> $NOT_A_PARAMETER is an Undefined Parameter.
> This is problematic as sometimes you want to comment out parts of your code, 
> including parameters so that you don't have to define them.
> This I think it would be really good if parameter substitution was not 
> performed in comments.
> {code}
> -- $NOT_A_PARAMETER
> {code}
> {code}
> -bash-3.00$ pig -exectype local -latest comment.pig
> USING: /grid/0/gs/pig/current
> java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
> at 
> org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
> at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
> at org.apache.pig.Main.main(Main.java:296)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #598

2009-10-22 Thread Apache Hudson Server
See 

Changes:

[olga] PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new
String(String) constructor (olgan)

[gates] PIG-984:  Add map side grouping for data that is already collected when 
it is read into the map.

[gates] PIG-1025: Add ability to set job priority from Pig Latin script.

[olga] PIG-1039: documentation update (chandec via olgan)

--
[...truncated 2547 lines...]

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-10-23_02-19-23
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 72ms :: artifacts dl 5ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/5ms)

buildJar:
 [echo] svnString 828918
  [jar] Building jar: 

 [copy] Copying 1 file to 


jarWithOutSvn:

findbugs:
[mkdir] Created dir: 

 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 205
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 

 [xslt] Processing 

 to 

 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 56 seconds
+ mv build/pig-2009-10-23_02-19-23.tar.gz 

+ mv build/test/findbugs 

+ mv build/docs/api 

+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 

   [delete] Deleting directory 

   [delete] Deleting directory 

   [delete] Deleting directory 


BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 

[clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732)
[clover-setup] Loaded from: 
/homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar

[jira] Commented: (PIG-1041) javac warnings: cast, fallthrough, serial

2009-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769026#action_12769026
 ] 

Hadoop QA commented on PIG-1041:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422919/PIG-1041-1.patch
  against trunk revision 828773.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/110/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/110/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/110/console

This message is automatically generated.

> javac warnings: cast, fallthrough, serial
> -
>
> Key: PIG-1041
> URL: https://issues.apache.org/jira/browse/PIG-1041
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1041-1.patch
>
>
> Pig have javac warnings when you build it with the option "-Dall.warnings=1". 
> We need to suppress all of them. This issue is to track the javac warnings in 
> the following categories:
> cast (49)
> fallthrough (1)
> serial (19)
> The number in the parenthesis is the times of occurrence of particular javac 
> warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1042) javac warnings: unchecked

2009-10-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1042:


Attachment: PIG-1042-1.patch

>  javac warnings: unchecked
> --
>
> Key: PIG-1042
> URL: https://issues.apache.org/jira/browse/PIG-1042
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1042-1.patch
>
>
> Pig have 164 javac warnings when you build it with the option 
> "-Dall.warnings=1" which fall into category "unchecked". We need to suppress 
> all of them

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1042) javac warnings: unchecked

2009-10-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1042:


Attachment: (was: PIG-1042-1.patch)

>  javac warnings: unchecked
> --
>
> Key: PIG-1042
> URL: https://issues.apache.org/jira/browse/PIG-1042
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1042-1.patch
>
>
> Pig have 164 javac warnings when you build it with the option 
> "-Dall.warnings=1" which fall into category "unchecked". We need to suppress 
> all of them

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1042) javac warnings: unchecked

2009-10-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1042:


Attachment: PIG-1042-1.patch

javacc related warnings are still there, we need to find a way to suppress that.

>  javac warnings: unchecked
> --
>
> Key: PIG-1042
> URL: https://issues.apache.org/jira/browse/PIG-1042
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1042-1.patch
>
>
> Pig have 164 javac warnings when you build it with the option 
> "-Dall.warnings=1" which fall into category "unchecked". We need to suppress 
> all of them

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1048) inner join using 'skewed' produces multiple rows for keys with single row in both input relations

2009-10-22 Thread Thejas M Nair (JIRA)
inner join using 'skewed' produces multiple rows for keys with single row in 
both input relations
-

 Key: PIG-1048
 URL: https://issues.apache.org/jira/browse/PIG-1048
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


${code}
grunt> cat students.txt   
asdfxc  M   23  12.44
qwerF   21  14.44
uhsdf   M   34  12.11
zxldf   M   21  12.56
qwerF   23  145.5
oiueM   54  23.33

 l1 = load 'students.txt';
l2 = load 'students.txt';  
j = join l1 by $0, l2 by $0 ; 
store j into 'tmp.txt' 

grunt> cat tmp.txt
oiueM   54  23.33   oiueM   54  23.33
oiueM   54  23.33   oiueM   54  23.33
qwerF   21  14.44   qwerF   21  14.44
qwerF   21  14.44   qwerF   23  145.5
qwerF   23  145.5   qwerF   21  14.44
qwerF   23  145.5   qwerF   23  145.5
uhsdf   M   34  12.11   uhsdf   M   34  12.11
uhsdf   M   34  12.11   uhsdf   M   34  12.11
zxldf   M   21  12.56   zxldf   M   21  12.56
zxldf   M   21  12.56   zxldf   M   21  12.56
asdfxc  M   23  12.44   asdfxc  M   23  12.44
asdfxc  M   23  12.44   asdfxc  M   23  12.44$


${code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Attachment: pinned_options_2.patch

- "using regular" will now translate into the default join method (hash join at 
the moment).
- "using hash" is an option in case we change what "regular" is
- renamed REGULAR to HASH in the Join code
- added the new group type option to CoGROUP, pinned where appropriate.

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1047) FINDBUGS: URF_UNREAD_FIELD: Unread field

2009-10-22 Thread Olga Natkovich (JIRA)
FINDBUGS: URF_UNREAD_FIELD: Unread field


 Key: PIG-1047
 URL: https://issues.apache.org/jira/browse/PIG-1047
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich


UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer.chunkSize
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.seen
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.log
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer.log
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MRPrinter.mIndent
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.load
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.log
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.r
UrF Unread field: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PlanPrinter.printer
UrF Unread field: 
org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.writeHeaderFooter
UrF Unread field: org.apache.pig.builtin.BinStorage.i
UrF Unread field: org.apache.pig.builtin.PigStorage.os
UrF Unread field: org.apache.pig.builtin.Utf8StorageConverter.pigLogger
UrF Unread field: 
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.mOriginalPlan
UrF Unread field: org.apache.pig.impl.logicalLayer.LOPrinter.printer
UrF Unread field: 
org.apache.pig.impl.plan.optimizer.RuleMatcher.mCommonNodes
UrF Unread field: org.apache.pig.impl.plan.optimizer.RulePlanPrinter.printer
UrF Unread field: org.apache.pig.impl.plan.PlanPrinter.printer
UrF Unread field: org.apache.pig.pen.DerivedDataVisitor.pc
UrF Unread field: org.apache.pig.PigServer.cachedScript
UuF Unused field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.pc
UuF Unused field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.sFile
UuF Unused field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.storer
UuF Unused field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.numQuantiles
UuF Unused field: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.samples

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-22 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1032.
-

Resolution: Fixed

patch committed

> FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
> constructor
> ---
>
> Key: PIG-1032
> URL: https://issues.apache.org/jira/browse/PIG-1032
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
> Attachments: PIG-1032.patch
>
>
> DmMethod 
> org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
> toString() method on a String
> Dm
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  invokes inefficient new String(String) constructor
> Dmnew org.apache.pig.data.TimestampedTuple(String, String, int, 
> SimpleDateFormat) invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.io.PigNullableWritable.toString() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
> invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
>  LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
> Boolean.valueOf(...) instead]
> Dm
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
>  Object) forces garbage collection; extremely dubious except in benchmarking 
> code
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
> invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768956#action_12768956
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

David,

If / when I get complex schemas to work, this could theoretically be promoted 
to PigStorage proper, which would be cool. For now, if you try to deserialize a 
complex schema, everything blows up.. So that's not so good (especially since I 
let you serialize complex schemas! Actually maybe I should turn that off).

I'll add some docs on the next iteration, good call.  Briefly -- it's a JSON 
representation of the ResourceSchema, as described on the LoadStore redesign 
proposal: http://wiki.apache.org/pig/LoadStoreRedesignProposal . Once you know 
what the fields are, it's pretty easy to read; the one complexity is that types 
are represented using constants from the DataType class, which are not publicly 
documented.

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.6.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1046) join algorithm specification is within double quotes

2009-10-22 Thread Thejas M Nair (JIRA)
join algorithm specification is within double quotes


 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


This fails -
j = join l1 by $0, l2 by $0 using 'skewed';
This works -
j = join l1 by $0, l2 by $0 using "skewed";

String constants are single-quoted in pig-latin. If the algorithm specification 
is supposed to be a string, specifying it within single quotes should be 
supported.
Alternatively, we should be using identifiers here, since these are pre-defined 
in pig users will not be specifying arbitrary values that might not be valid 
identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1045) Integration with Hadoop 20 New API

2009-10-22 Thread Richard Ding (JIRA)
Integration with Hadoop 20 New API
--

 Key: PIG-1045
 URL: https://issues.apache.org/jira/browse/PIG-1045
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding


Hadoop 21 is not yet released but we know that switch to new MR API is coming 
there. This JIRA is for early integration with the portion of this API that has 
been implemented in Hadoop 20.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768949#action_12768949
 ] 

Daniel Dai commented on PIG-1032:
-

+1. Target findbugs warnings suppressed.

> FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
> constructor
> ---
>
> Key: PIG-1032
> URL: https://issues.apache.org/jira/browse/PIG-1032
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
> Attachments: PIG-1032.patch
>
>
> DmMethod 
> org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
> toString() method on a String
> Dm
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  invokes inefficient new String(String) constructor
> Dmnew org.apache.pig.data.TimestampedTuple(String, String, int, 
> SimpleDateFormat) invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.io.PigNullableWritable.toString() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
> invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
>  LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
> Boolean.valueOf(...) instead]
> Dm
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
>  Object) forces garbage collection; extremely dubious except in benchmarking 
> code
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
> invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768948#action_12768948
 ] 

David Ciemiewicz commented on PIG-760:
--

@Dimitry:

> Still flat-schemas only, haven't gotten around to wrestling the Jackson 
> Parser on this one. David - do you need nested schemas?

No, at this time, and for the foreseeable future, I do not need nested schemes 
(hierarchical schemes) in relationship to PigStorage.

In general, PigStorage users are doing single atomic value per column. So your 
current implementation sounds like it will suffice for now, with appropriate 
caveats in any docs.

BTW, is there a pointer to documentation of the .pig_schema file likes like.  
Guess there should be one for .pig_header too. :-)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.6.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1044) [zebra] Zebra should implement ReversibleLoadStoreFunc interface

2009-10-22 Thread Olga Natkovich (JIRA)
[zebra] Zebra should implement ReversibleLoadStoreFunc interface


 Key: PIG-1044
 URL: https://issues.apache.org/jira/browse/PIG-1044
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich


This will allow some extra optimizations in Pig and will insure that Pig + 
Zebra have the best possible performance.

Need to do:

- combine current load and store classes
- implement the interface (no additional functions needed)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Attachment: pigstorageschema-2.patch

New patch to address findbugs and make the classes a little nicer to use.

Made internal fields protected, since having them public *and* having 
getters/setters didn't really make sense.

Setters now return "this", so that they can be chained.

Array setters make a copy of the passed in array.  Getters return the internal 
array, so it's still possible to shoot oneself in the foot (as findbugs points 
out), but side-effecting those arrays is the intended use case.

Still flat-schemas only, haven't gotten around to wrestling the Jackson Parser 
on this one. David -- do you need nested schemas?

Submitting as a patch so that Hudson can have a go. Would appreciate code 
comments, especially with regards to the interfaces (and changes I made to 
them) from the Load/Store redesign proposal. 

We probably want to hold off on commiting this until the new interfaces settle 
in a bit.

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.6.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.6.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Status: Open  (was: Patch Available)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Attachments: pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1043) FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class

2009-10-22 Thread Olga Natkovich (JIRA)
FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class


 Key: PIG-1043
 URL: https://issues.apache.org/jira/browse/PIG-1043
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich


SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer$AlgebraicPlanChecker
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer$DistinctPatcher
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer$fixMapProjects
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.KeyTypeDiscoveryVisitor$PhyPlanKeyTypeVisitor
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl$StoreFuncAdaptor
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$FindKeyTypeVisitor
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$RemovableStore
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.DotMRPrinter$InnerOperator
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$PackageDiscoverer
 be a _static_ inner class?
SIC Should 
org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator
 be a _static_ inner class?
SIC Should 
org.apache.pig.data.DistinctDataBag$DistinctDataBagIterator$TContainer be a 
_static_ inner class?
SIC Should org.apache.pig.data.SortedDataBag$DefaultComparator be a 
_static_ inner class?
SIC Should 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor$ScoreFuncSpecListComparator
 be a _static_ inner class?
SIC Should org.apache.pig.shock.SSHSocketImplFactory$SSHProcess be a 
_static_ inner class?
SIC Should org.apache.pig.tools.grunt.GruntParser$ExplainState be a 
_static_ inner class?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1042) javac warnings: unchecked

2009-10-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768890#action_12768890
 ] 

Daniel Dai commented on PIG-1042:
-

We can sub-divide unchecked warnings into several categories:
1. Logical plan related type conversion checking. Logical layer use generics 
extensively and there are places we do not use them consistently. We shall 
address these issues in our logical plan rework. Currently, I simply suppress 
them and put a note on that. We will look back later
2. Backend related type conversion checking. We skip type conversion checking 
intentionally for performance reasons. For these, I will simply suppress them
3. javacc generated warnings. We shall find a way to address them
4. Some valid warnings we shall fix them

>  javac warnings: unchecked
> --
>
> Key: PIG-1042
> URL: https://issues.apache.org/jira/browse/PIG-1042
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
>
> Pig have 164 javac warnings when you build it with the option 
> "-Dall.warnings=1" which fall into category "unchecked". We need to suppress 
> all of them

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768870#action_12768870
 ] 

Daniel Dai commented on PIG-1032:
-

I am reviewing the patch.

> FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
> constructor
> ---
>
> Key: PIG-1032
> URL: https://issues.apache.org/jira/browse/PIG-1032
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
> Attachments: PIG-1032.patch
>
>
> DmMethod 
> org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
> toString() method on a String
> Dm
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  invokes inefficient new String(String) constructor
> Dmnew org.apache.pig.data.TimestampedTuple(String, String, int, 
> SimpleDateFormat) invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.io.PigNullableWritable.toString() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
> invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
>  LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
> Boolean.valueOf(...) instead]
> Dm
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
>  Object) forces garbage collection; extremely dubious except in benchmarking 
> code
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
> invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-984:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Richard.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch, 
> PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-927) null should be handled consistently in Join

2009-10-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768825#action_12768825
 ] 

Alan Gates commented on PIG-927:


Sorry, I missed the \t at the end of the line.  Test looks good.  +1

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch, PIG-927-2.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-10-22 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-598:
---

Status: Patch Available  (was: Open)

> Parameter substitution ($PARAMETER) should not be performed in comments
> ---
>
> Key: PIG-598
> URL: https://issues.apache.org/jira/browse/PIG-598
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Thejas M Nair
> Attachments: PIG-598.patch
>
>
> Compiling the following code example will generate an error that 
> $NOT_A_PARAMETER is an Undefined Parameter.
> This is problematic as sometimes you want to comment out parts of your code, 
> including parameters so that you don't have to define them.
> This I think it would be really good if parameter substitution was not 
> performed in comments.
> {code}
> -- $NOT_A_PARAMETER
> {code}
> {code}
> -bash-3.00$ pig -exectype local -latest comment.pig
> USING: /grid/0/gs/pig/current
> java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
> at 
> org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
> at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
> at org.apache.pig.Main.main(Main.java:296)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader

2009-10-22 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768818#action_12768818
 ] 

Pradeep Kamath commented on PIG-879:


We may want to address this as part of implementing 
http://wiki.apache.org/pig/LoadStoreRedesignProposal. Option 4 seems the most 
extensible since loadfuncs get flexibility of dealing with the location string. 
At the same time we could provide some utility method to LoadFunc writers which 
they can use or refer to and this can be provided by pig as a static utility 
method (this can be the implementation that Pig internally uses in its builtin 
loaders)

> Pig should provide a way for input location string in load statement to be 
> passed as-is to the Loader
> -
>
> Key: PIG-879
> URL: https://issues.apache.org/jira/browse/PIG-879
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>
>  Due to multiquery optimization, Pig always converts the filenames to 
> absolute URIs (see 
> http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section 
> about Incompatible Changes - Path Names and Schemes). This is necessary since 
> the script may have "cd .." statements between load or store statements and 
> if the load statements have relative paths, we would need to convert to 
> absolute paths to know where to load/store from. To do this 
> QueryParser.massageFilename() has the code below[1] which basically gives the 
> fully qualified hdfs path
>  
> However the issue with this approach is that if the filename string is 
> something like 
> "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2",
>  the code below[1] actually translates this to 
> hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2
>  and throws an exception that it is an incorrect path.
>  
> Some loaders may want to interpret the filenames (the input location string 
> in the load statement) in any way they wish and may want Pig to not make 
> absolute paths out of them.
>  
> There are a few options to address this:
> 1)A command line switch to indicate to Pig that pathnames in the script 
> are all absolute and hence Pig should not alter them and pass them as-is to 
> Loaders and Storers. 
> 2)A keyword in the load and store statements to indicate the same intent 
> to pig
> 3)A property which users can supply on cmdline or in pig.properties to 
> indicate the same intent.
> 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String 
> curDir) which does the conversion to absolute - this way Loader can chose to 
> implement it as a noop.
> Thoughts?
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768819#action_12768819
 ] 

Olga Natkovich commented on PIG-1032:
-

I ran all unit tests manually. Can one of the committers, please, review.

> FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
> constructor
> ---
>
> Key: PIG-1032
> URL: https://issues.apache.org/jira/browse/PIG-1032
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
> Attachments: PIG-1032.patch
>
>
> DmMethod 
> org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
> toString() method on a String
> Dm
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  invokes inefficient new String(String) constructor
> Dmnew org.apache.pig.data.TimestampedTuple(String, String, int, 
> SimpleDateFormat) invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.io.PigNullableWritable.toString() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
> invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
>  LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
> Boolean.valueOf(...) instead]
> Dm
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
>  Object) forces garbage collection; extremely dubious except in benchmarking 
> code
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
> invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-22 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1032:


Attachment: PIG-1032.patch

> FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
> constructor
> ---
>
> Key: PIG-1032
> URL: https://issues.apache.org/jira/browse/PIG-1032
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
> Attachments: PIG-1032.patch
>
>
> DmMethod 
> org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
> toString() method on a String
> Dm
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
>  invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  invokes inefficient new String(String) constructor
> Dmnew org.apache.pig.data.TimestampedTuple(String, String, int, 
> SimpleDateFormat) invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.io.PigNullableWritable.toString() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dmorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
> Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
>  invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
> Dm
> org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
> invokes inefficient new String(String) constructor
> Dm
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
>  LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
> Boolean.valueOf(...) instead]
> Dm
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
>  Object) forces garbage collection; extremely dubious except in benchmarking 
> code
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
> invokes inefficient new String(String) constructor
> Dmorg.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
> inefficient new String(String) constructor
> Dmorg.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
> invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1028) FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number constructor; use static valueOf instead

2009-10-22 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1028:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch was coommitted yesterday

> FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number constructor; use 
> static valueOf instead
> ---
>
> Key: PIG-1028
> URL: https://issues.apache.org/jira/browse/PIG-1028
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1028.patch
>
>
> BxMethod 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.getStatistics() 
> invokes inefficient new Long(long) constructor; use Long.valueOf(long) instead
> BxMethod org.apache.pig.backend.hadoop.datastorage.HDataStorage.init() 
> invokes inefficient new Short(short) constructor; use Short.valueOf(short) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.datastorage.HPath.getConfiguration() invokes 
> inefficient new Long(long) constructor; use Long.valueOf(long) instead
> BxMethod 
> org.apache.pig.backend.hadoop.datastorage.HPath.getConfiguration() invokes 
> inefficient new Short(short) constructor; use Short.valueOf(short) instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.addShiftedKeyInfoIndex(int,
>  POPackage) invokes inefficient new Integer(int) constructor; use 
> Integer.valueOf(int) instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Mod.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Mod.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Subtract.getNext(Integer)
>  invokes inefficient new Integer(int) constructor; use Integer.valueOf(int) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Subtract.getNext(Long)
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.setIndex(int,
>  boolean) invokes inefficient new Byte(byte) constructor; use 
> Byte.valueOf(byte) instead
> BxMethod 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrangeForIllustrate.constructLROutput(List,
>  Tuple) invokes inefficient new Byte(byte) constructor; use 
> Byte.valueOf(byte) instead
> BxMethod org.apache.pig.builtin.ARITY.exec(Tuple) invokes inefficient new 
> Integer(int) constructor; us

[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-10-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-598:
--

Attachment: PIG-598.patch

With this patch 
- Parameters in comments are no longer substituted
- Line numbers don't change after parameter substitution, as long as 
declare/default don't span multiple lines (no multi-line literals etc). This 
will help in giving accurate line numbers in error messages after parameter 
substitution.

Code changes:
Instead of the parser processing the pig script a line at a time, the whole 
script is processed at once.
There are changes in test cases, because original line numbers are now 
preserved.



> Parameter substitution ($PARAMETER) should not be performed in comments
> ---
>
> Key: PIG-598
> URL: https://issues.apache.org/jira/browse/PIG-598
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Thejas M Nair
> Attachments: PIG-598.patch
>
>
> Compiling the following code example will generate an error that 
> $NOT_A_PARAMETER is an Undefined Parameter.
> This is problematic as sometimes you want to comment out parts of your code, 
> including parameters so that you don't have to define them.
> This I think it would be really good if parameter substitution was not 
> performed in comments.
> {code}
> -- $NOT_A_PARAMETER
> {code}
> {code}
> -bash-3.00$ pig -exectype local -latest comment.pig
> USING: /grid/0/gs/pig/current
> java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
> at 
> org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
> at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
> at org.apache.pig.Main.main(Main.java:296)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-267) Don't substitute parameters inside comments

2009-10-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-267.
---

Resolution: Duplicate

Duplicate of PIG-598

> Don't substitute parameters inside comments
> ---
>
> Key: PIG-267
> URL: https://issues.apache.org/jira/browse/PIG-267
> Project: Pig
>  Issue Type: Bug
>Reporter: Amir Youssefi
>Priority: Trivial
> Attachments: Pig267_ParamFix.patch
>
>
> A script with $x in comments fails because Pig thinks it's an undefined 
> parameter. One approach to address it is to skip substitution for comments. 
> java.lang.RuntimeException: Undefined parameter : x
> at 
> org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
> at org.apache.pig.Main.runParamPreprocessor(Main.java:382)
> at org.apache.pig.Main.main(Main.java:284)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-10-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-598:
-

Assignee: Thejas M Nair

> Parameter substitution ($PARAMETER) should not be performed in comments
> ---
>
> Key: PIG-598
> URL: https://issues.apache.org/jira/browse/PIG-598
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Thejas M Nair
>
> Compiling the following code example will generate an error that 
> $NOT_A_PARAMETER is an Undefined Parameter.
> This is problematic as sometimes you want to comment out parts of your code, 
> including parameters so that you don't have to define them.
> This I think it would be really good if parameter substitution was not 
> performed in comments.
> {code}
> -- $NOT_A_PARAMETER
> {code}
> {code}
> -bash-3.00$ pig -exectype local -latest comment.pig
> USING: /grid/0/gs/pig/current
> java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
> at 
> org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
> at 
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
> at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
> at org.apache.pig.Main.main(Main.java:296)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1041) javac warnings: cast, fallthrough, serial

2009-10-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1041:


Attachment: PIG-1041-1.patch

Attached patch address target javac warnings in the code level. Still couple of 
warnings from javacc which is out of our control, we will suppress it in 
build.xml. The change of build.xml will be included in PIG-1033. 

> javac warnings: cast, fallthrough, serial
> -
>
> Key: PIG-1041
> URL: https://issues.apache.org/jira/browse/PIG-1041
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1041-1.patch
>
>
> Pig have javac warnings when you build it with the option "-Dall.warnings=1". 
> We need to suppress all of them. This issue is to track the javac warnings in 
> the following categories:
> cast (49)
> fallthrough (1)
> serial (19)
> The number in the parenthesis is the times of occurrence of particular javac 
> warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1041) javac warnings: cast, fallthrough, serial

2009-10-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1041:


Status: Patch Available  (was: Open)

> javac warnings: cast, fallthrough, serial
> -
>
> Key: PIG-1041
> URL: https://issues.apache.org/jira/browse/PIG-1041
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1041-1.patch
>
>
> Pig have javac warnings when you build it with the option "-Dall.warnings=1". 
> We need to suppress all of them. This issue is to track the javac warnings in 
> the following categories:
> cast (49)
> fallthrough (1)
> serial (19)
> The number in the parenthesis is the times of occurrence of particular javac 
> warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1041) javac warnings: cast, fallthrough, serial

2009-10-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768771#action_12768771
 ] 

Daniel Dai commented on PIG-1041:
-

The plan is: 
1. Suppressing warnings in the code level as much as possible
2. Some "cast" warnings are from the generated code of javacc, which is out of 
our control. As a result, we will suppress all "cast" warnings in build.xml.

> javac warnings: cast, fallthrough, serial
> -
>
> Key: PIG-1041
> URL: https://issues.apache.org/jira/browse/PIG-1041
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
>
> Pig have javac warnings when you build it with the option "-Dall.warnings=1". 
> We need to suppress all of them. This issue is to track the javac warnings in 
> the following categories:
> cast (49)
> fallthrough (1)
> serial (19)
> The number in the parenthesis is the times of occurrence of particular javac 
> warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson testing of patches

2009-10-22 Thread Alan Gates
We've had many questions on this, so I'm sending this to everyone on  
the dev list in hopes of clarifying the situation.  Our Hudson setup  
for testing patches is falsely returning failures on all or most unit  
tests for all patches.  So if you submit a patch and all the unit  
tests fail, don't worry.  We are working on getting Hudson fixed.  We  
committers are working through the patch queue manually, running the  
unit tests ourselves.  As we don't work all night like Hudson and each  
run of the unit tests takes about 3 hours, this is going slowly.  But  
please know we will get to your patches, even if it takes us a day or  
two.


Alan.


[jira] Created: (PIG-1042) javac warnings: unchecked

2009-10-22 Thread Daniel Dai (JIRA)
 javac warnings: unchecked
--

 Key: PIG-1042
 URL: https://issues.apache.org/jira/browse/PIG-1042
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


Pig have 164 javac warnings when you build it with the option 
"-Dall.warnings=1" which fall into category "unchecked". We need to suppress 
all of them

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1041) javac warnings: cast, fallthrough, serial

2009-10-22 Thread Daniel Dai (JIRA)
javac warnings: cast, fallthrough, serial
-

 Key: PIG-1041
 URL: https://issues.apache.org/jira/browse/PIG-1041
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


Pig have javac warnings when you build it with the option "-Dall.warnings=1". 
We need to suppress all of them. This issue is to track the javac warnings in 
the following categories:

cast (49)
fallthrough (1)
serial (19)

The number in the parenthesis is the times of occurrence of particular javac 
warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1025:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Kevin for the contribution and for being patient with 
the build system.

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Assignee: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch, PIG-1025_2.patch, 
> TEST-org.apache.pig.test.TestFRJoin.txt
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1033) javac warnings: deprecated hadoop APIs

2009-10-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768741#action_12768741
 ] 

Olga Natkovich commented on PIG-1033:
-

I think it is ok to go with the option 1 for now with the understanding that we 
clean things up as part of the transition to Hadoop 21

> javac warnings: deprecated hadoop APIs
> --
>
> Key: PIG-1033
> URL: https://issues.apache.org/jira/browse/PIG-1033
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
> Fix For: 0.6.0
>
>
> Suppress javac warnings related to deprecated hadoop APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1040) FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be

2009-10-22 Thread Olga Natkovich (JIRA)
FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be
-

 Key: PIG-1040
 URL: https://issues.apache.org/jira/browse/PIG-1040
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich


MS  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.USER_COMPARATOR_MARKER
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.weightedParts
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce.sJobConf
 isn't final and can't be protected from malicious code
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.bagFactory
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.reporter
 isn't final and can't be protected from malicious code
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.pigLogger
 should be package protected
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyBag
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyBool
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyDBA
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyDouble
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyFloat
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyInt
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyLong
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyMap
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyString
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.dummyTuple
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.mTupleFactory
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.mTupleFactory
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.mBagFactory
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.mTupleFactory
 isn't final but should be
MS  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.mTupleFactory
 isn't final but should be
MS  org.apache.pig.builtin.PigDump.recordDelimiter isn't final but should be
MS  org.apache.pig.impl.builtin.GFCross.DEFAULT_PARALLELISM isn't final but 
should be
MS  org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.classloader isn't 
final and can't be protected from malicious code
MS  org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.mOpToCloneMap 
should be package protected
MS  
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.canonicalNamer isn't 
final but should be
MS  
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.castLookup 
isn't final but should be
MS  org.apache.pig.impl.plan.OperatorPlan.log isn't final but should be

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1025:
---

Assignee: Kevin Weil

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Assignee: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch, PIG-1025_2.patch, 
> TEST-org.apache.pig.test.TestFRJoin.txt
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1039) Pig 0.5 Doc Updates

2009-10-22 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1039:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to both trunk and branch-05. Thanks, Corinne!

> Pig 0.5 Doc Updates
> ---
>
> Key: PIG-1039
> URL: https://issues.apache.org/jira/browse/PIG-1039
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.5.0
>Reporter: Corinne Chandel
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: branch-0.5.patch, trunk.patch
>
>
> Pig 0.5 doc updates (to be applied to Trunk and branch-0.5)
> 1. updates to tutorial
> 2. updates to pig latin reference manual
> 3. updated doc tab to 0.5.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1039) Pig 0.5 Doc Updates

2009-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768625#action_12768625
 ] 

Hadoop QA commented on PIG-1039:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422861/branch-0.5.patch
  against trunk revision 828213.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/109/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/109/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/109/console

This message is automatically generated.

> Pig 0.5 Doc Updates
> ---
>
> Key: PIG-1039
> URL: https://issues.apache.org/jira/browse/PIG-1039
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.5.0
>Reporter: Corinne Chandel
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: branch-0.5.patch, trunk.patch
>
>
> Pig 0.5 doc updates (to be applied to Trunk and branch-0.5)
> 1. updates to tutorial
> 2. updates to pig latin reference manual
> 3. updated doc tab to 0.5.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.