[jira] [Commented] (PIG-4939) QueryParserUtils.setHdfsServers(QueryParserUtils.java:104) should not be called for non-dfs methods

2016-06-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353380#comment-15353380
 ] 

Prashant Kommireddi commented on PIG-4939:
--

There are utils that check for the filesystem (s3 vs hdfs vs ... ) but I don't 
think there is an interface for non-fs based loaders. I'd be in favor of option 
2, a marker interface that defines the behavior for non-direct-fs loaders. That 
could evolve with a base class and loaders being able to implement custom 
behaviors, but a marker interface might suffice for now.

[~daijy] thoughts?

> QueryParserUtils.setHdfsServers(QueryParserUtils.java:104) should not be 
> called for non-dfs methods
> ---
>
> Key: PIG-4939
> URL: https://issues.apache.org/jira/browse/PIG-4939
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Siddhi Mehta
>Priority: Minor
>
> {code}
> A = load 'hbase://query/SELECT ID,NAME,DATE FROM HIRES WHERE DATE > 
> TO_DATE('1990-12-21 05:55:00.000');
> STORE A into 'output';
> {code}
> The above script throws an exception because it treats the location as an fs 
> path and tries to convert it to a URI after splitting it based on comma.
> The code that tries to the same is 
> {code}
>  String buildLoadOp(SourceLocation loc, String alias, String filename, 
> FuncSpec funcSpec, LogicalSchema schema)
> throws ParserValidationException {
> String absolutePath;
> LoadFunc loFunc;
> try {
> // Load LoadFunc class from default properties if funcSpec is 
> null. Fallback on PigStorage if LoadFunc is not specified in properties.
> funcSpec = funcSpec == null ? new 
> FuncSpec(pigContext.getProperties().getProperty(PigConfiguration.PIG_DEFAULT_LOAD_FUNC,
>  PigStorage.class.getName())) : funcSpec;
> loFunc = (LoadFunc)PigContext.instantiateFuncFromSpec(funcSpec);
> ..
> ...
> if (absolutePath == null) {
> absolutePath = loFunc.relativeToAbsolutePath( filename, 
> QueryParserUtils.getCurrentDir( pigContext ) );
> if (absolutePath!=null) {
> QueryParserUtils.setHdfsServers( absolutePath, pigContext 
> );
> }
>  .  
>}
> {code}
> We should not be calling 
> QueryParserUtils.setHdfsServers(QueryParserUtils.java:104) should not be 
> called for non-dfs methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input

2016-05-05 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4724:
-
Issue Type: Improvement  (was: Bug)

> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.16.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4724) GROUP ALL must create an output record in case there is no input

2016-04-21 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-4724:


Assignee: Prashant Kommireddi

> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input

2016-04-21 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4724:
-
Affects Version/s: (was: 0.15.0)
   0.16.0

> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3126) Problem in STORE

2015-11-11 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001693#comment-15001693
 ] 

Prashant Kommireddi commented on PIG-3126:
--

checkOutputSpecs looks for whether a file already exists depending on StoreFunc 
being an OverwritableStoreFunc. Should we also check for permissions on the 
frontend instead of having to wait until it fails on the backend?

> Problem in STORE
> 
>
> Key: PIG-3126
> URL: https://issues.apache.org/jira/browse/PIG-3126
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2
> Environment: CentOS 5.7
>Reporter: Vishnu Ganth
>Priority: Blocker
> Attachments: log.txt
>
>
> A = Load 'sample';
> store A into '/user/xyz/sample-out';
> When this pig script is run using abc user who does not have write permission 
> in '/user/xyz', PIG is unable to create the directory sample-out and the 
> map-reduce job gets killed ultimately without any log. PIG should throw some 
> error log saying permission denied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3126) Problem in STORE

2015-11-11 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001733#comment-15001733
 ] 

Prashant Kommireddi commented on PIG-3126:
--

Interesting. How is it different from checking whether a file exists ? May be I 
am missing something obvious here.

> Problem in STORE
> 
>
> Key: PIG-3126
> URL: https://issues.apache.org/jira/browse/PIG-3126
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2
> Environment: CentOS 5.7
>Reporter: Vishnu Ganth
>Priority: Blocker
> Attachments: log.txt
>
>
> A = Load 'sample';
> store A into '/user/xyz/sample-out';
> When this pig script is run using abc user who does not have write permission 
> in '/user/xyz', PIG is unable to create the directory sample-out and the 
> map-reduce job gets killed ultimately without any log. PIG should throw some 
> error log saying permission denied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3126) Problem in STORE

2015-11-11 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001752#comment-15001752
 ] 

Prashant Kommireddi commented on PIG-3126:
--

I see what you are saying. Could we have PigTextOutputFormat include this check 
in that case? We could cover PigStorage with it, and a few others.

> Problem in STORE
> 
>
> Key: PIG-3126
> URL: https://issues.apache.org/jira/browse/PIG-3126
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2
> Environment: CentOS 5.7
>Reporter: Vishnu Ganth
> Attachments: log.txt
>
>
> A = Load 'sample';
> store A into '/user/xyz/sample-out';
> When this pig script is run using abc user who does not have write permission 
> in '/user/xyz', PIG is unable to create the directory sample-out and the 
> map-reduce job gets killed ultimately without any log. PIG should throw some 
> error log saying permission denied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988012#comment-14988012
 ] 

Prashant Kommireddi commented on PIG-4724:
--

[~rohini] what are your thoughts on this? I'm not sure if this breaks the way 
users expect output to be based on the current behavior, but sounds like the 
right thing to do?

cc [~daijy]

> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4724:


 Summary: GROUP ALL must create an output record in case there is 
no input
 Key: PIG-4724
 URL: https://issues.apache.org/jira/browse/PIG-4724
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi


{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group A ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4724:
-
Description: 
{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group B ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce


  was:
{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group A ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce



> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4704) Customizable Error Handling for Storers in Pig

2015-10-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979245#comment-14979245
 ] 

Prashant Kommireddi commented on PIG-4704:
--

Last few comments :)

* pig.allow.store.errors - can you add this to pig.properties file ?
* Formatting looks off in PigOutputFormat - {code} private StoreFuncDecorator 
storeDecorator; {code}. A couple more places has formatting off.

Looks good overall. [~daijy] any comments?


> Customizable Error Handling for Storers in Pig 
> ---
>
> Key: PIG-4704
> URL: https://issues.apache.org/jira/browse/PIG-4704
> Project: Pig
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Attachments: PIG-4704.patch, PIG-4704.patch, PIG-4704_3.patch
>
>
> On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann  wrote:
> You may also check these for ideas. It would be good to have them
> implemented:
> https://wiki.apache.org/pig/PigErrorHandlingInScripts
> https://issues.apache.org/jira/browse/PIG-2620
> --
> Saggi Neumann
> Co-founder and CTO, Xplenty
> M: +972-544-546102
> On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta  wrote:
> > Hello Everyone,
> >
> > Just wanted to follow up on the my earlier post and see if there are any
> > thoughts around the same.
> > I was planning to take a stab to implement the same.
> >
> > The approach I was planning to use for the same is
> > 1. Make the storer that wants error handling capability implement an
> > interface(ErrorHandlingStoreFunc).
> > 2. Using this interface the storer can define if the thresholds for
> > error.Each store func can determine what the threshold should be.For
> > example HbaseStorage can have a different threshold from ParquetStorage.
> > 3. Whenever the storer gets created in
> >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> > we intercept the called and give it a wrappedStoreFunc
> > 4. Every put next calls now gets delegated to the actual storer via the
> > delegate and we can listen in for error on putNext() and take care of the
> > allowing the error  if within threshold or re throwing from there.
> > 5. The client can get information about the threshold value from  the
> > counters to know if there was any data dropped.
> >
> > Thougts?
> >
> > Thanks,
> > Siddhi
> >
> >
> > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > Currently a Pig job fails when one record out of the billions records
> > > fails on STORE.
> > > This is not always desirable behavior when you are dealing with millions
> > > of records and only few fail.
> > > In certain use-cases its desirable to know how many such errors and have
> > > an accounting for the same.
> > > Is there a configurable limits that we can set for pig so that we can
> > > allow a threshold for bad records on STORE similar to the lines of the
> > JIRA
> > > for LOAD PIG-3059 
> > >
> > > Thanks,
> > > Siddhi
> > >
> >



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4704) Customizable Error Handling for Storers in Pig

2015-10-22 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969646#comment-14969646
 ] 

Prashant Kommireddi commented on PIG-4704:
--

Thanks [~siddhimehta]

* What are the deprecation warnings from in *CounterBasedOutputErrorTracker*
* Reason I prefer an interface is because currently the base impl takes 
"minErrors" and "errorThreshold" as the 2 variables to decide error handling. 
Another impl might have a different combination of parameters to look at, and 
choose not to look at one of these. That makes it hard to work with. I am okay 
with this currently, and we can change that as a follow-up when needed. May be 
I am thinking too far ahead :)
* Can you make "minErrors" and "errorThreshold" private? I am expecting these 
to be final values?
* *private StoreFuncInterface wrappedStoreFunc* could be made final in 
*WrappedErrorHandlingFunc*. *errorHandler* as well
* Can you please add comments to getWrappedStorer(..) method in *POStore*
* Also, can you append a <_patch_number> as a suffix to your uploads. That 
makes it easier to look at the right patch. In this case, it would be 
PIG-4704_2.patch

Thanks Siddhi, looks good overall. 



> Customizable Error Handling for Storers in Pig 
> ---
>
> Key: PIG-4704
> URL: https://issues.apache.org/jira/browse/PIG-4704
> Project: Pig
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Attachments: PIG-4704.patch, PIG-4704.patch
>
>
> On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann  wrote:
> You may also check these for ideas. It would be good to have them
> implemented:
> https://wiki.apache.org/pig/PigErrorHandlingInScripts
> https://issues.apache.org/jira/browse/PIG-2620
> --
> Saggi Neumann
> Co-founder and CTO, Xplenty
> M: +972-544-546102
> On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta  wrote:
> > Hello Everyone,
> >
> > Just wanted to follow up on the my earlier post and see if there are any
> > thoughts around the same.
> > I was planning to take a stab to implement the same.
> >
> > The approach I was planning to use for the same is
> > 1. Make the storer that wants error handling capability implement an
> > interface(ErrorHandlingStoreFunc).
> > 2. Using this interface the storer can define if the thresholds for
> > error.Each store func can determine what the threshold should be.For
> > example HbaseStorage can have a different threshold from ParquetStorage.
> > 3. Whenever the storer gets created in
> >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> > we intercept the called and give it a wrappedStoreFunc
> > 4. Every put next calls now gets delegated to the actual storer via the
> > delegate and we can listen in for error on putNext() and take care of the
> > allowing the error  if within threshold or re throwing from there.
> > 5. The client can get information about the threshold value from  the
> > counters to know if there was any data dropped.
> >
> > Thougts?
> >
> > Thanks,
> > Siddhi
> >
> >
> > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > Currently a Pig job fails when one record out of the billions records
> > > fails on STORE.
> > > This is not always desirable behavior when you are dealing with millions
> > > of records and only few fail.
> > > In certain use-cases its desirable to know how many such errors and have
> > > an accounting for the same.
> > > Is there a configurable limits that we can set for pig so that we can
> > > allow a threshold for bad records on STORE similar to the lines of the
> > JIRA
> > > for LOAD PIG-3059 
> > >
> > > Thanks,
> > > Siddhi
> > >
> >



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4704) Customizable Error Handling for Storers in Pig

2015-10-20 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964650#comment-14964650
 ] 

Prashant Kommireddi commented on PIG-4704:
--

Thanks [~siddhimehta] for the contribution. Few comments :

* Can you please add Apache license headers to all files? A few seem to be 
missing it
* How does a user query for counters that are reported by the ErrorHandler? 
Based on "storer_Error_Handler" I believe.
* {code}
private Counter getRecordCountCounter(String storeSignature) {
+PigStatusReporter reporter = PigStatusReporter.getInstance();
+Counter counter = reporter.getCounter(STORER_ERROR_COUNT_GROUP,
+getCounterNameForStore(STORER_RECORD_COUNT, storeSignature));
+return counter;
+}
{code}

We could probably change this method to take in another argument - String 
counterName and have "incAndGetErrorCount" re-use the method. Right now 

* You would have to remove @author tags *smile*
* The method *handle* in * OutputErrorHandler* could be made non-final. Classes 
extending from it might want to do something specific to their needs
* We probably don't want to LOG for every error case, that might have a bad job 
hose disks on cluster nodes. That could be removed from *handle* method in 
*OutputErrorHandler* {code} Log.debug("Handling error " + cause); {code}
* The logic in *OutputErrorHandler* appears to be custom to a single use-case, 
which is handling errors based on BOTH minErrors and errorThreshold. Do you 
think we should keep this logic in a base abstract class, or move it to an impl 
instead?
* In *WrappedErrorHandlingFunc* you are using the instance var "errorHandler" 
at one place and the method getErrorHandler at another. Can we make it 
consistent using either the var or method at both places?

  


> Customizable Error Handling for Storers in Pig 
> ---
>
> Key: PIG-4704
> URL: https://issues.apache.org/jira/browse/PIG-4704
> Project: Pig
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Attachments: PIG-4704.patch
>
>
> On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann  wrote:
> You may also check these for ideas. It would be good to have them
> implemented:
> https://wiki.apache.org/pig/PigErrorHandlingInScripts
> https://issues.apache.org/jira/browse/PIG-2620
> --
> Saggi Neumann
> Co-founder and CTO, Xplenty
> M: +972-544-546102
> On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta  wrote:
> > Hello Everyone,
> >
> > Just wanted to follow up on the my earlier post and see if there are any
> > thoughts around the same.
> > I was planning to take a stab to implement the same.
> >
> > The approach I was planning to use for the same is
> > 1. Make the storer that wants error handling capability implement an
> > interface(ErrorHandlingStoreFunc).
> > 2. Using this interface the storer can define if the thresholds for
> > error.Each store func can determine what the threshold should be.For
> > example HbaseStorage can have a different threshold from ParquetStorage.
> > 3. Whenever the storer gets created in
> >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> > we intercept the called and give it a wrappedStoreFunc
> > 4. Every put next calls now gets delegated to the actual storer via the
> > delegate and we can listen in for error on putNext() and take care of the
> > allowing the error  if within threshold or re throwing from there.
> > 5. The client can get information about the threshold value from  the
> > counters to know if there was any data dropped.
> >
> > Thougts?
> >
> > Thanks,
> > Siddhi
> >
> >
> > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > Currently a Pig job fails when one record out of the billions records
> > > fails on STORE.
> > > This is not always desirable behavior when you are dealing with millions
> > > of records and only few fail.
> > > In certain use-cases its desirable to know how many such errors and have
> > > an accounting for the same.
> > > Is there a configurable limits that we can set for pig so that we can
> > > allow a threshold for bad records on STORE similar to the lines of the
> > JIRA
> > > for LOAD PIG-3059 
> > >
> > > Thanks,
> > > Siddhi
> > >
> >



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4704) Customizable Error Handling for Storers in Pig

2015-10-19 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4704:
-
Patch Info: Patch Available

> Customizable Error Handling for Storers in Pig 
> ---
>
> Key: PIG-4704
> URL: https://issues.apache.org/jira/browse/PIG-4704
> Project: Pig
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Attachments: PIG-4704.patch
>
>
> On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann  wrote:
> You may also check these for ideas. It would be good to have them
> implemented:
> https://wiki.apache.org/pig/PigErrorHandlingInScripts
> https://issues.apache.org/jira/browse/PIG-2620
> --
> Saggi Neumann
> Co-founder and CTO, Xplenty
> M: +972-544-546102
> On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta  wrote:
> > Hello Everyone,
> >
> > Just wanted to follow up on the my earlier post and see if there are any
> > thoughts around the same.
> > I was planning to take a stab to implement the same.
> >
> > The approach I was planning to use for the same is
> > 1. Make the storer that wants error handling capability implement an
> > interface(ErrorHandlingStoreFunc).
> > 2. Using this interface the storer can define if the thresholds for
> > error.Each store func can determine what the threshold should be.For
> > example HbaseStorage can have a different threshold from ParquetStorage.
> > 3. Whenever the storer gets created in
> >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> > we intercept the called and give it a wrappedStoreFunc
> > 4. Every put next calls now gets delegated to the actual storer via the
> > delegate and we can listen in for error on putNext() and take care of the
> > allowing the error  if within threshold or re throwing from there.
> > 5. The client can get information about the threshold value from  the
> > counters to know if there was any data dropped.
> >
> > Thougts?
> >
> > Thanks,
> > Siddhi
> >
> >
> > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > Currently a Pig job fails when one record out of the billions records
> > > fails on STORE.
> > > This is not always desirable behavior when you are dealing with millions
> > > of records and only few fail.
> > > In certain use-cases its desirable to know how many such errors and have
> > > an accounting for the same.
> > > Is there a configurable limits that we can set for pig so that we can
> > > allow a threshold for bad records on STORE similar to the lines of the
> > JIRA
> > > for LOAD PIG-3059 
> > >
> > > Thanks,
> > > Siddhi
> > >
> >



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4679) Performance degradation due to InputSizeReducerEstimator since PIG-3754

2015-09-23 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904026#comment-14904026
 ] 

Prashant Kommireddi commented on PIG-4679:
--

+1

> Performance degradation due to InputSizeReducerEstimator since PIG-3754
> ---
>
> Key: PIG-4679
> URL: https://issues.apache.org/jira/browse/PIG-4679
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4679-0.patch, PIG-4679-1.patch, 
> PIG-4679-fixtest.patch
>
>
> On encountering a non-HDFS location in the input (for example a JOIN 
> involving both HBase tables and intermediate temp files), Pig 0.14 
> ReducerEstimator is returning total input size as -1 (unknown) where as in 
> Pig 0.12.1 it was returning the sum of temp file sizes as the total size. 
> Since -1 is returned as the input size, Pig end up using only one reducer for 
> the job.
> STEPS TO REPRODUCE:
> 1.Create an HBase table with enough data.  Using PerformanceEvaluation 
> tool to generate data
> {code:java}
> hbase org.apache.hadoop.hbase.PerformanceEvaluation --presplit=20 
> --rows=100 sequentialWrite 10
> {code}
> 2.Dump the table data into a file which we can then use in a Pig JOIN.  
> Following Pig script generates the data file
> {code:java}
> $ pig
> A = LOAD 'hbase://TestTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:data', '-loadKey') AS 
> (row_key: chararray, data: chararray);
> STORE A INTO 'hdfs:///tmp/re_test/test_table_data' USING PigStorage('|');
> {code}
> 3.Check file size to make sure that it is more than 1,000,000,000 which 
> is the default bytes per reducer Pig configuration
> {code:java}
> $ hdfs dfs -count hdfs:///tmp/re_test/test_table_data
> QA:   1   411028000 
> hdfs:///tmp/re_test/test_table_data
> PROD: 1   571028000 
> hdfs:///tmp/re_test/test_table_data
> {code}
> 4.Run a Pig script that joins the HBase table with the data file.  QA and 
> PROD will use different number of reducers.  QA (176243) should run 1 reducer 
> and PROD (176258) should run 11 reducers (10,280,000,000 / 1,000,000,000)
> {code:java}
> $ pig
> A = LOAD 'hbase://TestTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:data', '-loadKey') AS 
> (row_key: chararray, data: chararray);
> B = LOAD 'hdfs:///tmp/re_test/test_table_data' USING PigStorage('|') AS 
> (row_key: chararray, data: chararray);
> C = JOIN A BY row_key, B BY row_key;
> STORE C INTO 'hdfs:///tmp/re_test/test_table_data_join' USING PigStorage('|');
> {code}
> Pig 0.12.1 ran 11 reduce, Pig 0.13+ run only 1 reduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4674) TOMAP should infer schema

2015-09-23 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904030#comment-14904030
 ] 

Prashant Kommireddi commented on PIG-4674:
--

+1

> TOMAP should infer schema
> -
>
> Key: PIG-4674
> URL: https://issues.apache.org/jira/browse/PIG-4674
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4674-1.patch, PIG-4674-2.patch, PIG-4674-3.patch, 
> PIG-4674-fixtest.patch, PIG-4674-fixtest2.patch
>
>
> TOMAP schema is map only without map value schema. This should be inferred if 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4567:


 Summary: Allow UDFs to specify a counter increment other than 
default of 1
 Key: PIG-4567
 URL: https://issues.apache.org/jira/browse/PIG-4567
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi
 Fix For: 0.15.0


Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method to 
report counters which increments by 1. 
{code}
public final void warn(String msg, Enum warningEnum)
{code}

It would be more flexible to have an additional method that takes in an 
argument to increment the counter by.
{code}
public final void warn(String msg, Enum warningEnum, long incr)
{code}

This will be useful when you might have, for instance, several fields within 
the same row that are bad and you want the counter to reflect that. Making 
repetitive warn calls is not ideal.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4567:
-
Fix Version/s: (was: 0.15.0)
   0.16.0

 Allow UDFs to specify a counter increment other than default of 1
 -

 Key: PIG-4567
 URL: https://issues.apache.org/jira/browse/PIG-4567
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi
 Fix For: 0.16.0


 Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
 to report counters which increments by 1. 
 {code}
 public final void warn(String msg, Enum warningEnum)
 {code}
 It would be more flexible to have an additional method that takes in an 
 argument to increment the counter by.
 {code}
 public final void warn(String msg, Enum warningEnum, long incr)
 {code}
 This will be useful when you might have, for instance, several fields within 
 the same row that are bad and you want the counter to reflect that. Making 
 repetitive warn calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1462#comment-1462
 ] 

Prashant Kommireddi commented on PIG-4567:
--

Hi [~daijy], I changed it to 0.16. When are you planning to roll 0.15 RC? I 
will try and make it before then, if not I will aim for 0.16.

 Allow UDFs to specify a counter increment other than default of 1
 -

 Key: PIG-4567
 URL: https://issues.apache.org/jira/browse/PIG-4567
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi
 Fix For: 0.16.0


 Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
 to report counters which increments by 1. 
 {code}
 public final void warn(String msg, Enum warningEnum)
 {code}
 It would be more flexible to have an additional method that takes in an 
 argument to increment the counter by.
 {code}
 public final void warn(String msg, Enum warningEnum, long incr)
 {code}
 This will be useful when you might have, for instance, several fields within 
 the same row that are bad and you want the counter to reflect that. Making 
 repetitive warn calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4449) Optimize the case of Order by + Limit in nested foreach

2015-03-11 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357882#comment-14357882
 ] 

Prashant Kommireddi commented on PIG-4449:
--

[~rohini] would that mean we try and hold a sorted list (rather a heap) in 
memory before spills?

 Optimize the case of Order by + Limit in nested foreach
 ---

 Key: PIG-4449
 URL: https://issues.apache.org/jira/browse/PIG-4449
 Project: Pig
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 This is one of the very frequently used patterns
 {code}
 grouped_data_set = group data_set by id;
 capped_data_set = foreach grouped_data_set
 {
   ordered = order joined_data_set by timestamp desc;
   capped = limit ordered $num;
  generate flatten(capped);
 };
 {code}
 But this performs very poorly when there are millions of rows for a key in 
 the groupby with lot of spills.  This can be easily optimized by pushing the 
 limit into the InternalSortedBag and maintain only $num records any time and 
 avoid memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.

2015-03-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346251#comment-14346251
 ] 

Prashant Kommireddi commented on PIG-4442:
--

Schumi, I shall get you one day!

 Eliminate redundant RPC call to get file information in HPath.
 --

 Key: PIG-4442
 URL: https://issues.apache.org/jira/browse/PIG-4442
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 0.15.0

 Attachments: PIG-4442.001.patch


 The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to 
 get the block size and replication.  In the case of HDFS, this results in 2 
 separate but identical RPC transactions with the NameNode.  The situation is 
 the same for many other alternative {{FileSystem}} implementations too.  We 
 can get a minor latency improvement and lighten some RPC load on the remote 
 services by using a single call and getting the block size and replication 
 from the same response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.

2015-03-02 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344248#comment-14344248
 ] 

Prashant Kommireddi commented on PIG-4442:
--

+1. Will commit this shortly.

 Eliminate redundant RPC call to get file information in HPath.
 --

 Key: PIG-4442
 URL: https://issues.apache.org/jira/browse/PIG-4442
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 0.15.0

 Attachments: PIG-4442.001.patch


 The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to 
 get the block size and replication.  In the case of HDFS, this results in 2 
 separate but identical RPC transactions with the NameNode.  The situation is 
 the same for many other alternative {{FileSystem}} implementations too.  We 
 can get a minor latency improvement and lighten some RPC load on the remote 
 services by using a single call and getting the block size and replication 
 from the same response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.

2015-03-02 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344249#comment-14344249
 ] 

Prashant Kommireddi commented on PIG-4442:
--

[~daijy] beat me to it :)

 Eliminate redundant RPC call to get file information in HPath.
 --

 Key: PIG-4442
 URL: https://issues.apache.org/jira/browse/PIG-4442
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 0.15.0

 Attachments: PIG-4442.001.patch


 The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to 
 get the block size and replication.  In the case of HDFS, this results in 2 
 separate but identical RPC transactions with the NameNode.  The situation is 
 the same for many other alternative {{FileSystem}} implementations too.  We 
 can get a minor latency improvement and lighten some RPC load on the remote 
 services by using a single call and getting the block size and replication 
 from the same response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-07 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202951#comment-14202951
 ] 

Prashant Kommireddi commented on PIG-4299:
--

Thanks Daniel! Seems like our commits to branch 0.14 went around the same time 
and mine messed up the earlier commit. I just did a fresh commit once again. I 
didn't touch trunk so your commit was fine.

 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4299_1.patch


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured. Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it.
 Here is the stacktrace that resulted from this bug
 java.lang.UnsupportedOperationException: Usage threshold is not supported
 at sun.management.MemoryPoolImpl.setUsageThreshold(MemoryPoolImpl.java:114)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.init(SpillableMemoryManager.java:130)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.getInstance(SpillableMemoryManager.java:135)
 at org.apache.pig.data.BagFactory.init(BagFactory.java:123)
 at org.apache.pig.data.DefaultBagFactory.init(DefaultBagFactory.java:69)
 at org.apache.pig.data.BagFactory.getInstance(BagFactory.java:81)
 at 
 search.dashboard.VariableLengthTupleToBag.clinit(VariableLengthTupleToBag.java:27)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4299:
-
Attachment: PIG-4299_1.patch

Tenured (oldgen) is not always the largest.
In an environment running with larger young gen (so to increase the chances 
that object die young), this will be a problem as per the current code which 
makes the assumption that the largest pool is tenured.

Changed the way a pool is determined as tenured. Old gen or tenured is the 
only memory pool that allows setting usage threshold.

Unit tests pass around SpillableMemoryManager, namely TestDataBag and 
TestPOPartialAgg.

 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4299_1.patch


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured? Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4299:
-
Description: 
{code}
for (MemoryPoolMXBean b: mpbeans) {
log.debug(Found heap ( + b.getName() +
) of type  + b.getType());
if (b.getType() == MemoryType.HEAP) {
/* Here we are making the leap of faith that the biggest
 * heap is the tenured heap
 */
long size = b.getUsage().getMax();
totalSize += size;
if (size  biggestSize) {
biggestSize = size;
biggestHeap = b;
}
}
}
{code}

A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
tenured. Moreover, we must check whether usage threshold is supported by heap 
before trying to set usage threshold on it.

Here is the stacktrace that resulted from this bug

java.lang.UnsupportedOperationException: Usage threshold is not supported
at sun.management.MemoryPoolImpl.setUsageThreshold(MemoryPoolImpl.java:114)
at 
org.apache.pig.impl.util.SpillableMemoryManager.init(SpillableMemoryManager.java:130)
at 
org.apache.pig.impl.util.SpillableMemoryManager.getInstance(SpillableMemoryManager.java:135)
at org.apache.pig.data.BagFactory.init(BagFactory.java:123)
at org.apache.pig.data.DefaultBagFactory.init(DefaultBagFactory.java:69)
at org.apache.pig.data.BagFactory.getInstance(BagFactory.java:81)
at 
search.dashboard.VariableLengthTupleToBag.clinit(VariableLengthTupleToBag.java:27)

  was:
{code}
for (MemoryPoolMXBean b: mpbeans) {
log.debug(Found heap ( + b.getName() +
) of type  + b.getType());
if (b.getType() == MemoryType.HEAP) {
/* Here we are making the leap of faith that the biggest
 * heap is the tenured heap
 */
long size = b.getUsage().getMax();
totalSize += size;
if (size  biggestSize) {
biggestSize = size;
biggestHeap = b;
}
}
}
{code}

A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
tenured? Moreover, we must check whether usage threshold is supported by heap 
before trying to set usage threshold on it. 


 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4299_1.patch


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured. Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it.
 Here is the stacktrace that resulted from this bug
 java.lang.UnsupportedOperationException: Usage threshold is not supported
 at sun.management.MemoryPoolImpl.setUsageThreshold(MemoryPoolImpl.java:114)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.init(SpillableMemoryManager.java:130)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.getInstance(SpillableMemoryManager.java:135)
 at org.apache.pig.data.BagFactory.init(BagFactory.java:123)
 at org.apache.pig.data.DefaultBagFactory.init(DefaultBagFactory.java:69)
 at org.apache.pig.data.BagFactory.getInstance(BagFactory.java:81)
 at 
 search.dashboard.VariableLengthTupleToBag.clinit(VariableLengthTupleToBag.java:27)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4299:
-
Patch Info: Patch Available

 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4299_1.patch


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured. Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it.
 Here is the stacktrace that resulted from this bug
 java.lang.UnsupportedOperationException: Usage threshold is not supported
 at sun.management.MemoryPoolImpl.setUsageThreshold(MemoryPoolImpl.java:114)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.init(SpillableMemoryManager.java:130)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.getInstance(SpillableMemoryManager.java:135)
 at org.apache.pig.data.BagFactory.init(BagFactory.java:123)
 at org.apache.pig.data.DefaultBagFactory.init(DefaultBagFactory.java:69)
 at org.apache.pig.data.BagFactory.getInstance(BagFactory.java:81)
 at 
 search.dashboard.VariableLengthTupleToBag.clinit(VariableLengthTupleToBag.java:27)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4299:
-
Status: Patch Available  (was: Open)

 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4299_1.patch


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured. Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it.
 Here is the stacktrace that resulted from this bug
 java.lang.UnsupportedOperationException: Usage threshold is not supported
 at sun.management.MemoryPoolImpl.setUsageThreshold(MemoryPoolImpl.java:114)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.init(SpillableMemoryManager.java:130)
 at 
 org.apache.pig.impl.util.SpillableMemoryManager.getInstance(SpillableMemoryManager.java:135)
 at org.apache.pig.data.BagFactory.init(BagFactory.java:123)
 at org.apache.pig.data.DefaultBagFactory.init(DefaultBagFactory.java:69)
 at org.apache.pig.data.BagFactory.getInstance(BagFactory.java:81)
 at 
 search.dashboard.VariableLengthTupleToBag.clinit(VariableLengthTupleToBag.java:27)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-05 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4299:


 Summary: SpillableMemoryManager assumes tenured heap incorrectly
 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
 Fix For: 0.14.0


{code}
for (MemoryPoolMXBean b: mpbeans) {
log.debug(Found heap ( + b.getName() +
) of type  + b.getType());
if (b.getType() == MemoryType.HEAP) {
/* Here we are making the leap of faith that the biggest
 * heap is the tenured heap
 */
long size = b.getUsage().getMax();
totalSize += size;
if (size  biggestSize) {
biggestSize = size;
biggestHeap = b;
}
}
}
{code}

A memory pool being MemoryType.HEAP does not guarantee it being tenured. 
Moreover, we must check whether usage threshold is supported by heap before 
trying to set usage threshold on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4299) SpillableMemoryManager assumes tenured heap incorrectly

2014-11-05 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4299:
-
Description: 
{code}
for (MemoryPoolMXBean b: mpbeans) {
log.debug(Found heap ( + b.getName() +
) of type  + b.getType());
if (b.getType() == MemoryType.HEAP) {
/* Here we are making the leap of faith that the biggest
 * heap is the tenured heap
 */
long size = b.getUsage().getMax();
totalSize += size;
if (size  biggestSize) {
biggestSize = size;
biggestHeap = b;
}
}
}
{code}

A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
tenured? Moreover, we must check whether usage threshold is supported by heap 
before trying to set usage threshold on it. 

  was:
{code}
for (MemoryPoolMXBean b: mpbeans) {
log.debug(Found heap ( + b.getName() +
) of type  + b.getType());
if (b.getType() == MemoryType.HEAP) {
/* Here we are making the leap of faith that the biggest
 * heap is the tenured heap
 */
long size = b.getUsage().getMax();
totalSize += size;
if (size  biggestSize) {
biggestSize = size;
biggestHeap = b;
}
}
}
{code}

A memory pool being MemoryType.HEAP does not guarantee it being tenured. 
Moreover, we must check whether usage threshold is supported by heap before 
trying to set usage threshold on it. 


 SpillableMemoryManager assumes tenured heap incorrectly
 ---

 Key: PIG-4299
 URL: https://issues.apache.org/jira/browse/PIG-4299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
 Fix For: 0.14.0


 {code}
 for (MemoryPoolMXBean b: mpbeans) {
 log.debug(Found heap ( + b.getName() +
 ) of type  + b.getType());
 if (b.getType() == MemoryType.HEAP) {
 /* Here we are making the leap of faith that the biggest
  * heap is the tenured heap
  */
 long size = b.getUsage().getMax();
 totalSize += size;
 if (size  biggestSize) {
 biggestSize = size;
 biggestHeap = b;
 }
 }
 }
 {code}
 A memory pool being the biggest MemoryType.HEAP does not guarantee it being 
 tenured? Moreover, we must check whether usage threshold is supported by heap 
 before trying to set usage threshold on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4223) PigStorage using OptionBuilder is not thread-safe

2014-10-08 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4223:


 Summary: PigStorage using OptionBuilder is not thread-safe
 Key: PIG-4223
 URL: https://issues.apache.org/jira/browse/PIG-4223
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0


While creating Options in the PigStorage constructor via 
populateValidOptions(), we started using OptionBuilder to create option 
corresponding to overwrite feature. OptionBuilder overuses static variables 
whose state can be manipulated by multiple threads. So when PigStorage tries to 
create a longOpt, there might be some other thread that might be updating 
longOpt to null with a reset on the static variables. This does not seem to 
be safe for use within a multithreaded context.

Here is the ST

Cause5:

java.lang.IllegalArgumentException: must specify longopt

Cause5-StackTrace:

at org.apache.commons.cli.OptionBuilder.create(OptionBuilder.java:330)

at org.apache.pig.builtin.PigStorage.populateValidOptions(PigStorage.java:172)

at org.apache.pig.builtin.PigStorage.init(PigStorage.java:207)

... 36 shared with parent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4223) PigStorage using OptionBuilder is not thread-safe

2014-10-08 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi resolved PIG-4223.
--
Resolution: Duplicate

Dup of PIG-3988

 PigStorage using OptionBuilder is not thread-safe
 -

 Key: PIG-4223
 URL: https://issues.apache.org/jira/browse/PIG-4223
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0


 While creating Options in the PigStorage constructor via 
 populateValidOptions(), we started using OptionBuilder to create option 
 corresponding to overwrite feature. OptionBuilder overuses static variables 
 whose state can be manipulated by multiple threads. So when PigStorage tries 
 to create a longOpt, there might be some other thread that might be 
 updating longOpt to null with a reset on the static variables. This does 
 not seem to be safe for use within a multithreaded context.
 Here is the ST
 Cause5:
 java.lang.IllegalArgumentException: must specify longopt
 Cause5-StackTrace:
 at org.apache.commons.cli.OptionBuilder.create(OptionBuilder.java:330)
 at org.apache.pig.builtin.PigStorage.populateValidOptions(PigStorage.java:172)
 at org.apache.pig.builtin.PigStorage.init(PigStorage.java:207)
 ... 36 shared with parent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-09-25 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4080:
-
Attachment: PIG-4080_2.patch

Addressed Daniel's comments

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-4080.patch, PIG-4080_2.patch


 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-09-25 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148363#comment-14148363
 ] 

Prashant Kommireddi commented on PIG-4080:
--

Thanks Daniel!

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.14.0

 Attachments: PIG-4080.patch, PIG-4080_2.patch


 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-09-24 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4080:
-
Patch Info: Patch Available

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-4080.patch


 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-07-29 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4080:
-

Attachment: PIG-4080.patch

This patch provides a way to disable exec, run, import, define, default, 
declare.

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-4080.patch


 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-07-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078437#comment-14078437
 ] 

Prashant Kommireddi commented on PIG-4080:
--

RB https://reviews.apache.org/r/24071/

Sorry about the whitespace issues, I will fix them.

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-4080.patch


 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-07-28 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4080:


 Summary: Add Preprocessor commands and more to the 
black/whitelisting feature
 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi


Executing Pig scripts in a multi-tenant environment within the context of the 
app server requires disabling certain other commands that could be dangerous. 
For eg, the shell commands contained within the pig script will be executed as 
the user (possibly superuser) on the app server. The following is an example

%declare X `id`;

An admin might want to disable certain features, such as either disabling 
shell entirely or even being extra cautious by disabling declare. Some more 
commands that could be disabled are - default, define (dynamic invokers), run 
and exec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4080) Add Preprocessor commands and more to the black/whitelisting feature

2014-07-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077384#comment-14077384
 ] 

Prashant Kommireddi commented on PIG-4080:
--

This will again be config driven via pig.whitelist or pig.blacklist. This 
JIRA aims to make PIG-3765 more comprehensive.

 Add Preprocessor commands and more to the black/whitelisting feature
 

 Key: PIG-4080
 URL: https://issues.apache.org/jira/browse/PIG-4080
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi

 Executing Pig scripts in a multi-tenant environment within the context of the 
 app server requires disabling certain other commands that could be dangerous. 
 For eg, the shell commands contained within the pig script will be executed 
 as the user (possibly superuser) on the app server. The following is an 
 example
 %declare X `id`;
 An admin might want to disable certain features, such as either disabling 
 shell entirely or even being extra cautious by disabling declare. Some 
 more commands that could be disabled are - default, define (dynamic 
 invokers), run and exec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-05-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012624#comment-14012624
 ] 

Prashant Kommireddi commented on PIG-3800:
--

Thanks for the review Aniket. Committed to trunk and 0.13 branch

 Documentation for Pig whitelist and blacklist features
 --

 Key: PIG-3800
 URL: https://issues.apache.org/jira/browse/PIG-3800
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
  Labels: documentaion
 Fix For: 0.13.0

 Attachments: PIG-3800.patch


 Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-05-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011575#comment-14011575
 ] 

Prashant Kommireddi commented on PIG-3800:
--

[~aniket486] I can take this up and let you focus on the release. Should I go 
ahead?

 Documentation for Pig whitelist and blacklist features
 --

 Key: PIG-3800
 URL: https://issues.apache.org/jira/browse/PIG-3800
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Aniket Mokashi
  Labels: documentaion
 Fix For: 0.13.0


 Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-05-28 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3800:
-

Attachment: PIG-3800.patch

[~aniket486] you're welcome. Thank you for driving the release :)

 Documentation for Pig whitelist and blacklist features
 --

 Key: PIG-3800
 URL: https://issues.apache.org/jira/browse/PIG-3800
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
  Labels: documentaion
 Fix For: 0.13.0

 Attachments: PIG-3800.patch


 Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3897) very generic stack trace and root cause missing

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971831#comment-13971831
 ] 

Prashant Kommireddi commented on PIG-3897:
--

[~shriny] you would need to recompile Pig to work with hadoop 2. Can you please 
try after running this ant target?
ant clean jar -Dhadoopversion=23

 very generic stack trace and root cause missing
 ---

 Key: PIG-3897
 URL: https://issues.apache.org/jira/browse/PIG-3897
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
 Environment: linux
Reporter: srinivas

 We noticed following error randomly  while upgrading to yarn, we use pig0.12
 and hadoop 2.3
 Stack trace is generic and missing root cause
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:873)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
 at org.apache.pig.PigServer.execute(PigServer.java:1297)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
 at 
 com.shopzilla.fasttrack.sessions.PigExecutor.executePigScriptBatchMode(PigExecutor.java:40)
 at 
 com.shopzilla.fasttrack.sessions.SessionFileterRulesSyncTasklet.execute(SessionFileterRulesSyncTasklet.java:49)
 at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
 at 
 org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
 at 
 org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
 at 
 org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
 at $Proxy28.execute(Unknown Source)
 at 
 org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
 at 
 org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
 at 
 org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
 at 
 org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
 at org.springframework.batch.r



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3892) Pig distribution for hadoop 2

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971834#comment-13971834
 ] 

Prashant Kommireddi commented on PIG-3892:
--

Great, thanks. No use case, was wondering what the approach might be. Sounds 
good.

 Pig distribution for hadoop 2
 -

 Key: PIG-3892
 URL: https://issues.apache.org/jira/browse/PIG-3892
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0


 Currently Pig distribution only bundle pig.jar for Hadoop 1. For Hadoop 2 
 users they need to compile again using -Dhadoopversion=23 flag. That is a 
 quite confusing process. We need to make Pig work with Hadoop 2 out of box. I 
 am thinking two approaches:
 1. Bundle both pig-h1.jar and pig-h2.jar in distribution, and bin/pig will 
 chose the right pig.jar to run
 2. Make two Pig distributions for Hadoop 1 and Hadoop 
 Any opinion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3898) Refactor PPNL for non-MR execution engine

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971931#comment-13971931
 ] 

Prashant Kommireddi commented on PIG-3898:
--

I'm a +1 on the first approach. It's the right thing to do in this case, in 
terms of method signature having a base Interface rather than an implementation 
(OperatorPlan vs MROperPlan). PPNL is marked evolving which tells the the 
compatibility can be broken between minor releases (though not desired). It's 
unfortunate that we need to be doing it and users need to adjust, but sooner we 
make the change the better IMHO.

Let's also send out a note to Pig user mailing list in case we all agree to go 
ahead with approach 1.

 Refactor PPNL for non-MR execution engine
 -

 Key: PIG-3898
 URL: https://issues.apache.org/jira/browse/PIG-3898
 Project: Pig
  Issue Type: Task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.13.0


 Currently, PPNL assumes the MR plan, and thus, it's not compatible with 
 non-MR execution engine. To support non-MR execution engines, I propose we 
 changed initialPlanNotification() method as follows-
 {code:title=from}
 public void initialPlanNotification(String scriptId, MROperPlan plan);
 {code}
 {code:title=to}
 public void initialPlanNotification(String scriptId, OperatorPlan? plan);
 {code}
 Since MROperPlan and TezOperPlan are a subclass of OperatorPlan, this method 
 can take both plans. In addition, if we add a new execution engine in the 
 future, it won't break the interface again as long as we build the operator 
 plan as a subclass of OperatorPlan.
 With this approach, applications such as Ambrose / Lipstick should be able to 
 dynamically cast OperatorPlan to a concrete subclass depending on the 
 ExecType.
 One disadvantage is that this isn't backward compatible with Pig 0.12 and 
 older. But it only requires minor changes, and backward compatibility will be 
 broken one time only.
 I also considered an alternative approach, for example, adding a new PPNL for 
 Tez. But this approach has two problems.
 # Pig registers PPNL via the Main function, and right now, only one PPNL can 
 be registered. So having more than one PPNLs requires quite a few code 
 changes in Main, ScriptState, and so on.
 # Multiple PPNL interfaces mean multiple PPNL implementations. This results 
 in more (duplicate) code in applications such as Ambrose / Lipstick.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3897) very generic stack trace and root cause missing

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971932#comment-13971932
 ] 

Prashant Kommireddi commented on PIG-3897:
--

What were your versions of hadoop and pig when it worked before?

 very generic stack trace and root cause missing
 ---

 Key: PIG-3897
 URL: https://issues.apache.org/jira/browse/PIG-3897
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
 Environment: linux
Reporter: srinivas

 We noticed following error randomly  while upgrading to yarn, we use pig0.12
 and hadoop 2.3
 Stack trace is generic and missing root cause
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:873)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
 at org.apache.pig.PigServer.execute(PigServer.java:1297)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
 at 
 com.shopzilla.fasttrack.sessions.PigExecutor.executePigScriptBatchMode(PigExecutor.java:40)
 at 
 com.shopzilla.fasttrack.sessions.SessionFileterRulesSyncTasklet.execute(SessionFileterRulesSyncTasklet.java:49)
 at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
 at 
 org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
 at 
 org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
 at 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
 at 
 org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
 at $Proxy28.execute(Unknown Source)
 at 
 org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
 at 
 org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
 at 
 org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
 at 
 org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
 at org.springframework.batch.r



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3898) Refactor PPNL for non-MR execution engine

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971956#comment-13971956
 ] 

Prashant Kommireddi commented on PIG-3898:
--

PPNL is a public interface, so we need to assume it's used by folks in ways we 
aren't aware. We use it at Salesforce. I remember another user having filed a 
JIRA around an issue with ScriptState that he discovered with the use of PPNL.

 Refactor PPNL for non-MR execution engine
 -

 Key: PIG-3898
 URL: https://issues.apache.org/jira/browse/PIG-3898
 Project: Pig
  Issue Type: Task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.13.0


 Currently, PPNL assumes the MR plan, and thus, it's not compatible with 
 non-MR execution engine. To support non-MR execution engines, I propose we 
 changed initialPlanNotification() method as follows-
 {code:title=from}
 public void initialPlanNotification(String scriptId, MROperPlan plan);
 {code}
 {code:title=to}
 public void initialPlanNotification(String scriptId, OperatorPlan? plan);
 {code}
 Since MROperPlan and TezOperPlan are a subclass of OperatorPlan, this method 
 can take both plans. In addition, if we add a new execution engine in the 
 future, it won't break the interface again as long as we build the operator 
 plan as a subclass of OperatorPlan.
 With this approach, applications such as Ambrose / Lipstick should be able to 
 dynamically cast OperatorPlan to a concrete subclass depending on the 
 ExecType.
 One disadvantage is that this isn't backward compatible with Pig 0.12 and 
 older. But it only requires minor changes, and backward compatibility will be 
 broken one time only.
 I also considered an alternative approach, for example, adding a new PPNL for 
 Tez. But this approach has two problems.
 # Pig registers PPNL via the Main function, and right now, only one PPNL can 
 be registered. So having more than one PPNLs requires quite a few code 
 changes in Main, ScriptState, and so on.
 # Multiple PPNL interfaces mean multiple PPNL implementations. This results 
 in more (duplicate) code in applications such as Ambrose / Lipstick.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3898) Refactor PPNL for non-MR execution engine

2014-04-16 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971994#comment-13971994
 ] 

Prashant Kommireddi commented on PIG-3898:
--

[~cheolsoo] [~rohini] on second thoughts, do you feel we should use this as an 
opportunity to not expose OperatorPlan and instead expose a trimmed down 
version of it? Exposing OperatorPlan itself seems a bit risky (if we ever plan 
to change things around later).

Not sure how it affects the downstream users in terms of the extent to which 
one would need to make changes in PPNL custom impls.

 Refactor PPNL for non-MR execution engine
 -

 Key: PIG-3898
 URL: https://issues.apache.org/jira/browse/PIG-3898
 Project: Pig
  Issue Type: Task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.13.0


 Currently, PPNL assumes the MR plan, and thus, it's not compatible with 
 non-MR execution engine. To support non-MR execution engines, I propose we 
 changed initialPlanNotification() method as follows-
 {code:title=from}
 public void initialPlanNotification(String scriptId, MROperPlan plan);
 {code}
 {code:title=to}
 public void initialPlanNotification(String scriptId, OperatorPlan? plan);
 {code}
 Since MROperPlan and TezOperPlan are a subclass of OperatorPlan, this method 
 can take both plans. In addition, if we add a new execution engine in the 
 future, it won't break the interface again as long as we build the operator 
 plan as a subclass of OperatorPlan.
 With this approach, applications such as Ambrose / Lipstick should be able to 
 dynamically cast OperatorPlan to a concrete subclass depending on the 
 ExecType.
 One disadvantage is that this isn't backward compatible with Pig 0.12 and 
 older. But it only requires minor changes, and backward compatibility will be 
 broken one time only.
 I also considered an alternative approach, for example, adding a new PPNL for 
 Tez. But this approach has two problems.
 # Pig registers PPNL via the Main function, and right now, only one PPNL can 
 be registered. So having more than one PPNLs requires quite a few code 
 changes in Main, ScriptState, and so on.
 # Multiple PPNL interfaces mean multiple PPNL implementations. This results 
 in more (duplicate) code in applications such as Ambrose / Lipstick.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3892) Pig distribution for hadoop 2

2014-04-15 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970018#comment-13970018
 ] 

Prashant Kommireddi commented on PIG-3892:
--

+1 for 1. 

[~daijy] - would the way to invoke a certain version be passed as an argument 
to bin/pig, an env variable, both, something else?

 Pig distribution for hadoop 2
 -

 Key: PIG-3892
 URL: https://issues.apache.org/jira/browse/PIG-3892
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0


 Currently Pig distribution only bundle pig.jar for Hadoop 1. For Hadoop 2 
 users they need to compile again using -Dhadoopversion=23 flag. That is a 
 quite confusing process. We need to make Pig work with Hadoop 2 out of box. I 
 am thinking two approaches:
 1. Bundle both pig-h1.jar and pig-h2.jar in distribution, and bin/pig will 
 chose the right pig.jar to run
 2. Make two Pig distributions for Hadoop 1 and Hadoop 
 Any opinion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3478) Make StreamingUDF work for Hadoop 2

2014-04-02 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958121#comment-13958121
 ] 

Prashant Kommireddi commented on PIG-3478:
--

Thanks for getting back. I've moved this to 0.13, we could possibly get it in 
by then.

 Make StreamingUDF work for Hadoop 2
 ---

 Key: PIG-3478
 URL: https://issues.apache.org/jira/browse/PIG-3478
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Jeremy Karn
 Fix For: 0.13.0

 Attachments: PIG-3478.patch


 PIG-2417 introduced Streaming UDF. However, it does not work under Hadoop 2. 
 Both unit tests/e2e tests under Haodop 2 fails. We need to fix it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3473) org.apache.pig.Expression should support is null and not operations

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957009#comment-13957009
 ] 

Prashant Kommireddi commented on PIG-3473:
--

[~aniket486] can we move this to 0.13?

 org.apache.pig.Expression should support is null and not operations
 ---

 Key: PIG-3473
 URL: https://issues.apache.org/jira/browse/PIG-3473
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.12.1


 Currently Expression only support BinaryExpressions and Constants. Most of 
 the other logical expressions (cast, udf) need not be pushed down. But, it 
 would make sense to be able to pushdown is null and not operations (possibly 
 negativeexpression).
 This change would have impact on LoadFunc's (hcatloader), we need to be 
 careful and make sure we do this in a backwards compatible way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3478) Make StreamingUDF work for Hadoop 2

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957008#comment-13957008
 ] 

Prashant Kommireddi commented on PIG-3478:
--

[~russell.jurney] [~jeremykarn] anyone looking into this?

 Make StreamingUDF work for Hadoop 2
 ---

 Key: PIG-3478
 URL: https://issues.apache.org/jira/browse/PIG-3478
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Jeremy Karn
 Fix For: 0.12.1

 Attachments: PIG-3478.patch


 PIG-2417 introduced Streaming UDF. However, it does not work under Hadoop 2. 
 Both unit tests/e2e tests under Haodop 2 fails. We need to fix it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3570) Rollback PIG-3060

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957006#comment-13957006
 ] 

Prashant Kommireddi commented on PIG-3570:
--

[~cheolsoo] can we mark this resolved?

 Rollback PIG-3060
 -

 Key: PIG-3570
 URL: https://issues.apache.org/jira/browse/PIG-3570
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.1

 Attachments: PIG-3570-1.patch


 Will result missing records in some cases. One case is when we have two 
 flatten in a single pipeline, when the first flatten still hold some records, 
 the second flatten cannot return EOP just because an empty bag. Here is the 
 test script:
 {code}
 a = load '1.txt' as (bag1:bag{(t:int)});
 b = foreach a generate flatten(bag1) as field1;
 c = foreach b generate flatten(GenBag(field1));
 dump c;
 {code}
 GenBag:
 {code}
 public class GenBag extends EvalFuncDataBag {
 @Override
 public DataBag exec(Tuple input) throws IOException {
 Integer content = (Integer)input.get(0);
 DataBag bag = BagFactory.getInstance().newDefaultBag();
 if (content  10) {
 Tuple t = TupleFactory.getInstance().newTuple();
 t.append(content);
 bag.add(t);
 }
 return bag;
 }
 }
 {code}
 Input:
 {code}
 {(1),(12),(9)}
 {(15),(2)}
 {code}
 The test case in PIG-3060 fails if rollback, need to fix it when rollback.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3749) PigPerformance - data in the map gets lost during parsing

2014-04-01 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3749:
-

Fix Version/s: (was: 0.12.1)
   0.13.0

 PigPerformance - data in the map gets lost during parsing
 -

 Key: PIG-3749
 URL: https://issues.apache.org/jira/browse/PIG-3749
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Keren Ouaknine
Assignee: Keren Ouaknine
 Fix For: 0.13.0

 Attachments: PIG-3749.patch


 Create a Pigmix sample dataset which looks as follow:
 keren 1   2   qt  3   4   5.0 aaaabbbb 
 mccccddddeeeedmffffgggghhhh
 Launch the following query:
 A = load 'page_views_sample.txt' using 
 org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 store A into 'L1out_A';
 B = foreach A generate user, (int)action as action, (map[])page_info as 
 page_info, flatten((bag{tuple(map[])})page_links) as page_links;
 store B into 'L1out_B';
 The result looks like this: 
 keren 1   [b#bbb,a#aaa]   [d#,e#eee,c#ccc]
 keren 1   [b#bbb,a#aaa]   [f#fff,g#ggg,h#hhh
 It is missing the 'ddd' value and a closing bracket.
 Thanks,
 Keren



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3749) PigPerformance - data in the map gets lost during parsing

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957012#comment-13957012
 ] 

Prashant Kommireddi commented on PIG-3749:
--

[~kereno] moving this to 0.13, let me know if you have concerns with that. 
Also, can you please answer Cheolsoo's question above.

 PigPerformance - data in the map gets lost during parsing
 -

 Key: PIG-3749
 URL: https://issues.apache.org/jira/browse/PIG-3749
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Keren Ouaknine
Assignee: Keren Ouaknine
 Fix For: 0.13.0

 Attachments: PIG-3749.patch


 Create a Pigmix sample dataset which looks as follow:
 keren 1   2   qt  3   4   5.0 aaaabbbb 
 mccccddddeeeedmffffgggghhhh
 Launch the following query:
 A = load 'page_views_sample.txt' using 
 org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 store A into 'L1out_A';
 B = foreach A generate user, (int)action as action, (map[])page_info as 
 page_info, flatten((bag{tuple(map[])})page_links) as page_links;
 store B into 'L1out_B';
 The result looks like this: 
 keren 1   [b#bbb,a#aaa]   [d#,e#eee,c#ccc]
 keren 1   [b#bbb,a#aaa]   [f#fff,g#ggg,h#hhh
 It is missing the 'ddd' value and a closing bracket.
 Thanks,
 Keren



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3482) Mapper only Jobs are not creating intermediate files in /tmp/, instead of creating in user directory.

2014-04-01 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3482:
-

Fix Version/s: (was: 0.12.1)
   0.13.0

 Mapper only Jobs are not creating intermediate files in /tmp/, instead of 
 creating in user directory. 
 --

 Key: PIG-3482
 URL: https://issues.apache.org/jira/browse/PIG-3482
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.11.1
 Environment: RHEL 6.0
Reporter: Raviteja Chirala
Priority: Minor
 Fix For: 0.13.0


 When we run Mapper only jobs, All the intermediate outputs(compressed) are 
 going to the user directory instead of going to tmp. If we run on small 
 datasets, it shouldn't create a problem. But when I run for large datasets 
 like more than 100TB lets say, it taking up so much disk space exceeding the 
 disk space quota(setSpaceQuota) of 100GB also. Problem is happening before 
 clean up. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3844) Make ScriptState InheritableThreadLocal for threads that need it

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957130#comment-13957130
 ] 

Prashant Kommireddi commented on PIG-3844:
--

Sure, works in this case.

I just want us to make sure API changes have corresponding tests somewhere or 
backward-incompatibility might go unnoticed in future modifications.

 Make ScriptState InheritableThreadLocal for threads that need it
 

 Key: PIG-3844
 URL: https://issues.apache.org/jira/browse/PIG-3844
 Project: Pig
  Issue Type: Improvement
Reporter: Akihiro Matsukawa
Assignee: Akihiro Matsukawa
 Attachments: PIG-3844.patch


 I have a PPNL that is forked off the main thread so that its operations does 
 not block pig from continuing to run. This PPNL needs the ScriptState, but is 
 not able to get it in pig13 because the ScriptState is ThreadLocal.
 In pig12, this worked because there was logic to start a new ScriptState on 
 ScriptState.get() which was removed in PIG-3525 
 (https://reviews.apache.org/r/15634).
 My proposal is to change ScriptState to be InheritableThreadLocal, so that 
 any new child threads that are spawned will have a copy of it. I have a 
 pretty limited understanding of pig, but I do not see anything that this 
 proposed change would break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3844) Make ScriptState InheritableThreadLocal for threads that need it

2014-04-01 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957149#comment-13957149
 ] 

Prashant Kommireddi commented on PIG-3844:
--

Thanks [~amatsukawa] for the contribution!

 Make ScriptState InheritableThreadLocal for threads that need it
 

 Key: PIG-3844
 URL: https://issues.apache.org/jira/browse/PIG-3844
 Project: Pig
  Issue Type: Improvement
Reporter: Akihiro Matsukawa
Assignee: Akihiro Matsukawa
 Fix For: 0.13.0

 Attachments: PIG-3844.patch


 I have a PPNL that is forked off the main thread so that its operations does 
 not block pig from continuing to run. This PPNL needs the ScriptState, but is 
 not able to get it in pig13 because the ScriptState is ThreadLocal.
 In pig12, this worked because there was logic to start a new ScriptState on 
 ScriptState.get() which was removed in PIG-3525 
 (https://reviews.apache.org/r/15634).
 My proposal is to change ScriptState to be InheritableThreadLocal, so that 
 any new child threads that are spawned will have a copy of it. I have a 
 pretty limited understanding of pig, but I do not see anything that this 
 proposed change would break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3478) Make StreamingUDF work for Hadoop 2

2014-04-01 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3478:
-

Fix Version/s: (was: 0.12.1)
   0.13.0

 Make StreamingUDF work for Hadoop 2
 ---

 Key: PIG-3478
 URL: https://issues.apache.org/jira/browse/PIG-3478
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Jeremy Karn
 Fix For: 0.13.0

 Attachments: PIG-3478.patch


 PIG-2417 introduced Streaming UDF. However, it does not work under Hadoop 2. 
 Both unit tests/e2e tests under Haodop 2 fails. We need to fix it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-27 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949666#comment-13949666
 ] 

Prashant Kommireddi commented on PIG-3813:
--

+1 to backport

 Rank column is assigned different uids everytime when schema is reset
 -

 Key: PIG-3813
 URL: https://issues.apache.org/jira/browse/PIG-3813
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Suhas Satish
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 0.13.0

 Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt


 When the following script is run, pig goes into an infinite loop. This was 
 reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
 test_data.txt has been attached. 
 test.pig
 tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
 DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
 gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
 pWeek = FILTER gTWeek BY PERIOD == 201312;
 pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
 gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
 store gpWeekRanked into 'gpWeekRanked';
 describe gpWeekRanked;
 ---
 The res object of class Result, gets its value from leaf.getNextTuple()
 This gets an empty tuple 
 () 
 with STATUS_OK.
 SO the while(true) condition never gets an End of Processing (EOP) and so 
 does not exit. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3844) Make ScriptState InheritableThreadLocal for threads that need it

2014-03-27 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950097#comment-13950097
 ] 

Prashant Kommireddi commented on PIG-3844:
--

[~amatsukawa] can you also please add a test case that simulates the custom 
PPNL behavior? This will ensure any future changes to not break this.

 Make ScriptState InheritableThreadLocal for threads that need it
 

 Key: PIG-3844
 URL: https://issues.apache.org/jira/browse/PIG-3844
 Project: Pig
  Issue Type: Improvement
Reporter: Akihiro Matsukawa
Assignee: Akihiro Matsukawa
 Attachments: PIG-3844.patch


 I have a PPNL that is forked off the main thread so that its operations does 
 not block pig from continuing to run. This PPNL needs the ScriptState, but is 
 not able to get it in pig13 because the ScriptState is ThreadLocal.
 In pig12, this worked because there was logic to start a new ScriptState on 
 ScriptState.get() which was removed in PIG-3525 
 (https://reviews.apache.org/r/15634).
 My proposal is to change ScriptState to be InheritableThreadLocal, so that 
 any new child threads that are spawned will have a copy of it. I have a 
 pretty limited understanding of pig, but I do not see anything that this 
 proposed change would break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3827) Custom partitioner is not picked up with secondary sort optimization

2014-03-21 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3827:


 Summary: Custom partitioner is not picked up with secondary sort 
optimization
 Key: PIG-3827
 URL: https://issues.apache.org/jira/browse/PIG-3827
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi


Custom partitioner is ignored currently in case of secondary sort optimization.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3827) Custom partitioner is not picked up with secondary sort optimization

2014-03-21 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943752#comment-13943752
 ] 

Prashant Kommireddi commented on PIG-3827:
--

+1

Verified patch against the script that was broken due to this issue. Ran 
test-commit, LGTM

 Custom partitioner is not picked up with secondary sort optimization
 

 Key: PIG-3827
 URL: https://issues.apache.org/jira/browse/PIG-3827
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Daniel Dai
 Fix For: 0.12.1, 0.13.0

 Attachments: PIG-3827-1.patch, PIG-3827-2.patch


 Custom partitioner is ignored currently in case of secondary sort 
 optimization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3827) Custom partitioner is not picked up with secondary sort optimization

2014-03-21 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943757#comment-13943757
 ] 

Prashant Kommireddi commented on PIG-3827:
--

[~daijy] can you please add comments to the testcase before committing?

 Custom partitioner is not picked up with secondary sort optimization
 

 Key: PIG-3827
 URL: https://issues.apache.org/jira/browse/PIG-3827
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Daniel Dai
 Fix For: 0.12.1, 0.13.0

 Attachments: PIG-3827-1.patch, PIG-3827-2.patch


 Custom partitioner is ignored currently in case of secondary sort 
 optimization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3827) Custom partitioner is not picked up with secondary sort optimization

2014-03-21 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943768#comment-13943768
 ] 

Prashant Kommireddi commented on PIG-3827:
--

Perfect, thanks Daniel.

 Custom partitioner is not picked up with secondary sort optimization
 

 Key: PIG-3827
 URL: https://issues.apache.org/jira/browse/PIG-3827
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Daniel Dai
 Fix For: 0.12.1, 0.13.0

 Attachments: PIG-3827-1.patch, PIG-3827-2.patch, PIG-3827-3.patch


 Custom partitioner is ignored currently in case of secondary sort 
 optimization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3816) Incorrect Javadoc for launchPlan() method

2014-03-17 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3816:
-

Summary: Incorrect Javadoc for launchPlan() method  (was: Incorrect Javadoc 
for launchPlan() methoid)

 Incorrect Javadoc for launchPlan() method
 -

 Key: PIG-3816
 URL: https://issues.apache.org/jira/browse/PIG-3816
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Kyungho Jeon
Priority: Trivial
 Attachments: PIG-3816.patch


 Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} 
 incorrectly describes that the method takes a physical plan as an argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3816) Incorrect Javadoc for launchPlan() methoid

2014-03-17 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938857#comment-13938857
 ] 

Prashant Kommireddi commented on PIG-3816:
--

Thanks for the contribution Kyungho. Patch committed to trunk.

[~daijy] [~cheolsoo] how can I add Kyungho to contributors list? This JIRA 
needs to be assigned to him before marking it resolved.

 Incorrect Javadoc for launchPlan() methoid
 --

 Key: PIG-3816
 URL: https://issues.apache.org/jira/browse/PIG-3816
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Kyungho Jeon
Priority: Trivial
 Attachments: PIG-3816.patch


 Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} 
 incorrectly describes that the method takes a physical plan as an argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3811) PigServer.registerScript() wraps exception incorrectly on parsing errors

2014-03-13 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3811:


 Summary: PigServer.registerScript() wraps exception incorrectly on 
parsing errors
 Key: PIG-3811
 URL: https://issues.apache.org/jira/browse/PIG-3811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0


PigServer.registerScript throws an IOException that is wrapped on the cause of 
ParseException in case there were parsing errors in the script.

{code}
public void registerScript(InputStream in, MapString,String 
params,ListString paramsFiles) throws IOException {
try {
String substituted = pigContext.doParamSubstitution(in, 
paramMapToList(params), paramsFiles);
GruntParser grunt = new GruntParser(new StringReader(substituted));
grunt.setInteractive(false);
grunt.setParams(this);
grunt.parseStopOnError(true);
} catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
log.error(e.getLocalizedMessage());
throw new IOException(e.getCause());
}
}
{code}

{{e.getCause()}} however would be null and the IOException returned is actually 
an empty exception with null contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3811) PigServer.registerScript() wraps exception incorrectly on parsing errors

2014-03-13 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3811:
-

Status: Patch Available  (was: Open)

 PigServer.registerScript() wraps exception incorrectly on parsing errors
 

 Key: PIG-3811
 URL: https://issues.apache.org/jira/browse/PIG-3811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3811.patch


 PigServer.registerScript throws an IOException that is wrapped on the cause 
 of ParseException in case there were parsing errors in the script.
 {code}
 public void registerScript(InputStream in, MapString,String 
 params,ListString paramsFiles) throws IOException {
 try {
 String substituted = pigContext.doParamSubstitution(in, 
 paramMapToList(params), paramsFiles);
 GruntParser grunt = new GruntParser(new 
 StringReader(substituted));
 grunt.setInteractive(false);
 grunt.setParams(this);
 grunt.parseStopOnError(true);
 } catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
 log.error(e.getLocalizedMessage());
 throw new IOException(e.getCause());
 }
 }
 {code}
 {{e.getCause()}} however would be null and the IOException returned is 
 actually an empty exception with null contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3811) PigServer.registerScript() wraps exception incorrectly on parsing errors

2014-03-13 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3811:
-

Attachment: PIG-3811.patch

Straightforward fix.

 PigServer.registerScript() wraps exception incorrectly on parsing errors
 

 Key: PIG-3811
 URL: https://issues.apache.org/jira/browse/PIG-3811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3811.patch


 PigServer.registerScript throws an IOException that is wrapped on the cause 
 of ParseException in case there were parsing errors in the script.
 {code}
 public void registerScript(InputStream in, MapString,String 
 params,ListString paramsFiles) throws IOException {
 try {
 String substituted = pigContext.doParamSubstitution(in, 
 paramMapToList(params), paramsFiles);
 GruntParser grunt = new GruntParser(new 
 StringReader(substituted));
 grunt.setInteractive(false);
 grunt.setParams(this);
 grunt.parseStopOnError(true);
 } catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
 log.error(e.getLocalizedMessage());
 throw new IOException(e.getCause());
 }
 }
 {code}
 {{e.getCause()}} however would be null and the IOException returned is 
 actually an empty exception with null contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3811) PigServer.registerScript() wraps exception incorrectly on parsing errors

2014-03-13 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3811:
-

Attachment: PIG-3811_2.patch

We should, thanks for that. Attached a new patch

 PigServer.registerScript() wraps exception incorrectly on parsing errors
 

 Key: PIG-3811
 URL: https://issues.apache.org/jira/browse/PIG-3811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3811.patch, PIG-3811_2.patch


 PigServer.registerScript throws an IOException that is wrapped on the cause 
 of ParseException in case there were parsing errors in the script.
 {code}
 public void registerScript(InputStream in, MapString,String 
 params,ListString paramsFiles) throws IOException {
 try {
 String substituted = pigContext.doParamSubstitution(in, 
 paramMapToList(params), paramsFiles);
 GruntParser grunt = new GruntParser(new 
 StringReader(substituted));
 grunt.setInteractive(false);
 grunt.setParams(this);
 grunt.parseStopOnError(true);
 } catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
 log.error(e.getLocalizedMessage());
 throw new IOException(e.getCause());
 }
 }
 {code}
 {{e.getCause()}} however would be null and the IOException returned is 
 actually an empty exception with null contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3811) PigServer.registerScript() wraps exception incorrectly on parsing errors

2014-03-13 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3811:
-

Attachment: PIG-3811_3.patch

Also removed printStackTrace in the code, did not make sense to have that in 
there.

 PigServer.registerScript() wraps exception incorrectly on parsing errors
 

 Key: PIG-3811
 URL: https://issues.apache.org/jira/browse/PIG-3811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3811.patch, PIG-3811_2.patch, PIG-3811_3.patch


 PigServer.registerScript throws an IOException that is wrapped on the cause 
 of ParseException in case there were parsing errors in the script.
 {code}
 public void registerScript(InputStream in, MapString,String 
 params,ListString paramsFiles) throws IOException {
 try {
 String substituted = pigContext.doParamSubstitution(in, 
 paramMapToList(params), paramsFiles);
 GruntParser grunt = new GruntParser(new 
 StringReader(substituted));
 grunt.setInteractive(false);
 grunt.setParams(this);
 grunt.parseStopOnError(true);
 } catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
 log.error(e.getLocalizedMessage());
 throw new IOException(e.getCause());
 }
 }
 {code}
 {{e.getCause()}} however would be null and the IOException returned is 
 actually an empty exception with null contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3806) PigServer constructor throws NPE after PIG-3765

2014-03-11 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931166#comment-13931166
 ] 

Prashant Kommireddi commented on PIG-3806:
--

+1

Thanks for the fix [~aniket486]

 PigServer constructor throws NPE after PIG-3765
 ---

 Key: PIG-3806
 URL: https://issues.apache.org/jira/browse/PIG-3806
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.13.0

 Attachments: PIG-3806.patch


 PigServer constructor throws NPE because filter is not initialized at the 
 right place in PIG-3765.
 {noformat}
 java.lang.NullPointerException
  at org.apache.pig.PigServer.registerJar(PigServer.java:540)
  at org.apache.pig.PigServer.addJarsFromProperties(PigServer.java:261)
  at org.apache.pig.PigServer.init(PigServer.java:237)
  at org.apache.pig.PigServer.init(PigServer.java:219)
  at org.apache.pig.tools.grunt.Grunt.init(Grunt.java:46)
  at org.apache.pig.Main.run(Main.java:600)
  at org.apache.pig.Main.main(Main.java:156)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 =
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3802) Fix TestBlackAndWhitelistValidator failures

2014-03-09 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925395#comment-13925395
 ] 

Prashant Kommireddi commented on PIG-3802:
--

Thanks for the review [~cheolsoo]. Committed to trunk.

 Fix TestBlackAndWhitelistValidator failures
 ---

 Key: PIG-3802
 URL: https://issues.apache.org/jira/browse/PIG-3802
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3802.patch


 From [~cheolsoo]
 Prashant Kommireddi, I am reopening this jira. Your new tests fail with the 
 following error-
 Testcase: testWhitelist2 took 0.046 sec 
 Caused an ERROR
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 java.lang.NoSuchMethodError: 
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
 I think we're hitting this problem-
 https://tedvinke.wordpress.com/2013/12/17/mixing-junit-hamcrest-and-mockito-explaining-nosuchmethoderror/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3802) Fix TestBlackAndWhitelistValidator failures

2014-03-09 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3802:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Fix TestBlackAndWhitelistValidator failures
 ---

 Key: PIG-3802
 URL: https://issues.apache.org/jira/browse/PIG-3802
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3802.patch


 From [~cheolsoo]
 Prashant Kommireddi, I am reopening this jira. Your new tests fail with the 
 following error-
 Testcase: testWhitelist2 took 0.046 sec 
 Caused an ERROR
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 java.lang.NoSuchMethodError: 
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
 I think we're hitting this problem-
 https://tedvinke.wordpress.com/2013/12/17/mixing-junit-hamcrest-and-mockito-explaining-nosuchmethoderror/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-09 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925396#comment-13925396
 ] 

Prashant Kommireddi commented on PIG-3765:
--

[~cheolsoo] - would you mind closing this out if there are no more failures? 
PIG-3802 fixes it.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3259) Optimize byte to Long/Integer conversions

2014-03-09 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3259:
-

Issue Type: Improvement  (was: Bug)

 Optimize byte to Long/Integer conversions
 -

 Key: PIG-3259
 URL: https://issues.apache.org/jira/browse/PIG-3259
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.11, 0.11.1
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: byteToLong.xlsx


 These conversions can be performing better. If the input is not numeric 
 (1234abcd) the code calls Double.valueOf(String) regardless before finally 
 returning null. Any script that inadvertently (user's mistake or not) tries 
 to cast non-numeric column to int or long would result in many wasteful 
 calls. 
 We can avoid this and only handle the cases we find the input to be a decimal 
 number (1234.56) and return null otherwise even before trying 
 Double.valueOf(String).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3802) Fix TestBlackAndWhitelistValidator failures

2014-03-08 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3802:


 Summary: Fix TestBlackAndWhitelistValidator failures
 Key: PIG-3802
 URL: https://issues.apache.org/jira/browse/PIG-3802
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0


From [~cheolsoo]

Prashant Kommireddi, I am reopening this jira. Your new tests fail with the 
following error-
Testcase: testWhitelist2 took 0.046 sec 
Caused an ERROR
org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
java.lang.NoSuchMethodError: 
org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
I think we're hitting this problem-
https://tedvinke.wordpress.com/2013/12/17/mixing-junit-hamcrest-and-mockito-explaining-nosuchmethoderror/




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3802) Fix TestBlackAndWhitelistValidator failures

2014-03-08 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3802:
-

Attachment: PIG-3802.patch

Fixing errors from PIG-3765 check-in
[~cheolsoo]

 Fix TestBlackAndWhitelistValidator failures
 ---

 Key: PIG-3802
 URL: https://issues.apache.org/jira/browse/PIG-3802
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3802.patch


 From [~cheolsoo]
 Prashant Kommireddi, I am reopening this jira. Your new tests fail with the 
 following error-
 Testcase: testWhitelist2 took 0.046 sec 
 Caused an ERROR
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 java.lang.NoSuchMethodError: 
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
 I think we're hitting this problem-
 https://tedvinke.wordpress.com/2013/12/17/mixing-junit-hamcrest-and-mockito-explaining-nosuchmethoderror/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-08 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925093#comment-13925093
 ] 

Prashant Kommireddi commented on PIG-3765:
--

Opened PIG-3802 to fix this.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3802) Fix TestBlackAndWhitelistValidator failures

2014-03-08 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3802:
-

Status: Patch Available  (was: Open)

 Fix TestBlackAndWhitelistValidator failures
 ---

 Key: PIG-3802
 URL: https://issues.apache.org/jira/browse/PIG-3802
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3802.patch


 From [~cheolsoo]
 Prashant Kommireddi, I am reopening this jira. Your new tests fail with the 
 following error-
 Testcase: testWhitelist2 took 0.046 sec 
 Caused an ERROR
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 java.lang.NoSuchMethodError: 
 org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V
 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
 I think we're hitting this problem-
 https://tedvinke.wordpress.com/2013/12/17/mixing-junit-hamcrest-and-mockito-explaining-nosuchmethoderror/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-07 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924574#comment-13924574
 ] 

Prashant Kommireddi commented on PIG-3765:
--

Sure, I will take a look at it during the weekend. It probably has got to do 
with the use of {{@Rule}} for checking exception message contents. I will 
probably eliminate that.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-06 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922950#comment-13922950
 ] 

Prashant Kommireddi commented on PIG-3793:
--

It's cool, that's how the project evolves. Ideas are always welcome :)

Sounds like you have a use-case where {{PigServer.registerQuery(String)}} could 
be followed by a call to {{LogicalPlanData. getNumLogicalRelationOperators()}} 
- both interleaved and multiple times. Basically registering queries and seeing 
how LP changes through the various registers. Would it suffice to have a 
{{resetLogicalPlanData()}} method on PigServer that recomputes the 
{{LogicalPlanData}} ?

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch, PIG-3793_2.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_4.patch

Hi [~cheolsoo], I have uploaded a new patch to RB.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-03-06 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3800:


 Summary: Documentation for Pig whitelist and blacklist features
 Key: PIG-3800
 URL: https://issues.apache.org/jira/browse/PIG-3800
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi


Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_5.patch

Verified test-commit passes with the changes.

However noticed {{test/org/apache/pig/test/pigunit/pig/TestGruntParser.java}} 
failed as PigServer was being set. Made a minor change towards fixing that.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923648#comment-13923648
 ] 

Prashant Kommireddi commented on PIG-3765:
--

Committed to trunk. Thanks [~cheolsoo] for the review!

Note: typo in the above comment - failed as PigServer was NOT being set

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Fix Version/s: 0.13.0

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
 PIG-3765_4.patch, PIG-3765_5.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-05 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921170#comment-13921170
 ] 

Prashant Kommireddi commented on PIG-3765:
--

Sure, https://reviews.apache.org/r/18779/

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-05 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3793:
-

Attachment: PIG-3793_2.patch

Made the changes and committed to trunk. Thanks [~cheolsoo]

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch, PIG-3793_2.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-05 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3793:
-

  Resolution: Fixed
Release Note: Exposes methods getNumLogicalRelationOperators(), 
getNumSources() and  getNumSinks() on LogicalPlanData. 
  Status: Resolved  (was: Patch Available)

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch, PIG-3793_2.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-05 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922116#comment-13922116
 ] 

Prashant Kommireddi commented on PIG-3793:
--

[~kyunghoj] thanks for bringing this up. I did think about this and went with 
the other approach of going through LogicalPlan at the time of 
LogicalPlanData's creation. The other approach would require us going through 
the LogicalPlan for each of the method calls which would be expensive. I agree 
it could be documented better :)

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch, PIG-3793_2.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-03 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Status: Patch Available  (was: Open)

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-03 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3793:


 Summary: Provide info on number of LogicalRelationalOperator(s) 
used in the script through LogicalPlanData
 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0


Its useful to have an understanding of how many operators are being used in the 
script via the API. This could allow admins to enforce checks/restrictions on 
the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-03 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3793:
-

Status: Patch Available  (was: Open)

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-03 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3793:
-

Attachment: PIG-3793.patch

 Provide info on number of LogicalRelationalOperator(s) used in the script 
 through LogicalPlanData
 -

 Key: PIG-3793
 URL: https://issues.apache.org/jira/browse/PIG-3793
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.13.0

 Attachments: PIG-3793.patch


 Its useful to have an understanding of how many operators are being used in 
 the script via the API. This could allow admins to enforce 
 checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-02-25 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911952#comment-13911952
 ] 

Prashant Kommireddi commented on PIG-3765:
--

1. You are right, PigServer must handle this as well.
2. I felt it looked cleaner and more intuitive to change the signature. Having 
a setter for PigServer object seemed unnecessary after having looked at the way 
it was being used in the codebase so far. It's mostly 1 line changes in test 
cases, wasn't too hard to do it. What do you think?

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-02-25 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_3.patch

[~daijy] the new patch includes validations on PigServer as well. I've added 
test cases towards the same.

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-02-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_2.patch

Includes test cases

 Ability to disable Pig commands and operators
 -

 Key: PIG-3765
 URL: https://issues.apache.org/jira/browse/PIG-3765
 Project: Pig
  Issue Type: New Feature
  Components: documentation, grunt
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Attachments: PIG-3765.patch, PIG-3765_2.patch


 This is an admin feature providing ability to blacklist or/and whitelist 
 certain commands and operations. Pig exposes a few of these that could be not 
 very safe in a multitenant environment. For example, sh invokes shell 
 commands, set allows users to change non-final configs. While these are 
 tremendously useful in general, having an ability to disable would make Pig a 
 safer platform. The goal is to allow administrators to be able to have more 
 control over user scripts. Default behaviour would still be the same - no 
 filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   3   4   >