[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple

2009-07-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-513:
-

Status: Patch Available  (was: Reopened)

 PERFORMANCE: optimize some of the code in DefaultTuple
 --

 Key: PIG-513
 URL: https://issues.apache.org/jira/browse/PIG-513
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-513.patch, pig-513_2.patch


 The following areas in DefaultTuple.java can be changed:
 The member methods get(), set(), getType() and isNull() all call 
 checkBounds() which is redundant call since all these 4 functions throw 
 ExecException. Instead of doing a bounds check, we can catch the 
 IndexOutOfBounds exception in a try-catch and throw it as an ExecException
 The write() method has the following unused object (d in the code below):
 {code}
 for (int i = 0; i  sz; i++) {
 try {
 Object d = get(i);
 } catch (ExecException ee) {
 throw new RuntimeException(ee);
 }
 DataReaderWriter.writeDatum(out, mFields.get(i));
 }
 {code}
 {noformat}
 The get(i) call in the try should be replaced by the writeDatum call directly 
 since d is never used and there is an unncessary call to get()
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Attachment: PIG-882-4.patch

Sync with latest trunk

 log level not propogated to loggers 
 

 Key: PIG-882
 URL: https://issues.apache.org/jira/browse/PIG-882
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
 Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
 PIG-882-4.patch


 Pig accepts log level as a parameter. But the log level it captures is not 
 set appropriately, so that loggers in different classes log at the specified 
 level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-885:
---

Attachment: PIG-885-7.patch

Add null checking to all applicable UDFs

 New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, 
 HashFVN, DiffDate)
 

 Key: PIG-885
 URL: https://issues.apache.org/jira/browse/PIG-885
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, 
 PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885.patch


 Bunch of UDFs:
 1. Bin -- Converts a continuous value into discrete values
 2. Decode -- Converts a given attribute or expression into another string 
 value, based on the value of the source attribute
 3. LookupInFiles -- Check for the existence of an expression in a serial of 
 text files
 4. RegexExtract and RegexMatch -- Similar to perl regexes
 5. HashFNV -- An implementation of FNV hash
 6. DiffDate -- Caculate the number of days in between

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-792:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The code has been committed. Thanks, Sri and Ying for this important 
contribution

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-07-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-892:
---

Attachment: PIG-892_v3.patch

Patch with addressed comments from Santhosh

 Make COUNT and AVG deal with nulls accordingly with SQL standar
 ---

 Key: PIG-892
 URL: https://issues.apache.org/jira/browse/PIG-892
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.4.0

 Attachments: PIG-892.patch, PIG-892_v2.patch, PIG-892_v3.patch


 both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
 COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-885:
---

Attachment: PIG-885-8.patch

Add NullPointerException check

 New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, 
 HashFVN, DiffDate)
 

 Key: PIG-885
 URL: https://issues.apache.org/jira/browse/PIG-885
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, 
 PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885-8.patch, 
 PIG-885.patch


 Bunch of UDFs:
 1. Bin -- Converts a continuous value into discrete values
 2. Decode -- Converts a given attribute or expression into another string 
 value, based on the value of the source attribute
 3. LookupInFiles -- Check for the existence of an expression in a serial of 
 text files
 4. RegexExtract and RegexMatch -- Similar to perl regexes
 5. HashFNV -- An implementation of FNV hash
 6. DiffDate -- Caculate the number of days in between

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736964#action_12736964
 ] 

Jeff Hammerbacher commented on PIG-833:
---

Hey Raghu,

Good stuff! Do you guys have any internal benchmarks that you could add to the 
docs on design and usage?

Thanks,
Jeff

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-29 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736990#action_12736990
 ] 

Santhosh Srinivasan commented on PIG-889:
-

PigHadoopLogger implements the PigLogger interface. As part of the 
implementation it uses the Hadoop reporter for aggregating the warning messages.

 Pig can not access reporter of PigHadoopLog in Load Func
 

 Key: PIG-889
 URL: https://issues.apache.org/jira/browse/PIG-889
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.4.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_889_Patch.txt


 I'd like to increment Counter in my own LoadFunc, but it will throw 
 NullPointerException. It seems that the reporter is not initialized.  
 I looked into this problem and find that it need to call 
 PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-897) Pig should support counters

2009-07-29 Thread Santhosh Srinivasan (JIRA)
Pig should support counters
---

 Key: PIG-897
 URL: https://issues.apache.org/jira/browse/PIG-897
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Santhosh Srinivasan
 Fix For: 0.4.0


Pig should support the use of counters. The use of the counters can possibly be 
via the script or via Java APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan reassigned PIG-880:
---

Assignee: Santhosh Srinivasan

 Order by is borken with complex fields
 --

 Key: PIG-880
 URL: https://issues.apache.org/jira/browse/PIG-880
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch


 Pig script:
 a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
 f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
 s = order f by $0;   
 store s into 'sc.out' 
 Stack:
 Caused by: java.lang.ArrayStoreException
 at java.lang.System.arraycopy(Native Method)
 at java.util.Arrays.copyOf(Arrays.java:2763)
 at java.util.ArrayList.toArray(ArrayList.java:305)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
 ... 5 more
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
 at org.apache.pig.PigServer.execute(PigServer.java:762)
 at org.apache.pig.PigServer.access$100(PigServer.java:91)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Patch Available  (was: Open)

 Order by is borken with complex fields
 --

 Key: PIG-880
 URL: https://issues.apache.org/jira/browse/PIG-880
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
 PIG-880.patch


 Pig script:
 a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
 f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
 s = order f by $0;   
 store s into 'sc.out' 
 Stack:
 Caused by: java.lang.ArrayStoreException
 at java.lang.System.arraycopy(Native Method)
 at java.util.Arrays.copyOf(Arrays.java:2763)
 at java.util.ArrayList.toArray(ArrayList.java:305)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
 ... 5 more
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
 at org.apache.pig.PigServer.execute(PigServer.java:762)
 at org.apache.pig.PigServer.access$100(PigServer.java:91)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736998#action_12736998
 ] 

Raghu Angadi commented on PIG-833:
--

There will be benchmark results either attached to this jira or to a subsequent 
jira.

I would like to compare to SequenceFiles and the new format in Hive. Should to 
see on par performance.

Major performance benefits come from commonly used projections (through column 
groups) and map side joins of sorted tables. An important part of motivation is 
some features like column security, ability to delete entire columns. 

We are running some larger scale benchmarks internally.. but these run on 
Yahoo's internal data sources.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737006#action_12737006
 ] 

Hadoop QA commented on PIG-882:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12414928/PIG-882-4.patch
  against trunk revision 799141.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/console

This message is automatically generated.

 log level not propogated to loggers 
 

 Key: PIG-882
 URL: https://issues.apache.org/jira/browse/PIG-882
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
 Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
 PIG-882-4.patch


 Pig accepts log level as a parameter. But the log level it captures is not 
 set appropriately, so that loggers in different classes log at the specified 
 level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Attachment: PIG-882-5.patch

 log level not propogated to loggers 
 

 Key: PIG-882
 URL: https://issues.apache.org/jira/browse/PIG-882
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
 Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
 PIG-882-4.patch, PIG-882-5.patch


 Pig accepts log level as a parameter. But the log level it captures is not 
 set appropriately, so that loggers in different classes log at the specified 
 level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



zookeeper patch builds

2009-07-29 Thread Giridharan Kesavan
Looks like hudson space issue is resolved; I 've restarted the zookeeper patch 
build jobs.

-Giri