[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742239#action_12742239
 ] 

Alan Gates commented on PIG-911:


Dmitry,

First this is great.  We've had requests to read Sequence files.  Being able to 
write them also would be great.

A few thoughts:

1) This should not extend UTF8StorageConverter.  This loader will be returning 
actual data types, not bytes that need to be interpreted.  I would think 
instead that it should implement the bytesToX() methods itself and just throw 
an exception saying it didn't expect to do any conversion.

2) The getSampledTuple looks fine if skip is handling getting the stream to the 
point that reading the next tuple is viable.

3) In the bindTo call, where you obtain the key and value by reflection, should 
there be a try/catch block there in case the cast to Writable fails?  In the 
same way, in describe schema you're asking how to suppress warnings from the 
cast in reader.getKeyClass().  But don't you want to check that what you got 
really is a writable, since there is no guarantee?



> [Piggybank] SequenceFileLoader 
> ---
>
> Key: PIG-911
> URL: https://issues.apache.org/jira/browse/PIG-911
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_sequencefile.patch
>
>
> The proposed piggybank contribution adds a SequenceFileLoader to the 
> piggybank.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-907:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed

> Provide multiple version of HashFNV (Piggybank)
> ---
>
> Key: PIG-907
> URL: https://issues.apache.org/jira/browse/PIG-907
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-907-1.patch, PIG-907-2.patch
>
>
> HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
> when PIG-902 is not solved. So we can let the Pig pick the right version, do 
> the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-893:
---

  Resolution: Fixed
Release Note: PIG-893:  Added casts from chararray to int, long, float, and 
double.
  Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Jeff for your work on this.

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_893.Patch
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742231#action_12742231
 ] 

Ashutosh Chauhan commented on PIG-845:
--

Hi Pradeep,

Thanks for the review. Please find my comments inline.

1) In LogicalPlanTester.java, why is the following change required?
Typically when PigContext is constructed in Map-reduce mode, the properties 
should correspond to the cluster configuration. So the above initialization 
seems odd because the Properties object is an empty object in the constructor 
call above.

>> This is required because in local mode merge join gets rewritten as a 
>> regular join. So, if we had exec type as local, the plan which I get in 
>> MRCompiler corresponds to regular join plan against which we cant test merge 
>> join plan. Properties object has no bearing here, because LogicalPlanTester 
>> is used only for testing logical plans. Further I think all our tests should 
>> have exec type as MapReduce because we want to test the correctness in 
>> MapReduce mode.

2) In PigMapBase.java:
public static final String END_OF_INP_IN_MAP = "pig.stream.in.map";
can change to
public static final String END_OF_INP_IN_MAP = "pig.blocking.operator.in.map"; 
and this should be put as a public static member of JobControlCompiler.
In JobControlCompiler.java,
jobConf.set("pig.stream.in.map", "true"); should change to use the above public 
static String.
>> Will update this in new patch.

3) Remove the following comment in QueryParser.jjt (line 302):
* Join parser. Currently can only handle skewed joins.
>> Will be removed in next patch.

4) In QueryParser.jjt the joinPlans passed to LOJoin constructor is not a 
LinkedMultiMap
but in LogToPhyTranslationVistior the join plans are put in a LinkedMultiMap. 
If order is
important, shouldn't QueryParser.jjt also change?
>> Good catch. Order is indeed important. Will fix this in next patch.

5) Some comments in LogToPhyTranslationVisitor about the different lists and 
maps would help
>> those lists and maps were there earlier also, I didnt introduce anything 
>> new. I just moved them around :) But I agree that section needs to be 
>> documented better. Also took me a while to get my head around it. Will 
>> include comment about purpose of each in next patch.

6) In validateMergeJoin() - the code only considers direct successors and 
predecessors of LOJoin. It should check the entire plan and ensure that 
predecessors of LOJoin all the way to the LOLoad are only LOForEach and 
LOFilter. Strictly we should not allow LOForeach since it could change sort 
order or position of join keys and hence invalidate the index - but we need it
so that the Foreach introduced by the TypeCastInserter when there is a schema 
for either of the inputs remains. You should note in the documentation that 
only Order and join key position preserving Foreachs and Filters are allowed as 
predecessors to merge join and check the same in validateMergeJoin() - it is 
better to use a whitelist of allowed operators than a blacklist
of disallowed once (since then the blacklist would need to be updated anytime a 
new operator comes along. The exception source here is not really a bug but a 
user input error since merge join really doesnot support other ops.

Again for the successor, all successors from mergejoin down to map leaf should 
be checked to ensure stream is absent (really there should be no restriction on 
stream being present after the join - if there is an issue currently with this, 
it is fine to not allow stream but eventually it would be good to not have any 
restriction on what follows the merge join). You can just use a visitor to 
check presence of stream in the plan - this should be done after complete 
LogToPhyTranslation is done - in visit() so that the whole plan can be looked 
at.

>> Agreed. I fixed the bug for Streaming. Now there is no restriction for what 
>> follows Merge Join. For predecessors, I included new function which walks 
>> all the way up to make sure operators preceding merge join are the only the 
>> ones among the whitelist of LOLoad or LOForEach or LOFilter.
 
7) Is MRStreamHandler.java now replaced by 
/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
 ?
>> Yes.

8) Some of MRCompilerExceptions do not follow the Error handling spec - 
errcode, errMsg, Src
>> Will update them.

9) Should assert() statements in MRCompiler be replaced with Exceptions since 
assertions are disabled by default in Java.
>> Will update them.

10) In MRCompiler.java I wonder if you should change
rightMapPlan.disconnect(rightLoader, loadSucc);
   rightMapPlan.remove(loadSucc);
to
rightMapPlan.trimBelow(rightLoader);
We really want to remove all operators in rightMapPlan other than the loader.
>> Didn't know about this function. This indeed is the one which is needed here.

11) We should note in d

Build failed in Hudson: Pig-Patch-minerva.apache.org #159

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/

--
[...truncated 102729 lines...]
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 1 
for block blk_1014328216252725665_1010 terminating
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38309 is added to 
blk_1014328216252725665_1010 size 6
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block 
blk_1014328216252725665_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40130 is added to 
blk_1014328216252725665_1010 size 6
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 2 
for block blk_1014328216252725665_1010 terminating
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_-2485811821289249348_1011
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block 
blk_-2485811821289249348_1011 src: /127.0.0.1:41887 dest: /127.0.0.1:38309
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block 
blk_-2485811821289249348_1011 src: /127.0.0.1:59131 dest: /127.0.0.1:57055
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block 
blk_-2485811821289249348_1011 src: /127.0.0.1:37202 dest: /127.0.0.1:41872
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block 
blk_-2485811821289249348_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 0 
for block blk_-2485811821289249348_1011 terminating
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block 
blk_-2485811821289249348_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41872 is added to 
blk_-2485811821289249348_1011 size 6
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 1 
for block blk_-2485811821289249348_1011 terminating
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57055 is added to 
blk_-2485811821289249348_1011 size 6
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38309 is added to 
blk_-2485811821289249348_1011 size 6
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block 
blk_-2485811821289249348_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 2 
for block blk_-2485811821289249348_1011 terminating
 [exec] [junit] 09/08/12 03:47:23 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:47237
 [exec] [junit] 09/08/12 03:47:23 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:49986
 [exec] [junit] 09/08/12 03:47:23 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/12 03:47:23 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Deleting block 
blk_-6228595752809205073_1004 file 
dfs/data/data2/current/blk_-6228595752809205073
 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Deleting block 
blk_579166466543892216_1006 file dfs/data/data1/current/blk_579166466543892216
 [exec] [junit] 09/08/12 03:47:24 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/12 03:47:24 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908120346_0002/job.jar. 
blk_-1819354404747182069_1012
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block 
blk_-1819354404747182069_1012 src: /127.0.0.1:41890 dest: /127.0.0.1:38309
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block 
blk_-1819354404747182069_1012 src: /127.0.0.1:59134 dest: /127.0.0.1:57055
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block 
blk_-1819354404747182069_1012 src: /127.0.0.1:33222 dest: /127.0.0.1:40130
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Received block 
blk_-1819354404747182069_1012 of size 1480653 from /127.0.0.1
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: PacketResponder 0 
for block blk_-1819354404747182069_1012 terminating
 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Received block 
blk_-181935

[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742203#action_12742203
 ] 

Hadoop QA commented on PIG-890:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416267/sampler.patch
  against trunk revision 803312.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/console

This message is automatically generated.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742201#action_12742201
 ] 

Jay Tang commented on PIG-833:
--

Zebra has a dependency on TFile that is available in Hadoop 20; that's why the 
compilation instruction is more complicated.  A new wiki at 
http://wiki.apache.org/pig/zebra will provide more information on Zebra.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742170#action_12742170
 ] 

Dmitriy V. Ryaboy commented on PIG-833:
---

Alan, this means Pig contrib/ is no longer compatible with Hadoop 18.
Which probably means that you need to either rolls this back or roll 660 in 
(and add the hadoop20.jar file to lib/ )
Otherwise the build is broken.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742168#action_12742168
 ] 

Hadoop QA commented on PIG-913:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416263/PIG-913.patch
  against trunk revision 803312.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/console

This message is automatically generated.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at or

Build failed in Hudson: Pig-Patch-minerva.apache.org #158

2009-08-11 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/changes

Changes:

[gates] PIG-833: Added Zebra, new columnar storage mechanism for HDFS.

--
[...truncated 103108 lines...]
 [exec] [junit] 09/08/12 01:19:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/12 01:19:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-1535404250649000663_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/12 01:19:32 INFO dfs.DataNode: Deleting block 
blk_4954179736192186775_1006 file dfs/data/data8/current/blk_4954179736192186775
 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/12 01:19:33 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/12 01:19:33 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. 
blk_2669403222345271811_1012
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:58050 dest: /127.0.0.1:40049
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:38276 dest: /127.0.0.1:54901
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:48397 dest: /127.0.0.1:34055
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34055 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54901 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40049 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 2 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.split. 
blk_-777871427035102840_1013
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:48398 dest: /127.0.0.1:34055
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:58054 dest: /127.0.0.1:40049
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:38280 dest: /127.0.0.1:54901
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_-777871427035102840_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_-777871427035102840_1013 terminating
 [exe

[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742150#action_12742150
 ] 

Santhosh Srinivasan commented on PIG-913:
-

+1 for the fix. As Dmitriy indicates, we need new unit test cases after Hudson 
verifies the patch.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742144#action_12742144
 ] 

Alan Gates commented on PIG-893:


I'm reviewing this patch.

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_893.Patch
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-11 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742137#action_12742137
 ] 

Olga Natkovich commented on PIG-907:


+1

> Provide multiple version of HashFNV (Piggybank)
> ---
>
> Key: PIG-907
> URL: https://issues.apache.org/jira/browse/PIG-907
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-907-1.patch, PIG-907-2.patch
>
>
> HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
> when PIG-902 is not solved. So we can let the Pig pick the right version, do 
> the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: sampler.patch

Made some constants static to clear the findbugs warnings. This patch does not 
warrant a new test case since it only affects the performance of the skewed 
join sampler and SkewedJoin test case already handles the correctness of the 
join.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Patch Available  (was: Open)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742136#action_12742136
 ] 

Sriranjan Manjunath commented on PIG-890:
-

Let me know if you think that this requires a test case and I will be happy to 
include it.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: (was: sampler.patch)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742124#action_12742124
 ] 

Daniel Dai commented on PIG-913:


Thanks, Dmitriy, 
I will put unit test. I submit it first to see if it broke any existing unit 
test first.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Open  (was: Patch Available)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742120#action_12742120
 ] 

Dmitriy V. Ryaboy commented on PIG-913:
---

Daniel -- throw in a test to check for optimizer regressions in the future?

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742118#action_12742118
 ] 

Hadoop QA commented on PIG-890:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416250/sampler.patch
  against trunk revision 801865.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 6 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/console

This message is automatically generated.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #157

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/

--
[...truncated 103063 lines...]
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block 
blk_-6509224781215538639_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 1 
for block blk_-6509224781215538639_1011 terminating
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38934 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block 
blk_-6509224781215538639_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 2 
for block blk_-6509224781215538639_1011 terminating
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:40772
 [exec] [junit] 09/08/11 23:36:15 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:42304
 [exec] [junit] 09/08/11 23:36:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/11 23:36:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-7801099502017534561_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block 
blk_-7252209396593481868_1006 file 
dfs/data/data7/current/blk_-7252209396593481868
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block 
blk_-1800239565210147527_1005 file 
dfs/data/data8/current/blk_-1800239565210147527
 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/11 23:36:16 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/11 23:36:16 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908112335_0002/job.jar. 
blk_5812011963372313027_1012
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:56518 dest: /127.0.0.1:37446
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:53963 dest: /127.0.0.1:40940
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:36671 dest: /127.0.0.1:56715
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 0 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 1 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to 
blk_5812011963372313027_1012 size 1480752
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to 
blk_5812011963372313027_1012 size 1480752
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 2 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 IN

[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742117#action_12742117
 ] 

Daniel Dai commented on PIG-913:


The problem is caused by OpLimitOptimizer, which should use a correct way to 
rewire operators.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Status: Patch Available  (was: Open)

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Attachment: PIG-913.patch

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742100#action_12742100
 ] 

Alan Gates commented on PIG-833:


Patch checked in.  All the unit tests passed.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742093#action_12742093
 ] 

Alan Gates commented on PIG-833:


My bad.  I missed the line in the instructions where it said to apply the 
PIG-660 patch.  I applied that and am trying again.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742083#action_12742083
 ] 

Dmitriy V. Ryaboy commented on PIG-833:
---

Alan -- if it's not finding .dfs , it's probably not linking hadoop20.jar

Try my patch in 660 :-)

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-833:
---

Attachment: TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt

Okay, now that I've first built Pig's test, I run the tests and I get:

{code}
 [delete] Deleting directory 
/Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs
[mkdir] Created dir: 
/Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs
[junit] Running org.apache.hadoop.zebra.io.TestCheckin
[junit] Tests run: 125, Failures: 0, Errors: 0, Time elapsed: 16.894 sec
[junit] Running org.apache.hadoop.zebra.mapred.TestCheckin
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 158.741 sec
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin1
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.13 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin1 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin2
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.131 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin2 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin3
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.133 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin3 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin4
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin4 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin5
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin5 FAILED
[junit] Running org.apache.hadoop.zebra.types.TestCheckin
[junit] Tests run: 45, Failures: 0, Errors: 0, Time elapsed: 0.253 sec
{code}

I've attached the output from one of the tests.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: sampler.patch

The attached file has the redesigned sampler interface. Skewed join now uses a 
trivial implementation of the poisson sampling mechanism.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Patch Available  (was: Open)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742069#action_12742069
 ] 

Raghu Angadi commented on PIG-833:
--

Alan, in order to run unit tests you need to build pig test-core.

As mentioned in the instructions above please run {{'ant -Dtestcase=none 
test-core'}} under top level directory before running 'ant test' under 
contrib/zebra.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-833:
---

Attachment: test.out

When I run ant test in contrib/zebra, I get failures.  I've attached the output 
of the command.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

Updated patch. Only change is that ant prints a descriptive error to user if 
hadoop20.jar does not exist in top level lib directory. It lists basic steps to 
get this built until PIG-660 is committed.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2009-08-11 Thread Alex Newman (JIRA)
Change the pig hbase interface to get more than one row at a time when scanning
---

 Key: PIG-916
 URL: https://issues.apache.org/jira/browse/PIG-916
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Trivial


It should be significantly faster to get numerous rows at the same time rather 
than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742008#action_12742008
 ] 

Alex Newman commented on PIG-916:
-

Feel free to assign this to me.

> Change the pig hbase interface to get more than one row at a time when 
> scanning
> ---
>
> Key: PIG-916
> URL: https://issues.apache.org/jira/browse/PIG-916
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Priority: Trivial
>
> It should be significantly faster to get numerous rows at the same time 
> rather than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-915) Pig HBase

2009-08-11 Thread Alex Newman (JIRA)
Pig HBase
-

 Key: PIG-915
 URL: https://issues.apache.org/jira/browse/PIG-915
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Minor


Currently their is no way to get the Row names when doing a query from HBase, 
we should probably remedy this as important data may be stored there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741997#action_12741997
 ] 

Alex Newman commented on PIG-914:
-

Someone should assign this to me.

> Change the PIG hbase interface to use bytes along with strings
> --
>
> Key: PIG-914
> URL: https://issues.apache.org/jira/browse/PIG-914
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Priority: Minor
>
> Currently start rows, tablenames, column names are all strings, and HBase 
> supports bytes we might want to change the Pig interface to support bytes 
> along with strings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-915) Pig HBase

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741998#action_12741998
 ] 

Alex Newman commented on PIG-915:
-

Feel free to assign this to me.

> Pig HBase
> -
>
> Key: PIG-915
> URL: https://issues.apache.org/jira/browse/PIG-915
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Priority: Minor
>
> Currently their is no way to get the Row names when doing a query from HBase, 
> we should probably remedy this as important data may be stored there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-914) Change the PIG hbase interface to use bytes along with strings

2009-08-11 Thread Alex Newman (JIRA)
Change the PIG hbase interface to use bytes along with strings
--

 Key: PIG-914
 URL: https://issues.apache.org/jira/browse/PIG-914
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Minor


Currently start rows, tablenames, column names are all strings, and HBase 
supports bytes we might want to change the Pig interface to support bytes along 
with strings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741986#action_12741986
 ] 

Alex Newman commented on PIG-759:
-

Someone can feel free to assign this to me. I will fix up the syntax also. 
Should we switch everything from Strings to bytes, is that even possible to 
pass to with PIG?

> HBaseStorage scheme for Load/Slice function
> ---
>
> Key: PIG-759
> URL: https://issues.apache.org/jira/browse/PIG-759
> Project: Pig
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
> Attachments: patch.p1
>
>
> We would like to change the HBaseStorage function to use a scheme when 
> loading a table in pig. The scheme we are thinking of is: "hbase". So in 
> order to load an hbase table in a pig script the statement should read:
> {noformat}
> table = load 'hbase://' using HBaseStorage();
> {noformat}
> If the scheme is omitted pig would assume the tablename to be an hdfs path 
> and the storage function would use the last component of the path as a table 
> name and output a warning.
> For details on why see jira issue: PIG-758

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-759) HBaseStorage scheme for Load/Slice function

2009-08-11 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated PIG-759:


Attachment: patch.p1

This allows you to select start rows and end rows to filter the table.

> HBaseStorage scheme for Load/Slice function
> ---
>
> Key: PIG-759
> URL: https://issues.apache.org/jira/browse/PIG-759
> Project: Pig
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
> Attachments: patch.p1
>
>
> We would like to change the HBaseStorage function to use a scheme when 
> loading a table in pig. The scheme we are thinking of is: "hbase". So in 
> order to load an hbase table in a pig script the statement should read:
> {noformat}
> table = load 'hbase://' using HBaseStorage();
> {noformat}
> If the scheme is omitted pig would assume the tablename to be an hdfs path 
> and the storage function would use the last component of the path as a table 
> name and output a warning.
> For details on why see jira issue: PIG-758

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741963#action_12741963
 ] 

Hadoop QA commented on PIG-893:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416201/Pig_893.Patch
  against trunk revision 801865.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/console

This message is automatically generated.

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_893.Patch
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-Patch-minerva.apache.org #156

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/




[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Attachment: Pig_893.Patch

Updated the patch.
1. Add license header. (for audit warning)
2. Change new Long(long)  to Long.valueOf(long) for findbug warning

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_893.Patch
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Status: Patch Available  (was: Open)

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_893.Patch
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Attachment: (was: Pig_893.Patch)

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Status: Open  (was: Patch Available)

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-trunk #519

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/519/




[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741733#action_12741733
 ] 

Ashutosh Chauhan commented on PIG-845:
--

Hi Dmitriy,

Thanks for review. Please find my comments inline.

1.
EndOfAllInput flags - could you add comments here about what the point of this 
flag is? You explain what EndOfAllInputSetter does (which is actually rather 
self-explanatory) but not what the meaning of the flag is and how it's used. 
There is a bit of an explanation in PigMapBase, but it really belongs here.
>> EndofAllInput flag is basically a flag to indicate that on close() call of 
>> map/reduce task, run the pipeline once more. Till now it was used only by 
>> POStream, but now POMergeJoin also make use of it.

2.
Could you explain the relationship between EndOfAllInput and (deleted) POStream?
>> POStream is still there, I guess you are referring to MRStreamHandler which 
>> is deleted. Its renaming of class. Now that POMergeJoin also makes use of 
>> it, its better to give it a generic name like EndOfAllInput instead of 
>> MRStreamHandler.

3.
Comments in MRCompiler alternate between referring to the left MROp as 
LeftMROper and curMROper. Choose one.
>> Ya, will update the comments.

4.
I am curious about the decision to throw compiler exceptions if MergeJoin 
requirements re number of inputs, etc, aren't satisfied. It seems like a better 
user experience would be to log a warning and fall back to a regular join.
>> Ya, a good suggestion. It would be straight forward to do it while parsing 
>> (e.g. when there are more then two inputs). Though its not straight forward 
>> to do at logical to physical plan and physical to MRJobs translation time. 

5.
Style notes for visitMergeJoin:

It's a 200-line method. Any way you can break it up into smaller components? As 
is, it's hard to follow.
>> I can break it up, but that will bloat the MRCompiler class size. Better 
>> idea is to have MRCompilerHelper or some such class where all the low level 
>> helper function lives, so that MRCompiler itself is small and thus easier to 
>> read. 

The if statements should be broken up into multiple lines to agree with the 
style guides.

Variable naming: you've got topPrj, prj, pkg, lr, ce, nig.. one at a time they 
are fine, but together in a 200-line method they are undreadable. Please 
consider more descriptive names.
>> Will use more descriptive names in next patch.

6.
Kind of a global comment, since it applies to more than just MergeJoin:

It seems to me like we need a Builder for operators to clean up some of the 
new, set, set, set stuff.

Having the setters return this and a Plan's add() method return the plan, would 
let us replace this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)));
topPrj.setColumn(1);
topPrj.setResultType(DataType.TUPLE);
topPrj.setOverloaded(true);
rightMROpr.reducePlan.add(topPrj);
rightMROpr.reducePlan.connect(pkg, topPrj);

with this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)))
.setColumn(1).setResultType(DataType.TUPLE)
.setOverloaded(true);

rightMROpr.reducePlan.add(topPrj).connect(pkg, topPrj)

>>I agree. At many places there are too many parameters to set. Setters should 
>>be smart and should return the object instead of being void and then this 
>>idea of chaining will help to cut down the number of lines. 

7.
Is the change to List> keyTypes in POFRJoin related to MergeJoin or 
just rolled in?
POFRJoin can do without this change, but to avoid code duplication, I update 
the POFRJoin to use List> keyTypes.

8. MergeJoin

break getNext() into components.
>> I dont want to do that because it already has lots of class members which 
>> are getting updated at various places. Making those variables live in 
>> multiple functions will make logic even more harder to follow. Also, I am 
>> not sure if java compiler can always inline the private methods.

I don't see you supporting Left outer joins. Plans for that? At least document 
the planned approach.
>> Ya, outer joins are currently not supported. Its documented in 
>> specification. Will include comment in code also.

Error codes being declared deep inside classes, and documented on the wiki, is 
a poor practice, imo. They should be pulled out into PigErrors (as lightweight 
final objects that have an error code, a name, and a description..) I thought 
Santhosh made progress on this already, no?
>> Not sure if I understand you completely. I am using ExecException, 
>> FrontEndException etc. Arent these are lightweight final objects you are 
>> referring to ?

Could you explain the problem with splits and streams? Why can't this work for 
them?
>> Streaming after the join will be supported. There was a bug which I fixed 
>> and will be a part of next patch. Streaming before Join will not be 
>> supported because in endOfAllInput case, str