[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Tobias Schlottke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592072#comment-13592072
 ] 

Tobias Schlottke commented on PIG-3231:
---

We've been using the current trunk of avrostorage already. I switched on 
ignoreBadFiles, now I've got a similar exception elsewhere:

{code}
2013-03-04 09:40:03,341 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : org.apache.pig.backend.executionengine.ExecException: 
ERROR 2135: Received error fr
om store function.Filesystem closed
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:165)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:552)
at 
org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1406)
at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:161)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:136)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:125)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:90)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at 
org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at 
org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
{code}

Any ideas?


 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 

[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592081#comment-13592081
 ] 

Cheolsoo Park commented on PIG-3231:


Enabling 'ignoreBadFiles' doesn't help even if you use the trunk version. That 
option doesn't handle all the possible IOExceptions in AvroStorage.

The patch I worked on in PIG-3059 is meant to catch all the possible 
IOExceptions in any LoadFunc implementations, but that is NOT committed to 
trunk/CDH4.2.

 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
 {code}
 Please note that we're analyzing a bunch of files (~200 files, we're using 
 glob matchers), some of them are small.
 We made it work once without the small files.
 *Update*
 I found the following exception deep in the logs that seems to make the job 
 fail:
 {code}
 2013-03-03 19:51:06,169 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 2013-03-03 19:51:06,170 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:357)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:526)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
 

[jira] [Resolved] (PIG-3148) OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag.

2013-03-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-3148.
-

   Resolution: Fixed
Fix Version/s: 0.11.1
   0.12

Committed to 0.11.1 and trunk. Thanks Koji and Dmitriy.

 OutOfMemory exception while spilling stale DefaultDataBag. Extra option to 
 gc() before spilling large bag.
 --

 Key: PIG-3148
 URL: https://issues.apache.org/jira/browse/PIG-3148
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Koji Noguchi
Assignee: Koji Noguchi
 Fix For: 0.12, 0.11.1

 Attachments: pig-3148-v01.patch, pig-3148-v02.patch


 Our user reported that one of their jobs in pig 0.10 occasionally failed with 
 'Error: GC overhead limit exceeded' or 'Error: Java heap space', but 
 rerunning it sometimes finishes successfully.
 For 1G heap reducer, heap dump showed it contained two huge DefaultDataBag 
 with 300-400MBytes each when failing with OOM.
 Jstack at the time of OOM always showed that spill was running.
 {noformat}
 Low Memory Detector daemon prio=10 tid=0xb9c11800 nid=0xa52 runnable 
 [0xb9afc000]
java.lang.Thread.State: RUNNABLE
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:260)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   - locked 0xe57c6390 (a java.io.BufferedOutputStream)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   - locked 0xe57c60b8 (a java.io.DataOutputStream)
   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
   at org.apache.pig.data.utils.SedesHelper.writeBytes(SedesHelper.java:46)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:537)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
   at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
   at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:106)
   - locked 0xceb16190 (a java.util.ArrayList)
   at 
 org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:243)
   - locked 0xbeb86318 (a java.util.LinkedList)
   at 
 sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
   at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
   at 
 sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
   at sun.management.Sensor.trigger(Sensor.java:120)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592105#comment-13592105
 ] 

Rohini Palaniswamy commented on PIG-3231:
-

bq. It looks like the Avro file that Pig is trying to loadi doesn't exist on 
hdfs (or is corrupted). I have seen a similar issue when filename is changed 
between when Pig launches a job on front-end and when the job runs on back-end.
   You will not get a FileSystem closed exception for that. It will be a 
FileNotFoundException. If the file was renamed when it was being accessed, you 
will get a Lease Exception. 

This exception happens when some code has closed the filesystem and another 
piece of code has reference to the same FileSystem object (because of the 
FileSystem cache). A quick glance at AvroStorage does not have a fs.close() 
though. I am suspecting fs.close() is introduced in the pig code somewhere or 
most likely the user UDF is doing a fs.close(). 

Tobias,
   If you are using a UDF, please check if you are doing a fs.close() 
somewhere? If not you can workaround till this is fixed by setting 
fs.hdfs.impl.disable.cache to true.

 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
 {code}
 Please note that we're analyzing a bunch of files (~200 files, we're using 
 glob matchers), some of them are small.
 We made it work once without the small files.
 *Update*
 I found the following exception deep in the logs that seems to make the job 
 fail:
 {code}
 2013-03-03 19:51:06,169 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 2013-03-03 19:51:06,170 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 at 
 

[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Tobias Schlottke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592132#comment-13592132
 ] 

Tobias Schlottke commented on PIG-3231:
---

Could it be something like filesystem limits aswell? 
We rebooted the whole cluster for the first time after the installation. Now it 
seems to fail in reducers with this exception:

{code}
2013-03-04 11:57:02,214 ERROR [main] 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:metrigo (auth:SIMPLE) 
cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
2013-03-04 11:57:02,215 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
cannot be cast to java.lang.Comparable
at java.util.TreeMap.put(TreeMap.java:559)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
{code}

Which leads us to this Issue: 
https://issues.apache.org/jira/browse/MAPREDUCE-4965

4.2.0 seems to introduce this, we're now patching mapreduce and giving it 
another spin.



 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)

[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Tobias Schlottke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592199#comment-13592199
 ] 

Tobias Schlottke commented on PIG-3231:
---

Patching that worked like a charm. We'll see if the error still persists for 
any of our workflows.

 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
 {code}
 Please note that we're analyzing a bunch of files (~200 files, we're using 
 glob matchers), some of them are small.
 We made it work once without the small files.
 *Update*
 I found the following exception deep in the logs that seems to make the job 
 fail:
 {code}
 2013-03-03 19:51:06,169 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 2013-03-03 19:51:06,170 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
 at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:357)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:526)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
   

[jira] [Updated] (PIG-3136) Introduce a syntax making declared aliases optional

2013-03-04 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3136:
--

Attachment: PIG-3136-3.patch

I've updated the RB and this patch with your comments, Cheolsoo.

Let me know what you think?

 Introduce a syntax making declared aliases optional
 ---

 Key: PIG-3136
 URL: https://issues.apache.org/jira/browse/PIG-3136
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3136-0.patch, PIG-3136-1.patch, PIG-3136-2.patch, 
 PIG-3136-3.patch


 This is something Daniel and I have talked about before, and now that we have 
 the @ syntax, this is easy to implement. The idea is that relation names are 
 no longer required, and you can instead use a fat arrow (obviously that can 
 be changed) to signify this. The benefit is not having to engage in the 
 mental load of having to name everything.
 One other possibility is just making alias = optional. I fear that that 
 could be a little TOO magical, but I welcome opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592257#comment-13592257
 ] 

Jonathan Coveney commented on PIG-3211:
---

I'll take a closer look in a little bit, though I will say that there is a 
PigConfiguration singleton I'd like to see catch on. Ideally, it should be the 
central place for configurations like this.

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3232) Refactor Pig so that configurations use PigConfiguration wherever possible

2013-03-04 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-3232:
-

 Summary: Refactor Pig so that configurations use PigConfiguration 
wherever possible
 Key: PIG-3232
 URL: https://issues.apache.org/jira/browse/PIG-3232
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3144) Erroneous map entry alias resolution leading to Duplicate schema alias errors

2013-03-04 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3144:
--

Attachment: PIG-3144-1.patch

Updated. Let me know how the tests come back. Thanks, Cheolsoo!

 Erroneous map entry alias resolution leading to Duplicate schema alias 
 errors
 ---

 Key: PIG-3144
 URL: https://issues.apache.org/jira/browse/PIG-3144
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.10.1
Reporter: Kai Londenberg
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3144-0.patch, PIG-3144-1.patch


 The following code illustrates a problem concerning alias resolution in pig 
 The schema of D2 will incorrectly be described as containing two age 
 fields. And the last step in the following script will lead to a Duplicate 
 schema alias error message.
 I only encountered this bug when using aliases for map fields. 
 {code}
 DATA = LOAD 'file:///whatever' as (a:map[chararray], b:chararray);
 D1 = FOREACH DATA GENERATE a#'name' as name, a#'age' as age, b;
 D2 = FOREACH D1 GENERATE name, age, b;
 DESCRIBE D2;
 {code}
 Output:
 {code}
 D2: {
 age: chararray,
 age: chararray,
 b: chararray
 }
 {code}
 {code}
 D3 = FOREACH D2 GENERATE *;
 DESCRIBE D3;
 {code}
 Output:
 {code}
 file file:///.../pig-bug-example.pig, line 20, column 16 Duplicate schema 
 alias: age
 {code}
 This error occurs in this form in Apache Pig version 0.11.0-SNAPSHOT (r6408). 
 A less severe variant of this bug is also present in pig 0.10.1. In 0.10.1, 
 the Duplicate schema alias error message won't occur, but the schema of D2 
 (see above) will still have wrong duplicate alias entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2988) start deploying pigunit maven artifact part of Pig release process

2013-03-04 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-2988:


Attachment: PIG-2988.0.patch

The attached patch uploads the pigunit and pigsmoke jars when a release is 
deployed. Currently the pigunit and pigsmoke artifacts are uploaded as 
snapshots to the apache repo (e.g. 
https://repository.apache.org/content/repositories/snapshots/org/apache/pig/pigunit/0.12.0-SNAPSHOT),
 so this is a fairly small change. 

 start deploying pigunit maven artifact part of Pig release process
 --

 Key: PIG-2988
 URL: https://issues.apache.org/jira/browse/PIG-2988
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.11, 0.10.1
Reporter: Johnny Zhang
 Attachments: PIG-2988.0.patch


 right now the Pig project doesn't publish pigunit Maven artifact, thins like
 {noformat}
 dependency
   groupIdorg.apache.pig/groupId
   artifactIdpigunit/artifactId
   version0.10.0/version
 /dependency
 {noformat}
 doesn't work. Can we start deploy pigunit Maven artifacts as part of the 
 release process? Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3231) Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro input

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592386#comment-13592386
 ] 

Cheolsoo Park commented on PIG-3231:


[~rohini], thank you very much for correcting me! You're absolutely right. In 
fact, I should have said this:

The case that I have seen before is that Flume AvroSinks randomly die while 
writing Avro files, so the files that they were writing to are not properly 
closed. This leaves corrupted files in a directory. Now Pig launches a job on 
that directory, and jobs fail during execution since files cannot be either 
opened or read.

I found that it is common with my customers that they load files on HDFS by 
another tool and run Pig jobs on them at the same time. Apparently, this often 
leads to what you're saying:
{quote}
This exception happens when some code has closed the filesystem and another 
piece of code has reference to the same FileSystem object (because of the 
FileSystem cache). A quick glance at AvroStorage does not have a fs.close() 
though.
{quote}
The way I dealt with was ignoring bad files. In AvroStorage, there are several 
places (about 6 places IIRC) that can throw an IOException, so I caught them in 
PigRecordReader instead, drop the input split entirely and move on. Obviously, 
this is not a perfect solution, but in the short term, it seems to work so far.


 Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
 input
 

 Key: PIG-3231
 URL: https://issues.apache.org/jira/browse/PIG-3231
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
 Environment: CDH4.2, yarn, avro
Reporter: Tobias Schlottke

 Hi there,
 we've got a strange issue after switching to a new cluster with cdh4.2 (from 
 cdh3):
 Pig seems to create temporary avro files for its map reduce jobs, which it 
 either deletes or never creates.
 Pig fails with the no error returned by hadoop-message, but in nn-logs I 
 found something interesting.
 The actual exception from nn-log is:
 a
 {code}
 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.1.28:37814: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_07_0/part-m-7.avro
  File does not exist. Holder 
 DFSClient_attempt_1362133122980_0017_m_07_0_1992466008_1 does not have 
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
 {code}
 Please note that we're analyzing a bunch of files (~200 files, we're using 
 glob matchers), some of them are small.
 We made it work once without the small files.
 *Update*
 I found the following exception deep in the logs that seems to make the job 
 fail:
 {code}
 2013-03-03 19:51:06,169 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
 org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem 

[jira] [Updated] (PIG-2988) start deploying pigunit maven artifact part of Pig release process

2013-03-04 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-2988:


Assignee: Nick White
  Status: Patch Available  (was: Open)

 start deploying pigunit maven artifact part of Pig release process
 --

 Key: PIG-2988
 URL: https://issues.apache.org/jira/browse/PIG-2988
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.11, 0.10.1
Reporter: Johnny Zhang
Assignee: Nick White
 Attachments: PIG-2988.0.patch


 right now the Pig project doesn't publish pigunit Maven artifact, thins like
 {noformat}
 dependency
   groupIdorg.apache.pig/groupId
   artifactIdpigunit/artifactId
   version0.10.0/version
 /dependency
 {noformat}
 doesn't work. Can we start deploy pigunit Maven artifacts as part of the 
 release process? Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3233) Deploy a Piggybank Jar

2013-03-04 Thread Nick White (JIRA)
Nick White created PIG-3233:
---

 Summary: Deploy a Piggybank Jar
 Key: PIG-3233
 URL: https://issues.apache.org/jira/browse/PIG-3233
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.11, 0.10.0
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.10.1, 0.11.1
 Attachments: PIG-3233.0.patch

The attached patch adds the piggybank contrib jar to the mvn-install and 
mvn-deploy ant targets in the same way as the pigunit  pigsmoke artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3233) Deploy a Piggybank Jar

2013-03-04 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-3233:


Attachment: PIG-3233.0.patch

 Deploy a Piggybank Jar
 --

 Key: PIG-3233
 URL: https://issues.apache.org/jira/browse/PIG-3233
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.10.1, 0.11.1

 Attachments: PIG-3233.0.patch


 The attached patch adds the piggybank contrib jar to the mvn-install and 
 mvn-deploy ant targets in the same way as the pigunit  pigsmoke artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3233) Deploy a Piggybank Jar

2013-03-04 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-3233:


Status: Patch Available  (was: Open)

 Deploy a Piggybank Jar
 --

 Key: PIG-3233
 URL: https://issues.apache.org/jira/browse/PIG-3233
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.11, 0.10.0
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.10.1, 0.11.1

 Attachments: PIG-3233.0.patch


 The attached patch adds the piggybank contrib jar to the mvn-install and 
 mvn-deploy ant targets in the same way as the pigunit  pigsmoke artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3136) Introduce a syntax making declared aliases optional

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592432#comment-13592432
 ] 

Cheolsoo Park commented on PIG-3136:


+1.

I noticed that you modified dumpSchema() and dumpSchemaNested() in PigServer. 
The unit tests fully passed with last patch, but let me run another run of unit 
test with the new patch. I am almost certain that no test will fail; 
nevertheless, it's always good to verify. :-)


 Introduce a syntax making declared aliases optional
 ---

 Key: PIG-3136
 URL: https://issues.apache.org/jira/browse/PIG-3136
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3136-0.patch, PIG-3136-1.patch, PIG-3136-2.patch, 
 PIG-3136-3.patch


 This is something Daniel and I have talked about before, and now that we have 
 the @ syntax, this is easy to implement. The idea is that relation names are 
 no longer required, and you can instead use a fat arrow (obviously that can 
 be changed) to signify this. The benefit is not having to engage in the 
 mental load of having to name everything.
 One other possibility is just making alias = optional. I fear that that 
 could be a little TOO magical, but I welcome opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592486#comment-13592486
 ] 

Cheolsoo Park commented on PIG-3211:


[~prkommireddi], I have two comments:
# I think we can simplify the code. Why don't we do this?
{code:title=buildLoadOp}
-FuncSpec instantiatedFuncSpec =
-funcSpec == null ?
-new FuncSpec(PigStorage.class.getName()) :
-funcSpec;
-loFunc = 
(LoadFunc)PigContext.instantiateFuncFromSpec(instantiatedFuncSpec);
-String fileNameKey = 
QueryParserUtils.constructFileNameSignature(filename, instantiatedFuncSpec) + 
_ + (loadIndex++);
+String defaultLoadFuncName = 
pigContext.getProperties().getProperty(pig.default.load.func, 
PigStorage.class.getName());
+funcSpec = funcSpec == null ? new FuncSpec(defaultLoadFuncName): 
funcSpec;
+loFunc = (LoadFunc)PigContext.instantiateFuncFromSpec(funcSpec);
+String fileNameKey = 
QueryParserUtils.constructFileNameSignature(filename, funcSpec) + _ + 
(loadIndex++);
{code}
{code:title=buildStoreOp}
-FuncSpec instantiatedFuncSpec =
-funcSpec == null ?
-new FuncSpec(PigStorage.class.getName()):
-funcSpec;
-
-StoreFuncInterface stoFunc = 
(StoreFuncInterface)PigContext.instantiateFuncFromSpec(instantiatedFuncSpec);
+String defaultStoreFuncName = 
pigContext.getProperties().getProperty(pig.default.store.func, 
PigStorage.class.getName());
+funcSpec = funcSpec == null ? new FuncSpec(defaultStoreFuncName): 
funcSpec;
+StoreFuncInterface stoFunc = 
(StoreFuncInterface)PigContext.instantiateFuncFromSpec(funcSpec);
{code}
I can confirm that your test cases pass with this, so I don't think we need the 
helper functions in Utils.java.
# In addition, we should change the following in QueryParserUtils.java:
{code:title=QueryParserUtils.java}
 public static void attachStorePlan(String scope, LogicalPlan lp, String 
fileName, String func, 
 Operator input, String alias, PigContext pigContext) throws 
FrontendException {
 if( func == null ) {
-func = PigStorage.class.getName();
+func = 
pigContext.getProperties().getProperty(pig.default.store.func, 
PigStorage.class.getName());
 }
{code}

Let me know what you think.

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3136) Introduce a syntax making declared aliases optional

2013-03-04 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592518#comment-13592518
 ] 

Jonathan Coveney commented on PIG-3136:
---

Cheolsoo, can you update this when the tests pass?

 Introduce a syntax making declared aliases optional
 ---

 Key: PIG-3136
 URL: https://issues.apache.org/jira/browse/PIG-3136
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3136-0.patch, PIG-3136-1.patch, PIG-3136-2.patch, 
 PIG-3136-3.patch


 This is something Daniel and I have talked about before, and now that we have 
 the @ syntax, this is easy to implement. The idea is that relation names are 
 no longer required, and you can instead use a fat arrow (obviously that can 
 be changed) to signify this. The benefit is not having to engage in the 
 mental load of having to name everything.
 One other possibility is just making alias = optional. I fear that that 
 could be a little TOO magical, but I welcome opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3136) Introduce a syntax making declared aliases optional

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592534#comment-13592534
 ] 

Cheolsoo Park commented on PIG-3136:


Will do. It's likely that you will find your patch committed tomorrow morning.

 Introduce a syntax making declared aliases optional
 ---

 Key: PIG-3136
 URL: https://issues.apache.org/jira/browse/PIG-3136
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3136-0.patch, PIG-3136-1.patch, PIG-3136-2.patch, 
 PIG-3136-3.patch


 This is something Daniel and I have talked about before, and now that we have 
 the @ syntax, this is easy to implement. The idea is that relation names are 
 no longer required, and you can instead use a fat arrow (obviously that can 
 be changed) to signify this. The benefit is not having to engage in the 
 mental load of having to name everything.
 One other possibility is just making alias = optional. I fear that that 
 could be a little TOO magical, but I welcome opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3144) Erroneous map entry alias resolution leading to Duplicate schema alias errors

2013-03-04 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3144:
---

   Resolution: Fixed
Fix Version/s: 0.11.1
   Status: Resolved  (was: Patch Available)

Committed to trunk and 0.11.

Note that I replaced @'s with relation names from the new test case in 0.11 
because it isn't supported in 0.11.

 Erroneous map entry alias resolution leading to Duplicate schema alias 
 errors
 ---

 Key: PIG-3144
 URL: https://issues.apache.org/jira/browse/PIG-3144
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.10.1
Reporter: Kai Londenberg
Assignee: Jonathan Coveney
 Fix For: 0.12, 0.11.1

 Attachments: PIG-3144-0.patch, PIG-3144-1-branch-0.11.patch, 
 PIG-3144-1.patch


 The following code illustrates a problem concerning alias resolution in pig 
 The schema of D2 will incorrectly be described as containing two age 
 fields. And the last step in the following script will lead to a Duplicate 
 schema alias error message.
 I only encountered this bug when using aliases for map fields. 
 {code}
 DATA = LOAD 'file:///whatever' as (a:map[chararray], b:chararray);
 D1 = FOREACH DATA GENERATE a#'name' as name, a#'age' as age, b;
 D2 = FOREACH D1 GENERATE name, age, b;
 DESCRIBE D2;
 {code}
 Output:
 {code}
 D2: {
 age: chararray,
 age: chararray,
 b: chararray
 }
 {code}
 {code}
 D3 = FOREACH D2 GENERATE *;
 DESCRIBE D3;
 {code}
 Output:
 {code}
 file file:///.../pig-bug-example.pig, line 20, column 16 Duplicate schema 
 alias: age
 {code}
 This error occurs in this form in Apache Pig version 0.11.0-SNAPSHOT (r6408). 
 A less severe variant of this bug is also present in pig 0.10.1. In 0.10.1, 
 the Duplicate schema alias error message won't occur, but the schema of D2 
 (see above) will still have wrong duplicate alias entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3144) Erroneous map entry alias resolution leading to Duplicate schema alias errors

2013-03-04 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3144:
---

Attachment: PIG-3144-1-branch-0.11.patch

Attaching the 0.11 patch for the record.

 Erroneous map entry alias resolution leading to Duplicate schema alias 
 errors
 ---

 Key: PIG-3144
 URL: https://issues.apache.org/jira/browse/PIG-3144
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.10.1
Reporter: Kai Londenberg
Assignee: Jonathan Coveney
 Fix For: 0.12, 0.11.1

 Attachments: PIG-3144-0.patch, PIG-3144-1-branch-0.11.patch, 
 PIG-3144-1.patch


 The following code illustrates a problem concerning alias resolution in pig 
 The schema of D2 will incorrectly be described as containing two age 
 fields. And the last step in the following script will lead to a Duplicate 
 schema alias error message.
 I only encountered this bug when using aliases for map fields. 
 {code}
 DATA = LOAD 'file:///whatever' as (a:map[chararray], b:chararray);
 D1 = FOREACH DATA GENERATE a#'name' as name, a#'age' as age, b;
 D2 = FOREACH D1 GENERATE name, age, b;
 DESCRIBE D2;
 {code}
 Output:
 {code}
 D2: {
 age: chararray,
 age: chararray,
 b: chararray
 }
 {code}
 {code}
 D3 = FOREACH D2 GENERATE *;
 DESCRIBE D3;
 {code}
 Output:
 {code}
 file file:///.../pig-bug-example.pig, line 20, column 16 Duplicate schema 
 alias: age
 {code}
 This error occurs in this form in Apache Pig version 0.11.0-SNAPSHOT (r6408). 
 A less severe variant of this bug is also present in pig 0.10.1. In 0.10.1, 
 the Duplicate schema alias error message won't occur, but the schema of D2 
 (see above) will still have wrong duplicate alias entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3211:
-

Attachment: PIG-3211_2.patch

Thanks [~cheolsoo]. I have updated the patch (may be simplified a bit further). 
I also took Jon's suggestion of using PigConfiguration to define the new 
properties. The same has been documented in pig.properties for users.

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211_2.patch, PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592660#comment-13592660
 ] 

Cheolsoo Park commented on PIG-3211:


+1. I will wait a day before committing just to see Jonathan has more 
suggestions. Thank you Prashant!

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211_2.patch, PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592710#comment-13592710
 ] 

Jonathan Coveney commented on PIG-3211:
---

looks fine to me. go ahead and commit it cheolsoo :)

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211_2.patch, PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3233) Deploy a Piggybank Jar

2013-03-04 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592712#comment-13592712
 ] 

Bill Graham commented on PIG-3233:
--

Thanks for tackling this one Nick!

The mechanics of the patch looks good, but from where did you get the deps that 
you included in {{ivy/piggybank-template.xml}}?

 Deploy a Piggybank Jar
 --

 Key: PIG-3233
 URL: https://issues.apache.org/jira/browse/PIG-3233
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.10.1, 0.11.1

 Attachments: PIG-3233.0.patch


 The attached patch adds the piggybank contrib jar to the mvn-install and 
 mvn-deploy ant targets in the same way as the pigunit  pigsmoke artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3211) Allow default Load/Store funcs to be configurable

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592716#comment-13592716
 ] 

Cheolsoo Park commented on PIG-3211:


OK, I will run unit test now.

 Allow default Load/Store funcs to be configurable
 -

 Key: PIG-3211
 URL: https://issues.apache.org/jira/browse/PIG-3211
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3211_2.patch, PIG-3211.patch


 PigStorage is used by default when a Load/StoreFunc is not specified. It 
 would be useful to make this configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3183) rm or rmf commands should respect globbing/regex of path

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592762#comment-13592762
 ] 

Cheolsoo Park commented on PIG-3183:


[~prkommireddi], please correct me if I am wrong.
* As of now, these limitations can be easily worked around by using fs 
commands (i.e. fs -rm * and fs -ls *).
* Given these commands are not documented (and thus not official), I would 
encourage users to use fs commands.

I do not know why these commands are added in the first place, and we should 
keep them for backward compatibility for a while. But eventually I would like 
to get rid of them since they're duplicate IMO.

 rm or rmf commands should respect globbing/regex of path
 

 Key: PIG-3183
 URL: https://issues.apache.org/jira/browse/PIG-3183
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3183.patch


 Hadoop fs commands support globbing during deleting files/dirs. Pig is not 
 consistent with this behavior and seems like we could change rm/rmf commands 
 to do the same.
 For eg:
 {code}
 localhost:pig pkommireddi$ ls -ld out*
 drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
 localhost:pig pkommireddi$ bin/pig -x local
 grunt rmf out*
 grunt quit
 localhost:pig pkommireddi$ ls -ld out*
 drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
 {code}
 Ideally, the user would expect rmf out* to delete all of the above dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2988) start deploying pigunit maven artifact part of Pig release process

2013-03-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2988:


Attachment: PIG-2988.0-branch11.patch

Attaching patch for branch-0.11. The patch for trunk did not apply due to 
whitespace issues. 

 start deploying pigunit maven artifact part of Pig release process
 --

 Key: PIG-2988
 URL: https://issues.apache.org/jira/browse/PIG-2988
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.11, 0.10.1
Reporter: Johnny Zhang
Assignee: Nick White
 Attachments: PIG-2988.0-branch11.patch, PIG-2988.0.patch


 right now the Pig project doesn't publish pigunit Maven artifact, thins like
 {noformat}
 dependency
   groupIdorg.apache.pig/groupId
   artifactIdpigunit/artifactId
   version0.10.0/version
 /dependency
 {noformat}
 doesn't work. Can we start deploy pigunit Maven artifacts as part of the 
 release process? Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3081) Pig progress stays at 0% for the first job in hadoop 23

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592783#comment-13592783
 ] 

Cheolsoo Park commented on PIG-3081:


+1. LGTM.

 Pig progress stays at 0% for the first job in hadoop 23
 ---

 Key: PIG-3081
 URL: https://issues.apache.org/jira/browse/PIG-3081
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3081-1.patch, PIG-3081.patch


   We are seeing that for many scripts if there are multiple jobs in the job 
 graph, progress stays at 0% for the first job and jumps to 33% when the first 
 job completes. There is no intermediate progress. After that intermediate 
 progress gets reported for the subsequent jobs. Noticed this with jobs that 
 do filtering and order by. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3183) rm or rmf commands should respect globbing/regex of path

2013-03-04 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592808#comment-13592808
 ] 

Prashant Kommireddi commented on PIG-3183:
--

Right. I receive a lot of questions regarding rm/ls behavior and I point them 
to fs commands. But it's a pain for users to start using it and realize it 
doesn't work. I would be in favor of deprecating these or may be even 
translating it to fs under the hood. 

 rm or rmf commands should respect globbing/regex of path
 

 Key: PIG-3183
 URL: https://issues.apache.org/jira/browse/PIG-3183
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3183.patch


 Hadoop fs commands support globbing during deleting files/dirs. Pig is not 
 consistent with this behavior and seems like we could change rm/rmf commands 
 to do the same.
 For eg:
 {code}
 localhost:pig pkommireddi$ ls -ld out*
 drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
 localhost:pig pkommireddi$ bin/pig -x local
 grunt rmf out*
 grunt quit
 localhost:pig pkommireddi$ ls -ld out*
 drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
 drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
 {code}
 Ideally, the user would expect rmf out* to delete all of the above dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-03-04 Thread jira
Issue Subscription
Filter: PIG patch available (33 issues)

Subscriber: pigdaily

Key Summary
PIG-3233Deploy a Piggybank Jar
https://issues.apache.org/jira/browse/PIG-3233
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3211Allow default Load/Store funcs to be configurable
https://issues.apache.org/jira/browse/PIG-3211
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3208[zebra] TFile should not set io.compression.codec.lzo.buffersize
https://issues.apache.org/jira/browse/PIG-3208
PIG-3205Passing arguments to python script does not work with -f option
https://issues.apache.org/jira/browse/PIG-3205
PIG-3198Let users use any function from PigType - PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3172Partition filter push down does not happen when there is a non 
partition key map column filter
https://issues.apache.org/jira/browse/PIG-3172
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3141Giving CSVExcelStorage an option to handle header rows
https://issues.apache.org/jira/browse/PIG-3141
PIG-3136Introduce a syntax making declared aliases optional
https://issues.apache.org/jira/browse/PIG-3136
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3081Pig progress stays at 0% for the first job in hadoop 23
https://issues.apache.org/jira/browse/PIG-3081
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2988start deploying pigunit maven artifact part of Pig release process
https://issues.apache.org/jira/browse/PIG-2988
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir
https://issues.apache.org/jira/browse/PIG-2591
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Commented] (PIG-3233) Deploy a Piggybank Jar

2013-03-04 Thread Nick White (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592893#comment-13592893
 ] 

Nick White commented on PIG-3233:
-

I just used the imports -

grep -hr '^import' contrib/piggybank/java/src/main | sort | uniq

for the compile-time dependencies, and:

grep -hr '^import' contrib/piggybank/java/src/test | sort | uniq

for anything else for the test scope. I'm not sure there's any better way of 
keeping the ivy and maven dependencies in sync (especially as the template poms 
don't use the ivy.properties file to pick up their versions).

 Deploy a Piggybank Jar
 --

 Key: PIG-3233
 URL: https://issues.apache.org/jira/browse/PIG-3233
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.10.1, 0.11.1

 Attachments: PIG-3233.0.patch


 The attached patch adds the piggybank contrib jar to the mvn-install and 
 mvn-deploy ant targets in the same way as the pigunit  pigsmoke artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2

2013-03-04 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592896#comment-13592896
 ] 

Prashant Kommireddi commented on PIG-3194:
--

Would be great to have some ideas from others and discuss :)

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg

 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3214) New/improved mascot

2013-03-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated PIG-3214:


Attachment: newlogo1.png
newlogo2.png
newlogo3.png
newlogo4.png

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-04 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593065#comment-13593065
 ] 

Prashant Kommireddi commented on PIG-3214:
--

Thanks Prasanth, these look great. My vote would go to newlogo2!

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593082#comment-13593082
 ] 

Cheolsoo Park commented on PIG-3214:


My +1 to #2 as well. Thank you Prasanth for the hard work!


 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593122#comment-13593122
 ] 

Prasanth J commented on PIG-3214:
-

Adding one more similar to #2. 

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
 newlogo5.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3214) New/improved mascot

2013-03-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated PIG-3214:


Attachment: newlogo5.png

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
 newlogo5.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2507) Semicolon in paramenters for UDF results in parsing error

2013-03-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated PIG-2507:
--

Affects Version/s: 0.11.1
   0.11

 Semicolon in paramenters for UDF results in parsing error
 -

 Key: PIG-2507
 URL: https://issues.apache.org/jira/browse/PIG-2507
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.1, 0.10.0, 0.11, 0.11.1
Reporter: Vivek Padmanabhan
Assignee: Timothy Chen
 Attachments: PIG_2507.patch


 If I have a semicolon in the parameter passed to a udf, the script execution 
 will fail with a parsing error.
 a = load 'i1' as (f1:chararray);
 c = foreach a generate REGEX_EXTRACT(f1, '.;' ,1);
 dump c;
 The above script fails with the below error 
 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, 
 line 3, column 0  mismatched character 'EOF' expecting '''
 Even replacing the semicolon with Unicode \u003B results in the same error.
 c = foreach a generate REGEX_EXTRACT(f1, '.\u003B',1);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira