[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (32 issues) Subscriber: pigdaily Key Summary PIG-5246Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 https://issues.apache.org/jira/browse/PIG-5246 PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown NPE in multithread env https://issues.apache.org/jira/browse/PIG-5160 PIG-5157Upgrade to Spark 2.0 https://issues.apache.org/jira/browse/PIG-5157 PIG-5115Builtin AvroStorage generates incorrect avro schema when the same pig field name appears in the alias https://issues.apache.org/jira/browse/PIG-5115 PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true https://issues.apache.org/jira/browse/PIG-5106 PIG-5081Can not run pig on spark source code distribution https://issues.apache.org/jira/browse/PIG-5081 PIG-5080Support store alias as spark table https://issues.apache.org/jira/browse/PIG-5080 PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput https://issues.apache.org/jira/browse/PIG-5057 PIG-5029Optimize sort case when data is skewed https://issues.apache.org/jira/browse/PIG-5029 PIG-4926Modify the content of start.xml for spark mode https://issues.apache.org/jira/browse/PIG-4926 PIG-4913Reduce jython function initiation during compilation https://issues.apache.org/jira/browse/PIG-4913 PIG-4849pig on tez will cause tez-ui to crash,because the content from timeline server is too long. https://issues.apache.org/jira/browse/PIG-4849 PIG-4750REPLACE_MULTI should compile Pattern once and reuse it https://issues.apache.org/jira/browse/PIG-4750 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4548Records Lost With Specific Combination of Commands and Streaming Function https://issues.apache.org/jira/browse/PIG-4548 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-1804Alow Jython function to implement Algebraic and/or Accumulator interfaces https://issues.apache.org/jira/browse/PIG-1804 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384
Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59530/ --- (Updated June 22, 2017, 2:07 p.m.) Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita. Changes --- Fix for java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListenerInterface Repository: pig-git Description --- Upgrade to Spark 2.1 API using shims. Diffs (updated) - build.xml bba2b52d9354ab909ad26f969480806f6d91911c ivy.xml 3f2c94373ba9455bbb6a3c96bfd61fc6cfaab588 ivy/libraries.properties c2aed45a3244dfd108a255c7308a7dcb0dabd3b5 src/org/apache/pig/backend/hadoop/executionengine/spark/FlatMapFunctionAdapter.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java f81341233447203abc4800cc7b22a4f419e10262 src/org/apache/pig/backend/hadoop/executionengine/spark/PairFlatMapFunctionAdapter.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/Spark1Shims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/Spark2Shims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 237fd9431a16226234d91059088f91aab346b83c src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java 83311dfa5bb25209a5366c2db7e8d483c31d94cd src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java 382258e7ff9105aa397c5a2888df0c11e9562ec9 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java b58415e7e18ca4cf1331beef06e9214600a51424 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java 130c8b9a747b176ce2b649ca6d5260527595fb76 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java fe1b54c8f128661d7d19c276d3bb2de7874d3086 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java adf78ecab0da10d3b1a7fdde8af2b42dd899810f src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java d1c43b1e06adc4c9fe45a83b8110402e3756 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java e003bbd95763b2d189ff9ec540c89abe52592420 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java 00d29b44848546ed16dde2baa8c61b36939971b2 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java baabfa090323e3bef087e259ce19df2e4c34dd63 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java 3166fdc31745c013380492e089c83f3e853a3e6e src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java 3a50d485cfd54b9f3b9c1a982e6c30497a4c85fc src/org/apache/pig/tools/pigstats/spark/Spark1JobStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/Spark2JobStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java c8cc03194b223d2ee181d73c6b651a6872cac6b6 src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java 61ccbcc9fd723f6e2e578a8476230c42d5587dfe test/org/apache/pig/test/TestPigRunner.java ec08417f2f71deec514cab5cfb9d2f99520ad641 Diff: https://reviews.apache.org/r/59530/diff/8/ Changes: https://reviews.apache.org/r/59530/diff/7-8/ Testing --- Thanks, Nandor Kollar
[jira] [Updated] (PIG-5263) Using wildcard doesn't work with OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Subhashrao Saley updated PIG-5263: - Attachment: PIG-5263-1.patch > Using wildcard doesn't work with OrcStorage > --- > > Key: PIG-5263 > URL: https://issues.apache.org/jira/browse/PIG-5263 > Project: Pig > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Priority: Minor > Attachments: PIG-5263-1.patch > > > myinput = LOAD '/user/saley/data/datestamp=20170301*' USING OrcStorage(); > Its throwing an exception {{Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist.}} > Full stack trace > {code} > 2017-03-03 18:50:12,651 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: serious > problem > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:411) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:292) > > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:169) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:182) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:503) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:187) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:286) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1401) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1386) > at org.apache.pig.PigServer.storeEx(PigServer.java:1045) > at org.apache.pig.PigServer.store(PigServer.java:1008) > at org.apache.pig.PigServer.openIterator(PigServer.java:921) > at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:762) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at org.apache.pig.Main.run(Main.java:630) > at org.apache.pig.Main.main(Main.java:176) > Caused by: java.lang.RuntimeException: serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1119) > at > org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) > ... 23 more > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1087) > ... 25 more > Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:948) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:927) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) > at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1697) > at > org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:665) > at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) > at
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059727#comment-16059727 ] Rohini Palaniswamy commented on PIG-5157: - +1 for https://reviews.apache.org/r/59530/diff/7/ . [~nkollar], Can you upload the final patch here? [~kellyzly], Can you retry with the new patch and verify if it works for you? Please go ahead and commit it you are +1 on the patch. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIG-5263) Using wildcard doesn't work with OrcStorage
Satish Subhashrao Saley created PIG-5263: Summary: Using wildcard doesn't work with OrcStorage Key: PIG-5263 URL: https://issues.apache.org/jira/browse/PIG-5263 Project: Pig Issue Type: Bug Reporter: Satish Subhashrao Saley Priority: Minor myinput = LOAD '/user/saley/data/datestamp=20170301*' USING OrcStorage(); Its throwing an exception {{Caused by: java.io.FileNotFoundException: File hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist.}} Full stack trace {code} 2017-03-03 18:50:12,651 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir org.apache.pig.backend.executionengine.ExecException: ERROR 2118: serious problem at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:411) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:292) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:169) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:182) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:503) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:187) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:286) at org.apache.pig.PigServer.launchPlan(PigServer.java:1401) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1386) at org.apache.pig.PigServer.storeEx(PigServer.java:1045) at org.apache.pig.PigServer.store(PigServer.java:1008) at org.apache.pig.PigServer.openIterator(PigServer.java:921) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:762) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:630) at org.apache.pig.Main.main(Main.java:176) Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1119) at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) ... 23 more Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1087) ... 25 more Caused by: java.io.FileNotFoundException: File hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:948) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:927) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1697) at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:665) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:692) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:659) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:682) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:679) at
[jira] [Assigned] (PIG-5263) Using wildcard doesn't work with OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Subhashrao Saley reassigned PIG-5263: Assignee: Satish Subhashrao Saley > Using wildcard doesn't work with OrcStorage > --- > > Key: PIG-5263 > URL: https://issues.apache.org/jira/browse/PIG-5263 > Project: Pig > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley >Priority: Minor > Attachments: PIG-5263-1.patch > > > myinput = LOAD '/user/saley/data/datestamp=20170301*' USING OrcStorage(); > Its throwing an exception {{Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist.}} > Full stack trace > {code} > 2017-03-03 18:50:12,651 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: serious > problem > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:411) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:292) > > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:169) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:182) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:503) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:187) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:286) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1401) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1386) > at org.apache.pig.PigServer.storeEx(PigServer.java:1045) > at org.apache.pig.PigServer.store(PigServer.java:1008) > at org.apache.pig.PigServer.openIterator(PigServer.java:921) > at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:762) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at org.apache.pig.Main.run(Main.java:630) > at org.apache.pig.Main.main(Main.java:176) > Caused by: java.lang.RuntimeException: serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1119) > at > org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) > ... 23 more > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1087) > ... 25 more > Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:948) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:927) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) > at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1697) > at > org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:665) > at
[jira] [Updated] (PIG-5263) Using wildcard doesn't work with OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Subhashrao Saley updated PIG-5263: - Status: Patch Available (was: Open) > Using wildcard doesn't work with OrcStorage > --- > > Key: PIG-5263 > URL: https://issues.apache.org/jira/browse/PIG-5263 > Project: Pig > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley >Priority: Minor > Attachments: PIG-5263-1.patch > > > myinput = LOAD '/user/saley/data/datestamp=20170301*' USING OrcStorage(); > Its throwing an exception {{Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist.}} > Full stack trace > {code} > 2017-03-03 18:50:12,651 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: serious > problem > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:411) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:292) > > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:169) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:182) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:503) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:187) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:286) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1401) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1386) > at org.apache.pig.PigServer.storeEx(PigServer.java:1045) > at org.apache.pig.PigServer.store(PigServer.java:1008) > at org.apache.pig.PigServer.openIterator(PigServer.java:921) > at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:762) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at org.apache.pig.Main.run(Main.java:630) > at org.apache.pig.Main.main(Main.java:176) > Caused by: java.lang.RuntimeException: serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1119) > at > org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) > ... 23 more > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1087) > ... 25 more > Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:948) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:927) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) > at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1697) > at > org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:665) > at
[jira] [Commented] (PIG-5263) Using wildcard doesn't work with OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059749#comment-16059749 ] Satish Subhashrao Saley commented on PIG-5263: -- The approach to solve this issue is to glob paths using {{FileStatus[] org.apache.hadoop.fs.FileSystem.globStatus(Path pathPattern, PathFilter filter) throws IOException}} for input paths. > Using wildcard doesn't work with OrcStorage > --- > > Key: PIG-5263 > URL: https://issues.apache.org/jira/browse/PIG-5263 > Project: Pig > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Priority: Minor > Attachments: PIG-5263-1.patch > > > myinput = LOAD '/user/saley/data/datestamp=20170301*' USING OrcStorage(); > Its throwing an exception {{Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist.}} > Full stack trace > {code} > 2017-03-03 18:50:12,651 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: serious > problem > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:411) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:292) > > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:169) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:182) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:503) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:187) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:286) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1401) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1386) > at org.apache.pig.PigServer.storeEx(PigServer.java:1045) > at org.apache.pig.PigServer.store(PigServer.java:1008) > at org.apache.pig.PigServer.openIterator(PigServer.java:921) > at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:762) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at org.apache.pig.Main.run(Main.java:630) > at org.apache.pig.Main.main(Main.java:176) > Caused by: java.lang.RuntimeException: serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1119) > at > org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) > ... 23 more > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1087) > ... 25 more > Caused by: java.io.FileNotFoundException: File > hdfs://localhost:8020/user/saley/data/datestamp=20170301* does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:948) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:927) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) > at
Re: [ANNOUNCE] Apache Pig 0.17.0 released
Thanks Adam for being the Release Manager and getting this important release out. Pig on Spark is another milestone that will benefit users looking for improved execution times and migrating out of mapreduce . Regards, Rohini On Wed, Jun 21, 2017 at 2:05 AM, Adam Szitawrote: > The Pig team is happy to announce the Pig 0.17.0 release. > > Apache Pig provides a high-level data-flow language and execution framework > for parallel computation on Hadoop clusters. > More details about Pig can be found at http://pig.apache.org/. > > The highlights of this release is the introduction of Spark execution > engine. The details of the release can be found at > http://pig.apache.org/releases.html. >
Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59530/#review178716 --- Ship it! Ship It! - Rohini Palaniswamy On June 22, 2017, 2:38 p.m., Nandor Kollar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59530/ > --- > > (Updated June 22, 2017, 2:38 p.m.) > > > Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita. > > > Repository: pig-git > > > Description > --- > > Upgrade to Spark 2.1 API using shims. > > > Diffs > - > > build.xml bba2b52d9354ab909ad26f969480806f6d91911c > ivy.xml 3f2c94373ba9455bbb6a3c96bfd61fc6cfaab588 > ivy/libraries.properties c2aed45a3244dfd108a255c7308a7dcb0dabd3b5 > > src/org/apache/pig/backend/hadoop/executionengine/spark/FlatMapFunctionAdapter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java > f81341233447203abc4800cc7b22a4f419e10262 > > src/org/apache/pig/backend/hadoop/executionengine/spark/PairFlatMapFunctionAdapter.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/Spark1Shims.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/Spark2Shims.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > 237fd9431a16226234d91059088f91aab346b83c > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShims.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java > 83311dfa5bb25209a5366c2db7e8d483c31d94cd > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java > 382258e7ff9105aa397c5a2888df0c11e9562ec9 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java > b58415e7e18ca4cf1331beef06e9214600a51424 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java > 130c8b9a747b176ce2b649ca6d5260527595fb76 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java > fe1b54c8f128661d7d19c276d3bb2de7874d3086 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java > adf78ecab0da10d3b1a7fdde8af2b42dd899810f > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java > d1c43b1e06adc4c9fe45a83b8110402e3756 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java > e003bbd95763b2d189ff9ec540c89abe52592420 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java > 00d29b44848546ed16dde2baa8c61b36939971b2 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java > c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java > baabfa090323e3bef087e259ce19df2e4c34dd63 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java > 3166fdc31745c013380492e089c83f3e853a3e6e > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java > 3a50d485cfd54b9f3b9c1a982e6c30497a4c85fc > src/org/apache/pig/tools/pigstats/spark/Spark1JobStats.java PRE-CREATION > src/org/apache/pig/tools/pigstats/spark/Spark2JobStats.java PRE-CREATION > src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java > c8cc03194b223d2ee181d73c6b651a6872cac6b6 > src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java > 61ccbcc9fd723f6e2e578a8476230c42d5587dfe > test/org/apache/pig/test/TestPigRunner.java > ec08417f2f71deec514cab5cfb9d2f99520ad641 > > > Diff: https://reviews.apache.org/r/59530/diff/9/ > > > Testing > --- > > > Thanks, > > Nandor Kollar > >
Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59530/ --- (Updated June 22, 2017, 2:38 p.m.) Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita. Changes --- fix serialization problem for SparkShims Repository: pig-git Description --- Upgrade to Spark 2.1 API using shims. Diffs (updated) - build.xml bba2b52d9354ab909ad26f969480806f6d91911c ivy.xml 3f2c94373ba9455bbb6a3c96bfd61fc6cfaab588 ivy/libraries.properties c2aed45a3244dfd108a255c7308a7dcb0dabd3b5 src/org/apache/pig/backend/hadoop/executionengine/spark/FlatMapFunctionAdapter.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java f81341233447203abc4800cc7b22a4f419e10262 src/org/apache/pig/backend/hadoop/executionengine/spark/PairFlatMapFunctionAdapter.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/Spark1Shims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/Spark2Shims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 237fd9431a16226234d91059088f91aab346b83c src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShims.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java 83311dfa5bb25209a5366c2db7e8d483c31d94cd src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java 382258e7ff9105aa397c5a2888df0c11e9562ec9 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java b58415e7e18ca4cf1331beef06e9214600a51424 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java 130c8b9a747b176ce2b649ca6d5260527595fb76 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java fe1b54c8f128661d7d19c276d3bb2de7874d3086 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java adf78ecab0da10d3b1a7fdde8af2b42dd899810f src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java d1c43b1e06adc4c9fe45a83b8110402e3756 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java e003bbd95763b2d189ff9ec540c89abe52592420 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java 00d29b44848546ed16dde2baa8c61b36939971b2 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java baabfa090323e3bef087e259ce19df2e4c34dd63 src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java 3166fdc31745c013380492e089c83f3e853a3e6e src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java 3a50d485cfd54b9f3b9c1a982e6c30497a4c85fc src/org/apache/pig/tools/pigstats/spark/Spark1JobStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/Spark2JobStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java c8cc03194b223d2ee181d73c6b651a6872cac6b6 src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java 61ccbcc9fd723f6e2e578a8476230c42d5587dfe test/org/apache/pig/test/TestPigRunner.java ec08417f2f71deec514cab5cfb9d2f99520ad641 Diff: https://reviews.apache.org/r/59530/diff/9/ Changes: https://reviews.apache.org/r/59530/diff/8-9/ Testing --- Thanks, Nandor Kollar