Meanwhile the process crashed. The Log file shows an out of memory exception ================================================================================ Pig Stack Trace --------------- ERROR 1200: Failed to create DataStorage
Failed to parse: Failed to create DataStorage at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1650) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623) at org.apache.pig.PigServer.registerQuery(PigServer.java:577) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:541) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.RuntimeException: Failed to create DataStorage at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53) at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:109) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:191) at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:517) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:886) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 15 more Caused by: java.net.ConnectException: Call From ........ failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1762) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:147) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) ... 28 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 52 more ================================================================================ Pig Stack Trace --------------- ERROR 2998: Unhandled internal error. Java heap space java.lang.OutOfMemoryError: Java heap space at java.net.URLDecoder.decode(URLDecoder.java:169) at org.apache.hadoop.fs.HarFileSystem.decodeString(HarFileSystem.java:257) at org.apache.hadoop.fs.HarFileSystem.decodeFileName(HarFileSystem.java:264) at org.apache.hadoop.fs.HarFileSystem.access$200(HarFileSystem.java:50) at org.apache.hadoop.fs.HarFileSystem$HarMetaData.parseMetaData(HarFileSystem.java:1200) at org.apache.hadoop.fs.HarFileSystem$HarMetaData.access$000(HarFileSystem.java:1098) at org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:166) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53) at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:109) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:191) at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:517) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:886) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1650) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623) at org.apache.pig.PigServer.registerQuery(PigServer.java:577) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) ================================================================================ -----Ursprüngliche Nachricht----- Von: Thon, Ingo [mailto:ingo.t...@siemens.com] Gesendet: Freitag, 6. Februar 2015 15:54 An: user@pig.apache.org Betreff: Performance with har files Hello, I have a pretty huge har (Hadoop Archive) file containing roughly 600k files. I want to convert the files in this archive, but strangely when I do something like: RAW_LOGS = LOAD 'har://hdfs-192.....:8020/user/thon/logs.har/somfile.txt.gz'; Basically nothing happens and the grunt shell is blocking. I was wondering whether this is a (known) performance issue or I do something wrong? Thanks in advance Ingo