[jira] [Commented] (PARQUET-514) Automate coveralls.io updates in Travis CI
[ https://issues.apache.org/jira/browse/PARQUET-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153679#comment-15153679 ] Wes McKinney commented on PARQUET-514: -- see patch https://github.com/apache/parquet-cpp/pull/57 > Automate coveralls.io updates in Travis CI > -- > > Key: PARQUET-514 > URL: https://issues.apache.org/jira/browse/PARQUET-514 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Priority: Minor > > The repo has been enabled in INFRA-11273, so all that's left is to work on > the Travis CI build matrix and add coveralls to one of the builds (rather > than running it for all of them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-470) Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux
[ https://issues.apache.org/jira/browse/PARQUET-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved PARQUET-470. -- Resolution: Fixed Resolved in PARQUET-468 > Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux > - > > Key: PARQUET-470 > URL: https://issues.apache.org/jira/browse/PARQUET-470 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Thrift 0.9.3 introduces a {{#include }} include which > causes {{tr1/functional}} to be included, causing a compiler conflict with > googletest, which has its own portability macros surrounding its use of > {{std::tr1::tuple}}. I spent a bunch of time twiddling compiler flags to try > to resolve this conflict, but wasn't able to figure it out. > If this is a Thrift bug, we should report it to Thrift. If it's fixable by > compiler flags, then we should figure that out and track the issue here, > otherwise users with the latest version of Thrift will be unable to compile > the parquet-cpp test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-531) Can't read past first page in a column
[ https://issues.apache.org/jira/browse/PARQUET-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153527#comment-15153527 ] Deepak Majeti commented on PARQUET-531: --- I will work with [~asandryh] and try to push them by tomorrow or this weekend. > Can't read past first page in a column > -- > > Key: PARQUET-531 > URL: https://issues.apache.org/jira/browse/PARQUET-531 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Environment: Ubuntu Linux 14.04 (no obvious platform dependence), > Parquet file created by Apache Spark 1.5.0 on the same platform. >Reporter: Spiro Michaylov >Assignee: Deepak Majeti > Attachments: > part-r-00031-e5d9a4ef-d73e-406c-8c2f-9ad1f20ebf8e.gz.parquet > > > Building the code as of 2/14/2015 and adding the obvious three lines of code > to serialized-page.cc to enable the newly added CompressionCodec::GZIP: > {code} > case parquet::CompressionCodec::GZIP: >decompressor_.reset(new GZipCodec()); >break; > {code} > I try to run the parquet_reader example on the column I'm about to attach, > which was created by Apache Spark 1.5.0. It works surprisingly well until it > hits the end of the first page, where it dies with > {quote} > Parquet error: Value was non-null, but has not been buffered > {quote} > I realize you may be reluctant to look at this because (a) the GZip support > is new and (b) I had to modify the code to enable it, but actually things > seem to decompress just fine (congratulations: this is awesome!): looking at > the problem in the debugger and tracing through a bit it seems to me like the > buffering is a bit screwed up in general -- some kind of confusion between > the buffering at the Scanner and Reader levels. I can reproduce the problem > by reading through just a single column too. > It fails after 128 rows, which is suspicious given this line in > column/scanner.h: > {code} > DEFAULT_SCANNER_BATCH_SIZE = 128; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-531) Can't read past first page in a column
[ https://issues.apache.org/jira/browse/PARQUET-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153506#comment-15153506 ] Wes McKinney commented on PARQUET-531: -- Do you have a time estimate for these patches (for my own planning)? > Can't read past first page in a column > -- > > Key: PARQUET-531 > URL: https://issues.apache.org/jira/browse/PARQUET-531 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Environment: Ubuntu Linux 14.04 (no obvious platform dependence), > Parquet file created by Apache Spark 1.5.0 on the same platform. >Reporter: Spiro Michaylov >Assignee: Deepak Majeti > Attachments: > part-r-00031-e5d9a4ef-d73e-406c-8c2f-9ad1f20ebf8e.gz.parquet > > > Building the code as of 2/14/2015 and adding the obvious three lines of code > to serialized-page.cc to enable the newly added CompressionCodec::GZIP: > {code} > case parquet::CompressionCodec::GZIP: >decompressor_.reset(new GZipCodec()); >break; > {code} > I try to run the parquet_reader example on the column I'm about to attach, > which was created by Apache Spark 1.5.0. It works surprisingly well until it > hits the end of the first page, where it dies with > {quote} > Parquet error: Value was non-null, but has not been buffered > {quote} > I realize you may be reluctant to look at this because (a) the GZip support > is new and (b) I had to modify the code to enable it, but actually things > seem to decompress just fine (congratulations: this is awesome!): looking at > the problem in the debugger and tracing through a bit it seems to me like the > buffering is a bit screwed up in general -- some kind of confusion between > the buffering at the Scanner and Reader levels. I can reproduce the problem > by reading through just a single column too. > It fails after 128 rows, which is suspicious given this line in > column/scanner.h: > {code} > DEFAULT_SCANNER_BATCH_SIZE = 128; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-531) Can't read past first page in a column
[ https://issues.apache.org/jira/browse/PARQUET-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153496#comment-15153496 ] Deepak Majeti commented on PARQUET-531: --- This will be resolved by the upcoming patches for PARQUET-526 and PARQUET-532 > Can't read past first page in a column > -- > > Key: PARQUET-531 > URL: https://issues.apache.org/jira/browse/PARQUET-531 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Environment: Ubuntu Linux 14.04 (no obvious platform dependence), > Parquet file created by Apache Spark 1.5.0 on the same platform. >Reporter: Spiro Michaylov >Assignee: Deepak Majeti > Attachments: > part-r-00031-e5d9a4ef-d73e-406c-8c2f-9ad1f20ebf8e.gz.parquet > > > Building the code as of 2/14/2015 and adding the obvious three lines of code > to serialized-page.cc to enable the newly added CompressionCodec::GZIP: > {code} > case parquet::CompressionCodec::GZIP: >decompressor_.reset(new GZipCodec()); >break; > {code} > I try to run the parquet_reader example on the column I'm about to attach, > which was created by Apache Spark 1.5.0. It works surprisingly well until it > hits the end of the first page, where it dies with > {quote} > Parquet error: Value was non-null, but has not been buffered > {quote} > I realize you may be reluctant to look at this because (a) the GZip support > is new and (b) I had to modify the code to enable it, but actually things > seem to decompress just fine (congratulations: this is awesome!): looking at > the problem in the debugger and tracing through a bit it seems to me like the > buffering is a bit screwed up in general -- some kind of confusion between > the buffering at the Scanner and Reader levels. I can reproduce the problem > by reading through just a single column too. > It fails after 128 rows, which is suspicious given this line in > column/scanner.h: > {code} > DEFAULT_SCANNER_BATCH_SIZE = 128; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-471) Use the same environment setup script for Travis CI as local sandbox development
[ https://issues.apache.org/jira/browse/PARQUET-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-471. --- Resolution: Fixed Fix Version/s: cpp-0.1 Issue resolved by pull request 54 [https://github.com/apache/parquet-cpp/pull/54] > Use the same environment setup script for Travis CI as local sandbox > development > > > Key: PARQUET-471 > URL: https://issues.apache.org/jira/browse/PARQUET-471 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: cpp-0.1 > > > Currently the environment setups are slightly different, and so a passing > Travis CI build might have a problem with the sandbox build and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-499) Complete PlainEncoder implementation for all primitive types and test end to end
[ https://issues.apache.org/jira/browse/PARQUET-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-499. --- Resolution: Fixed Fix Version/s: cpp-0.1 resolved by: https://github.com/apache/parquet-cpp/pull/52 > Complete PlainEncoder implementation for all primitive types and test end to > end > > > Key: PARQUET-499 > URL: https://issues.apache.org/jira/browse/PARQUET-499 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Deepak Majeti > Fix For: cpp-0.1 > > > As part of PARQUET-485, I added a partial {{Encoding::PLAIN}} encoder > implementation. This needs to be finished, with a test suite that validates > data round-trips across all primitive types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-516) Add better error handling for reading local files
[ https://issues.apache.org/jira/browse/PARQUET-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153264#comment-15153264 ] Aliaksei Sandryhaila commented on PARQUET-516: -- Initial PR is available: https://github.com/apache/parquet-cpp/pull/56 > Add better error handling for reading local files > - > > Key: PARQUET-516 > URL: https://issues.apache.org/jira/browse/PARQUET-516 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Aliaksei Sandryhaila >Priority: Minor > > The {{LocalFile}} reader class does not handle the various failure modes for > the cstdio system calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-478) Reassembly algorithms for Arrow in-memory columnar memory layout
[ https://issues.apache.org/jira/browse/PARQUET-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-478: - Description: I plan to use parquet-cpp primarily in conjunction with columnar data structures (http://arrow.apache.org). Specifically, this requires in the interpretation of repetition / definition levels: * Computing null bits / bytes for each logical level of nested tree (group, array, primitive leaf) * Computing implied array sizes for each repeated group (according to 1, 2, or 3-level array encoding) The results of this reconstruction will be simply C arrays accompanied by the parquet-cpp logical schema; this way we can make it easy to adapt to different in-memory columnar memory schemes. As far as implementation, it would make sense to proceed first with functional unit tests of the reassembly algorithms using repetition / definition levels declared in the test suite as C++ vectors -- otherwise it's going to be too tedious trying to produce valid Parquet test data files which explore all of the different edge cases. Several other teams (Spark, Drill, Parquet-Java) are currently working on related efforts along these lines, so we can engage when appropriate to collaborate on algorithms and nuances of this approach to avoid unnecessary code churn / bugs. was: I plan to use parquet-cpp primarily in conjunction with columnar data structures. Specifically, this requires in the interpretation of repetition / definition levels: * Computing null bits / bytes for each logical level of nested tree (group, array, primitive leaf) * Computing implied array sizes for each repeated group (according to 1, 2, or 3-level array encoding) The results of this reconstruction will be simply C arrays accompanied by the parquet-cpp logical schema; this way we can make it easy to adapt to different in-memory columnar memory schemes. As far as implementation, it would make sense to proceed first with functional unit tests of the reassembly algorithms using repetition / definition levels declared in the test suite as C++ vectors -- otherwise it's going to be too tedious trying to produce valid Parquet test data files which explore all of the different edge cases. Several other teams (Spark, Drill, Parquet-Java) are currently working on related efforts along these lines, so we can engage when appropriate to collaborate on algorithms and nuances of this approach to avoid unnecessary code churn / bugs. Summary: Reassembly algorithms for Arrow in-memory columnar memory layout (was: Reassembly algorithms for nested in-memory columnar memory layout) > Reassembly algorithms for Arrow in-memory columnar memory layout > > > Key: PARQUET-478 > URL: https://issues.apache.org/jira/browse/PARQUET-478 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > I plan to use parquet-cpp primarily in conjunction with columnar data > structures (http://arrow.apache.org). > Specifically, this requires in the interpretation of repetition / definition > levels: > * Computing null bits / bytes for each logical level of nested tree (group, > array, primitive leaf) > * Computing implied array sizes for each repeated group (according to 1, 2, > or 3-level array encoding) > The results of this reconstruction will be simply C arrays accompanied by the > parquet-cpp logical schema; this way we can make it easy to adapt to > different in-memory columnar memory schemes. > As far as implementation, it would make sense to proceed first with > functional unit tests of the reassembly algorithms using repetition / > definition levels declared in the test suite as C++ vectors -- otherwise it's > going to be too tedious trying to produce valid Parquet test data files which > explore all of the different edge cases. > Several other teams (Spark, Drill, Parquet-Java) are currently working on > related efforts along these lines, so we can engage when appropriate to > collaborate on algorithms and nuances of this approach to avoid unnecessary > code churn / bugs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-537) LocalFileSource leaks resources
[ https://issues.apache.org/jira/browse/PARQUET-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153255#comment-15153255 ] Aliaksei Sandryhaila commented on PARQUET-537: -- Yes, that's what it looks like. It's strange, since {{LocalFileSource}} should be taken care of during the destruction of the {{unique_ptr}} holding it. I'll post a PR shortly. > LocalFileSource leaks resources > --- > > Key: PARQUET-537 > URL: https://issues.apache.org/jira/browse/PARQUET-537 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila > > As a result of modifications introduced in PARQUET-497, LocalFileSource never > gets deleted and the associated memory and file handle are leaked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-470) Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux
[ https://issues.apache.org/jira/browse/PARQUET-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-470: Assignee: Wes McKinney > Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux > - > > Key: PARQUET-470 > URL: https://issues.apache.org/jira/browse/PARQUET-470 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Thrift 0.9.3 introduces a {{#include }} include which > causes {{tr1/functional}} to be included, causing a compiler conflict with > googletest, which has its own portability macros surrounding its use of > {{std::tr1::tuple}}. I spent a bunch of time twiddling compiler flags to try > to resolve this conflict, but wasn't able to figure it out. > If this is a Thrift bug, we should report it to Thrift. If it's fixable by > compiler flags, then we should figure that out and track the issue here, > otherwise users with the latest version of Thrift will be unable to compile > the parquet-cpp test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-470) Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux
[ https://issues.apache.org/jira/browse/PARQUET-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153252#comment-15153252 ] Wes McKinney commented on PARQUET-470: -- this is fixed in https://github.com/apache/parquet-cpp/pull/55 > Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux > - > > Key: PARQUET-470 > URL: https://issues.apache.org/jira/browse/PARQUET-470 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney > > Thrift 0.9.3 introduces a {{#include }} include which > causes {{tr1/functional}} to be included, causing a compiler conflict with > googletest, which has its own portability macros surrounding its use of > {{std::tr1::tuple}}. I spent a bunch of time twiddling compiler flags to try > to resolve this conflict, but wasn't able to figure it out. > If this is a Thrift bug, we should report it to Thrift. If it's fixable by > compiler flags, then we should figure that out and track the issue here, > otherwise users with the latest version of Thrift will be unable to compile > the parquet-cpp test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-537) LocalFileSource leaks resources
[ https://issues.apache.org/jira/browse/PARQUET-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153246#comment-15153246 ] Wes McKinney commented on PARQUET-537: -- Could you clarify how to reproduce this problem? The file's lifetime is currently tied to the {{ParquetFileReader}} -- are you saying that when {{ParquetFileReader}} is deleted that {{LocalFileSource}}'s virtual dtor is not called? > LocalFileSource leaks resources > --- > > Key: PARQUET-537 > URL: https://issues.apache.org/jira/browse/PARQUET-537 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila > > As a result of modifications introduced in PARQUET-497, LocalFileSource never > gets deleted and the associated memory and file handle are leaked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-537) LocalFileSource leaks resources
Aliaksei Sandryhaila created PARQUET-537: Summary: LocalFileSource leaks resources Key: PARQUET-537 URL: https://issues.apache.org/jira/browse/PARQUET-537 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-0.1 Reporter: Aliaksei Sandryhaila As a result of modifications introduced in PARQUET-497, LocalFileSource never gets deleted and the associated memory and file handle are leaked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-471) Use the same environment setup script for Travis CI as local sandbox development
[ https://issues.apache.org/jira/browse/PARQUET-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153049#comment-15153049 ] Wes McKinney commented on PARQUET-471: -- See patch https://github.com/apache/parquet-cpp/pull/54 > Use the same environment setup script for Travis CI as local sandbox > development > > > Key: PARQUET-471 > URL: https://issues.apache.org/jira/browse/PARQUET-471 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney > > Currently the environment setups are slightly different, and so a passing > Travis CI build might have a problem with the sandbox build and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-471) Use the same environment setup script for Travis CI as local sandbox development
[ https://issues.apache.org/jira/browse/PARQUET-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-471: Assignee: Wes McKinney > Use the same environment setup script for Travis CI as local sandbox > development > > > Key: PARQUET-471 > URL: https://issues.apache.org/jira/browse/PARQUET-471 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Currently the environment setups are slightly different, and so a passing > Travis CI build might have a problem with the sandbox build and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-536) Configure Travis CI caching to preserve built thirdparty in between builds
Wes McKinney created PARQUET-536: Summary: Configure Travis CI caching to preserve built thirdparty in between builds Key: PARQUET-536 URL: https://issues.apache.org/jira/browse/PARQUET-536 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Wes McKinney Follow up to PARQUET-471. Will speed up builds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: HashJoin throws ParquetDecodingException with input as ParquetTupleScheme
Santlal, What version of Parquet are you using? I think this was recently fixed by Reuben. rb On Tue, Feb 16, 2016 at 5:16 AM, Santlal J Gupta < santlal.gu...@bitwiseglobal.com> wrote: > Hi, > > I am facing problem while using *HashJoin* with input using > *ParquetTupleScheme*. I have two source taps of which one is using > *TextDelimited* scheme and the other source tap is using > *ParquetTupleScheme. *I am performing a *HashJoin *and writing the data > as Delimited file. The program runs successfully on local mode but when i > tried to run it on cluster, it gives following error : > > parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 > in file hdfs://Hostname:8020/user/username/testData/lookup-file.parquet > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:211) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:144) > at > parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.(DeprecatedParquetInputFormat.java:91) > at > parquet.hadoop.mapred.DeprecatedParquetInputFormat.getRecordReader(DeprecatedParquetInputFormat.java:42) > at > cascading.tap.hadoop.io.MultiRecordReaderIterator.makeReader(MultiRecordReaderIterator.java:123) > at > cascading.tap.hadoop.io.MultiRecordReaderIterator.getNextReader(MultiRecordReaderIterator.java:172) > at > cascading.tap.hadoop.io.MultiRecordReaderIterator.hasNext(MultiRecordReaderIterator.java:133) > at > cascading.tuple.TupleEntrySchemeIterator.(TupleEntrySchemeIterator.java:94) > at > cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.(HadoopTupleEntrySchemeIterator.java:49) > at > cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.(HadoopTupleEntrySchemeIterator.java:44) > at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:439) > at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:108) > at > cascading.flow.stream.element.SourceStage.map(SourceStage.java:82) > at > cascading.flow.stream.element.SourceStage.run(SourceStage.java:66) > at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:139) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > parquet.hadoop.util.counters.mapred.MapRedCounterAdapter.increment(MapRedCounterAdapter.java:34) > at > parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:75) > at > parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:349) > at > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:114) > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:191) > ... 21 more > > *Below are the UseCase:* > > public static void main(String[] args) throws IOException { > > Configuration conf = new Configuration(); > > String[] otherArgs; > > otherArgs = new GenericOptionsParser(conf, > args).getRemainingArgs(); > > String argsString = ""; > for (String arg : otherArgs) { > argsString = argsString + " " + arg; > } > System.out.println("After processing arguments are:" + argsString); > > Properties properties = new Properties(); > properties.putAll(conf.getValByRegex(".*")); > > String OutputPath = "testData/BasicEx_Output"; > Class types1[] = { String.class, String.class, String.class }; > Fields f1 = new Fields("id1", "city1", "state"); > > Tap source = new Hfs(new TextDelimited(f1, "|", "", types1, > false), "main-txt-file.dat"); > Pipe pipe = new Pipe("ReadWrite"); > > Scheme pScheme = new ParquetTupleScheme(); > Tap source2 = new Hfs(pScheme, "testData/lookup-file.parquet"); > Pipe pipe2 = new Pipe("ReadWrite2"); > > Pipe tokenPipe = new HashJoin(pipe, new Fields("id1"), pipe2, new > Fields("id"), new LeftJoin()); > > Tap sink = new Hfs(new TextDelimited(f1, true, "|"), OutputPath, > SinkMode.REPLACE); > > FlowDef flowDef1 = FlowDef.flowDef().addSource(pipe, > source).addSource(pipe2, source2).addTailSink(tokenPipe, > sink); > new > Hadoop2MR1FlowConnector(properties).connect(flowDef1).complete(); > > } > > > I have attached the input files for the reference . Please help me in > solving this issue. > > > > I have