Thanks, Bill and Josh! rfile-info was the tool I was looking for. It definitely seems related to HDFS encryption. I put a gist here showing the results:
https://gist.github.com/anonymous/13a8ba2c5f6794dc6207f1ae09b12825 The two files contain exactly the same key-value pairs. They differ by 2 bytes in the footer of the RFile. The file written to the encrypted HDFS directory is consistently corrupt - I'm not confident yet that it's always corrupt in the same place because I see several different errors, but in this case those 2 bytes were wrong. -Russ On Fri, Jul 8, 2016 at 12:30 PM Josh Elser <[email protected]> wrote: > Yeah, I'd lean towards something corrupting the file as well. We > presently have two BCFile versions: 2.0 and 1.0. Both are presently > supported by the code so it should not be possible to create a bad RFile > using our APIs (assuming correctness from the filesystem, anyways) > > I'm reminded of HADOOP-11674, but a quick check does show that is fixed > in your HDP-2.3.4 version (sorry for injecting $vendor here). > > Some other thoughts on how you could proceed: > > * Can Spark write the file to local fs? Maybe you can rule out HDFS w/ > encryption as a contributing issue by just writing directly to local > disk and then upload them to HDFS after the fact (as a test) > * `accumulo rfile-info` should fail in the same way if the metadata is > busted as a way to verify things > * You can use rfile-info on both files in HDFS and local fs (tying into > the first point) > * If you can share one of these files that is invalid, we can rip it > apart and see what's going on. > > William Slacum wrote: > > I wonder if the file isn't being decrypted properly. I don't see why it > > would write out incompatible file versions. > > > > On Fri, Jul 8, 2016 at 3:02 PM, Josh Elser <[email protected] > > <mailto:[email protected]>> wrote: > > > > Interesting! I have not run into this one before. > > > > You could use `accumulo rfile-info`, but I'd guess that would net > > the same exception you see below. > > > > Let me see if I can dig a little into the code and come up with a > > plausible explanation. > > > > > > Russ Weeks wrote: > > > > Hi, folks, > > > > Has anybody ever encountered a problem where the RFiles that are > > generated by AccumuloFileOutputFormat can't be imported using > > TableOperations.importDirectory? > > > > I'm seeing this problem very frequently for small RFiles and > > occasionally for larger RFiles. The errors shown in the > > monitor's log UI > > suggest a corrupt file, to me. For instance, the stack trace > > below shows > > a case where the BCFileVersion was incorrect, but sometimes it > will > > complain about an invalid length, negative offset, or invalid > codec. > > > > I'm using HDP Accumulo 1.7.0 (1.7.0.2.3.4.12-1) on an encrypted > HDFS > > volume, with Kerberos turned on. The RFiles are generated by > > AccumuloFileOutputFormat from a Spark job. > > > > A very small RFile that exhibits this problem is available here: > > > http://firebar.newbrightidea.com/downloads/bad_rfiles/I0000waz.rf > > > > I'm pretty confident that the keys are being written to the > RFile in > > order. Are there any tools I could use to inspect the internal > > structure > > of the RFile? > > > > Thanks, > > -Russ > > > > Unable to find tablets that overlap file > > hdfs://[redacted]/accumulo/data/tables/f/b-0000ze9/I0000zeb.rf > > java.lang.RuntimeException: Incompatible BCFile > fileBCFileVersion. > > at > > > > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:828) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368) > > at > > > > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137) > > at > > > org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:843) > > at > > > > org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79) > > at > > > > org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69) > > at > > > > org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:644) > > at > > > > org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:615) > > at > > > org.apache.accumulo.server.client.BulkImporter$1.run(BulkImporter.java:146) > > at > > > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > > at > > > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at > > > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > > at java.lang.Thread.run(Thread.java:745) > > > > >
