Is Accumulo writing RFiles to the encrypted HDFS instance and are those ok? If only the spark job is having issue, maybe it using a different hadoop client lib or different hadoop config when it writes files.
On Fri, Jul 8, 2016 at 5:29 PM, Russ Weeks <[email protected]> wrote: > Thanks, Bill and Josh! rfile-info was the tool I was looking for. > > It definitely seems related to HDFS encryption. I put a gist here showing > the results: > > https://gist.github.com/anonymous/13a8ba2c5f6794dc6207f1ae09b12825 > > The two files contain exactly the same key-value pairs. They differ by 2 > bytes in the footer of the RFile. The file written to the encrypted HDFS > directory is consistently corrupt - I'm not confident yet that it's always > corrupt in the same place because I see several different errors, but in > this case those 2 bytes were wrong. > > -Russ > > On Fri, Jul 8, 2016 at 12:30 PM Josh Elser <[email protected]> wrote: > >> Yeah, I'd lean towards something corrupting the file as well. We >> presently have two BCFile versions: 2.0 and 1.0. Both are presently >> supported by the code so it should not be possible to create a bad RFile >> using our APIs (assuming correctness from the filesystem, anyways) >> >> I'm reminded of HADOOP-11674, but a quick check does show that is fixed >> in your HDP-2.3.4 version (sorry for injecting $vendor here). >> >> Some other thoughts on how you could proceed: >> >> * Can Spark write the file to local fs? Maybe you can rule out HDFS w/ >> encryption as a contributing issue by just writing directly to local >> disk and then upload them to HDFS after the fact (as a test) >> * `accumulo rfile-info` should fail in the same way if the metadata is >> busted as a way to verify things >> * You can use rfile-info on both files in HDFS and local fs (tying into >> the first point) >> * If you can share one of these files that is invalid, we can rip it >> apart and see what's going on. >> >> William Slacum wrote: >> > I wonder if the file isn't being decrypted properly. I don't see why it >> > would write out incompatible file versions. >> > >> > On Fri, Jul 8, 2016 at 3:02 PM, Josh Elser <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Interesting! I have not run into this one before. >> > >> > You could use `accumulo rfile-info`, but I'd guess that would net >> > the same exception you see below. >> > >> > Let me see if I can dig a little into the code and come up with a >> > plausible explanation. >> > >> > >> > Russ Weeks wrote: >> > >> > Hi, folks, >> > >> > Has anybody ever encountered a problem where the RFiles that are >> > generated by AccumuloFileOutputFormat can't be imported using >> > TableOperations.importDirectory? >> > >> > I'm seeing this problem very frequently for small RFiles and >> > occasionally for larger RFiles. The errors shown in the >> > monitor's log UI >> > suggest a corrupt file, to me. For instance, the stack trace >> > below shows >> > a case where the BCFileVersion was incorrect, but sometimes it >> will >> > complain about an invalid length, negative offset, or invalid >> codec. >> > >> > I'm using HDP Accumulo 1.7.0 (1.7.0.2.3.4.12-1) on an encrypted >> HDFS >> > volume, with Kerberos turned on. The RFiles are generated by >> > AccumuloFileOutputFormat from a Spark job. >> > >> > A very small RFile that exhibits this problem is available here: >> > >> http://firebar.newbrightidea.com/downloads/bad_rfiles/I0000waz.rf >> > >> > I'm pretty confident that the keys are being written to the >> RFile in >> > order. Are there any tools I could use to inspect the internal >> > structure >> > of the RFile? >> > >> > Thanks, >> > -Russ >> > >> > Unable to find tablets that overlap file >> > hdfs://[redacted]/accumulo/data/tables/f/b-0000ze9/I0000zeb.rf >> > java.lang.RuntimeException: Incompatible BCFile >> fileBCFileVersion. >> > at >> > >> >> org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:828) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368) >> > at >> > >> >> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137) >> > at >> > >> org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:843) >> > at >> > >> >> org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79) >> > at >> > >> >> org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69) >> > at >> > >> >> org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:644) >> > at >> > >> >> org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:615) >> > at >> > >> org.apache.accumulo.server.client.BulkImporter$1.run(BulkImporter.java:146) >> > at >> > >> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) >> > at >> > >> org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) >> > at >> > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> > at >> > >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> > at >> > >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> > at >> > >> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) >> > at java.lang.Thread.run(Thread.java:745) >> > >> > >> >
