Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-12 Thread Saad Mufti
Thanks, will do that. Saad On Mon, Mar 12, 2018 at 12:14 PM, Ted Yu wrote: > Saad: > I encourage you to open an HBase JIRA outlining your use case and the > config knobs you added through a patch. > > We can see the details for each config and make recommendation

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-12 Thread Ted Yu
Saad: I encourage you to open an HBase JIRA outlining your use case and the config knobs you added through a patch. We can see the details for each config and make recommendation accordingly. Thanks On Mon, Mar 12, 2018 at 8:43 AM, Saad Mufti wrote: > I have create a

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-12 Thread Saad Mufti
I have create a company specific branch and added 4 new flags to control this behavior, these gave us a huge performance boost when running Spark jobs on snapshots of very large tables in S3. I tried to do everything cleanly but a) not being familiar with the whole test strategies I haven't had

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
The question remain though of why it is even accessing a column family's files that should be excluded based on the Scan. And that column family does NOT specify prefetch on open in its schema. Only the one we want to read specifies prefetch on open, which we want to override if possible for the

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
See below more I found on item 3. Cheers. Saad On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote: > Hi, > > I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is > no Hbase installed on the cluster, only HBase libs linked to my Spark app. > We