Re: HBaseCon 2017

2017-02-17 Thread Stack
Thanks for asking Zach.

June 12th, the day before the DataWorks Summit in San Jose. Google are
graciously hosting. It is looking like San Francisco but may be on the
Mountain View campus. CfP should go out this weekend. In general more
details to follow.

St.Ack



On Fri, Feb 17, 2017 at 5:51 PM, Zach York  wrote:

> Hello,
>
> Does anyone know if there will be a HBaseCon conference this year (and the
> relative timeline)?
> I'm trying to plan out different conferences that I want to attend this
> year and this information would help.
>
> Sorry if this is not the correct place to ask, just thought I'd try here!
>
> Thanks,
> Zach
>


HBaseCon 2017

2017-02-17 Thread Zach York
Hello,

Does anyone know if there will be a HBaseCon conference this year (and the
relative timeline)?
I'm trying to plan out different conferences that I want to attend this
year and this information would help.

Sorry if this is not the correct place to ask, just thought I'd try here!

Thanks,
Zach


[jira] [Resolved] (HBASE-17577) Optimize file copying during backup restore operation

2017-02-17 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-17577.
---
Resolution: Duplicate

Duplicate of HBASE-17150

> Optimize file copying during backup restore operation
> -
>
> Key: HBASE-17577
> URL: https://issues.apache.org/jira/browse/HBASE-17577
> Project: HBase
>  Issue Type: Task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: HBASE-7912
>
>
> Currently, we copy files into TMP directory if backup destination is on the 
> same cluster as a source. This is because DistCp deletes src files, by 
> default, doing copies in the same cluster. Should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17150) Verify restore logic (remote/local cluster)

2017-02-17 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-17150.
---
Resolution: Invalid

after HBASE-17660 patch this code is obsolete.

> Verify restore logic (remote/local cluster)
> ---
>
> Key: HBASE-17150
> URL: https://issues.apache.org/jira/browse/HBASE-17150
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: HBASE-7912
>
>
> This part of application is a legacy code from a first version. 
> If backup destination is local cluster, then during restore we copy HFiles 
> into local temp dir first. For remote cluster we do not do this. Seems should 
> be other way around.
> {quote}
> What does this mean?
> 253 2016-11-17 14:13:39,782 DEBUG [main] util.RestoreServerUtil: File 
> hdfs://ve0524.halxg.cloudera.com:8020/user/stack/backup/backup_1479419995738/default/x_1/archive/data/default/x_1
>  on local cluster, back it up before restore
> Is this a full copy of the backup to elsewhere?
> 296 2016-11-17 14:13:47,907 DEBUG [main] util.RestoreServerUtil: Copied to 
> temporary path on local cluster: /user/stack/hbase-staging/restore
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17660) HFileSplitter is not being applied during full table restore

2017-02-17 Thread Vladimir Rodionov (JIRA)
Vladimir Rodionov created HBASE-17660:
-

 Summary: HFileSplitter is not being applied during full table 
restore
 Key: HBASE-17660
 URL: https://issues.apache.org/jira/browse/HBASE-17660
 Project: HBase
  Issue Type: Bug
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Fix For: HBASE-7912


HFileSplitter M/R job splits snapshot files into a given region boundaries 
before moving them using bulk load tool. 

The current code for restore full table backup does not utilize this job. 
Should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)

2017-02-17 Thread York, Zach
Thanks for the updates! I will review when I have time.

On 2/17/17, 4:16 PM, "Umesh Agashe"  wrote:

Hi,

Here is the doc that summarizes our discussion about why we think top-down
approach requiring radical code changes compared to incremental, phased
(bottom-up) approach will help us REDO of FS directory layout.



https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j

Thanks,
Umesh


On Fri, Feb 17, 2017 at 12:57 PM, Stack  wrote:

> Notes from this morning's online meeting @10AM PST (please fill in any
> detail I missed):
>
> IN ATTENDANCE:
> Aman Poonia
> Umesh Agashe, Cloudera
> Stephen Tak, AMZ
> Zach York, AMZ
> Francis Liu, Yahoo!
> Ben Mau, Yahoo!
> Sean Busbey, Cloudera
> Ted Yu, HWX
> Appy (Apekshit Sharma), Cloudera
>
>
> BACKGROUND (St.Ack)
> Y! want to do millions of regions in a Cluster.
> Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
> heavily on HDFS rename doing atomic file and directory swaps); complicates
> being able to run on another FS.
> HBase is bound to a particular physical layout in the FS.
> Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
> with how FS ops are distributed all about the codebase had him propose a
> logical tier with a radically simplified set of requirements of underlying
> FS (block store?); atomic operations would be done by HBase rather than
> farmed out to the FS.
> Matteo not w/ us anymore but he passed on the vision to Umesh
>
> CURRENT STATE OF FS REDO PROJECT (Umesh)
> Currently it is shelved but hope to get back to it 'soon'.
> Spent a few months on FS REDO at end of last year.
> Initial approach was to abstract out three Interfaces (original sketched 
by
> Matteo in [1]).
> Idea was to centralize all FS use in a few well-known locations.
> Then refactor all FS usage.
> Keep all meta data about tables, files, etc., in hbase:meta
> Idea was to slowly migrate over ops, tools etc., to the new Interface.
> This was a bottom-up approach, finding FS references, and moving 
references
> to one place.
> Soon found too many refs all over the code.
> Found that we might not get to desired simple Interface because API had to
> carry around baggage.
> Matteo had tried this approach in [1] and started to argue this stepped
> migration would never arrive.
>
> So restarted over w/ the ideal Simple FS Interface and the implementation
> seemed to flow smoothly.
> An in-memory POC that did simple file ops was posted a while back here 
[2].
>
> Given the two approaches taken above, experience indicates that the
> radical, top-down approach is more likely to succeed.
>
> WHY ARE PEOPLE INTERESTED IN FS REDO?
> Francis and Ben Mau, we want to be able to do 1M regions.
> St.Ack suggested that even small installs need to be able to do more,
> smaller regions.
> Zach is interested because wants to optimize HBase over S3 (rename,
> consistency issues). Liked the idea of metadata up in hbase;meta table and
> avoiding renames, etc.
>
> WHAT SHOULD WE DO?
> We have few resources. It is a big job (We've been talking about it a good
> while now). All docs are stale missing benefit of Umesh recent
> explorations.
> Sean pointed out that before shelving, the idea was to try the PoC
> Interface against a new hbase operation other than simple file reading and
> writing (compactions?). If the PoC Interface survived in the new context,
> we'd then step back and write up a design.
> Seemed like as good a plan as any. Plan should talk about all the ways in
> which ops can go wrong.
> Thereafter, split up the work and bring over subsystems.
> It is looking like hbase3 rather than hbase2 project (though all hoped it
> could make an hbase2).
>
> TODOs
> We agreed to post these notes with pointers to current state of FS REDO
> (See below).
> Umesh and Stack to do up a one-pager on current PoC to be posted on this
> thread or up in the FS REDO issue (HBASE-14090).
> Keep up macro status on this thread.
>
> What else?
> Thanks,
> S
>
> 1. Matteo's original FS REDO suggested plan: https://docs.google.com/
> document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
> 2. Umesh's PoC: https://reviews.apache.org/r/55200/
> 3. HBASE-14090 is the parent issue for this project?
> 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
> Matteo's ideas): https://docs.google.com/document/d/
> 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#
>
>
> On Fri, Feb 17, 2017 at 9:53 AM, Stack  

Re: Successful: HBase Generate Website

2017-02-17 Thread Stack
I pushed the website with below patch.
FYI,
S

On Fri, Feb 17, 2017 at 7:02 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build status: Successful
>
> If successful, the website and docs have been generated. To update the
> live site, follow the instructions below. If failed, skip to the bottom of
> this email.
>
> Use the following commands to download the patch and apply it to a clean
> branch based on origin/asf-site. If you prefer to keep the hbase-site repo
> around permanently, you can skip the clone step.
>
>   git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git
>
>   cd hbase-site
>   wget -O- https://builds.apache.org/job/hbase_generate_website/491/
> artifact/website.patch.zip | funzip > 7763dd6688254d37ad611f5d290db4
> 7c83cf93d3.patch
>   git fetch
>   git checkout -b asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3
> origin/asf-site
>   git am --whitespace=fix 7763dd6688254d37ad611f5d290db47c83cf93d3.patch
>
> At this point, you can preview the changes by opening index.html or any of
> the other HTML pages in your local 
> asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3
> branch.
>
> There are lots of spurious changes, such as timestamps and CSS styles in
> tables, so a generic git diff is not very useful. To see a list of files
> that have been added, deleted, renamed, changed type, or are otherwise
> interesting, use the following command:
>
>   git diff --name-status --diff-filter=ADCRTXUB origin/asf-site
>
> To see only files that had 100 or more lines changed:
>
>   git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}'
>
> When you are satisfied, publish your changes to origin/asf-site using
> these commands:
>
>   git commit --allow-empty -m "Empty commit" # to work around a current
> ASF INFRA bug
>   git push origin asf-site-7763dd6688254d37ad611f5d290db4
> 7c83cf93d3:asf-site
>   git checkout asf-site
>   git branch -D asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3
>
> Changes take a couple of minutes to be propagated. You can verify whether
> they have been propagated by looking at the Last Published date at the
> bottom of http://hbase.apache.org/. It should match the date in the
> index.html on the asf-site branch in Git.
>
> As a courtesy- reply-all to this email to let other committers know you
> pushed the site.
>
>
>
> If failed, see https://builds.apache.org/job/hbase_generate_website/491/
> console


Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)

2017-02-17 Thread Stack
Notes from this morning's online meeting @10AM PST (please fill in any
detail I missed):

IN ATTENDANCE:
Aman Poonia
Umesh Agashe, Cloudera
Stephen Tak, AMZ
Zach York, AMZ
Francis Liu, Yahoo!
Ben Mau, Yahoo!
Sean Busbey, Cloudera
Ted Yu, HWX
Appy (Apekshit Sharma), Cloudera


BACKGROUND (St.Ack)
Y! want to do millions of regions in a Cluster.
Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
heavily on HDFS rename doing atomic file and directory swaps); complicates
being able to run on another FS.
HBase is bound to a particular physical layout in the FS.
Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
with how FS ops are distributed all about the codebase had him propose a
logical tier with a radically simplified set of requirements of underlying
FS (block store?); atomic operations would be done by HBase rather than
farmed out to the FS.
Matteo not w/ us anymore but he passed on the vision to Umesh

CURRENT STATE OF FS REDO PROJECT (Umesh)
Currently it is shelved but hope to get back to it 'soon'.
Spent a few months on FS REDO at end of last year.
Initial approach was to abstract out three Interfaces (original sketched by
Matteo in [1]).
Idea was to centralize all FS use in a few well-known locations.
Then refactor all FS usage.
Keep all meta data about tables, files, etc., in hbase:meta
Idea was to slowly migrate over ops, tools etc., to the new Interface.
This was a bottom-up approach, finding FS references, and moving references
to one place.
Soon found too many refs all over the code.
Found that we might not get to desired simple Interface because API had to
carry around baggage.
Matteo had tried this approach in [1] and started to argue this stepped
migration would never arrive.

So restarted over w/ the ideal Simple FS Interface and the implementation
seemed to flow smoothly.
An in-memory POC that did simple file ops was posted a while back here [2].

Given the two approaches taken above, experience indicates that the
radical, top-down approach is more likely to succeed.

WHY ARE PEOPLE INTERESTED IN FS REDO?
Francis and Ben Mau, we want to be able to do 1M regions.
St.Ack suggested that even small installs need to be able to do more,
smaller regions.
Zach is interested because wants to optimize HBase over S3 (rename,
consistency issues). Liked the idea of metadata up in hbase;meta table and
avoiding renames, etc.

WHAT SHOULD WE DO?
We have few resources. It is a big job (We've been talking about it a good
while now). All docs are stale missing benefit of Umesh recent explorations.
Sean pointed out that before shelving, the idea was to try the PoC
Interface against a new hbase operation other than simple file reading and
writing (compactions?). If the PoC Interface survived in the new context,
we'd then step back and write up a design.
Seemed like as good a plan as any. Plan should talk about all the ways in
which ops can go wrong.
Thereafter, split up the work and bring over subsystems.
It is looking like hbase3 rather than hbase2 project (though all hoped it
could make an hbase2).

TODOs
We agreed to post these notes with pointers to current state of FS REDO
(See below).
Umesh and Stack to do up a one-pager on current PoC to be posted on this
thread or up in the FS REDO issue (HBASE-14090).
Keep up macro status on this thread.

What else?
Thanks,
S

1. Matteo's original FS REDO suggested plan: https://docs.google.com/
document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
2. Umesh's PoC: https://reviews.apache.org/r/55200/
3. HBASE-14090 is the parent issue for this project?
4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
Matteo's ideas): https://docs.google.com/document/d/
10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#


On Fri, Feb 17, 2017 at 9:53 AM, Stack  wrote:

> I put up a hangout. If above link doesn't work, try this
> https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write
> here if can't get in.
>
> St.Ack
>
> On Tue, Feb 14, 2017 at 12:36 PM, Stack  wrote:
>
>> A few folks want to have a quick chat about the state of the proposed FS
>> redo project. The proposal is for 10AM, this Friday morning, PST. All
>> interested parties are invited to join (shout if 10AM PST is untenable and
>> suggest an alternative). Below is a google hangout link that comes alive
>> friday morning [1].
>>
>> One of us will keep notes and post synopsis of discussion back here and
>> in issue after the meeting is done.
>>
>> Suggest those who join try to do some background reading -- see
>> HBASE-14439 -- so we are all around the same level of understanding when
>> the meeting starts. Agenda will be a basic intros, current state of the
>> project (with update on most recent effort), and then expectations. Basic.
>>
>> Thanks,
>> S
>>
>> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQ
>> GdtYWlsLmNvbQ.1oaqlr00ru20s1hqrsq1q05j3k?authuser=0
>>
>
>


Re: looking for reviews on small security patches

2017-02-17 Thread Esteban Gutierrez
done.

--
Cloudera, Inc.


On Fri, Feb 17, 2017 at 5:50 AM, Sean Busbey  wrote:

> Hi folks!
>
> I'm hoping to get reviews on these two issues:
>
> Unvalidated Redirect in HMaster
> https://issues.apache.org/jira/browse/HBASE-15328
>
> table status page should escape values that may contain arbitrary
> characters.
> https://issues.apache.org/jira/browse/HBASE-17561
>
>
> I'd like to use some time over the long weekend to get a new 1.2.5
> release candidate posted, but I'd like to see these two issues closed
> out first.
>


Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)

2017-02-17 Thread Stack
I put up a hangout. If above link doesn't work, try this
https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write here
if can't get in.

St.Ack

On Tue, Feb 14, 2017 at 12:36 PM, Stack  wrote:

> A few folks want to have a quick chat about the state of the proposed FS
> redo project. The proposal is for 10AM, this Friday morning, PST. All
> interested parties are invited to join (shout if 10AM PST is untenable and
> suggest an alternative). Below is a google hangout link that comes alive
> friday morning [1].
>
> One of us will keep notes and post synopsis of discussion back here and in
> issue after the meeting is done.
>
> Suggest those who join try to do some background reading -- see
> HBASE-14439 -- so we are all around the same level of understanding when
> the meeting starts. Agenda will be a basic intros, current state of the
> project (with update on most recent effort), and then expectations. Basic.
>
> Thanks,
> S
>
> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQGdtYWlsLmNvbQ.
> 1oaqlr00ru20s1hqrsq1q05j3k?authuser=0
>


[jira] [Resolved] (HBASE-17659) How to connect to hbase hdfs filesystem

2017-02-17 Thread Dima Spivak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima Spivak resolved HBASE-17659.
-
Resolution: Not A Problem

Please use the user mailing list for questions about getting HBase up and 
running. 

> How to connect to hbase hdfs filesystem
> ---
>
> Key: HBASE-17659
> URL: https://issues.apache.org/jira/browse/HBASE-17659
> Project: HBase
>  Issue Type: Task
>  Components: API
>Affects Versions: 0.94.7
>Reporter: Jenson Luke
>Priority: Blocker
>
> I am not able to connect to HBASE hdfs file system. When I run my Java 
> program through server, it is picking up the local file system instead of 
> hdfs file system.
> while Running through Server, I am passing only
> conf = HBaseConfiguration.create();
> fs = FileSystem.get(this.conf);
> Path tabledir = new Path(fs.makeQualified(new 
> Path(conf.get(HConstants.HBASE_DIR))), tableName);
> It is giving the value of tabledir as 
> "/tmp/hbase-hbase/hbase/tsdb-uid_jentab_bkp1_scen06"
> My actual hdfs path is "hdfs://ibdash-.xx.xx..net:8020/hbase".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Successful: HBase Generate Website

2017-02-17 Thread Apache Jenkins Server
Build status: Successful

If successful, the website and docs have been generated. To update the live 
site, follow the instructions below. If failed, skip to the bottom of this 
email.

Use the following commands to download the patch and apply it to a clean branch 
based on origin/asf-site. If you prefer to keep the hbase-site repo around 
permanently, you can skip the clone step.

  git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git

  cd hbase-site
  wget -O- 
https://builds.apache.org/job/hbase_generate_website/491/artifact/website.patch.zip
 | funzip > 7763dd6688254d37ad611f5d290db47c83cf93d3.patch
  git fetch
  git checkout -b asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3 
origin/asf-site
  git am --whitespace=fix 7763dd6688254d37ad611f5d290db47c83cf93d3.patch

At this point, you can preview the changes by opening index.html or any of the 
other HTML pages in your local 
asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3 branch.

There are lots of spurious changes, such as timestamps and CSS styles in 
tables, so a generic git diff is not very useful. To see a list of files that 
have been added, deleted, renamed, changed type, or are otherwise interesting, 
use the following command:

  git diff --name-status --diff-filter=ADCRTXUB origin/asf-site

To see only files that had 100 or more lines changed:

  git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}'

When you are satisfied, publish your changes to origin/asf-site using these 
commands:

  git commit --allow-empty -m "Empty commit" # to work around a current ASF 
INFRA bug
  git push origin asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3:asf-site
  git checkout asf-site
  git branch -D asf-site-7763dd6688254d37ad611f5d290db47c83cf93d3

Changes take a couple of minutes to be propagated. You can verify whether they 
have been propagated by looking at the Last Published date at the bottom of 
http://hbase.apache.org/. It should match the date in the index.html on the 
asf-site branch in Git.

As a courtesy- reply-all to this email to let other committers know you pushed 
the site.



If failed, see https://builds.apache.org/job/hbase_generate_website/491/console

[jira] [Created] (HBASE-17659) How to connect to hbase hdfs filesystem

2017-02-17 Thread Jenson Luke (JIRA)
Jenson Luke created HBASE-17659:
---

 Summary: How to connect to hbase hdfs filesystem
 Key: HBASE-17659
 URL: https://issues.apache.org/jira/browse/HBASE-17659
 Project: HBase
  Issue Type: Task
  Components: API
Affects Versions: 0.94.7
Reporter: Jenson Luke
Priority: Blocker


I am not able to connect to HBASE hdfs file system. When I run my Java program 
through server, it is picking up the local file system instead of hdfs file 
system.

while Running through Server, I am passing only
conf = HBaseConfiguration.create();
fs = FileSystem.get(this.conf);
Path tabledir = new Path(fs.makeQualified(new 
Path(conf.get(HConstants.HBASE_DIR))), tableName);

It is giving the value of tabledir as 
"/tmp/hbase-hbase/hbase/tsdb-uid_jentab_bkp1_scen06"

My actual hdfs path is "hdfs://ibdash-.xx.xx..net:8020/hbase".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


looking for reviews on small security patches

2017-02-17 Thread Sean Busbey
Hi folks!

I'm hoping to get reviews on these two issues:

Unvalidated Redirect in HMaster
https://issues.apache.org/jira/browse/HBASE-15328

table status page should escape values that may contain arbitrary characters.
https://issues.apache.org/jira/browse/HBASE-17561


I'd like to use some time over the long weekend to get a new 1.2.5
release candidate posted, but I'd like to see these two issues closed
out first.