[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2019-10-02 Thread Jean-Marc Spaggiari (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943237#comment-16943237
 ] 

Jean-Marc Spaggiari commented on HBASE-16213:
-

Is there any follow-up JIRA fot V2 and V3?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: Lijin Bin
>Assignee: Lijin Bin
>Priority: Major
> Fix For: 1.4.0, 2.0.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22600) Document that LoadIncrementalHFiles will be removed in 3.0.0

2019-06-20 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868719#comment-16868719
 ] 

Jean-Marc Spaggiari commented on HBASE-22600:
-

Might be good to add a pointer to the right direction? Users might wonder what 
this is replaced by?

> Document that LoadIncrementalHFiles will be removed in 3.0.0
> 
>
> Key: HBASE-22600
> URL: https://issues.apache.org/jira/browse/HBASE-22600
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.3.0, 2.2.1
>
>
> Here we break the rule, it should be removed in 4.0.0 by default. So we need 
> to document clearly that it will be removed 3.0.0, and also explain the 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22566) Call out default compaction and flush throttling for 2.x in Book

2019-06-12 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862239#comment-16862239
 ] 

Jean-Marc Spaggiari commented on HBASE-22566:
-

Any reason to limit the flush speed by default? That's surprising, no?

> Call out default compaction and flush throttling for 2.x in Book
> 
>
> Key: HBASE-22566
> URL: https://issues.apache.org/jira/browse/HBASE-22566
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
> Attachments: HBASE-22566-updated-content.png, HBASE-22566.001.patch
>
>
> We had it in the release notes, but it didn't make it into the upgrade path 
> chapter of the Book ([https://hbase.apache.org/book.html#upgrade2.0]) that 
> compactions and flushes are now limited in throughput per RegionServer.
> Came as a surprise to me the other day - I'm sure I won't be the last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-9272) A parallel, unordered scanner

2019-05-13 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838521#comment-16838521
 ] 

Jean-Marc Spaggiari commented on HBASE-9272:


[~lhofhansl] stop ignoring me ;)

> A parallel, unordered scanner
> -
>
> Key: HBASE-9272
> URL: https://issues.apache.org/jira/browse/HBASE-9272
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Attachments: 9272-0.94-v2.txt, 9272-0.94-v3.txt, 9272-0.94-v4.txt, 
> 9272-0.94.txt, 9272-trunk-v2.txt, 9272-trunk-v3.txt, 9272-trunk-v3.txt, 
> 9272-trunk-v4.txt, 9272-trunk.txt, ParallelClientScanner.java, 
> ParallelClientScanner.java
>
>
> The contract of ClientScanner is to return rows in sort order. That limits 
> the order in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and 
> queries regions in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, 
> or in cases where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a 
> bit heavy on the synchronization side - it used a BlockingQueue to hand data 
> between the client using the scanner and the threads doing the scanning, it 
> also could potentially starve some scanners long enugh to time out at the 
> server).
> On the plus side, it's only a 130 lines of code. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-7297) Allow load balancer to accommodate different region server configurations

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-7297.

Resolution: Duplicate

Duplicate of HBASE-11780.

> Allow load balancer to accommodate different region server configurations
> -
>
> Key: HBASE-7297
> URL: https://issues.apache.org/jira/browse/HBASE-7297
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Ted Yu
>Priority: Major
>
> Robert Dyer raised the following scenario under the thread of 'Multiple 
> regionservers on a single node':
> {quote}
> I have a very small cluster where all nodes are identical.  However, I was
> just given a very powerful node to add into this cluster which effectively
> doubles the total CPUs, RAM, and HDDs in the cluster.
> As such, when I run a MR job half the jobs go to this single, new node yet
> most of the data is not local due to HBase balancing the regions.
> {quote}
> Load balancer should take region server config (total heap in the above case) 
> into account when allocating regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-22213.
-
Resolution: Later

> Create a Java based BulkLoadPartitioner
> ---
>
> Key: HBASE-22213
> URL: https://issues.apache.org/jira/browse/HBASE-22213
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>
> We have a scala based partitionner, but not all projects are build in Scala. 
> We should provide a Java based version of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815744#comment-16815744
 ] 

Jean-Marc Spaggiari commented on HBASE-22213:
-

3 things:
1) I'm unable to build hbase-connectors.
2) It might be doable to call the Scala BulkLoadPartioner directly from the 
Java code. Constructor is not useful but it should work
3) Below is a work Java version of it, with a useful easy to use constructor.

Closing this Jira for now.


{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE file distributed with this work
 * for additional information regarding copyright ownership. The ASF licenses 
this file to You under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License. You may 
obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations
 * under the License.
 */

package org.spaggiari.othello.spark;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.spark.ByteArrayWrapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.Partitioner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BulkLoadPartitioner extends Partitioner {

  /**
   * 
   */
  private static final long serialVersionUID = 1994698119904772184L;
  private byte[][] startKeys;
  private static final Logger LOG = 
LoggerFactory.getLogger(BulkLoadPartitioner.class);

  public BulkLoadPartitioner(TableName tableName, Configuration configuration) {
try {
  this.startKeys = 
ConnectionFactory.createConnection(HBaseConfiguration.create(HBaseConfiguration.create(configuration))).getRegionLocator(tableName).getStartKeys();
} catch (IOException e) {
  LOG.error(e.toString(), e);
}
  }

  @Override
  public int getPartition(Object key) {
byte[] keyBytes = null;
if (key instanceof ImmutableBytesWritable) {
  keyBytes = ((ImmutableBytesWritable) key).get();
}
if (key instanceof byte[]) {
  keyBytes = (byte[]) key;
}
if (key instanceof ByteArrayWrapper) {
  keyBytes = ((ByteArrayWrapper) key).value();
}
// Only one region return 0
if (startKeys.length == 1) {
  return 0;
}
for (int i = startKeys.length - 1; i >= 0; i--) {
  if (Bytes.compareTo(startKeys[i], keyBytes) <= 0) {
return i;
  }
}
// if above fails to find start key that match we need to return
// something
return 0;

  }

  @Override
  public int numPartitions() {
// when table not exist, startKeys = Byte[0][]
return Math.max(1, startKeys.length);
  }

}

{code}


> Create a Java based BulkLoadPartitioner
> ---
>
> Key: HBASE-22213
> URL: https://issues.apache.org/jira/browse/HBASE-22213
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>
> We have a scala based partitionner, but not all projects are build in Scala. 
> We should provide a Java based version of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815744#comment-16815744
 ] 

Jean-Marc Spaggiari edited comment on HBASE-22213 at 4/11/19 7:53 PM:
--

3 things:
1) I'm unable to build hbase-connectors.
2) It might be doable to call the Scala BulkLoadPartioner directly from the 
Java code. Constructor is not useful but it should work
3) Below is a work Java version of it, with a useful easy to use constructor.

Closing this Jira for now.


{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE file distributed with this work
 * for additional information regarding copyright ownership. The ASF licenses 
this file to You under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License. You may 
obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations
 * under the License.
 */

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.spark.ByteArrayWrapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.Partitioner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BulkLoadPartitioner extends Partitioner {

  /**
   * 
   */
  private static final long serialVersionUID = 1994698119904772184L;
  private byte[][] startKeys;
  private static final Logger LOG = 
LoggerFactory.getLogger(BulkLoadPartitioner.class);

  public BulkLoadPartitioner(TableName tableName, Configuration configuration) {
try {
  this.startKeys = 
ConnectionFactory.createConnection(HBaseConfiguration.create(HBaseConfiguration.create(configuration))).getRegionLocator(tableName).getStartKeys();
} catch (IOException e) {
  LOG.error(e.toString(), e);
}
  }

  @Override
  public int getPartition(Object key) {
byte[] keyBytes = null;
if (key instanceof ImmutableBytesWritable) {
  keyBytes = ((ImmutableBytesWritable) key).get();
}
if (key instanceof byte[]) {
  keyBytes = (byte[]) key;
}
if (key instanceof ByteArrayWrapper) {
  keyBytes = ((ByteArrayWrapper) key).value();
}
// Only one region return 0
if (startKeys.length == 1) {
  return 0;
}
for (int i = startKeys.length - 1; i >= 0; i--) {
  if (Bytes.compareTo(startKeys[i], keyBytes) <= 0) {
return i;
  }
}
// if above fails to find start key that match we need to return
// something
return 0;

  }

  @Override
  public int numPartitions() {
// when table not exist, startKeys = Byte[0][]
return Math.max(1, startKeys.length);
  }

}

{code}



was (Author: jmspaggi):
3 things:
1) I'm unable to build hbase-connectors.
2) It might be doable to call the Scala BulkLoadPartioner directly from the 
Java code. Constructor is not useful but it should work
3) Below is a work Java version of it, with a useful easy to use constructor.

Closing this Jira for now.


{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE file distributed with this work
 * for additional information regarding copyright ownership. The ASF licenses 
this file to You under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License. You may 
obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations
 * under the License.
 */

package org.spaggiari.othello.spark;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.spark.ByteArrayWrapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.Partitioner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BulkLoadPartitioner extends Partitioner {

  /**
   * 
 

[jira] [Created] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-22213:
---

 Summary: Create a Java based BulkLoadPartitioner
 Key: HBASE-22213
 URL: https://issues.apache.org/jira/browse/HBASE-22213
 Project: HBase
  Issue Type: New Feature
Affects Versions: 2.1.4
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari


We have a scala based partitionner, but not all projects are build in Scala. We 
should provide a Java based version of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22209) sdf

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-22209.
-
Resolution: Invalid

> sdf
> ---
>
> Key: HBASE-22209
> URL: https://issues.apache.org/jira/browse/HBASE-22209
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 2.1.4
>Reporter: leonjoe
>Priority: Major
> Fix For: hbase-6055
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-7297) Allow load balancer to accommodate different region server configurations

2019-04-09 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813746#comment-16813746
 ] 

Jean-Marc Spaggiari commented on HBASE-7297:


Yes

> Allow load balancer to accommodate different region server configurations
> -
>
> Key: HBASE-7297
> URL: https://issues.apache.org/jira/browse/HBASE-7297
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Ted Yu
>Priority: Major
>
> Robert Dyer raised the following scenario under the thread of 'Multiple 
> regionservers on a single node':
> {quote}
> I have a very small cluster where all nodes are identical.  However, I was
> just given a very powerful node to add into this cluster which effectively
> doubles the total CPUs, RAM, and HDDs in the cluster.
> As such, when I run a MR job half the jobs go to this single, new node yet
> most of the data is not local due to HBase balancing the regions.
> {quote}
> Load balancer should take region server config (total heap in the above case) 
> into account when allocating regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21768) list_quota_table_sizes/list_quota_snapshots should print human readable values for size

2019-01-24 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751109#comment-16751109
 ] 

Jean-Marc Spaggiari commented on HBASE-21768:
-

It's easier to use in scripts when it's not human readable. Should that not be 
a parameter of the function instead?

> list_quota_table_sizes/list_quota_snapshots should print human readable 
> values for size
> ---
>
> Key: HBASE-21768
> URL: https://issues.apache.org/jira/browse/HBASE-21768
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: xuqinya
>Assignee: xuqinya
>Priority: Minor
> Attachments: HBASE-21768.master.0001.patch
>
>
> Using space quota, list_quota_table_sizes/list_quota_snapshots should print 
> human readable values for size.
> {code:java}
> hbase(main):001:0> list_quota_table_sizes
> TABLE SIZE 
> TestTable 110399 
> t1 5211 
> hbase(main):002:0> list_quota_snapshots
> TABLE USAGE LIMIT IN_VIOLATION POLICY
> t1 5211 1073741824 false None
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-16 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402355#comment-16402355
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Yes, sorry. A bit overwhelmed. Just received the SSDs. I will have to install 
them to test it, and I will have to have a deeper look of Anoop's comment. Most 
probably over the week-end.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399735#comment-16399735
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

[~anoop.hbase] storagePolicy contains the flush related policy. 
this.conf.get(ColumnFamilyDescriptorBuilder.STORAGE_POLICY); containts the CF 
related policy. They can be different. And the one after is indeed if there is 
nothing specific for this CF. So this sections does:
1) Use Flush policy
2) If none, use CF policy (HBASE-14061)
3) If none, use global config

Regarding HConstant, I have put it there becaure there is other flush related 
constants. Do you prefer that somewhere else? What do you suggest?

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Release Note: 
Introducing hbase.hstore.flush.storagepolicy column family parameter.
public static final String FLUSH_STORAGE_POLICY = 
"hbase.hstore.flush.storagepolicy"; 

This parameters allows the user to target specific storage policy for flushes. 

There can be 3 storage policy settings. HBase will use the first configured:
1) Use column family flush policy
2) If none, use column family storage policy
3) If none, use global configured storage policy

The following table creation command will instruct HBase to re-direct all 
memstore flushes into SSD drives:
create 't1', {NAME => 'f1',  CONFIGURATION => {'hba
se.hstore.flush.storagepolicy' => 'ALL_SSD'}} 



  was:
Introducing hbase.hstore.flush.storagepolicy column family parameter.
public static final String FLUSH_STORAGE_POLICY = 
"hbase.hstore.flush.storagepolicy"; 

This parameters allows the user to target specific storage policy for flushes. 

There can be 3 storage policy settings. HBase will use the first configured.

1) 

The following table creation command will instruct HBase to re-direct all 
memstore flushes into SSD drives:
create 't1', {NAME => 'f1',  CONFIGURATION => {'hba
se.hstore.flush.storagepolicy' => 'ALL_SSD'}} 




> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Release Note: 
Introducing hbase.hstore.flush.storagepolicy column family parameter.
public static final String FLUSH_STORAGE_POLICY = 
"hbase.hstore.flush.storagepolicy"; 

This parameters allows the user to target specific storage policy for flushes. 

There can be 3 storage policy settings. HBase will use the first configured.

1) 

The following table creation command will instruct HBase to re-direct all 
memstore flushes into SSD drives:
create 't1', {NAME => 'f1',  CONFIGURATION => {'hba
se.hstore.flush.storagepolicy' => 'ALL_SSD'}} 



  was:
Introducing hbase.hstore.flush.storagepolicy column family parameter.
public static final String FLUSH_STORAGE_POLICY = 
"hbase.hstore.flush.storagepolicy"; 

This parameters allows the user to target specific storage policy for flushes. 

The following table creation command will instruct HBase to re-direct all 
memstore flushes into SSD drives:
create 't1', {NAME => 'f1',  CONFIGURATION => {'hba
se.hstore.flush.storagepolicy' => 'ALL_SSD'}} 




> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398411#comment-16398411
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Hi [~anoop.hbase], indeed, it will first prefer the flush policy, then the CF 
policy (if any).

{code} 
  String policyName = storagePolicy;
  if (null == policyName) {
policyName = 
this.conf.get(ColumnFamilyDescriptorBuilder.STORAGE_POLICY);
  }
{code}

storagePolicy is the flush policy. If there is none, it will look at the CF 
policy.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v6.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Applied Stacks recommendations. New patch attached.

Anywhere we can add that on the book? I have not been able to find an easy 
suitable place for it...

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch, HBASE-20105-v6.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Release Note: 
Introducing hbase.hstore.flush.storagepolicy column family parameter.
public static final String FLUSH_STORAGE_POLICY = 
"hbase.hstore.flush.storagepolicy"; 

This parameters allows the user to target specific storage policy for flushes. 

The following table creation command will instruct HBase to re-direct all 
memstore flushes into SSD drives:
create 't1', {NAME => 'f1',  CONFIGURATION => {'hba
se.hstore.flush.storagepolicy' => 'ALL_SSD'}} 



> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398383#comment-16398383
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Hi [~stack], thanks a lot for the review!

If 
store.getColumnFamilyDescriptor().getConfigurationValue(HConstants.FLUSH_STORAGE_POLICY))
 returns null, it's ok and expected.  If store.getColumnFamilyDescriptor() 
returns null, it will already crash 2 lines above. So I think this is fine. If 
it returns gobble-de-gook, FSUtils.setStoragePolicy will log a WARN that HDFS 
is not able to set this storage policy. So I think we are fine too.

{quote}
Should there be an assign here?
{quote}
Ha! Shame! :( Yes it should. That's very bad :(




> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20195) happybase connection password

2018-03-14 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398354#comment-16398354
 ] 

Jean-Marc Spaggiari commented on HBASE-20195:
-

u...@hbase.apache.org
https://hbase.apache.org/mail-lists.html

> happybase connection password
> -
>
> Key: HBASE-20195
> URL: https://issues.apache.org/jira/browse/HBASE-20195
> Project: HBase
>  Issue Type: Task
>  Components: Client
>Affects Versions: 1.2.6
>Reporter: sanghyunkwon
>Priority: Major
>
> hi i have question about happybase connection.
> I built the hbase table through Happybase.
> I will use HBase using happybase.
> I am wondering if there is another way to have a password when someone 
> connects to the hbase table with my ip and port
> thank you.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Testing a backport on 1.2.0 and works well. Passed the tests locally too. Looks 
done to me.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v5.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394684#comment-16394684
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Did some small changes and tested it locally:
{code:java}
root@hbasetest1:~# hdfs fsck 
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 -files -blocks -locations

Connecting to namenode via 
http://hbasetest1.distparser.com:50070/fsck?ugi=root=1=1=1=%2Fhbase%2Fdata%2Fdefault%2Ft1%2F1c925d870fcf663dd3f48d31bf2b98d8%2Ff1%2Ff22416c954df4e24b499e5fc707cb029
FSCK started by root (auth:SIMPLE) from /192.168.23.51 for path 
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 at Sun Mar 11 18:44:49 EDT 2018
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 4908 bytes, 1 block(s): OK
0. BP-2069742952-192.168.23.51-1431229364576:blk_1074774898_1034473 len=4908 
Live_repl=3 
[DatanodeInfoWithStorage[192.168.23.54:50010,DS-6c810995-115c-42cd-af32-c34f5095e45c,SSD],
 
DatanodeInfoWithStorage[192.168.23.52:50010,DS-d4e1790e-b7d3-492f-bb4f-7fb11b7ceff4,SSD],
 
DatanodeInfoWithStorage[192.168.23.53:50010,DS-04dac874-1eaf-43b5-ac1d-2768572b7a36,SSD]]

Status: HEALTHY
Total size: 4908 B
Total dirs: 0
Total files:1
Total symlinks: 0
Total blocks (validated):   1 (avg. block size 4908 B)
Minimally replicated blocks:1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks:0 (0.0 %)
Mis-replicated blocks:  0 (0.0 %)
Default replication factor: 3
Average block replication:  3.0
Corrupt blocks: 0
Missing replicas:   0 (0.0 %)
Number of data-nodes:   3
Number of racks:1
FSCK ended at Sun Mar 11 18:44:49 EDT 2018 in 0 milliseconds


The filesystem under path 
'/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029'
 is HEALTHY
{code}

Sound like it works. Updated patch coming soon.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-09 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

All tests passed. Fixed JavaDoc and CheckStyle. Will try this in the next 10 
days and report my results...

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-09 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-09 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v4.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: (was: HBASE-20105-v2.patch)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Cleaned formatting, Removed un-required imports. Will test that soon on a real 
cluster. If it works I will add documentation too...

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v2.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v3.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari reassigned HBASE-20105:
---

Assignee: Jean-Marc Spaggiari

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v2.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Cleaner version of it. Might still need some touch-up.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Rebased.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v1.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

First draft attached, to see how Jenkins likes it. Not final.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v0.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-08 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391051#comment-16391051
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Oh! I missed his reply! Nice. Looks like we have an option... I will try to see 
what I can come up with. How do we see this?  HBase level setting? Or table 
level setting?  Will it make sense to have this only for one table and keep 
other tables on spinning? Maybe, if SSDs are too small? But if they are, HDFS 
will anyway put that somewhere else, right? Any recommendation?

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-07 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389624#comment-16389624
 ] 

Jean-Marc Spaggiari commented on HBASE-20045:
-

I agree with [~anoop.hbase]. By default we should NOT cache all the blocks of 
major compactions. If the file is 30GB, it will indeed exhaust most of the BC. 
Not good. The idea is to cache only blocks containing cells younger than a 
given timestamp. So on major compactions, most probably most of the blocks will 
not go in the cache. But when you have a table with a TTL of a day, and very 
fast ingestion, then most of them will stay on the cache. It should be a CF 
level parameter. Exactly like the TTL...

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20101) HBase should provide a way to re-validate locality

2018-03-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389003#comment-16389003
 ] 

Jean-Marc Spaggiari commented on HBASE-20101:
-

I built a small improvement which in some usecases reduces the compaction IOs 
by 80%... I will for sure share that very soon, when cleaned a bit. I don't 
think it will ever make it into HBase. It needs HDFS admin rights to be able to 
move the blocks. And HBase doesn't have those right. I will provide more 
details soon. Tested in on 3 clusters right now. One 8 nodes dev cluster, one 
20 nodes test cluster and one 60 nodes production cluster. Working well. More 
to come. 

> HBase should provide a way to re-validate locality
> --
>
> Key: HBASE-20101
> URL: https://issues.apache.org/jira/browse/HBASE-20101
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jean-Marc Spaggiari
>Assignee: huaxiang sun
>Priority: Major
>
> HDFS blocks can move for many reasons. HDFS balancing, lost of a disk, of a 
> node, etc. However, today, locality seems to be calculated when the files are 
> opened for the first time. Even disabling and re-enabling the regions doesn't 
> trigger a re-calculation of the locality. 
> We should provide a way to let the user ask for this number to be 
> re-calculated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388663#comment-16388663
 ] 

Jean-Marc Spaggiari commented on HBASE-20045:
-

Thanks for sharing [~vrodionov]. Sad to hear for Cassandra OnHeap memory 
impact. Wondering how this applies to HBase OffHeap memory cache. I guess at 
some point we will figure it out.

[~anoop.hbase] I highly doubt I will have any time soon to work on that :(

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20058) improper quoting in presplitting command docs

2018-03-02 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383900#comment-16383900
 ] 

Jean-Marc Spaggiari commented on HBASE-20058:
-

Trivial. +1

> improper quoting in presplitting command docs
> -
>
> Key: HBASE-20058
> URL: https://issues.apache.org/jira/browse/HBASE-20058
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Mike Drob
>Assignee: maoling
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-20058-master-v0.patch
>
>
> http://hbase.apache.org/book.html#tricks.pre-split
> {code}
> hbase>create 't1','f',SPLITS => ['10','20',30']
> {code}
> Missing a quote before the 30./



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20060) Add details of off heap memstore into book.

2018-03-02 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383898#comment-16383898
 ] 

Jean-Marc Spaggiari commented on HBASE-20060:
-

Should we not just by default assign 40% of offheap to memstore and 40% to 
blockcache, like we do for the heap, so if someone doesn't configure anything 
he still gets a correct configuration?

> Add details of off heap memstore into book.
> ---
>
> Key: HBASE-20060
> URL: https://issues.apache.org/jira/browse/HBASE-20060
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19586) Figure how to enable compression by default (fallbacks if native is missing, etc.)

2018-03-02 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383852#comment-16383852
 ] 

Jean-Marc Spaggiari commented on HBASE-19586:
-

No, but we have to ;)

> Figure how to enable compression by default (fallbacks if native is missing, 
> etc.)
> --
>
> Key: HBASE-19586
> URL: https://issues.apache.org/jira/browse/HBASE-19586
> Project: HBase
>  Issue Type: Sub-task
>  Components: defaults
>Reporter: stack
>Priority: Major
>
> See parent issue where the benefits of enabling compression are brought up 
> (again!). Figure how we can make it work out of the box rather than expect 
> user set it up. Parking this issue to look at it before we release 2.0.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20101) HBase should provide a way to re-validate locality

2018-03-02 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383851#comment-16383851
 ] 

Jean-Marc Spaggiari commented on HBASE-20101:
-

awesome! Thanks [~huaxiang]!

> HBase should provide a way to re-validate locality
> --
>
> Key: HBASE-20101
> URL: https://issues.apache.org/jira/browse/HBASE-20101
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jean-Marc Spaggiari
>Assignee: huaxiang sun
>Priority: Major
>
> HDFS blocks can move for many reasons. HDFS balancing, lost of a disk, of a 
> node, etc. However, today, locality seems to be calculated when the files are 
> opened for the first time. Even disabling and re-enabling the regions doesn't 
> trigger a re-calculation of the locality. 
> We should provide a way to let the user ask for this number to be 
> re-calculated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20101) HBase should provide a way to re-validate locality

2018-02-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381294#comment-16381294
 ] 

Jean-Marc Spaggiari commented on HBASE-20101:
-

Kind of! 

There is 2 things.

One Chore that can check, eventually, on a specified frequence.

One API call that can trigger a reload for a specific region/CF...

 

I need this 2nd option because I manually move some blocks to restore locality. 
I don't want to force HBase to re-check everything. Since I know which file the 
blocks are belonging too, I just need to say to HBase than this specific 
Region/CF changed...

 

> HBase should provide a way to re-validate locality
> --
>
> Key: HBASE-20101
> URL: https://issues.apache.org/jira/browse/HBASE-20101
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HDFS blocks can move for many reasons. HDFS balancing, lost of a disk, of a 
> node, etc. However, today, locality seems to be calculated when the files are 
> opened for the first time. Even disabling and re-enabling the regions doesn't 
> trigger a re-calculation of the locality. 
> We should provide a way to let the user ask for this number to be 
> re-calculated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-02-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381240#comment-16381240
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Blocked by HDFS-13209

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-02-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380914#comment-16380914
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Sounds like StoragePolicy can be configured only at the directory level 
(Region/CF), and not for a specific file (Flushed file). It might require new 
feature on HDFS side. Investigating.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20105) Allow flushes to target SSD storage

2018-02-28 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20105:
---

 Summary: Allow flushes to target SSD storage
 Key: HBASE-20105
 URL: https://issues.apache.org/jira/browse/HBASE-20105
 Project: HBase
  Issue Type: New Feature
Affects Versions: hbase-2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


On heavy writes usecases, flushes are compactes together pretty quickly. 
Allowing flushes to go on SSD allows faster flush and faster first compactions. 
Subsequent compactions going on regular storage.

 

I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20101) HBase should provide a way to re-validate locality

2018-02-27 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20101:
---

 Summary: HBase should provide a way to re-validate locality
 Key: HBASE-20101
 URL: https://issues.apache.org/jira/browse/HBASE-20101
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Marc Spaggiari


HDFS blocks can move for many reasons. HDFS balancing, lost of a disk, of a 
node, etc. However, today, locality seems to be calculated when the files are 
opened for the first time. Even disabling and re-enabling the regions doesn't 
trigger a re-calculation of the locality. 

We should provide a way to let the user ask for this number to be re-calculated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-02-27 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379420#comment-16379420
 ] 

Jean-Marc Spaggiari commented on HBASE-20045:
-

Ok. Just ran a small test.

PE sequentialWrites 1 rows. Flush ->`1500 bloks in offheap

PE sequentialWrites 1 rows. Flush ->` +1500 bloks in offheap

PE sequentialWrites 1 rows. Flush ->`+1500 bloks in offheap

At that point I have about 4700 blocks in offheap.

major_compact TestTable -> 0 blocks in memory. So on compactions, block are not 
loaded back into memory. Which reduces a lot the benefit for caching on 
writes/flushs.

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-02-27 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379372#comment-16379372
 ] 

Jean-Marc Spaggiari commented on HBASE-18451:
-

Ping. Can we get this re-based and commited somehow?

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: nihed mbarek
>Priority: Major
> Attachments: HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> 

[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-02-27 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379019#comment-16379019
 ] 

Jean-Marc Spaggiari commented on HBASE-20045:
-

In our case it does help. Also, my understand was that offheap was not impacted 
by the GC.

I will run a small test to figure if compaction is adding, or not, those blocks 
into the cache...

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-02-23 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374887#comment-16374887
 ] 

Jean-Marc Spaggiari commented on HBASE-20045:
-

Interesting. My tought was that compactions are not loading the blocks back 
into memory. Only the flush. Which will make sense, because if you compact a 
20GB file, you will just blow-up the cache, no? I will give that a try.

 

For what I had a mind was, a table parameter where for a given CF we say 
"KEEP_IN_MEMORY => 604800" (Where 604800 represents 7 days in seconds). Which 
mean, every block we write on disk when doing compactions, that contains a cell 
which is less than 7 days old, has to be stored into memory too.

Yes it will store some "older" values. But look at the compaction life.

 

We flush F1. Then F2, then F3. There is already a huge F0 10GB file.  We run a 
minor compaction. F1+F2+F3 become F4, but nothing is put back into cache. F0 is 
left over, because too big. I would have liked all of F4 to go into memory... 
Then I continue. F4, F5, F6. I compact that into F7, keeping F0 and F4. Here 
again, I would have liked all of F7 to go into memory. When I will compact F0, 
F4 and F7, then yes, I might have some blocks with some older data, but 
depending on the key design, it might not be the case. If your key is 
sensorid+ts, then F0 holds old data, and when merging with F4 and F7, most 
probably old data and most recent data will be in different blocks, so only 
what is recent goes back in memory. This is to garantie a better 99.999% 
latency. Since that way recent data will ALWAYS be in memory...

 

 

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20045) When running compaction, cache recent blocks.

2018-02-21 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20045:
---

 Summary: When running compaction, cache recent blocks.
 Key: HBASE-20045
 URL: https://issues.apache.org/jira/browse/HBASE-20045
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-beta-1
Reporter: Jean-Marc Spaggiari


HBase already allows to cache blocks on flush. This is very useful for usecases 
where most queries are against recent data. However, as soon as their is a 
compaction, those blocks are evicted. It will be interesting to have a table 
level parameter to say "When compacting, cache blocks less than 24 hours old". 
That way, when running compaction, all blocks where some data are less than 24h 
hold, will be automatically cached. 

 

Very useful for table design where there is TS in the key but a long history 
(Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19658) Fix and reenable TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap

2018-01-30 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345225#comment-16345225
 ] 

Jean-Marc Spaggiari commented on HBASE-19658:
-

Guys, ping by mail if you want/need me to give this patch a try...

> Fix and reenable 
> TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap
> --
>
> Key: HBASE-19658
> URL: https://issues.apache.org/jira/browse/HBASE-19658
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-beta-1
>Reporter: stack
>Assignee: Anastasia Braginsky
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19658-V01.patch, HBASE-19658-V02.patch, 
> HBASE-19658-V03.patch, HBASE-19658-V04.patch, HBASE-19658-V05.patch, 
> HBASE-19658.8.patch, HBASE-19658.0007.patch, HBASE-19658.006.patch, 
> HBASE-19658.05.patch, 
> org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore-output.txt
>
>
> testFlatteningToJumboCellChunkMap was disabled so could commit HBASE-19282 on 
> branch-2. This test is failing reliably. Assigned to [~anastas]. This issue 
> is about fixing the failing test and reenabling it in time for beta-2. Thanks 
> A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving

2018-01-25 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338985#comment-16338985
 ] 

Jean-Marc Spaggiari commented on HBASE-8963:


Folks, any idea if this will one day be done? Just got to drop a 400 000 HFiles 
table, it takes a while ;) Just want to skip this move...

> Add configuration option to skip HFile archiving
> 
>
> Key: HBASE-8963
> URL: https://issues.apache.org/jira/browse/HBASE-8963
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 8963-v10.txt, HBASE-8963.trunk.v1.patch, 
> HBASE-8963.trunk.v2.patch, HBASE-8963.trunk.v3.patch, 
> HBASE-8963.trunk.v4.patch, HBASE-8963.trunk.v5.patch, 
> HBASE-8963.trunk.v6.patch, HBASE-8963.trunk.v7.patch, 
> HBASE-8963.trunk.v8.patch, HBASE-8963.trunk.v9.patch
>
>
> Currently HFileArchiver is always called when a table is dropped or compacted.
> A configuration option (either global or per table) should be provided so 
> that archiving can be skipped when table is deleted or compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15381) Implement a distributed MOB compaction by procedure

2018-01-25 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338934#comment-16338934
 ] 

Jean-Marc Spaggiari commented on HBASE-15381:
-

[~jmhsieh] [~stack] [~jingcheng...@intel.com] guys any chance to see that done 
soon? Just looking at a cluster where 90% of the workload is MOBs and 
compaction designed on the master sounds a bit strange. When you add servers or 
lost servers and need to get back locality, etc.

> Implement a distributed MOB compaction by procedure
> ---
>
> Key: HBASE-15381
> URL: https://issues.apache.org/jira/browse/HBASE-15381
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
>Priority: Major
> Attachments: HBASE-15381-v2.patch, HBASE-15381-v3.patch, 
> HBASE-15381-v4.patch, HBASE-15381-v5.patch, HBASE-15381-v6.patch, 
> HBASE-15381.patch, mob distributed compaction design-v2.pdf, mob distributed 
> compaction design.pdf
>
>
> In MOB, there is a periodical compaction which runs in HMaster (It can be 
> disabled by configuration), some small mob files are merged into bigger ones. 
> Now the compaction only runs in HMaster which is not efficient and might 
> impact the running of HMaster. In this JIRA, a distributed MOB compaction is 
> introduced, it is triggered by HMaster, but all the compaction jobs are 
> distributed to HRegionServers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19658) Fix and reenable TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap

2018-01-13 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-19658:

Attachment: 
org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore-output.txt

> Fix and reenable 
> TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap
> --
>
> Key: HBASE-19658
> URL: https://issues.apache.org/jira/browse/HBASE-19658
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-beta-1
>Reporter: stack
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19658-V01.patch, HBASE-19658-V02.patch, 
> HBASE-19658-V03.patch, 
> org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore-output.txt
>
>
> testFlatteningToJumboCellChunkMap was disabled so could commit HBASE-19282 on 
> branch-2. This test is failing reliably. Assigned to [~anastas]. This issue 
> is about fixing the failing test and reenabling it in time for beta-2. Thanks 
> A.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19658) Fix and reenable TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap

2018-01-13 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325416#comment-16325416
 ] 

Jean-Marc Spaggiari commented on HBASE-19658:
-

[~stack]

{code}
---
Test set: 
org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore
---
Tests run: 76, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.04 s <<< 
FAILURE! - in 
org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore
testFlatteningToJumboCellChunkMap[0](org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore)
  Time elapsed: 0.472 s  <<< FAILURE!
java.lang.AssertionError: expected:<6292152> but was:<6292084>
at 
org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore.testFlatteningToJumboCellChunkMap(TestCompactingToCellFlatMapMemStore.java:815)
{code}

See attached.

> Fix and reenable 
> TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap
> --
>
> Key: HBASE-19658
> URL: https://issues.apache.org/jira/browse/HBASE-19658
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-beta-1
>Reporter: stack
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19658-V01.patch, HBASE-19658-V02.patch, 
> HBASE-19658-V03.patch, 
> org.apache.hadoop.hbase.regionserver.TestCompactingToCellFlatMapMemStore-output.txt
>
>
> testFlatteningToJumboCellChunkMap was disabled so could commit HBASE-19282 on 
> branch-2. This test is failing reliably. Assigned to [~anastas]. This issue 
> is about fixing the failing test and reenabling it in time for beta-2. Thanks 
> A.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19658) Fix and reenable TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap

2018-01-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323975#comment-16323975
 ] 

Jean-Marc Spaggiari commented on HBASE-19658:
-

I didn't kept the logs for that one :(

I will re-run the tests. I got it failed 3 times out of 6 runs, so might be 
able to reproduce. I will provide the details as soon as I have them.

> Fix and reenable 
> TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap
> --
>
> Key: HBASE-19658
> URL: https://issues.apache.org/jira/browse/HBASE-19658
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-beta-1
>Reporter: stack
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19658-V01.patch, HBASE-19658-V02.patch, 
> HBASE-19658-V03.patch
>
>
> testFlatteningToJumboCellChunkMap was disabled so could commit HBASE-19282 on 
> branch-2. This test is failing reliably. Assigned to [~anastas]. This issue 
> is about fixing the failing test and reenabling it in time for beta-2. Thanks 
> A.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19658) Fix and reenable TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322895#comment-16322895
 ] 

Jean-Marc Spaggiari commented on HBASE-19658:
-

This keeps failing for me with 2.0.0...

> Fix and reenable 
> TestCompactingToCellFlatMapMemStore#testFlatteningToJumboCellChunkMap
> --
>
> Key: HBASE-19658
> URL: https://issues.apache.org/jira/browse/HBASE-19658
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-beta-1
>Reporter: stack
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19658-V01.patch, HBASE-19658-V02.patch, 
> HBASE-19658-V03.patch
>
>
> testFlatteningToJumboCellChunkMap was disabled so could commit HBASE-19282 on 
> branch-2. This test is failing reliably. Assigned to [~anastas]. This issue 
> is about fixing the failing test and reenabling it in time for beta-2. Thanks 
> A.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19772) ReadOnlyZKClient improvements

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322833#comment-16322833
 ] 

Jean-Marc Spaggiari commented on HBASE-19772:
-

Are you talking about this?

{code}
18/01/11 14:39:45 INFO zookeeper.ReadOnlyZKClient: 0x4685fae6 no activities for 
6 ms, close active connection. Will reconnect next time when there are new 
requests.
{code}

Will be happy to see them going away ;) Maybe debug?

> ReadOnlyZKClient improvements
> -
>
> Key: HBASE-19772
> URL: https://issues.apache.org/jira/browse/HBASE-19772
> Project: HBase
>  Issue Type: Sub-task
>  Components: Zookeeper
>Reporter: stack
>Assignee: Duo Zhang
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19772.master.001.patch
>
>
> Here is [~Apache9] 's patch from the parent so it applies on top of what was 
> committed in the parent.
> Patch makes it so we we don't close out zk if available Tasks to run and 
> nicer logging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19768) RegionServer startup failing when DN is dead

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19768:
---

 Summary: RegionServer startup failing when DN is dead
 Key: HBASE-19768
 URL: https://issues.apache.org/jira/browse/HBASE-19768
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari


When starting HBase, if the datanode hosted on the same host is dead but not 
yet detected by the namenode, HBase will fail to start

{code}
515691223393/node8.distparser.com%2C16020%2C1515691223393.1515691238778 failed, 
retry = 7
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at 
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
 Source)
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
 syscall:getsockopt(..) failed: Connexion refusée
... 1 more
{code}

and will also get stuck to stop:
{code}
hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
stopping 
hbase^C
hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
stopping 
hbase..
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
{code}

The most interesting is that it seems to fail the same way even if the DN is 
declared dead on HDFS side:

{code}
515692041367/node8.distparser.com%2C16020%2C1515692041367.1515692057716 failed, 
retry = 4
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at 
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
 Source)
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
 syscall:getsockopt(..) failed: Connexion refusée
... 1 more
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19767) Master web UI shows negative values for Remaining KVs

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19767:
---

 Summary: Master web UI shows negative values for Remaining KVs
 Key: HBASE-19767
 URL: https://issues.apache.org/jira/browse/HBASE-19767
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


In the Master Web UI, under the compaction tab, the Remaining KVs sometimes 
shows negative values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19721) Unnecessary stubbings detected in test class: TestReversedScannerCallable

2018-01-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314679#comment-16314679
 ] 

Jean-Marc Spaggiari commented on HBASE-19721:
-

FYI

jmspaggi@node8:~/hbase-test$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)


> Unnecessary stubbings detected in test class: TestReversedScannerCallable
> -
>
> Key: HBASE-19721
> URL: https://issues.apache.org/jira/browse/HBASE-19721
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Jean-Marc Spaggiari
>Assignee: Mike Drob
> Fix For: 2.0.0-beta-2
>
>
> Found by JMS on the mailing list:
> {noformat}
> ---
> Test set: org.apache.hadoop.hbase.client.TestReversedScannerCallable
> ---
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.515 s <<<
> FAILURE! - in org.apache.hadoop.hbase.client.TestReversedScannerCallable
> unnecessary Mockito
> stubbings(org.apache.hadoop.hbase.client.TestReversedScannerCallable)  Time
> elapsed: 0.014 s  <<< ERROR!
> org.mockito.exceptions.misusing.UnnecessaryStubbingException:
> Unnecessary stubbings detected in test class: TestReversedScannerCallable
> Clean & maintainable test code requires zero unnecessary code.
> Following stubbings are unnecessary (click to navigate to relevant line of
> code):
>   1. -> at
> org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:66)
>   2. -> at
> org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:68)
> Please remove unnecessary stubbings. More info: javadoc for
> UnnecessaryStubbingException class.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19582) Tags on append doesn't behave like expected

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-19582:

Description: 
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together. This can be easily figured by 
looking at the returned timestamp from the scan.
{code}
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
{code}


  was:
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.
{code}
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
{code}



> Tags on append doesn't behave like expected
> ---
>
> Key: HBASE-19582
> URL: https://issues.apache.org/jira/browse/HBASE-19582
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>
> When appending a tag an HBase cell, they seems to not really be append be 
> live their own life. In the example below, I put a cell, append the TTL, and 
> we can see between the 2 scans that only the TTL append cell expires. I was 
> expecting those 2 cells to become one and expire together. This can be easily 
> figured by looking at the returned timestamp from the scan.
> {code}
> hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
> 0 row(s) in 0.1350 seconds
> hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
> 0 row(s) in 0.0080 seconds
> hbase(main):084:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879615014, value=value  
>  
> 1 row(s) in 0.0730 seconds
> hbase(main):085:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879599375, value=value  
>  
> 1 row(s) in 0.0500 seconds
> {code}



--
This message was sent 

[jira] [Updated] (HBASE-19582) Tags on append doesn't behave like expected

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-19582:

Description: 
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.
{code}
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
{code}


  was:
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.
{[code}
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
{code}



> Tags on append doesn't behave like expected
> ---
>
> Key: HBASE-19582
> URL: https://issues.apache.org/jira/browse/HBASE-19582
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>
> When appending a tag an HBase cell, they seems to not really be append be 
> live their own life. In the example below, I put a cell, append the TTL, and 
> we can see between the 2 scans that only the TTL append cell expires. I was 
> expecting those 2 cells to become one and expire together.
> {code}
> hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
> 0 row(s) in 0.1350 seconds
> hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
> 0 row(s) in 0.0080 seconds
> hbase(main):084:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879615014, value=value  
>  
> 1 row(s) in 0.0730 seconds
> hbase(main):085:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879599375, value=value  
>  
> 1 row(s) in 0.0500 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19582) Tags on append doesn't behave like expected

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-19582:

Description: 
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.
{[code}
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
{code}


  was:
When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.

[code]
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
[code]



> Tags on append doesn't behave like expected
> ---
>
> Key: HBASE-19582
> URL: https://issues.apache.org/jira/browse/HBASE-19582
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>
> When appending a tag an HBase cell, they seems to not really be append be 
> live their own life. In the example below, I put a cell, append the TTL, and 
> we can see between the 2 scans that only the TTL append cell expires. I was 
> expecting those 2 cells to become one and expire together.
> {[code}
> hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
> 0 row(s) in 0.1350 seconds
> hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
> 0 row(s) in 0.0080 seconds
> hbase(main):084:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879615014, value=value  
>  
> 1 row(s) in 0.0730 seconds
> hbase(main):085:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>
>  r1   column=f1:c1, 
> timestamp=1513879599375, value=value  
>  
> 1 row(s) in 0.0500 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19582) Tags on append doesn't behave like expected

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19582:
---

 Summary: Tags on append doesn't behave like expected
 Key: HBASE-19582
 URL: https://issues.apache.org/jira/browse/HBASE-19582
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.

[code]
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
[code]




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1638#comment-1638
 ] 

Jean-Marc Spaggiari commented on HBASE-18294:
-

[~eshcar] trying to understand here. So the 128MB limit will not be used 
anymore, and we might flush MemStore as big as what the memory can hold?

> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19148) Reevaluate default values of configurations

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299986#comment-16299986
 ] 

Jean-Marc Spaggiari commented on HBASE-19148:
-

For compression, maybe we can loop over the available codecs and apply the 
first we find? It's on the creation only, so it's not that important if we take 
few more ms? That way most of the people will get snappy enabled by default, 
but some will get other settings? Worked with someone yesterday where they had 
way to many regions on HBase. By activating FAST_DIFF (because of the usecase) 
and Snappy their biggest table went from 860GB to 70GB... It really makes a big 
difference.

> Reevaluate default values of configurations
> ---
>
> Key: HBASE-19148
> URL: https://issues.apache.org/jira/browse/HBASE-19148
> Project: HBase
>  Issue Type: Bug
>  Components: defaults
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: 
> 0002-HBASE-19148-Reevaluate-default-values-of-configurati.patch, 
> HBASE-19148.master.001.patch, HBASE-19148.master.002.patch, 
> HBASE-19148.master.003.patch, HBASE-19148.master.003.patch, 
> HBASE-19148.master.004 (1).patch, HBASE-19148.master.004.patch, 
> HBASE-19148.master.005.patch
>
>
> Remove cruft and mythologies. Make descriptions more digestible. Change 
> defaults given experience.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-11817) HTable.batch() loses operations when region is splited

2017-12-06 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-11817.
-
  Resolution: Fixed
Release Note: Was able to reproduce on 0.98, but not anymore on 1.0.2. Most 
probably solved by another JIRA.

> HTable.batch() loses operations when region is splited
> --
>
> Key: HBASE-11817
> URL: https://issues.apache.org/jira/browse/HBASE-11817
> Project: HBase
>  Issue Type: Bug
>  Components: Admin, Client
>Affects Versions: 0.98.4
> Environment: 0.98.4+hadoop 2.4.1, 0.98.4 stand-alone, jdk1.6
>Reporter: t samkawa
>
> Using HTable.batch() often loses increment operation when region split ran.
> Test code snpipet is below; 
> Running this 2 code blocks concurrently, different values were often 
> recoreded although all value should be same 0x.
> {code}
> // --- code 1 ---
> HTable table = new HTable(CONF);
> byte[] rowKey = new byte[1];
> for (int i=0;i<0x;i++){
>  ArrayList operations = new ArrayList();
>  for (byte c1 = (byte)'A'; c1<=(byte)'Z'; c1++) {
>rowKey[0] = c1;
>Increment opIncr = new Increment(rowKey);
>opIncr.addColumn(FAM, HConstants.EMPTY_BYTE_ARRAY, 1);
>operations.add(opIncr);
>  }
>  table.batch(operations, null);
> }
> // -- code2 --
> HBaseAdmin admin = new HBaseAdmin(CONF);
> byte[] rowKey = new byte[1];
> for (byte c1 = (byte)'A'; c1<=(byte)'Z'; c1++) {
>  try { Thread.sleep(2000L); } catch (InterruptedException iex) {}
>  rowKey[0] = c1;
>  admin.split(TABLE_NAME, rowKey);
> }
> /
> {code}
> Using table.increment() instead of table.batch() works fine. But that change 
> gets slower . 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-11817) HTable.batch() loses operations when region is splited

2017-12-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280791#comment-16280791
 ] 

Jean-Marc Spaggiari commented on HBASE-11817:
-

0.98 is old, tried on 1.0.2 and it works well now. Closing.

> HTable.batch() loses operations when region is splited
> --
>
> Key: HBASE-11817
> URL: https://issues.apache.org/jira/browse/HBASE-11817
> Project: HBase
>  Issue Type: Bug
>  Components: Admin, Client
>Affects Versions: 0.98.4
> Environment: 0.98.4+hadoop 2.4.1, 0.98.4 stand-alone, jdk1.6
>Reporter: t samkawa
>
> Using HTable.batch() often loses increment operation when region split ran.
> Test code snpipet is below; 
> Running this 2 code blocks concurrently, different values were often 
> recoreded although all value should be same 0x.
> {code}
> // --- code 1 ---
> HTable table = new HTable(CONF);
> byte[] rowKey = new byte[1];
> for (int i=0;i<0x;i++){
>  ArrayList operations = new ArrayList();
>  for (byte c1 = (byte)'A'; c1<=(byte)'Z'; c1++) {
>rowKey[0] = c1;
>Increment opIncr = new Increment(rowKey);
>opIncr.addColumn(FAM, HConstants.EMPTY_BYTE_ARRAY, 1);
>operations.add(opIncr);
>  }
>  table.batch(operations, null);
> }
> // -- code2 --
> HBaseAdmin admin = new HBaseAdmin(CONF);
> byte[] rowKey = new byte[1];
> for (byte c1 = (byte)'A'; c1<=(byte)'Z'; c1++) {
>  try { Thread.sleep(2000L); } catch (InterruptedException iex) {}
>  rowKey[0] = c1;
>  admin.split(TABLE_NAME, rowKey);
> }
> /
> {code}
> Using table.increment() instead of table.batch() works fine. But that change 
> gets slower . 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2017-12-04 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19418:
---

 Summary: RANGE_OF_DELAY in PeriodicMemstoreFlusher should be 
configurable.
 Key: HBASE-19418
 URL: https://issues.apache.org/jira/browse/HBASE-19418
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


When RSs have a LOT of regions and CFs, flushing everything within 5 minutes is 
not always doable. It might be interesting to be able to increase the 
RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-12-04 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277188#comment-16277188
 ] 

Jean-Marc Spaggiari commented on HBASE-18451:
-

bump

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: nihed mbarek
> Attachments: HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Commented] (HBASE-1935) Scan in parallel

2017-11-21 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260997#comment-16260997
 ] 

Jean-Marc Spaggiari commented on HBASE-1935:


Ping ;) [~lhofhansl] [~stack]

Will be nice to get 9272 or this in at some point, or maybe close one of the 2?

> Scan in parallel
> 
>
> Key: HBASE-1935
> URL: https://issues.apache.org/jira/browse/HBASE-1935
> Project: HBase
>  Issue Type: New Feature
>  Components: Coprocessors
>Reporter: stack
> Attachments: 1935-idea.txt, pscanner-v2.patch, pscanner-v3.patch, 
> pscanner-v4.patch, pscanner.patch
>
>
> A scanner that rather than scan in series, instead scanned multiple regions 
> in parallell would be more involved but could complete much faster 
> partiularly if results are sparse.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-9272) A parallel, unordered scanner

2017-11-21 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260988#comment-16260988
 ] 

Jean-Marc Spaggiari commented on HBASE-9272:


[~lhofhansl] any motivation to finish that? ;)

> A parallel, unordered scanner
> -
>
> Key: HBASE-9272
> URL: https://issues.apache.org/jira/browse/HBASE-9272
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Attachments: 9272-0.94-v2.txt, 9272-0.94-v3.txt, 9272-0.94-v4.txt, 
> 9272-0.94.txt, 9272-trunk-v2.txt, 9272-trunk-v3.txt, 9272-trunk-v3.txt, 
> 9272-trunk-v4.txt, 9272-trunk.txt, ParallelClientScanner.java, 
> ParallelClientScanner.java
>
>
> The contract of ClientScanner is to return rows in sort order. That limits 
> the order in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and 
> queries regions in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, 
> or in cases where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a 
> bit heavy on the synchronization side - it used a BlockingQueue to hand data 
> between the client using the scanner and the threads doing the scanning, it 
> also could potentially starve some scanners long enugh to time out at the 
> server).
> On the plus side, it's only a 130 lines of code. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-8458) Support for batch version of checkAndPut() and checkAndDelete()

2017-11-20 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259470#comment-16259470
 ] 

Jean-Marc Spaggiari commented on HBASE-8458:


Ping?

> Support for batch version of checkAndPut() and checkAndDelete()
> ---
>
> Key: HBASE-8458
> URL: https://issues.apache.org/jira/browse/HBASE-8458
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, regionserver
>Affects Versions: 0.95.0
>Reporter: Hari Mankude
>
> The use case is that the user has multiple threads loading hundreds of keys 
> into a hbase table. Occasionally there are collisions in the keys being 
> uploaded by different threads. So for correctness, it is required to do 
> checkAndPut() instead of a put(). However, doing a checkAndPut() rpc for 
> every key update is non optimal. It would be good to have a batch version of 
> checkAndPut() similar to batch put(). The client can partition the keys on 
> region boundaries.
> The jira is NOT looking for any type of cross-row locking or multi-row 
> atomicity with checkAndPut()
> Batch version of checkAndDelete() is a similar requirement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect

2017-11-07 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242784#comment-16242784
 ] 

Jean-Marc Spaggiari commented on HBASE-15320:
-

PIng ;)

> HBase connector for Kafka Connect
> -
>
> Key: HBASE-15320
> URL: https://issues.apache.org/jira/browse/HBASE-15320
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Mike Wingert
>  Labels: beginner
> Fix For: 3.0.0
>
> Attachments: HBASE-15320.patch
>
>
> Implement an HBase connector with source and sink tasks for the Connect 
> framework (http://docs.confluent.io/2.0.0/connect/index.html) available in 
> Kafka 0.9 and later.
> See also: 
> http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines
> An HBase source 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task)
>  could be implemented as a replication endpoint or WALObserver, publishing 
> cluster wide change streams from the WAL to one or more topics, with 
> configurable mapping and partitioning of table changes to topics.  
> An HBase sink task 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would 
> persist, with optional transformation (JSON? Avro?, map fields to native 
> schema?), Kafka SinkRecords into HBase tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14217) Add Java access to Spark bulk load functionality

2017-09-22 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176454#comment-16176454
 ] 

Jean-Marc Spaggiari commented on HBASE-14217:
-

No update?

> Add Java access to Spark bulk load functionality
> 
>
> Key: HBASE-14217
> URL: https://issues.apache.org/jira/browse/HBASE-14217
> Project: HBase
>  Issue Type: Improvement
>Reporter: Theodore michael Malaska
>Assignee: Theodore michael Malaska
>Priority: Minor
>
> HBASE-14150 added bulk load functionality for Scala users.  This jira will 
> add the Java layer that will make this functionality accessable to Java 
> developers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-13788) Shell commands do not support column qualifiers containing colon (:)

2017-09-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161636#comment-16161636
 ] 

Jean-Marc Spaggiari commented on HBASE-13788:
-

[~stack][~eclark][~busbey] any update guys? [~mmaharana] is wondering if we can 
close on that... Thanks.

> Shell commands do not support column qualifiers containing colon (:)
> 
>
> Key: HBASE-13788
> URL: https://issues.apache.org/jira/browse/HBASE-13788
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 0.98.0, 0.96.0, 1.0.0, 1.1.0
>Reporter: Dave Latham
>Assignee: Manaswini
> Attachments: Hbase-13788-testcases.docx, hbase-13788-v1.patch
>
>
> The shell interprets the colon within the qualifier as a delimiter to a 
> FORMATTER instead of part of the qualifier itself.
> Example from the mailing list:
> Hmph, I may have spoken too soon. I know I tested this at one point and
> it worked, but now I'm getting different results:
> On the new cluster, I created a duplicate test table:
> hbase(main):043:0> create 'content3', {NAME => 'x', BLOOMFILTER =>
> 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
> 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> Then I pull some data from the imported table:
> hbase(main):045:0> scan 'content', {LIMIT=>1,
> STARTROW=>'A:9223370612089311807:twtr:57013379'}
> ROW  COLUMN+CELL
> 
> A:9223370612089311807:twtr:570133798827921408
> column=x:twitter:username, timestamp=1424775595345, value=BERITA &
> INFORMASI!
> Then put it:
> hbase(main):046:0> put
> 'content3','A:9223370612089311807:twtr:570133798827921408',
> 'x:twitter:username', 'BERITA & INFORMASI!'
> But then when I query it, I see that I've lost the column qualifier
> ":username":
> hbase(main):046:0> scan 'content3'
> ROW  COLUMN+CELL
>  A:9223370612089311807:twtr:570133798827921408 column=x:twitter,
>  timestamp=1432745301788, value=BERITA & INFORMASI!
> Even though I'm missing one of the qualifiers, I can at least filter on
> columns in this sample table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-13788) Shell commands do not support column qualifiers containing colon (:)

2017-09-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-13788:

Status: Patch Available  (was: Open)

> Shell commands do not support column qualifiers containing colon (:)
> 
>
> Key: HBASE-13788
> URL: https://issues.apache.org/jira/browse/HBASE-13788
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.1.0, 1.0.0, 0.96.0, 0.98.0
>Reporter: Dave Latham
>Assignee: Manaswini
> Attachments: Hbase-13788-testcases.docx, hbase-13788-v1.patch
>
>
> The shell interprets the colon within the qualifier as a delimiter to a 
> FORMATTER instead of part of the qualifier itself.
> Example from the mailing list:
> Hmph, I may have spoken too soon. I know I tested this at one point and
> it worked, but now I'm getting different results:
> On the new cluster, I created a duplicate test table:
> hbase(main):043:0> create 'content3', {NAME => 'x', BLOOMFILTER =>
> 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
> 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> Then I pull some data from the imported table:
> hbase(main):045:0> scan 'content', {LIMIT=>1,
> STARTROW=>'A:9223370612089311807:twtr:57013379'}
> ROW  COLUMN+CELL
> 
> A:9223370612089311807:twtr:570133798827921408
> column=x:twitter:username, timestamp=1424775595345, value=BERITA &
> INFORMASI!
> Then put it:
> hbase(main):046:0> put
> 'content3','A:9223370612089311807:twtr:570133798827921408',
> 'x:twitter:username', 'BERITA & INFORMASI!'
> But then when I query it, I see that I've lost the column qualifier
> ":username":
> hbase(main):046:0> scan 'content3'
> ROW  COLUMN+CELL
>  A:9223370612089311807:twtr:570133798827921408 column=x:twitter,
>  timestamp=1432745301788, value=BERITA & INFORMASI!
> Even though I'm missing one of the qualifiers, I can at least filter on
> columns in this sample table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104888#comment-16104888
 ] 

Jean-Marc Spaggiari commented on HBASE-18451:
-

Thanks Anoop.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: nihed mbarek
> Attachments: HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104884#comment-16104884
 ] 

Jean-Marc Spaggiari commented on HBASE-18451:
-

You got it! ;)

LGTM.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: nihed mbarek
> Attachments: HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Comment Edited] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-27 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103472#comment-16103472
 ] 

Jean-Marc Spaggiari edited comment on HBASE-18451 at 7/27/17 4:46 PM:
--

@stack Can you please assign this JIRA to Nihed?

Thanks.


was (Author: jmspaggi):
@stack Can you please assign this JIRA to Nihed?

Thanks.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-27 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103472#comment-16103472
 ] 

Jean-Marc Spaggiari commented on HBASE-18451:
-

@stack Can you please assign this JIRA to Nihed?

Thanks.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of 

[jira] [Commented] (HBASE-9272) A parallel, unordered scanner

2017-07-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102414#comment-16102414
 ] 

Jean-Marc Spaggiari commented on HBASE-9272:


Guys,

Is there still any work on that? 

JMS

> A parallel, unordered scanner
> -
>
> Key: HBASE-9272
> URL: https://issues.apache.org/jira/browse/HBASE-9272
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Attachments: 9272-0.94.txt, 9272-0.94-v2.txt, 9272-0.94-v3.txt, 
> 9272-0.94-v4.txt, 9272-trunk.txt, 9272-trunk-v2.txt, 9272-trunk-v3.txt, 
> 9272-trunk-v3.txt, 9272-trunk-v4.txt, ParallelClientScanner.java, 
> ParallelClientScanner.java
>
>
> The contract of ClientScanner is to return rows in sort order. That limits 
> the order in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and 
> queries regions in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, 
> or in cases where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a 
> bit heavy on the synchronization side - it used a BlockingQueue to hand data 
> between the client using the scanner and the threads doing the scanning, it 
> also could potentially starve some scanners long enugh to time out at the 
> server).
> On the plus side, it's only a 130 lines of code. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-25 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-18451:

Description: 
If you run a big job every 4 hours, impacting many tables (they have 150 
regions per server), ad the end all the regions might have some data to be 
flushed, and we want, after one hour, trigger a periodic flush. That's totally 
fine.

Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
"randomDelay" to the delayed flush, that way we spread them away.

RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
which is very good.

However, because we don't check if there is already a request in the queue, 10 
seconds after, we create a new request, with a new randomDelay.

If you generate a randomDelay every 10 seconds, at some point, you will end up 
having a small one, and the flush will be triggered almost immediatly.

As a result, instead of spreading all the flush within the next 5 minutes, you 
end-up getting them all way more quickly. Like within the first minute. Which 
not only feed the queue to to many flush requests, but also defeats the purpose 
of the randomDelay.

{code}
@Override
protected void chore() {
  final StringBuffer whyFlush = new StringBuffer();
  for (Region r : this.server.onlineRegions.values()) {
if (r == null) continue;
if (((HRegion)r).shouldFlush(whyFlush)) {
  FlushRequester requester = server.getFlushRequester();
  if (requester != null) {
long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
MIN_DELAY_TIME;
LOG.info(getName() + " requesting flush of " +
  r.getRegionInfo().getRegionNameAsString() + " because " +
  whyFlush.toString() +
  " after random delay " + randomDelay + "ms");
//Throttle the flushes by putting a delay. If we don't throttle, 
and there
//is a balanced write-load on the regions in a table, we might end 
up
//overwhelming the filesystem with too many flushes at once.
requester.requestDelayedFlush(r, randomDelay, false);
  }
}
  }
}
{code}


{code}
2017-07-24 18:44:33,338 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 270785ms
2017-07-24 18:44:43,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 200143ms
2017-07-24 18:44:53,954 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 191082ms
2017-07-24 18:45:03,528 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 92532ms
2017-07-24 18:45:14,201 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 238780ms
2017-07-24 18:45:24,195 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 35390ms
2017-07-24 18:45:33,362 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 283034ms
2017-07-24 18:45:43,933 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 84328ms
2017-07-24 18:45:53,866 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old 

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-25 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-18451:

Description: 
If you run a big job every 4 hours, impacting many tables (they have 150 
regions per server), ad the end all the regions might have some data to be 
flushed, and we want, after one hour, trigger a periodic flush. That's totally 
fine.

Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
"randomDelay" to the delayed flush, that way we spread them away.

RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
which is very good.

However, because we don't check if there is already a request in the queue, 10 
seconds after, we create a new request, with a new randomDelay.

If you generate a randomDelay every 10 seconds, at some point, you will end up 
having a small one, and the flush will be triggered almost immediatly.

As a result, instead of spreading all the flush within the next 5 minutes, you 
end-up getting them all way more quickly. Like within the first minute. Which 
defeats the purpose of the randomDelay.

{code}
@Override
protected void chore() {
  final StringBuffer whyFlush = new StringBuffer();
  for (Region r : this.server.onlineRegions.values()) {
if (r == null) continue;
if (((HRegion)r).shouldFlush(whyFlush)) {
  FlushRequester requester = server.getFlushRequester();
  if (requester != null) {
long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
MIN_DELAY_TIME;
LOG.info(getName() + " requesting flush of " +
  r.getRegionInfo().getRegionNameAsString() + " because " +
  whyFlush.toString() +
  " after random delay " + randomDelay + "ms");
//Throttle the flushes by putting a delay. If we don't throttle, 
and there
//is a balanced write-load on the regions in a table, we might end 
up
//overwhelming the filesystem with too many flushes at once.
requester.requestDelayedFlush(r, randomDelay, false);
  }
}
  }
}
{code}


{code}
2017-07-24 18:44:33,338 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 270785ms
2017-07-24 18:44:43,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 200143ms
2017-07-24 18:44:53,954 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 191082ms
2017-07-24 18:45:03,528 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 92532ms
2017-07-24 18:45:14,201 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 238780ms
2017-07-24 18:45:24,195 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 35390ms
2017-07-24 18:45:33,362 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 283034ms
2017-07-24 18:45:43,933 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 84328ms
2017-07-24 18:45:53,866 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 72291ms

[jira] [Created] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2017-07-25 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-18451:
---

 Summary: PeriodicMemstoreFlusher should inspect the queue before 
adding a delayed flush request
 Key: HBASE-18451
 URL: https://issues.apache.org/jira/browse/HBASE-18451
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.0.0-alpha-1
Reporter: Jean-Marc Spaggiari


If you run a big job every 4 hours, impacting many tables (they have 150 
regions per server), ad the end all the regions might have some data to be 
flushed, and we want, after one hour, trigger a periodic flush. That's totally 
fine.

Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
"randomDelay" to the delayed flush, that way we spread them away.

RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
which is very good.

However, because we don't check if there is already a request in the queue, 10 
seconds after, we create a new request, with a new randomDelay.

If you generate a randomDelay every 10 seconds, at some point, you will end up 
having a small one, and the flush will be triggered almost immediatly.

As a result, instead of spreading all the flush within the next 5 minutes, you 
end-up getting them all way more quickly. Like within the first minute. Which 
defeats the purpose of the randomDelay.

[code]
@Override
protected void chore() {
  final StringBuffer whyFlush = new StringBuffer();
  for (Region r : this.server.onlineRegions.values()) {
if (r == null) continue;
if (((HRegion)r).shouldFlush(whyFlush)) {
  FlushRequester requester = server.getFlushRequester();
  if (requester != null) {
long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
MIN_DELAY_TIME;
LOG.info(getName() + " requesting flush of " +
  r.getRegionInfo().getRegionNameAsString() + " because " +
  whyFlush.toString() +
  " after random delay " + randomDelay + "ms");
//Throttle the flushes by putting a delay. If we don't throttle, 
and there
//is a balanced write-load on the regions in a table, we might end 
up
//overwhelming the filesystem with too many flushes at once.
requester.requestDelayedFlush(r, randomDelay, false);
  }
}
  }
}
[code]


[code]
2017-07-24 18:44:33,338 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 270785ms
2017-07-24 18:44:43,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 200143ms
2017-07-24 18:44:53,954 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 191082ms
2017-07-24 18:45:03,528 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 92532ms
2017-07-24 18:45:14,201 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 238780ms
2017-07-24 18:45:24,195 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 35390ms
2017-07-24 18:45:33,362 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 283034ms
2017-07-24 18:45:43,933 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
has an old edit so flush to free WALs after random delay 84328ms
2017-07-24 18:45:53,866 INFO 

[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-05-25 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025323#comment-16025323
 ] 

Jean-Marc Spaggiari commented on HBASE-16415:
-

Indeed, it will be very nice! Anyone want to take this?

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18032) Hbase MOB

2017-05-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008262#comment-16008262
 ] 

Jean-Marc Spaggiari commented on HBASE-18032:
-

C'est passé cette fois. On peut abandonner la conversation ici.

> Hbase MOB
> -
>
> Key: HBASE-18032
> URL: https://issues.apache.org/jira/browse/HBASE-18032
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: hbase-11339
> Environment: debian
>Reporter: Fred T.
>
> Hi all,
> I spent a lot of of time to try to use MOB in Hbase (1.2.3). I read every 
> where that the patch 11339 can fix it but I can't find help on how to install 
> this patch. Also, Apache official web site refers to a HBASE.2.0.0 version. I 
> can't find it anywhere. So please help me to be able to MOB
> Thanks for your help



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   8   9   10   >