Re: Scan problem

2018-03-21 Thread Yang Zhang
Thanks all of you,  and your answer help me a lot.

2018-03-19 22:31 GMT+08:00 Saad Mufti :

> Another option if you have enough disk space/off heap memory space is to
> enable bucket cache to cache even more of your data, and set the
> PREFETCH_ON_OPEN => true option on the column families you want always
> cache. That way HBase will prefetch your data into the bucket cache and
> your scan won't have that initial slowdown. Or if you want to do it
> globally for all column families, set the configuration flag
> "hbase.rs.prefetchblocksonopen" to "true". Keep in mind though that if you
> do this, you should either have enough bucket cache space for all your
> data, otherwise there will be a lot of useless eviction activity at HBase
> startup and even later.
>
> Also, where a region is located will also be heavily impacted by which
> region balancer you have chosen and how you have tuned it in terms of how
> often to run and other parameters. A split region will stay initially at
> least on the same region server but your balancer if and when run can move
> it (an indeed any region) elsewhere to satisfy its criteria.
>
> Cheers.
>
> 
> Saad
>
>
> On Mon, Mar 19, 2018 at 1:14 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi
> >
> > First regarding the scans,
> >
> > Generally the data resides in the store files which is in HDFS. So
> probably
> > the first scan that you are doing is reading from HDFS which involves
> disk
> > reads. Once the blocks are read, they are cached in the Block cache of
> > HBase. So your further reads go through that and hence you see further
> > speed up in the scans.
> >
> > >> And another question about region split, I want to know which
> > RegionServer
> > will load the new region afther splited ,
> > Will they be the same One with the old region?
> > Yes . Generally same region server hosts it.
> >
> > In master the code is here,
> > https://github.com/apache/hbase/blob/master/hbase-
> > server/src/main/java/org/apache/hadoop/hbase/master/assignment/
> > SplitTableRegionProcedure.java
> >
> > You may need to understand the entire flow to know how the regions are
> > opened after a split.
> >
> > Regards
> > Ram
> >
> > On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang 
> > wrote:
> >
> > > Hello everyone
> > >
> > > I try to do many Scan use RegionScanner in coprocessor, and
> > ervery
> > > time ,the first Scan cost  about 10 times than the other,
> > > I don't know why this will happen
> > >
> > > OneBucket Scan cost is : 8794 ms Num is : 710
> > > OneBucket Scan cost is : 91 ms Num is : 776
> > > OneBucket Scan cost is : 87 ms Num is : 808
> > > OneBucket Scan cost is : 105 ms Num is : 748
> > > OneBucket Scan cost is : 68 ms Num is : 200
> > >
> > >
> > > And another question about region split, I want to know which
> > RegionServer
> > > will load the new region afther splited ,
> > > Will they be the same One with the old region?  Anyone know where I can
> > > find the code to learn about that?
> > >
> > >
> > > Thanks for your help
> > >
> >
>


Re: Scan problem

2018-03-19 Thread Saad Mufti
Another option if you have enough disk space/off heap memory space is to
enable bucket cache to cache even more of your data, and set the
PREFETCH_ON_OPEN => true option on the column families you want always
cache. That way HBase will prefetch your data into the bucket cache and
your scan won't have that initial slowdown. Or if you want to do it
globally for all column families, set the configuration flag
"hbase.rs.prefetchblocksonopen" to "true". Keep in mind though that if you
do this, you should either have enough bucket cache space for all your
data, otherwise there will be a lot of useless eviction activity at HBase
startup and even later.

Also, where a region is located will also be heavily impacted by which
region balancer you have chosen and how you have tuned it in terms of how
often to run and other parameters. A split region will stay initially at
least on the same region server but your balancer if and when run can move
it (an indeed any region) elsewhere to satisfy its criteria.

Cheers.


Saad


On Mon, Mar 19, 2018 at 1:14 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Hi
>
> First regarding the scans,
>
> Generally the data resides in the store files which is in HDFS. So probably
> the first scan that you are doing is reading from HDFS which involves disk
> reads. Once the blocks are read, they are cached in the Block cache of
> HBase. So your further reads go through that and hence you see further
> speed up in the scans.
>
> >> And another question about region split, I want to know which
> RegionServer
> will load the new region afther splited ,
> Will they be the same One with the old region?
> Yes . Generally same region server hosts it.
>
> In master the code is here,
> https://github.com/apache/hbase/blob/master/hbase-
> server/src/main/java/org/apache/hadoop/hbase/master/assignment/
> SplitTableRegionProcedure.java
>
> You may need to understand the entire flow to know how the regions are
> opened after a split.
>
> Regards
> Ram
>
> On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang 
> wrote:
>
> > Hello everyone
> >
> > I try to do many Scan use RegionScanner in coprocessor, and
> ervery
> > time ,the first Scan cost  about 10 times than the other,
> > I don't know why this will happen
> >
> > OneBucket Scan cost is : 8794 ms Num is : 710
> > OneBucket Scan cost is : 91 ms Num is : 776
> > OneBucket Scan cost is : 87 ms Num is : 808
> > OneBucket Scan cost is : 105 ms Num is : 748
> > OneBucket Scan cost is : 68 ms Num is : 200
> >
> >
> > And another question about region split, I want to know which
> RegionServer
> > will load the new region afther splited ,
> > Will they be the same One with the old region?  Anyone know where I can
> > find the code to learn about that?
> >
> >
> > Thanks for your help
> >
>


Re: Scan problem

2018-03-18 Thread ramkrishna vasudevan
Hi

First regarding the scans,

Generally the data resides in the store files which is in HDFS. So probably
the first scan that you are doing is reading from HDFS which involves disk
reads. Once the blocks are read, they are cached in the Block cache of
HBase. So your further reads go through that and hence you see further
speed up in the scans.

>> And another question about region split, I want to know which
RegionServer
will load the new region afther splited ,
Will they be the same One with the old region?
Yes . Generally same region server hosts it.

In master the code is here,
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java

You may need to understand the entire flow to know how the regions are
opened after a split.

Regards
Ram

On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang  wrote:

> Hello everyone
>
> I try to do many Scan use RegionScanner in coprocessor, and ervery
> time ,the first Scan cost  about 10 times than the other,
> I don't know why this will happen
>
> OneBucket Scan cost is : 8794 ms Num is : 710
> OneBucket Scan cost is : 91 ms Num is : 776
> OneBucket Scan cost is : 87 ms Num is : 808
> OneBucket Scan cost is : 105 ms Num is : 748
> OneBucket Scan cost is : 68 ms Num is : 200
>
>
> And another question about region split, I want to know which RegionServer
> will load the new region afther splited ,
> Will they be the same One with the old region?  Anyone know where I can
> find the code to learn about that?
>
>
> Thanks for your help
>


Scan problem

2018-03-17 Thread Yang Zhang
Hello everyone

I try to do many Scan use RegionScanner in coprocessor, and ervery
time ,the first Scan cost  about 10 times than the other,
I don't know why this will happen

OneBucket Scan cost is : 8794 ms Num is : 710
OneBucket Scan cost is : 91 ms Num is : 776
OneBucket Scan cost is : 87 ms Num is : 808
OneBucket Scan cost is : 105 ms Num is : 748
OneBucket Scan cost is : 68 ms Num is : 200


And another question about region split, I want to know which RegionServer
will load the new region afther splited ,
Will they be the same One with the old region?  Anyone know where I can
find the code to learn about that?


Thanks for your help


M/R scan problem

2011-07-04 Thread Lior Schachter
Hi all,
I'm running a scan using the M/R framework.
My table contains hundreds of millions of rows and I'm scanning using
start/stop key about 50 million rows.

The problem is that some map tasks get stuck and the task manager kills
these maps after 600 seconds. When retrying the task everything works fine
(sometimes).

To verify that the problem is in hbase (and not in the map code) I removed
all the code from my map function, so it looks like this:
public void map(ImmutableBytesWritable key, Result value, Context context)
throws IOException, InterruptedException {
}

Also, when the map got stuck on a region, I tried to scan this region (using
simple scan from a Java main) and it worked fine.

Any ideas ?

Thanks,
Lior


Re: M/R scan problem

2011-07-04 Thread Ted Yu
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce one Split per
region.

What HBase version do you use ?
Do you find any exception in master / region server logs around the moment
of timeout ?

Cheers

On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter li...@infolinks.com wrote:

 Hi all,
 I'm running a scan using the M/R framework.
 My table contains hundreds of millions of rows and I'm scanning using
 start/stop key about 50 million rows.

 The problem is that some map tasks get stuck and the task manager kills
 these maps after 600 seconds. When retrying the task everything works fine
 (sometimes).

 To verify that the problem is in hbase (and not in the map code) I removed
 all the code from my map function, so it looks like this:
 public void map(ImmutableBytesWritable key, Result value, Context context)
 throws IOException, InterruptedException {
 }

 Also, when the map got stuck on a region, I tried to scan this region
 (using
 simple scan from a Java main) and it worked fine.

 Any ideas ?

 Thanks,
 Lior



Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. yes - I configure my job using this line:
TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan,
ScanMapper.class, Text.class, MapWritable.class, job)

which internally uses TableInputFormat.class

2. One split per region ? What do you mean ? How do I do that ?

3. hbase version 0.90.2

4. no exceptions. the logs are very clean.



On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com wrote:

 Do you use TableInputFormat ?
 To scan large number of rows, it would be better to produce one Split per
 region.

 What HBase version do you use ?
 Do you find any exception in master / region server logs around the moment
 of timeout ?

 Cheers

 On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter li...@infolinks.com
 wrote:

  Hi all,
  I'm running a scan using the M/R framework.
  My table contains hundreds of millions of rows and I'm scanning using
  start/stop key about 50 million rows.
 
  The problem is that some map tasks get stuck and the task manager kills
  these maps after 600 seconds. When retrying the task everything works
 fine
  (sometimes).
 
  To verify that the problem is in hbase (and not in the map code) I
 removed
  all the code from my map function, so it looks like this:
  public void map(ImmutableBytesWritable key, Result value, Context
 context)
  throws IOException, InterruptedException {
  }
 
  Also, when the map got stuck on a region, I tried to scan this region
  (using
  simple scan from a Java main) and it worked fine.
 
  Any ideas ?
 
  Thanks,
  Lior
 



Re: M/R scan problem

2011-07-04 Thread Ted Yu
For #2, see TableInputFormatBase.getSplits():
   * Calculates the splits that will serve as input for the map tasks. The
   * number of splits matches the number of regions in a table.


On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter li...@infolinks.com wrote:

 1. yes - I configure my job using this line:
 TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan,
 ScanMapper.class, Text.class, MapWritable.class, job)

 which internally uses TableInputFormat.class

 2. One split per region ? What do you mean ? How do I do that ?

 3. hbase version 0.90.2

 4. no exceptions. the logs are very clean.



 On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com wrote:

  Do you use TableInputFormat ?
  To scan large number of rows, it would be better to produce one Split per
  region.
 
  What HBase version do you use ?
  Do you find any exception in master / region server logs around the
 moment
  of timeout ?
 
  Cheers
 
  On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   Hi all,
   I'm running a scan using the M/R framework.
   My table contains hundreds of millions of rows and I'm scanning using
   start/stop key about 50 million rows.
  
   The problem is that some map tasks get stuck and the task manager kills
   these maps after 600 seconds. When retrying the task everything works
  fine
   (sometimes).
  
   To verify that the problem is in hbase (and not in the map code) I
  removed
   all the code from my map function, so it looks like this:
   public void map(ImmutableBytesWritable key, Result value, Context
  context)
   throws IOException, InterruptedException {
   }
  
   Also, when the map got stuck on a region, I tried to scan this region
   (using
   simple scan from a Java main) and it worked fine.
  
   Any ideas ?
  
   Thanks,
   Lior
  
 



Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. Currently every map gets one region. So I don't understand what
difference will it make using the splits.
2. How should I use the TableInputFormatBase.getSplits() ? Could not find
examples for that.

Thanks,
Lior


On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com wrote:

 For #2, see TableInputFormatBase.getSplits():
   * Calculates the splits that will serve as input for the map tasks. The
   * number of splits matches the number of regions in a table.


 On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter li...@infolinks.com
 wrote:

  1. yes - I configure my job using this line:
  TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan,
  ScanMapper.class, Text.class, MapWritable.class, job)
 
  which internally uses TableInputFormat.class
 
  2. One split per region ? What do you mean ? How do I do that ?
 
  3. hbase version 0.90.2
 
  4. no exceptions. the logs are very clean.
 
 
 
  On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   Do you use TableInputFormat ?
   To scan large number of rows, it would be better to produce one Split
 per
   region.
  
   What HBase version do you use ?
   Do you find any exception in master / region server logs around the
  moment
   of timeout ?
  
   Cheers
  
   On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter li...@infolinks.com
   wrote:
  
Hi all,
I'm running a scan using the M/R framework.
My table contains hundreds of millions of rows and I'm scanning using
start/stop key about 50 million rows.
   
The problem is that some map tasks get stuck and the task manager
 kills
these maps after 600 seconds. When retrying the task everything works
   fine
(sometimes).
   
To verify that the problem is in hbase (and not in the map code) I
   removed
all the code from my map function, so it looks like this:
public void map(ImmutableBytesWritable key, Result value, Context
   context)
throws IOException, InterruptedException {
}
   
Also, when the map got stuck on a region, I tried to scan this region
(using
simple scan from a Java main) and it worked fine.
   
Any ideas ?
   
Thanks,
Lior
   
  
 



Re: M/R scan problem

2011-07-04 Thread Ted Yu
I wasn't clear in my previous email.
It was not answer to why map tasks got stuck.
TableInputFormatBase.getSplits() is being called already.

Can you try getting jstack of one of the map tasks before task tracker kills
it ?

Thanks

On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter li...@infolinks.com wrote:

 1. Currently every map gets one region. So I don't understand what
 difference will it make using the splits.
 2. How should I use the TableInputFormatBase.getSplits() ? Could not find
 examples for that.

 Thanks,
 Lior


 On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com wrote:

  For #2, see TableInputFormatBase.getSplits():
* Calculates the splits that will serve as input for the map tasks. The
* number of splits matches the number of regions in a table.
 
 
  On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   1. yes - I configure my job using this line:
   TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
 scan,
   ScanMapper.class, Text.class, MapWritable.class, job)
  
   which internally uses TableInputFormat.class
  
   2. One split per region ? What do you mean ? How do I do that ?
  
   3. hbase version 0.90.2
  
   4. no exceptions. the logs are very clean.
  
  
  
   On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com wrote:
  
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce one Split
  per
region.
   
What HBase version do you use ?
Do you find any exception in master / region server logs around the
   moment
of timeout ?
   
Cheers
   
On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter li...@infolinks.com
wrote:
   
 Hi all,
 I'm running a scan using the M/R framework.
 My table contains hundreds of millions of rows and I'm scanning
 using
 start/stop key about 50 million rows.

 The problem is that some map tasks get stuck and the task manager
  kills
 these maps after 600 seconds. When retrying the task everything
 works
fine
 (sometimes).

 To verify that the problem is in hbase (and not in the map code) I
removed
 all the code from my map function, so it looks like this:
 public void map(ImmutableBytesWritable key, Result value, Context
context)
 throws IOException, InterruptedException {
 }

 Also, when the map got stuck on a region, I tried to scan this
 region
 (using
 simple scan from a Java main) and it worked fine.

 Any ideas ?

 Thanks,
 Lior

   
  
 



Re: M/R scan problem

2011-07-04 Thread Lior Schachter
I used kill -3, following the thread dump:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode):

IPC Client (47) connection to /127.0.0.1:59759 from hadoop daemon
prio=10 tid=0x2aaab05ca800 nid=0x4eaf in Object.wait()
[0x403c1000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xf9dba860 (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
- locked 0xf9dba860 (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)

SpillThread daemon prio=10 tid=0x2aaab0585000 nid=0x4c99 waiting
on condition [0x404c2000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xf9af0c38 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1169)

main-EventThread daemon prio=10 tid=0x2aaab035d000 nid=0x4c95
waiting on condition [0x41207000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xf9af5f58 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

main-SendThread(hadoop09.infolinks.local:2181) daemon prio=10
tid=0x2aaab035c000 nid=0x4c94 runnable [0x40815000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked 0xf9af61a8 (a sun.nio.ch.Util$2)
- locked 0xf9af61b8 (a java.util.Collections$UnmodifiableSet)
- locked 0xf9af6160 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)

communication thread daemon prio=10 tid=0x4d02
nid=0x4c93 waiting on condition [0x42497000]
   java.lang.Thread.State: RUNNABLE
at java.util.Hashtable.put(Hashtable.java:420)
- locked 0xf9dbaa58 (a java.util.Hashtable)
at org.apache.hadoop.ipc.Client$Connection.addCall(Client.java:225)
- locked 0xf9dba860 (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.access$1600(Client.java:176)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:854)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:548)
at java.lang.Thread.run(Thread.java:662)

Thread for syncLogs daemon prio=10 tid=0x2aaab02e9800 nid=0x4c90
runnable [0x40714000]
   java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at java.io.UnixFileSystem.resolve(UnixFileSystem.java:93)
at java.io.File.init(File.java:312)
at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:72)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:180)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:230)
- locked 0xeea92fc0 (a java.lang.Class for
org.apache.hadoop.mapred.TaskLog)
at org.apache.hadoop.mapred.Child$2.run(Child.java:89)

Low Memory Detector daemon prio=10 tid=0x2aaab0001800 nid=0x4c86
runnable [0x]
   java.lang.Thread.State: RUNNABLE

CompilerThread1 daemon prio=10 tid=0x4cb4e800 nid=0x4c85
waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

CompilerThread0 daemon prio=10 tid=0x4cb4b000 nid=0x4c84
waiting on condition 

Re: M/R scan problem

2011-07-04 Thread Ted Yu
In the future, provide full dump using pastebin.com
Write snippet of log in email.

Can you tell us what the following lines are about ?
HBaseURLsDaysAggregator.java:124
HBaseURLsDaysAggregator.java:131

How many mappers were launched ?

What value is used for hbase.zookeeper.property.maxClientCnxns ?
You may need to increase the value for above setting.

On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com wrote:

 I used kill -3, following the thread dump:

 ...


 On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com wrote:

  I wasn't clear in my previous email.
  It was not answer to why map tasks got stuck.
  TableInputFormatBase.getSplits() is being called already.
 
  Can you try getting jstack of one of the map tasks before task tracker
  kills
  it ?
 
  Thanks
 
  On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   1. Currently every map gets one region. So I don't understand what
   difference will it make using the splits.
   2. How should I use the TableInputFormatBase.getSplits() ? Could not
 find
   examples for that.
  
   Thanks,
   Lior
  
  
   On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com wrote:
  
For #2, see TableInputFormatBase.getSplits():
  * Calculates the splits that will serve as input for the map tasks.
  The
  * number of splits matches the number of regions in a table.
   
   
On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter li...@infolinks.com
wrote:
   
 1. yes - I configure my job using this line:
 TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
   scan,
 ScanMapper.class, Text.class, MapWritable.class, job)

 which internally uses TableInputFormat.class

 2. One split per region ? What do you mean ? How do I do that ?

 3. hbase version 0.90.2

 4. no exceptions. the logs are very clean.



 On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com
 wrote:

  Do you use TableInputFormat ?
  To scan large number of rows, it would be better to produce one
  Split
per
  region.
 
  What HBase version do you use ?
  Do you find any exception in master / region server logs around
 the
 moment
  of timeout ?
 
  Cheers
 
  On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
  li...@infolinks.com
  wrote:
 
   Hi all,
   I'm running a scan using the M/R framework.
   My table contains hundreds of millions of rows and I'm scanning
   using
   start/stop key about 50 million rows.
  
   The problem is that some map tasks get stuck and the task
 manager
kills
   these maps after 600 seconds. When retrying the task everything
   works
  fine
   (sometimes).
  
   To verify that the problem is in hbase (and not in the map
 code)
  I
  removed
   all the code from my map function, so it looks like this:
   public void map(ImmutableBytesWritable key, Result value,
 Context
  context)
   throws IOException, InterruptedException {
   }
  
   Also, when the map got stuck on a region, I tried to scan this
   region
   (using
   simple scan from a Java main) and it worked fine.
  
   Any ideas ?
  
   Thanks,
   Lior
  
 

   
  
 



Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
not important since even when I removed all my map code the tasks got stuck
(but the thread dumps were generated after I revived the code). If you think
its important I'll remove the map code again and re-generate the thread
dumps...

2. 82 maps were launched but only 36 ran simultaneously.

3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?

Thanks,
Lior


On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu yuzhih...@gmail.com wrote:

 In the future, provide full dump using pastebin.com
 Write snippet of log in email.

 Can you tell us what the following lines are about ?
 HBaseURLsDaysAggregator.java:124
 HBaseURLsDaysAggregator.java:131

 How many mappers were launched ?

 What value is used for hbase.zookeeper.property.maxClientCnxns ?
 You may need to increase the value for above setting.

 On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com
 wrote:

  I used kill -3, following the thread dump:
 
  ...
 
 
  On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   I wasn't clear in my previous email.
   It was not answer to why map tasks got stuck.
   TableInputFormatBase.getSplits() is being called already.
  
   Can you try getting jstack of one of the map tasks before task tracker
   kills
   it ?
  
   Thanks
  
   On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter li...@infolinks.com
   wrote:
  
1. Currently every map gets one region. So I don't understand what
difference will it make using the splits.
2. How should I use the TableInputFormatBase.getSplits() ? Could not
  find
examples for that.
   
Thanks,
Lior
   
   
On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com wrote:
   
 For #2, see TableInputFormatBase.getSplits():
   * Calculates the splits that will serve as input for the map
 tasks.
   The
   * number of splits matches the number of regions in a table.


 On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter 
 li...@infolinks.com
 wrote:

  1. yes - I configure my job using this line:
 
 TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
scan,
  ScanMapper.class, Text.class, MapWritable.class, job)
 
  which internally uses TableInputFormat.class
 
  2. One split per region ? What do you mean ? How do I do that ?
 
  3. hbase version 0.90.2
 
  4. no exceptions. the logs are very clean.
 
 
 
  On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com
  wrote:
 
   Do you use TableInputFormat ?
   To scan large number of rows, it would be better to produce one
   Split
 per
   region.
  
   What HBase version do you use ?
   Do you find any exception in master / region server logs around
  the
  moment
   of timeout ?
  
   Cheers
  
   On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
   li...@infolinks.com
   wrote:
  
Hi all,
I'm running a scan using the M/R framework.
My table contains hundreds of millions of rows and I'm
 scanning
using
start/stop key about 50 million rows.
   
The problem is that some map tasks get stuck and the task
  manager
 kills
these maps after 600 seconds. When retrying the task
 everything
works
   fine
(sometimes).
   
To verify that the problem is in hbase (and not in the map
  code)
   I
   removed
all the code from my map function, so it looks like this:
public void map(ImmutableBytesWritable key, Result value,
  Context
   context)
throws IOException, InterruptedException {
}
   
Also, when the map got stuck on a region, I tried to scan
 this
region
(using
simple scan from a Java main) and it worked fine.
   
Any ideas ?
   
Thanks,
Lior
   
  
 

   
  
 



Re: M/R scan problem

2011-07-04 Thread Ted Yu
The reason I asked about HBaseURLsDaysAggregator.java was that I see no
HBase (client) code in call stack.
I have little clue for the problem you experienced.

There may be more than one connection to zookeeper from one map task.
So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns

Cheers

On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter li...@infolinks.com wrote:

 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
 not important since even when I removed all my map code the tasks got stuck
 (but the thread dumps were generated after I revived the code). If you
 think
 its important I'll remove the map code again and re-generate the thread
 dumps...

 2. 82 maps were launched but only 36 ran simultaneously.

 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?

 Thanks,
 Lior


 On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu yuzhih...@gmail.com wrote:

  In the future, provide full dump using pastebin.com
  Write snippet of log in email.
 
  Can you tell us what the following lines are about ?
  HBaseURLsDaysAggregator.java:124
  HBaseURLsDaysAggregator.java:131
 
  How many mappers were launched ?
 
  What value is used for hbase.zookeeper.property.maxClientCnxns ?
  You may need to increase the value for above setting.
 
  On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   I used kill -3, following the thread dump:
  
   ...
  
  
   On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com wrote:
  
I wasn't clear in my previous email.
It was not answer to why map tasks got stuck.
TableInputFormatBase.getSplits() is being called already.
   
Can you try getting jstack of one of the map tasks before task
 tracker
kills
it ?
   
Thanks
   
On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter li...@infolinks.com
wrote:
   
 1. Currently every map gets one region. So I don't understand what
 difference will it make using the splits.
 2. How should I use the TableInputFormatBase.getSplits() ? Could
 not
   find
 examples for that.

 Thanks,
 Lior


 On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com
 wrote:

  For #2, see TableInputFormatBase.getSplits():
* Calculates the splits that will serve as input for the map
  tasks.
The
* number of splits matches the number of regions in a table.
 
 
  On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter 
  li...@infolinks.com
  wrote:
 
   1. yes - I configure my job using this line:
  
  TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
 scan,
   ScanMapper.class, Text.class, MapWritable.class, job)
  
   which internally uses TableInputFormat.class
  
   2. One split per region ? What do you mean ? How do I do that ?
  
   3. hbase version 0.90.2
  
   4. no exceptions. the logs are very clean.
  
  
  
   On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com
   wrote:
  
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce
 one
Split
  per
region.
   
What HBase version do you use ?
Do you find any exception in master / region server logs
 around
   the
   moment
of timeout ?
   
Cheers
   
On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
li...@infolinks.com
wrote:
   
 Hi all,
 I'm running a scan using the M/R framework.
 My table contains hundreds of millions of rows and I'm
  scanning
 using
 start/stop key about 50 million rows.

 The problem is that some map tasks get stuck and the task
   manager
  kills
 these maps after 600 seconds. When retrying the task
  everything
 works
fine
 (sometimes).

 To verify that the problem is in hbase (and not in the map
   code)
I
removed
 all the code from my map function, so it looks like this:
 public void map(ImmutableBytesWritable key, Result value,
   Context
context)
 throws IOException, InterruptedException {
 }

 Also, when the map got stuck on a region, I tried to scan
  this
 region
 (using
 simple scan from a Java main) and it worked fine.

 Any ideas ?

 Thanks,
 Lior

   
  
 

   
  
 



Re: M/R scan problem

2011-07-04 Thread Ted Yu
From master UI, click 'zk dump'
:60010/zk.jsp would show you the active connections. See if the count
reaches 300 when map tasks run.

On Mon, Jul 4, 2011 at 10:12 AM, Ted Yu yuzhih...@gmail.com wrote:

 The reason I asked about HBaseURLsDaysAggregator.java was that I see no
 HBase (client) code in call stack.
 I have little clue for the problem you experienced.

 There may be more than one connection to zookeeper from one map task.
 So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns

 Cheers


 On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter li...@infolinks.comwrote:

 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 :
 are
 not important since even when I removed all my map code the tasks got
 stuck
 (but the thread dumps were generated after I revived the code). If you
 think
 its important I'll remove the map code again and re-generate the thread
 dumps...

 2. 82 maps were launched but only 36 ran simultaneously.

 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?

 Thanks,
 Lior


 On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu yuzhih...@gmail.com wrote:

  In the future, provide full dump using pastebin.com
  Write snippet of log in email.
 
  Can you tell us what the following lines are about ?
  HBaseURLsDaysAggregator.java:124
  HBaseURLsDaysAggregator.java:131
 
  How many mappers were launched ?
 
  What value is used for hbase.zookeeper.property.maxClientCnxns ?
  You may need to increase the value for above setting.
 
  On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   I used kill -3, following the thread dump:
  
   ...
  
  
   On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com wrote:
  
I wasn't clear in my previous email.
It was not answer to why map tasks got stuck.
TableInputFormatBase.getSplits() is being called already.
   
Can you try getting jstack of one of the map tasks before task
 tracker
kills
it ?
   
Thanks
   
On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter li...@infolinks.com
 
wrote:
   
 1. Currently every map gets one region. So I don't understand what
 difference will it make using the splits.
 2. How should I use the TableInputFormatBase.getSplits() ? Could
 not
   find
 examples for that.

 Thanks,
 Lior


 On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com
 wrote:

  For #2, see TableInputFormatBase.getSplits():
* Calculates the splits that will serve as input for the map
  tasks.
The
* number of splits matches the number of regions in a table.
 
 
  On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter 
  li...@infolinks.com
  wrote:
 
   1. yes - I configure my job using this line:
  
  TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
 scan,
   ScanMapper.class, Text.class, MapWritable.class, job)
  
   which internally uses TableInputFormat.class
  
   2. One split per region ? What do you mean ? How do I do that
 ?
  
   3. hbase version 0.90.2
  
   4. no exceptions. the logs are very clean.
  
  
  
   On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com
   wrote:
  
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce
 one
Split
  per
region.
   
What HBase version do you use ?
Do you find any exception in master / region server logs
 around
   the
   moment
of timeout ?
   
Cheers
   
On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
li...@infolinks.com
wrote:
   
 Hi all,
 I'm running a scan using the M/R framework.
 My table contains hundreds of millions of rows and I'm
  scanning
 using
 start/stop key about 50 million rows.

 The problem is that some map tasks get stuck and the task
   manager
  kills
 these maps after 600 seconds. When retrying the task
  everything
 works
fine
 (sometimes).

 To verify that the problem is in hbase (and not in the map
   code)
I
removed
 all the code from my map function, so it looks like this:
 public void map(ImmutableBytesWritable key, Result value,
   Context
context)
 throws IOException, InterruptedException {
 }

 Also, when the map got stuck on a region, I tried to scan
  this
 region
 (using
 simple scan from a Java main) and it worked fine.

 Any ideas ?

 Thanks,
 Lior

   
  
 

   
  
 





Re: M/R scan problem

2011-07-04 Thread Lior Schachter
I will increase the number of connections to 1000.

Thanks !

Lior




On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu yuzhih...@gmail.com wrote:

 The reason I asked about HBaseURLsDaysAggregator.java was that I see no
 HBase (client) code in call stack.
 I have little clue for the problem you experienced.

 There may be more than one connection to zookeeper from one map task.
 So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns

 Cheers

 On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter li...@infolinks.com
 wrote:

  1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 :
 are
  not important since even when I removed all my map code the tasks got
 stuck
  (but the thread dumps were generated after I revived the code). If you
  think
  its important I'll remove the map code again and re-generate the thread
  dumps...
 
  2. 82 maps were launched but only 36 ran simultaneously.
 
  3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?
 
  Thanks,
  Lior
 
 
  On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   In the future, provide full dump using pastebin.com
   Write snippet of log in email.
  
   Can you tell us what the following lines are about ?
   HBaseURLsDaysAggregator.java:124
   HBaseURLsDaysAggregator.java:131
  
   How many mappers were launched ?
  
   What value is used for hbase.zookeeper.property.maxClientCnxns ?
   You may need to increase the value for above setting.
  
   On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com
   wrote:
  
I used kill -3, following the thread dump:
   
...
   
   
On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com wrote:
   
 I wasn't clear in my previous email.
 It was not answer to why map tasks got stuck.
 TableInputFormatBase.getSplits() is being called already.

 Can you try getting jstack of one of the map tasks before task
  tracker
 kills
 it ?

 Thanks

 On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter 
 li...@infolinks.com
 wrote:

  1. Currently every map gets one region. So I don't understand
 what
  difference will it make using the splits.
  2. How should I use the TableInputFormatBase.getSplits() ? Could
  not
find
  examples for that.
 
  Thanks,
  Lior
 
 
  On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com
  wrote:
 
   For #2, see TableInputFormatBase.getSplits():
 * Calculates the splits that will serve as input for the map
   tasks.
 The
 * number of splits matches the number of regions in a table.
  
  
   On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter 
   li...@infolinks.com
   wrote:
  
1. yes - I configure my job using this line:
   
   TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
  scan,
ScanMapper.class, Text.class, MapWritable.class, job)
   
which internally uses TableInputFormat.class
   
2. One split per region ? What do you mean ? How do I do that
 ?
   
3. hbase version 0.90.2
   
4. no exceptions. the logs are very clean.
   
   
   
On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu yuzhih...@gmail.com
wrote:
   
 Do you use TableInputFormat ?
 To scan large number of rows, it would be better to produce
  one
 Split
   per
 region.

 What HBase version do you use ?
 Do you find any exception in master / region server logs
  around
the
moment
 of timeout ?

 Cheers

 On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
 li...@infolinks.com
 wrote:

  Hi all,
  I'm running a scan using the M/R framework.
  My table contains hundreds of millions of rows and I'm
   scanning
  using
  start/stop key about 50 million rows.
 
  The problem is that some map tasks get stuck and the task
manager
   kills
  these maps after 600 seconds. When retrying the task
   everything
  works
 fine
  (sometimes).
 
  To verify that the problem is in hbase (and not in the
 map
code)
 I
 removed
  all the code from my map function, so it looks like this:
  public void map(ImmutableBytesWritable key, Result value,
Context
 context)
  throws IOException, InterruptedException {
  }
 
  Also, when the map got stuck on a region, I tried to scan
   this
  region
  (using
  simple scan from a Java main) and it worked fine.
 
  Any ideas ?
 
  Thanks,
  Lior
 

   
  
 

   
  
 



Re: M/R scan problem

2011-07-04 Thread Ted Yu
Although connection count may not be the root cause, please read
http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif
you have time.
0.92.0 would do a much better job of managing connections.

On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter li...@infolinks.com wrote:

 I will increase the number of connections to 1000.

 Thanks !

 Lior




 On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu yuzhih...@gmail.com wrote:

  The reason I asked about HBaseURLsDaysAggregator.java was that I see no
  HBase (client) code in call stack.
  I have little clue for the problem you experienced.
 
  There may be more than one connection to zookeeper from one map task.
  So it doesn't hurt if you increase
 hbase.zookeeper.property.maxClientCnxns
 
  Cheers
 
  On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter li...@infolinks.com
  wrote:
 
   1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 :
  are
   not important since even when I removed all my map code the tasks got
  stuck
   (but the thread dumps were generated after I revived the code). If you
   think
   its important I'll remove the map code again and re-generate the thread
   dumps...
  
   2. 82 maps were launched but only 36 ran simultaneously.
  
   3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it
 ?
  
   Thanks,
   Lior
  
  
   On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu yuzhih...@gmail.com wrote:
  
In the future, provide full dump using pastebin.com
Write snippet of log in email.
   
Can you tell us what the following lines are about ?
HBaseURLsDaysAggregator.java:124
HBaseURLsDaysAggregator.java:131
   
How many mappers were launched ?
   
What value is used for hbase.zookeeper.property.maxClientCnxns ?
You may need to increase the value for above setting.
   
On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter li...@infolinks.com
wrote:
   
 I used kill -3, following the thread dump:

 ...


 On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu yuzhih...@gmail.com
 wrote:

  I wasn't clear in my previous email.
  It was not answer to why map tasks got stuck.
  TableInputFormatBase.getSplits() is being called already.
 
  Can you try getting jstack of one of the map tasks before task
   tracker
  kills
  it ?
 
  Thanks
 
  On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter 
  li...@infolinks.com
  wrote:
 
   1. Currently every map gets one region. So I don't understand
  what
   difference will it make using the splits.
   2. How should I use the TableInputFormatBase.getSplits() ?
 Could
   not
 find
   examples for that.
  
   Thanks,
   Lior
  
  
   On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu yuzhih...@gmail.com
   wrote:
  
For #2, see TableInputFormatBase.getSplits():
  * Calculates the splits that will serve as input for the
 map
tasks.
  The
  * number of splits matches the number of regions in a
 table.
   
   
On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter 
li...@infolinks.com
wrote:
   
 1. yes - I configure my job using this line:

TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
   scan,
 ScanMapper.class, Text.class, MapWritable.class, job)

 which internally uses TableInputFormat.class

 2. One split per region ? What do you mean ? How do I do
 that
  ?

 3. hbase version 0.90.2

 4. no exceptions. the logs are very clean.



 On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu 
 yuzhih...@gmail.com
 wrote:

  Do you use TableInputFormat ?
  To scan large number of rows, it would be better to
 produce
   one
  Split
per
  region.
 
  What HBase version do you use ?
  Do you find any exception in master / region server logs
   around
 the
 moment
  of timeout ?
 
  Cheers
 
  On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter 
  li...@infolinks.com
  wrote:
 
   Hi all,
   I'm running a scan using the M/R framework.
   My table contains hundreds of millions of rows and I'm
scanning
   using
   start/stop key about 50 million rows.
  
   The problem is that some map tasks get stuck and the
 task
 manager
kills
   these maps after 600 seconds. When retrying the task
everything
   works
  fine
   (sometimes).
  
   To verify that the problem is in hbase (and not in the
  map
 code)
  I
  removed
   all the code from my map function, so it looks like
 this:
   public void map(ImmutableBytesWritable key, Result
 value,
 Context
  context)
   throws 

Re: M/R scan problem

2011-07-04 Thread Michel Segel
Did a quick trim...

Sorry to jump in on the tail end of this...
Two things you may want to look at...

Are you timing out because you haven't updated your status within the task or 
are you taking 600seconds to complete a single map() iteration.

You can test this by tracking to see how long you are spending in each map 
iteration and printing out the result if it is longer than 2 mins... 

Also try updating your status in each iteration by sending a unique status 
update like current system time...
...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 4, 2011, at 12:35 PM, Ted Yu yuzhih...@gmail.com wrote:

 Although connection count may not be the root cause, please read
 http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif
 you have time.
 0.92.0 would do a much better job of managing connections.
 
 On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter li...@infolinks.com wrote:
 


Re: HBase filtered scan problem

2011-05-23 Thread Iulia Zidaru

 Thank you very much St. Ack.
It sounds like we have to create other filer.
Iulia

On 05/12/2011 08:07 PM, Stack wrote:

On Thu, May 12, 2011 at 6:42 AM, Iulia Zidaruiulia.zid...@1and1.ro  wrote:

  Hi,

Thank you for your answer St. Ack.
Yes, both coordinates are the same. It is impossible for the filter to
decide that a value is old. I still don't understand why the HBase server
has both values or how long does it keep both.

Well its hard to 'overwrite' if one value is in the memstore and the
other is out on the filesystem.

It'll do the clean up on major compaction.

The fiilter should be able to pick up ordering hints from its context;
its just not doing it.


The same thing happens if
puts have different timestamps.


With the filter you mean?  I'd think the filter should distingush these.
St.Ack




Re: HBase filtered scan problem

2011-05-11 Thread Iulia Zidaru

 Hi,
I'll try to rephrase the problem...
We have a table where we add an empty value.(The same thing happen also 
if we have a value).
Afterward we put a value inside.(Same put, just other value). When 
scanning for empty values (first values inserted), the result is wrong 
because the filter gets called for both values (the empty which maches 
and the not empty which doesn't match). The table has only one version. 
It looks like the heap object in StoreScanner has both objects. Do you 
have any idea if this is a normal behavior and if we can avoid this somehow?


Thank you,
Iulia

On 05/10/2011 03:56 PM, Stefan Comanita wrote:

Hi all,

I want to do a scan on a number of rows, each row having multiple columns, and 
I want to filter out some of this columns based on their values per example, if 
I have the following rows:

plainRow:col:value1 column=T:19, timestamp=19, value=   
plainRow:col:value1 column=T:2, timestamp=2, value=U
plainRow:col:value1 column=T:3, timestamp=3, value=U
plainRow:col:value1 column=T:4, timestamp=4, value=


and

secondRow:col:value1 column=T:1, timestamp=1, value=   
secondRow:col:value1 column=T:2, timestamp=2, value=
secondRow:col:value1 column=T:3, timestamp=3, value=U   
secondRow:col:value1 column=T:4, timestamp=4, value=



and I want to select all the rows but just with the columns that don't have the value 
U, something like:

plainRow:col:value1 column=T:19, timestamp=19, value=   
plainRow:col:value1 column=T:4, timestamp=4, value=
secondRow:col:value1 column=T:1, timestamp=1, value=   
secondRow:col:value1 column=T:2, timestamp=2, value=secondRow:col:value1 column=T:4, timestamp=4, value=


and to achieve this, i try the following:

Scan scan = new Scan();
 
scan.setStartRow(stringToBytes(rowIdentifier));

scan.setStopRow(stringToBytes(rowIdentifier + Constants.MAX_CHAR));
scan.addFamily(Constants.TERM_VECT_COLUMN_FAMILY);

if(includeFilter) {
 Filter filter = new ValueFilter(CompareOp.EQUAL,
 new BinaryComparator(stringToBytes(U)));
 scan.setFilter(filter);

}

and if i execute this scan I get the rows with the columns having the value U, which is 
correct, but when i set CompareOp.NOT_EQUAL and i expect to get the other columns it doesnt work 
the way i want, it give me back all the rows, including the one which have the value U, 
the same happens when i use:
Filter filter = new ValueFilter(CompareOp.EQUAL, new 
BinaryComparator(stringToBytes()));

I mention that the columns have the values U and  (empty string), and that 
i also saw the same behaivior with the RegexComparator and SubstringComparator.

Any idea would be very much appreciated, sorry for the long mail, thank you.

Stefan Comanita



--
Iulia Zidaru
Java Developer

11 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
iulia.zid...@1and1.ro
0040 31 223 9153

 



HBase filtered scan problem

2011-05-10 Thread Stefan Comanita
Hi all, 

I want to do a scan on a number of rows, each row having multiple columns, and 
I want to filter out some of this columns based on their values per example, if 
I have the following rows:

plainRow:col:value1 column=T:19, timestamp=19, 
value= 
  
plainRow:col:value1 column=T:2, timestamp=2, 
value=U  
  
plainRow:col:value1 column=T:3, timestamp=3, 
value=U  
  
plainRow:col:value1 column=T:4, timestamp=4, value=

and

secondRow:col:value1 column=T:1, timestamp=1, 
value= 
  
secondRow:col:value1 column=T:2, timestamp=2, 
value=  
  
secondRow:col:value1 column=T:3, timestamp=3, 
value=U 
  
secondRow:col:value1 column=T:4, timestamp=4, value=


and I want to select all the rows but just with the columns that don't have the 
value U, something like:

plainRow:col:value1 column=T:19, timestamp=19, 
value= 
  
plainRow:col:value1 column=T:4, timestamp=4, value=
secondRow:col:value1 column=T:1, timestamp=1, 
value= 
  
secondRow:col:value1 column=T:2, timestamp=2, 
value=   
 secondRow:col:value1 column=T:4, timestamp=4, value=

and to achieve this, i try the following:

Scan scan = new Scan();
    
scan.setStartRow(stringToBytes(rowIdentifier));
scan.setStopRow(stringToBytes(rowIdentifier + Constants.MAX_CHAR));
scan.addFamily(Constants.TERM_VECT_COLUMN_FAMILY);

if(includeFilter) {
    Filter filter = new ValueFilter(CompareOp.EQUAL, 
    new BinaryComparator(stringToBytes(U)));    
    scan.setFilter(filter);
}

and if i execute this scan I get the rows with the columns having the value 
U, which is correct, but when i set CompareOp.NOT_EQUAL and i expect to get 
the other columns it doesnt work the way i want, it give me back all the rows, 
including the one which have the value U, the same happens when i use: 
Filter filter = new ValueFilter(CompareOp.EQUAL, new 
BinaryComparator(stringToBytes())); 

I mention that the columns have the values U and  (empty string), and that 
i also saw the same behaivior with the RegexComparator and SubstringComparator.

Any idea would be very much appreciated, sorry for the long mail, thank you.

Stefan Comanita