[jira] [Created] (HBASE-7129) Need documentation for REST atomic operations (HBASE-4720)

2012-11-08 Thread Joe Pallas (JIRA)
Joe Pallas created HBASE-7129:
-

 Summary: Need documentation for REST atomic operations (HBASE-4720)
 Key: HBASE-7129
 URL: https://issues.apache.org/jira/browse/HBASE-7129
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Joe Pallas
Priority: Minor


HBASE-4720 added checkAndPut/checkAndDelete capability to the REST interface, 
but the REST documentation (in the package summary) needs to be updated so 
people know that this feature exists and how to use it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-24 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114111#comment-13114111
 ] 

Joe Pallas commented on HBASE-4335:
---

That looks like the sort of solution I envisioned.  I am not sure that changing 
the order of starting/joining the threads actually helps; I think it may just 
be a distraction for the reader.

> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2011-09-12 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102929#comment-13102929
 ] 

Joe Pallas commented on HBASE-2600:
---

How would a table name have 0x00 in it?  HTableDescriptor says it will throw 
IllegalArgumentException "if passed a table name that is made of other than 
'word' characters or underscores: i.e. [a-zA-Z_0-9]."


> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4334) HRegion.get never validates row

2011-09-06 Thread Joe Pallas (JIRA)
HRegion.get never validates row
---

 Key: HBASE-4334
 URL: https://issues.apache.org/jira/browse/HBASE-4334
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: Joe Pallas


If a client gets confused (possibly by a hole in .META., see HBASE-4333), it 
may send a request to the wrong region.  Paths through put, delete, 
incrementColumnValue, and checkAndMutate all call checkRow either directly or 
indirectly (through getLock).  But get apparently does not.  This can result in 
returning an incorrect empty result instead of a WrongRegionException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4333) Client does not check for holes in .META.

2011-09-06 Thread Joe Pallas (JIRA)
Client does not check for holes in .META.
-

 Key: HBASE-4333
 URL: https://issues.apache.org/jira/browse/HBASE-4333
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Joe Pallas


If there is a temporary hole in .META., the client may get the wrong region 
from HConnection.locateRegion.  
HConnectionManager.HConnectionImplementation.locateRegionInMeta should check 
the end key of the region found with getClosestRowBefore, just as it checks the 
offline status, when it looks at the region info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-06 Thread Joe Pallas (JIRA)
Splits can create temporary holes in .META. that confuse clients and 
regionservers
--

 Key: HBASE-4335
 URL: https://issues.apache.org/jira/browse/HBASE-4335
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: Joe Pallas


When a SplitTransaction is performed, three updates are done to .META.:
1. The parent region is marked as splitting (and hence offline)
2. The first daughter region is added (same start key as parent)
3. The second daughter region is added (split key is start key)
(later, the original parent region is deleted, but that's not important to this 
discussion)

Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener 
threads.  While the master is notified when a split is complete, the only 
visibility that clients have is whether the daughter regions have appeared in 
.META.

If the second daughter is added to .META. first, then .META. will contain the 
(offline) parent region followed by the second daughter region.  If the client 
looks up a key that is greater than (or equal to) the split, the client will 
find the second daughter region and use it.  If the key is less than the split 
key, the client will find the parent region and see that it is offline, 
triggering a retry.

If the first daughter is added to .META. before the second daughter, there is a 
window during which .META. has a hole: the first daughter effectively hides the 
parent region (same start key), but there is no entry for the second daughter.  
A region lookup will find the first daughter for all keys in the parent's 
range, but the first daughter does not include keys at or beyond the split key.

See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
suggestions for mitigating this in the client and regionserver.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

2011-06-10 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047342#comment-13047342
 ] 

Joe Pallas commented on HBASE-3484:
---

I think the performance issue I mentioned above may actually be HBASE-3855.

> Replace memstore's ConcurrentSkipListMap with our own implementation
> 
>
> Key: HBASE-3484
> URL: https://issues.apache.org/jira/browse/HBASE-3484
> Project: HBase
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much 
> more cheaply
> - implement a Set directly without having to do Map to 
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 
> 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

2011-04-26 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025463#comment-13025463
 ] 

Joe Pallas commented on HBASE-3484:
---

This issue was cited by jdcryans as related to unfortunate performance seen in 
the following case:

A test program fills a single row of a family with tens of thousands of 
sequentially increasing qualifiers.  Then it performs random gets (or exists) 
of those qualifiers.  The response time seen is (on average) proportional to 
the ordinal position of the qualifier.  If the table is flushed before the 
random tests begin, then the average response time is basically constant, 
independent of the qualifier's ordinal position.

I'm not sure that either of the two points in the description actually covers 
this case, but I don't know enough to say.


> Replace memstore's ConcurrentSkipListMap with our own implementation
> 
>
> Key: HBASE-3484
> URL: https://issues.apache.org/jira/browse/HBASE-3484
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.92.0
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much 
> more cheaply
> - implement a Set directly without having to do Map to 
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 
> 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3730) DEFAULT_VERSIONS should be 1

2011-04-04 Thread Joe Pallas (JIRA)
DEFAULT_VERSIONS should be 1


 Key: HBASE-3730
 URL: https://issues.apache.org/jira/browse/HBASE-3730
 Project: HBase
  Issue Type: Improvement
Reporter: Joe Pallas
Priority: Minor


The current DEFAULT_VERSIONS (in HColumnDescriptor) is 3, but there is no 
particular reason for this.  Many uses require only 1, and having a default 
that is different makes people confused (e.g., "Do I need multiple versions to 
support deletes properly?").

Reasonable values for the default are 1 and max int.  1 is the better choice.

Discussion on the mailing list suggests that the current value of 3 may have 
been derived from an example in the Bigtable paper.  The example does not 
suggest that there is anything special about 3, it's just an illustration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1837) Fix results contract (If row has no results, return null, if Result has no results return null or empty Sets and Arrays?)

2011-03-23 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010235#comment-13010235
 ] 

Joe Pallas commented on HBASE-1837:
---

Current behavior is both confusing and not documented.

"Effective Java" recommends returning empty collections instead of null, in 
general.  I understand the argument about distinguishing between empty and 
missing from HBASE-1028, but with a Bigtable-style dynamic schema, there are 
cases where there simply is no difference.  In those cases, an empty collection 
seems a much better choice for clients.

> Fix results contract (If row has no results, return null, if Result has no 
> results return null or empty Sets and Arrays?)
> -
>
> Key: HBASE-1837
> URL: https://issues.apache.org/jira/browse/HBASE-1837
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.92.0
>
>
> Make sure we are consistent regards results contract.  As jgray says:
> {code}
> 17:47 < jgray> decisions are things like, if the result is empty do we return 
> nulls or do we return empty 
>lists/0-length arrays
> 17:47 < jgray> if result is empty, do we return null for row?
> 17:47 < jgray> and if row is the null row, we then return zero-length byte[0]
> 17:48 < St^Ack_> So, if row is empty, we return null (I believe)
> 17:48 < jgray> yes
> 17:49 < St^Ack_> If you have a result, up to this, if empty, it would not 
> return null stuff.
> 17:49 < jgray> no it did return null stuff
> 17:49 < jgray> at least many of them did
> 17:49 < St^Ack_> oh.. ok.
> 17:49 < jgray> but then my result delayed deserialization broke that on one 
> case
> 17:49 < St^Ack_> I thought I'd added it w/ 1836?
> 17:49 < jgray> yeah u fixed what i broke, i think
> 17:50 < jgray> but we should nail down the contract, specify what it is in 
> javadoc, and add unit tests to verify such
> ...
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3494) checkAndPut implementation doesnt verify row param and writable row are the same

2011-02-02 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989772#comment-12989772
 ] 

Joe Pallas commented on HBASE-3494:
---

It would be nice if the API Javadoc mentioned this restriction.

> checkAndPut implementation doesnt verify row param and writable row are the 
> same
> 
>
> Key: HBASE-3494
> URL: https://issues.apache.org/jira/browse/HBASE-3494
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: ryan rawson
>Assignee: ryan rawson
> Fix For: 0.90.1
>
> Attachments: HBASE-3494.txt
>
>
> the API checkAndPut, and on the server side checkAndMutate doesn't enforce 
> that the row in the API call and the row in the passed writable that should 
> be executed if the check passes, are the same row!  Looking at the code, if 
> someone were to 'fool' us, we'd probably end up with rows in the wrong region 
> in the worst case.  Or we'd end up with non-locked puts/deletes to different 
> rows since the checkAndMutate grabs the row lock and calls put/delete methods 
> that do not grab row locks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira