Re: Hash indexing of HFiles

2011-07-19 Thread Claudio Martella
. On Mon, Jul 18, 2011 at 12:32 PM, Stack st...@duboce.net wrote: On Mon, Jul 18, 2011 at 9:22 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Yes, I had a look at it a while ago. For what I know perfect hashing doesn't work that good for many elements. With millions of items it should

Re: Hash indexing of HFiles

2011-07-18 Thread Claudio Martella
On 7/16/11 10:08 PM, Stack wrote: On Fri, Jul 15, 2011 at 10:06 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: On 7/15/11 6:24 PM, Stack wrote: How do you figure the N in the below Claudio? N is the total amount of pairs in the sequence file. You know that when you finish flushing

HBase and Hadoop 0.20-security-append

2011-07-18 Thread Claudio Martella
0.20-security-append to get both these features and allowing to deploy both the systems on the same cluster. I'm guessing how HBase behaves with 0.20-security-append. Can I run it on this hadoop version? Can anybody quickly report on that? Thanks Claudio -- Claudio Martella Free Software Open

Re: HBase and Hadoop 0.20-security-append

2011-07-18 Thread Claudio Martella
On 7/18/11 5:50 PM, Stack wrote: On Mon, Jul 18, 2011 at 6:01 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I'm guessing how HBase behaves with 0.20-security-append. Can I run it on this hadoop version? My guess is that it will work (where'd you find this branch?). Will Giraph

Re: Hash indexing of HFiles

2011-07-18 Thread Claudio Martella
On 7/18/11 6:05 PM, Stack wrote: On Mon, Jul 18, 2011 at 4:04 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: No, you can have collisions, so the index is not perfect (which means you can have buckets for colliding keys and empty unused entries in the hashtable directory). Well

Re: HBase and Hadoop 0.20-security-append

2011-07-18 Thread Claudio Martella
things setup, let us know and we'll try to help out. Gary On Mon, Jul 18, 2011 at 9:13 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: On 7/18/11 5:50 PM, Stack wrote: On Mon, Jul 18, 2011 at 6:01 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I'm guessing how HBase

Hash indexing of HFiles

2011-07-15 Thread Claudio Martella
anybody know if anybody has developed other indexing techniques for sequence files other than Btrees? Thanks! -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129

Re: data structure

2011-07-15 Thread Claudio Martella
no chance to put a date into the key andre -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information

Re: data structure

2011-07-15 Thread Claudio Martella
- From: Claudio Martella claudio.marte...@tis.bz.it Sent: Fri Jul 15 2011 14:40:38 GMT+0200 (CET) To: CC: Subject: Re: data structure supposed you want a per-hour granularity, you could have a key like this userID_YYMMDDHH where HH: hour of the day (0-23) DD: day of the month MM: month

Re: Hash indexing of HFiles

2011-07-15 Thread Claudio Martella
really require range queries, so I thought I'd take advantage of even faster random i/o from hash indexing of data in each sequence file. Does anybody know if anybody has developed other indexing techniques for sequence files other than Btrees? Thanks! -- Claudio Martella Free Software

Re: Hash indexing of HFiles

2011-07-15 Thread Claudio Martella
Bautin of an hfile v2). I'd be interested in that, do you have a reference to it? St.Ack On Fri, Jul 15, 2011 at 7:58 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hi Michal, what I was talking about is more of a vector-of-offsets kind of approach in stead of the Btree created

Re: client-side caching

2011-07-05 Thread Claudio Martella
it through memcache. On 7/4/11 7:03 PM, Ted Yu wrote: See HBASE-4018 On Mon, Jul 4, 2011 at 7:33 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hello list, i'm using hbase 0.90.3 on a 5 nodes cluster. I'm using a table as a string-long map. As I'm using this map a lot, I was thinking about

Re: client-side caching

2011-07-05 Thread Claudio Martella
until you need to worry about invalidation. It's hard to build efficient and correct invalidation. On Jul 5, 2011 2:13 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I've seen that. But that's about caching on regionserver-side through memcache. You still have the network roundtrip

client-side caching

2011-07-04 Thread Claudio Martella
? some client-side caching already in hbase? Best, Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http

Re: on the impact of incremental counters

2011-06-20 Thread Claudio Martella
(or other data) may become imprecise. For some use cases that is fine. /wizard - Andy --- On Sat, 6/18/11, Claudio Martella claudio.marte...@tis.bz.it wrote: From: Claudio Martella claudio.marte...@tis.bz.it Subject: on the impact of incremental counters To: user@hbase.apache.org

on the impact of incremental counters

2011-06-18 Thread Claudio Martella
filters). Thanks! Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information

Re: is there an atomic checkAndPut in hbase?

2011-01-31 Thread Claudio Martella
. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano

Re: I give up, help please

2010-12-21 Thread Claudio Martella
) at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:170) at org.apache.hadoop.hbase.master.HMaster.checkRootDir(HMaster.java:304) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:221) ... 15 more -- Claudio Martella Digital Technologies

the semantics of HTable.put()

2010-12-18 Thread Claudio Martella
Hello list, just two lines for a proposal. Wouldn't it make more sense if put would return the old value in case the put ends up being an update instead of an insert? This would mimic HashMap's behavior and would be very useful. What do you think? -- Claudio Martella Digital Technologies Unit

GraphDB over HBase or Columnstore in general

2010-12-08 Thread Claudio Martella
on the same regionserver. 3) I guess that for the scanning I'd make extensive use of Filters. I guess regexp Filter will be my friend. Do you have concerns about performance of filters applied to this data model? Thank you very much Claudio -- Claudio Martella Digital Technologies Unit Research

Re: incremental counters and a global String-Long Dictionary

2010-12-02 Thread Claudio Martella
Hi Todd, you're right, there's no need to be purists in this case. Thanks On 12/1/10 9:24 AM, Todd Lipcon wrote: On Tue, Nov 30, 2010 at 6:02 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Lars, yes, that's exactly the problem, i also considered checkAndPut() but that wouldn't

Re: incremental counters and a global String-Long Dictionary

2010-12-02 Thread Claudio Martella
. Would that help? -ryan On Tue, Nov 30, 2010 at 6:07 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hi Dave, thanks for you idea. I also considered this possibility. Although the possibility of a collision is very small, what scares me is the fact that i don't think

Re: incremental counters and a global String-Long Dictionary

2010-12-02 Thread Claudio Martella
[], org.apache.hadoop.hbase.client.Put) St.Ack On Thu, Dec 2, 2010 at 7:42 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hi Ryan, yes that would help for sure. Shouldn't this feature be documented? Thanks On 12/1/10 4:03 AM, Ryan Rawson wrote: CheckAndPut interprets a 'null' value

Re: incremental counters and a global String-Long Dictionary

2010-11-30 Thread Claudio Martella
. This should be a good deal simpler than trying to keep around an order dependent integer mapping for your dictionary. And, it is somewhat recoverable if you ever lose your dictionary for some reason. Dave -Original Message- From: Claudio Martella [mailto:claudio.marte

incremental counters and a global String-Long Dictionary

2010-11-29 Thread Claudio Martella
would also be a possibility, but again, what about speed and the actual issues with the package (like recovering in the face of hregion failure). Thank you, Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19

Re: incremental counters and a global String-Long Dictionary

2010-11-29 Thread Claudio Martella
it and then releasing the lock. Or some such. Lars On Nov 29, 2010, at 16:12, Claudio Martella claudio.marte...@tis.bz.it wrote: Hello list, I'm kind of new to HBase, so I'll post this email with a request for comment. Very briefly, I do a lot of text processing with mapreduce, so it's very