AWS cluster script - abnormal behavior

2015-06-02 Thread roman.drap...@baesystems.com
Hi there, I was going to do some performance tests to evaluate the technology on AWS. Followed https://blogs.aws.amazon.com/bigdata/post/Tx15973X6QHUM43/Running-Apache-Accumulo-on-Amazon-EMR The exact command I am using: aws emr create-cluster --name Accumulo --no-auto-terminate --bootstrap-ac

micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
Hi guys, While doing pre-analytics we generate hundreds of millions of mutations that result in 1-100 megabytes of useful data after major compaction. We ingest into Accumulo using MR from Mapper job. We identified that performance really degrades while increasing a number of mutations. The o

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
aring apples to oranges :) roman.drap...@baesystems.com wrote: > Hi guys, > > While doing pre-analytics we generate hundreds of millions of > mutations that result in 1-100 megabytes of useful data after major > compaction. We ingest into Accumulo using MR from Mapper job. We > i

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
2015 at 9:08 AM roman.drap...@baesystems.com<mailto:roman.drap...@baesystems.com> mailto:roman.drap...@baesystems.com>> wrote: Aggregated output is tiny, so if I do same calculations in memory (instead of sending mutations to Accumulo) , I can reduce overall number of mutations by 1000x or

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
to handle the in-memory aggregation before giving the data to the BatchWriter. Why would any part of Accumulo code be responsible for this kind of application-specific data handling? On Tue, Jun 9, 2015 at 3:17 PM, roman.drap...@baesystems.com<mailto:roman.drap...@baesystems.com&

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
compaction On Tue, Jun 9, 2015 at 4:06 PM, roman.drap...@baesystems.com<mailto:roman.drap...@baesystems.com> mailto:roman.drap...@baesystems.com>> wrote: My view is that introduction of ingest-time iterators would be quite a useful feature. Anyway. ☺ Also, could anyone exactly explain w

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
Thanks a lot, will give a try! From: Keith Turner [mailto:ke...@deenlo.com] Sent: 09 June 2015 22:28 To: user@accumulo.apache.org Subject: Re: micro compaction On Tue, Jun 9, 2015 at 5:10 PM, roman.drap...@baesystems.com<mailto:roman.drap...@baesystems.com> mailto:roma

visibility expression & column compression

2015-08-24 Thread roman.drap...@baesystems.com
Hi there, My question is how Accumulo compression works in regards to visibility labels. Is there any difference between "VeryLargeLargeLarge & AlsoLargeLargeLarge" and "A&B" expressions? Will it be internally compiled to a low data consuming structure? Same question applies to column and qual

RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
Hi there, Our current rowid format is MMdd_payload_sha256(raw data). It works nicely as we have a date and uniqueness guaranteed by hash, however unfortunately, rowid is around 50-60 bytes per record. Requirements are the following: 1) Support Hive on top of Accumulo for ad-hoc querie

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
rk to work with. Referencing some of the HBaseStorageHandler code might also be worthwhile (as the two are very similar). - Josh roman.drap...@baesystems.com wrote: > Hi there, > > Our current rowid format is MMdd_payload_sha256(raw data). It works > nicely as we have a date and uniquen

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
LazyObject type system is not my favorite framework to work with. Referencing some of the HBaseStorageHandler code might also be worthwhile (as the two are very similar). - Josh roman.drap...@baesystems.com wrote: > Hi there, > > Our current rowid format is MMdd_payload_sha256(raw d

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
hould bridge the gap https://github.com/apache/hive/blob/release-1.2.1/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloRangeGenerator.java#L277 roman.drap...@baesystems.com wrote: > Hi Josh, > > Thanks for response. > > Well, I am not an expert in Accumulo (s

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
ing that repetitious prefix. You sure it wasn't the "payload_sha256" you had as a suffix that was problematic? Human readable data (that doesn't sacrifice performance terribly) is always more pleasant to work with. Just a thought. roman.drap...@baesystems.com wrote: > So

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
#x27;d have to write some custom code to use the Lexicoders (an extension to the AccumuloRowSerializer). roman.drap...@baesystems.com wrote: > Yes, payload + sha256 adds 35 more bytes, so we want to use 4 bytes instead > of 32 for hash but we need second precision (instead of day). > >

RE: RowID design and Hive push down

2015-09-14 Thread roman.drap...@baesystems.com
erformance timestamp oracles for transactions in their Percolator paper [3]. Cheers, Adam [1] https://en.wikipedia.org/wiki/Birthday_problem [2] https://github.com/twitter/snowflake [3] http://research.google.com/pubs/pub36726.html On Mon, Sep 14, 2015 at 2:47 PM, roman.drap...@baesy

Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
Hi there, Trying to setup Accumulo 1.7 on Kerberized cluster. Only interested in master/tablets to be kerberized (not end-users). Configured everything as per manual: 1) Created principals 2) Generated glob keytab 3) Modified accumulo-site.xml providing general.kerberos.keytab

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
collect the output specifying -Dsun.security.krb5.debug=true in accumulo-env.sh (per the instructions) and try enabling log4j DEBUG on org.apache.hadoop.security.UserGroupInformation. - Josh [1] https://issues.apache.org/jira/browse/ACCUMULO-4069 [2] http://accumulo.apache.org/1.7/accumulo_user_manu

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
e fix from the JIRA case and rebuild Accumulo yourself, or build 1.7.1-SNAPSHOT from our codebase. I would recommend using 1.7.1-SNAPSHOT as it should be the least painful (1.7.1-SNAPSHOT now is likely to not change significantly from what is ultimately released as 1.7.1) roman.drap...@baesystem

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
- From: roman.drap...@baesystems.com [mailto:roman.drap...@baesystems.com] Sent: 26 January 2016 19:43 To: user@accumulo.apache.org Subject: RE: Accumulo and Kerberos Hi Josh, Two quick questions. 1) What should I use instead of HDFS classloader? All examples seem to be from hdfs. 2) Whan

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
y thoughts please? -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: 26 January 2016 20:08 To: user@accumulo.apache.org Subject: Re: Accumulo and Kerberos The normal classloader (on the local filesystem) which is configured out of the box. roman.drap...@baesystems

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
tating that the Kerberos login happened (or didn't). The server should exit if it fails to log in (but I don't know if I've actively tested that). Do you see this message? Does it say you successfully logged in (and the principal you logged in as)? roman.drap...@baesystems.com wrote:

RE: Accumulo and Kerberos

2016-01-26 Thread roman.drap...@baesystems.com
you share logs? Try enabling -Dsun.security.kr5b.debug=true in the appropriate environment variable (for the service you want to turn it on for) in accumulo-env.sh and then start the services again (hopefully, sharing that too if the problem isn't obvious). roman.drap...@baesystems.com wr

RE: Accumulo and Kerberos

2016-01-27 Thread roman.drap...@baesystems.com
(or "token"). If I had to venture a guess, it would be that you have Accumulo configured to use the wrong Hadoop configuration files, notably core-site.xml and hdfs-site.xml. Try the command `accumulo classpath` command and verify that the Hadoop configuration files included th

RE: Accumulo and Kerberos

2016-01-27 Thread roman.drap...@baesystems.com
files into the Accumulo installation (making upgrades less error prone). Glad to hear you got it working. roman.drap...@baesystems.com wrote: > Hi Josh, > > Thanks a lot for your guess. Classpath did not help, however symlinks from > Hadoop conf directory to Accumulo conf directory worked

RE: Accumulo and Kerberos

2016-01-27 Thread roman.drap...@baesystems.com
classpaths ...elided... $HADOOP_CONF_DIR, ...elided... Classpaths that accumulo checks for updates and class files. This is all you should need to get the necessary Hadoop configuration files made available to the Accumulo services. roman.drap...@baesystems.com wrote