Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

2017-01-16 Thread Yamini Joshi
Just a thought, will forcing a major compaction take care of this? Merging smaller tablets and deleting empty ones? Best regards, Yamini Joshi On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR < matt.dick...@defence.gov.au> wrote: > *UNOFFICIAL* > I have a table that has evolved t

Re: Accumulo Working

2016-11-22 Thread Yamini Joshi
, after any record is passed through the > complete Iterator "stack" (never only some of the iterators), may choose to > flush the queue entries back to the client. > > Yamini Joshi wrote: > >> I see. So, foa a scan opertaion that span 2 tservers: the client knows >>

Re: Accumulo Working

2016-11-22 Thread Yamini Joshi
t; it2 -> it3 -> client (if the max limit is reached) or is it always at the the end of the pipeline? Best regards, Yamini Joshi On Tue, Nov 22, 2016 at 12:36 PM, Josh Elser <josh.el...@gmail.com> wrote: > Scanners are sequentially communicating with TabletServers, as opposed to >

Re: Accumulo Working

2016-11-22 Thread Yamini Joshi
gt; it2 on tserver -> it3 on tserver -> client The processing is done in batches? Data is returned to the client when it reaches the max limit for table.scan.max.memory even if it is in the middle of the pipeline above? Best regards, Yamini Joshi On Tue, Nov 22, 2016 at 11:56 AM, Christ

Accumulo Working

2016-11-22 Thread Yamini Joshi
have some doubts: 1. Where is the data from tserver1 and tserver2 merged? 2. when and how are custom iterators applied? Also, if there is any resource explaining this, please point me to it. I've found some slides but no detailed explanation. Best regards, Yamini Joshi

Re: Clear Cache

2016-11-19 Thread Yamini Joshi
I figured the same but forgot to factor in the HDFS part. Best regards, Yamini Joshi On Sat, Nov 19, 2016 at 6:13 PM, Josh Elser <josh.el...@gmail.com> wrote: > Hi Yamini, > > I'd just add one word of caution about knowing exactly what you're > trying to measure. For example,

Clear Cache

2016-11-19 Thread Yamini Joshi
Hello all I am trying to track performance of my queries on Accumulo.I need to clear cache before every query in order to get clean time values. Could anyone please tell me how could this be achieved? Best regards, Yamini Joshi

HDFS Replication of data

2016-11-10 Thread Yamini Joshi
Hello all Does the HDFS replication improve performance of queries on Accumulo or is it transparent to the Accumulo system? If it does improve the performance by some notion of load balancing, is there is a Read Only or Write Only copy of data on HDFS for Accumulo? Best regards, Yamini Joshi

Write data to a file from inside of an iterator

2016-11-05 Thread Yamini Joshi
way to go about it? Best regards, Yamini Joshi

CountingIterator in pyaccumulo

2016-10-25 Thread Yamini Joshi
regards, Yamini Joshi

MultiIterator Class

2016-10-21 Thread Yamini Joshi
from batch_scan before passing the data to other iterators? Best regards, Yamini Joshi

Re: Iterator as a Filter

2016-10-21 Thread Yamini Joshi
Thank you for the reply! I'll try this and get back to you. Also, I found a MultiIterator Class. Any ideas on how it works? Will it work with batch scan and sort data before passing it to other iterators? Best regards, Yamini Joshi On Fri, Oct 21, 2016 at 6:35 AM, <dlmar...@comcast.net>

Re: Iterator as a Filter

2016-10-20 Thread Yamini Joshi
in which belong the list cardinality(Y intersection C) Best regards, Yamini Joshi On Thu, Oct 20, 2016 at 7:16 PM, Dave <dlmar...@comcast.net> wrote: > I'm a little confused to the use case here. Are you trying to find courses > that students are taking where the students are in a part

Re: Iterator as a Filter

2016-10-20 Thread Yamini Joshi
ored as rows, or otherwise moving the columns into the > rows. > > Regards, Dylan > > On Thu, Oct 20, 2016 at 3:45 PM, Yamini Joshi <yamini.1...@gmail.com> > wrote: > >> Hello all >> >> Is it possible to configure an iterator that works as a filter? As per &g

Re: Net ColumnFamily Count

2016-10-20 Thread Yamini Joshi
I will take a look at it. Thanks Josh :) Best regards, Yamini Joshi On Thu, Oct 20, 2016 at 5:30 PM, Josh Elser <josh.el...@gmail.com> wrote: > You can do a partial summation in an Iterator, but managing memory > pressure (like you originally pointed out) would require

Iterator as a Filter

2016-10-20 Thread Yamini Joshi
in iterator and go to the range in the list of cf and check if it exists. I am not sure if this will work or if it is a good approach. Any feedback is much appreciated. Best regards, Yamini Joshi

Re: Net ColumnFamily Count

2016-10-20 Thread Yamini Joshi
of parameters. I am back to square one. But I guess if there is no other option, I will try to benchmark and keep you guys in the loop :) Best regards, Yamini Joshi On Thu, Oct 20, 2016 at 4:22 PM, Josh Elser <josh.el...@gmail.com> wrote: > I would like to inject some hesitation here. This i

Re: Net ColumnFamily Count

2016-10-20 Thread Yamini Joshi
Alright! Do you happen to have some reference code that I can refer to? I am a newbie and I am not sure if by caching, aggregating and merge sort you mean to use some Accumulo wrapper or write a simple java code. Best regards, Yamini Joshi On Thu, Oct 20, 2016 at 2:49 PM, ivan bella &l

Net ColumnFamily Count

2016-10-20 Thread Yamini Joshi
might need to generate new keys with columnfamily name as the key and count as the value. Best regards, Yamini Joshi

Re: Accumulo Equivalent of Mongo Aggr Query

2016-10-20 Thread Yamini Joshi
not return records in a sorted manner hence step 4 does not give me the required results :\ I am not sure how to proceed now. Best regards, Yamini Joshi On Mon, Sep 26, 2016 at 8:28 AM, Josh Elser <josh.el...@gmail.com> wrote: > I think I can understand what your query is doing, but, I'm just

Count RowIDs with a common Prefix

2016-10-17 Thread Yamini Joshi
Hello all My keys are of the form rowID:otherID where there are multiple otherIDs for a RowID. I want to know the count of all the otherIDs within a rowID. What would be the most optimal way to implement this? Best regards, Yamini Joshi

Re: Data Replication

2016-10-16 Thread Yamini Joshi
In other words, what helps in load balancing? HDFS replication or Data center replication? Best regards, Yamini Joshi On Sat, Oct 15, 2016 at 10:44 PM, Yamini Joshi <yamini.1...@gmail.com> wrote: > So HDFS is for durability while replication is for availability? I'm > assuming tha

Re: Data Replication

2016-10-15 Thread Yamini Joshi
So HDFS is for durability while replication is for availability? I'm assuming that the client is unaware of the replicated instance and queries the DB with no knowledge of which instance/table will return the result. Best regards, Yamini Joshi On Thu, Oct 13, 2016 at 11:46 AM, Josh Elser

Re: Data Replication

2016-10-13 Thread Yamini Joshi
So, can I say that if I have a table split across nodes (i.e. num tablets > 1) and HDFS replication in my system, it is sort of equivalent to a sharded and replicated mongo architecture? Best regards, Yamini Joshi On Thu, Oct 13, 2016 at 11:06 AM, Josh Elser <josh.el...@gmail.com&

Data Replication

2016-10-13 Thread Yamini Joshi
this replication conf and the replication on HDFS level. What exactly is the use case for replication? Are the replicated instances visible to the clients? Best regards, Yamini Joshi

Re: Bulk import

2016-10-11 Thread Yamini Joshi
Alright. I'll keep that in mind. The next step for me will be to import data from 90G Bson files. I think that'll be a good start for bulk import. Best regards, Yamini Joshi On Tue, Oct 11, 2016 at 10:14 PM, Josh Elser <josh.el...@gmail.com> wrote: > Even 10G is a rather small amoun

Bulk import

2016-10-11 Thread Yamini Joshi
, Yamini Joshi

Re: Installing Accumulo in multinode setup

2016-10-10 Thread Yamini Joshi
Thanks everyone for help. It is working now. I had to edit some memory confs and do a clean install. Also, the /tracers znode is created after the acccumulo is started (i.e. start-all.sh) and not init. Best regards, Yamini Joshi On Fri, Oct 7, 2016 at 12:12 PM, Josh Elser <josh.el...@gmail.

Re: Indexing Column Values in Accumulo

2016-10-10 Thread Yamini Joshi
, Yamini Joshi On Mon, Oct 10, 2016 at 5:09 AM, vaibhav thapliyal < vaibhav.thapliyal...@gmail.com> wrote: > Creating an Inverted Index could serve your use case. You can store the > column family and column qualifier both in the row of the index table > separated by a delimiter. &

Re: Installing Accumulo in multinode setup

2016-10-07 Thread Yamini Joshi
I can see that in my local setup in my laptop. But, I can't see it here somehow. Idk what exactly is wrong. Best regards, Yamini Joshi On Fri, Oct 7, 2016 at 11:00 AM, Josh Elser <josh.el...@gmail.com> wrote: > It should be generated at /tracers when the Accumulo Tracer i

Re: Installing Accumulo in multinode setup

2016-10-07 Thread Yamini Joshi
I don't understand why the tracer node is not generated at all. Best regards, Yamini Joshi On Fri, Oct 7, 2016 at 10:19 AM, Yamini Joshi <yamini.1...@gmail.com> wrote: > So the file structure inside zookeeper(now after formatting zookeepers) is: > Accumulo > >- d61d7

Re: Installing Accumulo in multinode setup

2016-10-07 Thread Yamini Joshi
- fate - tservers - tables - replication - next_file - config - bulk_failed_copyq - dead - masters - instances - test test is the name of my new instance. Yes I reinitialized accumulo using /bin/accumulo init Best regards, Yamini Joshi On Fri

Re: Installing Accumulo in multinode setup

2016-10-07 Thread Yamini Joshi
) at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80) at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151) ... 6 more Best regards, Yamini Joshi On Fri, Oct 7, 2016 at 10:08 AM, Sean Busbey <bus...@cloudera.com> wrote: > tracers used to be under the in

Re: Installing Accumulo in multinode setup

2016-10-06 Thread Yamini Joshi
1.7.2 Best regards, Yamini Joshi On Thu, Oct 6, 2016 at 4:17 PM, Josh Elser <josh.el...@gmail.com> wrote: > Hrm, maybe I am looking at a newer version of Accumulo than what you're > using. What version are you on? > > Yamini Joshi wrote: > >> Thank you for rep

Installing Accumulo in multinode setup

2016-10-06 Thread Yamini Joshi
this up? I am attaching my config files here (Rest all the same generated as a result of bin_config file). Best regards, Yamini Joshi accumulo-env.sh Description: Bourne shell script instance.volumes hdfs://m4:9000/accumulo comma separated list of URIs for volumes. example

Re: Modify Keys within iterator

2016-09-30 Thread Yamini Joshi
Alright. Thanks :) Best regards, Yamini Joshi On Fri, Sep 30, 2016 at 1:10 PM, Brian Loss <bfl...@praxiseng.com> wrote: > That’s true for the row. For the other parts of the key, it can be done > under the right circumstances. > > On Sep 30, 2016, at 2:05 PM, Yamini Joshi <

Re: Modify Keys within iterator

2016-09-30 Thread Yamini Joshi
If I give it an empty range, it gives me the output of simple scan(without the iterator applied even though the iterator is working). I guess it's bad to modify keys within an iterator. Best regards, Yamini Joshi On Fri, Sep 30, 2016 at 12:51 PM, Dan Blum <db...@bbn.com> wrote: > Wha

Re: Modify Keys within iterator

2016-09-30 Thread Yamini Joshi
regards, Yamini Joshi On Fri, Sep 30, 2016 at 12:31 PM, Dan Blum <db...@bbn.com> wrote: > What code are you using to test the iterator, where you see no output? > > > > *From:* Yamini Joshi [mailto:yamini.1...@gmail.com] > *Sent:* Friday, September 30, 2016 1:26 PM > *To

Modify Keys within iterator

2016-09-30 Thread Yamini Joshi
ange, columnFamilies, inclusive); next(); } @Override public Key getTopKey() { return key; } @Override public Value getTopValue() { return value; } @Override public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) { return null; } } Best regards, Yamini Joshi

Indexing Column Values in Accumulo

2016-09-28 Thread Yamini Joshi
looking for an optimal solution (since filter might scan the entire database). Best regards, Yamini Joshi

Re: Accumulo Equivalent of Mongo Aggr Query

2016-09-26 Thread Yamini Joshi
o's built-in Combiner iterators > <https://accumulo.apache.org/1.8/accumulo_user_manual#_combiners>. They > seem more relevant than Filters. > > I don't know what you mean when you write that your output is not visible > to "the complete Database". > > Regards