Re: Hector Problem Basic one

2011-10-12 Thread Wangpei (Peter)
I only saw this error message when all Cassandra nodes are down.
How you get the Cluster and how you set the hosts?

发件人: CASSANDRA learner [mailto:cassandralear...@gmail.com]
发送时间: 2011年10月12日 14:30
收件人: user@cassandra.apache.org
主题: Re: Hector Problem Basic one

Thanks for the reply ben.

Actually The problem is, I could not able to run a basic hector example from 
eclipse. Its throwing me.prettyprint.hector.api.
exceptions.HectorException: All host pools marked
 down. Retry burden pushed out to client

Can you please let me know why i am getting this

On Tue, Oct 11, 2011 at 3:54 PM, Ben Ashton 
b...@bossastudios.commailto:b...@bossastudios.com wrote:
Hey,

We had this one, even tho in the hector documentation it says that it
retry s failed servers even 30 by default, it doesn't.

Once we explicitly set it to X seconds, when ever there is a failure,
ie with network (AWS), it will retry and add it back into the pool.

Ben

On 11 October 2011 11:09, CASSANDRA learner 
cassandralear...@gmail.commailto:cassandralear...@gmail.com wrote:
 Hi Every One,

 Actually I was using cassandra long time back and when i tried today, I am
 getting a problem from eclipse. When i am trying to run a basic hector
 (java) example, I am getting an exception
 me.prettyprint.hector.api.exceptions.HectorException: All host pools marked
 down. Retry burden pushed out to client. . But My server is up. Node tool
 also whows that it is up. I donno what happens..

 1.)Is it any thing to do with JMX port.
 2.) What is the storage port in casandra.yaml and jmx port in
 cassandra-env.sh






Re: Indexes on heterogeneous rows

2011-04-15 Thread Wangpei (Peter)
Does the get_indexed_slice in 0.7.4 version already do thing that way?
It seems always take the 1st indexed column with EQ.
Or is it a new feature of coming 0.7.5 or 0.8?

-邮件原件-
发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
发送时间: 2011年4月15日 0:21
收件人: user@cassandra.apache.org
抄送: David Boxenhorn; aaron morton
主题: Re: Indexes on heterogeneous rows

This should work reasonably well w/ 0.7 indexes. Cassandra tracks
statistics on index selectivity, so it would plan that query as index
lookup on e=5, then iterate over those results and return only rows
that also have type=2.

On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn da...@taotown.com wrote:
 Thank you for your answer, and sorry about the sloppy terminology.

 I'm thinking of the scenario where there are a small number of results in
 the result set, but there are billions of rows in the first of your
 secondary indexes.

 That is, I want to do something like (not sure of the CQL syntax):

 select * where type=2 and e=5

 where there are billions of rows of type 2, but some manageable number of
 those rows have e=5.

 As I understand it, secondary indexes are like column families, where each
 value is a column. So the billions of rows where type=2 would go into a
 single row of the secondary index. This sounds like a problem to me, is it?

 I'm assuming that the billions of rows that don't have column e at all
 (those rows of other types) are not a problem at all...

 On Thu, Apr 14, 2011 at 12:12 PM, aaron morton aa...@thelastpickle.com
 wrote:

 Need to clear up some terminology here.
 Rows have a key and can be retrieved by key. This is *sort of* the primary
 index, but not primary in the normal RDBMS sense.
 Rows can have different columns and the column names are sorted and can be
 efficiently selected.
 There are secondary indexes in cassandra 0.7 based on column
 values http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
 So you could create secondary indexes on the a,e, and h columns and get
 rows that have specific values. There are some limitations to secondary
 indexes, read the linked article.
 Or you can make your own secondary indexes using row keys as the index
 values.
 If you have billions of rows, how many do you need to read back at once?
 Hope that helps
 Aaron

 On 14 Apr 2011, at 04:23, David Boxenhorn wrote:

 Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
 different sets of columns?

 For example, let's say you have three types of objects (1, 2, 3) which
 each had three members. If your rows had the following pattern

 type=1 a=? b=? c=?
 type=2 d=? e=? f=?
 type=3 g=? h=? i=?

 could you index type as your primary index, and also index a, e, h
 as secondary indexes, to get the objects of that type that you are looking
 for?

 Would it work if you had billions of rows of each type?






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: result of get_indexed_slices() seems wrong

2011-03-24 Thread Wangpei (Peter)
Thanks aaron.
Maybe we need to do more check at ThriftValidation.validateIndexClauses(), add 
this:
MapByteBuffer, ColumnDefinition colDefs = 
DatabaseDescriptor.getTableDefinition(keyspace).cfMetaData().get(columnFamily).getColumn_metadata();
for (IndexExpression expression : index_clause.expressions)
{
if (!colDefs.containsKey(expression.column_name))
throw new InvalidRequestException(No column definition for  + 
expression.column_name);
}


发件人: aaron morton [mailto:aa...@thelastpickle.com]
发送时间: 2011年3月24日 12:24
收件人: user@cassandra.apache.org
主题: Re: result of get_indexed_slices() seems wrong

Looks like this https://issues.apache.org/jira/browse/CASSANDRA-2347

From this discussion 
http://www.mail-archive.com/user@cassandra.apache.org/msg11291.html


Aaron

On 24 Mar 2011, at 17:17, Wangpei (Peter) wrote:


Hi,

This problem occurs when the clause has multi expression and a expression with 
operator other than EQ.
Is anyone meet the same problem?

I trace the code, and seen this at ColumnFamilyStore.satisfies() method:
int v = data.getComparator().compare(column.value(), 
expression.value);
It seems when I need the type of column value here, it use the type of my 
column names which is UTF8Type, so give the wrong result.
To fix it, the expression needs a optional “comparator_type” attribute, then 
satisfies() can get the correct type to compare.
pls point out if I am wrong.





result of get_indexed_slices() seems wrong

2011-03-23 Thread Wangpei (Peter)
Hi,

This problem occurs when the clause has multi expression and a expression with 
operator other than EQ.
Is anyone meet the same problem?

I trace the code, and seen this at ColumnFamilyStore.satisfies() method:
int v = data.getComparator().compare(column.value(), 
expression.value);
It seems when I need the type of column value here, it use the type of my 
column names which is UTF8Type, so give the wrong result.
To fix it, the expression needs a optional comparator_type attribute, then 
satisfies() can get the correct type to compare.
pls point out if I am wrong.




Re: understanding tombstones

2011-03-10 Thread Wangpei (Peter)
My question: 
what the client would get, when following happens:(RF=3, N=3)
1, write with timestamp T and succeed in all nodes.
2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but failed 
in node3.
3, force flush + compaction
4, read CL=Q

Does the client will get the row and read repair will fix the data?
If not, how cassandra prevent from this?

-邮件原件-
发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
发送时间: 2011年3月10日 10:19
收件人: user@cassandra.apache.org
主题: Re: understanding tombstones

On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang jw...@palantir.com wrote:
 insert row X with timestamp T
 delete row X with timestamp T+1
 force flush + compaction
 insert row X with timestamp T

 My understanding is that the tombstone created by the delete (and row X)
 will disappear with the flush + compaction which means the last insertion
 should show up.

Right.

 I believe I have traced this to the fact that the markedForDeleteAt field on
 the ColumnFamily does not get reset after a compaction (after
 gc_grace_seconds has passed); is this desirable? I think it introduces an
 inconsistency in how tombstoned columns work versus tombstoned CFs. Thanks.

That does sound like a bug.  Can you create a ticket?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: managing a limited-length list as a value

2011-02-19 Thread Wangpei (Peter)
Maybe you can try this: use MAX-time as your column name, then get the first 
limit columns.

-邮件原件-
发件人: Benson Margulies [mailto:bimargul...@gmail.com] 
发送时间: 2011年2月19日 2:11
收件人: user@cassandra.apache.org
主题: managing a limited-length list as a value

The following is derived from the redis list operations.

The data model is that a key maps to an list of items. The operation
is to push a new item into the front, and discard any items from the
end above a threshold number of items.

of course, this can be done by reading a value, fiddling with it, and
writing it back. I write this email to wonder if there's any native
trickery to avoid having to read the value, but rather permitting some
sort of 'push' operation.


Re: Partitioning

2011-02-16 Thread Wangpei (Peter)
I have same question.
I read the source code of NetworkTopologyStrategy, seems it always put replica 
on the first nodes on the ring of the DC.
If I am misunderstand, It seems those nodes will became hot spot.
Why NetworkTopologyStrategy works that way? is there some alternative can avoid 
this shortcoming?

Thanks in advance.

Peter

发件人: Aaron Morton [mailto:aa...@thelastpickle.com]
发送时间: 2011年2月16日 3:56
收件人: user@cassandra.apache.org
主题: Re: Partitioning

You can using the Network Topology Strategy see
http://wiki.apache.org/cassandra/Operations?highlight=(topology)|(network)#Network_topology

and NetworkTopologyStrategy in the  conf/cassandra.yaml file.

You can control the number of replicas to each DC.

Also look at conf/cassandra-topology.properties for information on how to tell 
cassandra about your network topology.

Aaron


On 16 Feb, 2011,at 05:10 AM, RWN s5a...@gmail.com wrote:

Hi,
I am new to Cassandra and am evaluating it.

Following diagram is how my setup will be: http://bit.ly/gJZlhw
Here each oval represents one data center. I want to keep N=4. i.e. four
copies of every Column Family. I want one copy in each data-center. In
other words, COMPLETE database must be contained in each of the data
centers.

Question:
1. Is this possible ? If so, how do I configure (partitioner, replica etc) ?

Thanks

AJ

P.S excuse my multiple posting of the same. I am unable to subscribe for
some reason.
--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Partitioning-tp6028132p6028132.html
Sent from the 
cassandra-u...@incubator.apache.orgmailto:cassandra-u...@incubator.apache.org 
mailing list archive at Nabble.comhttp://Nabble.com.


Re: time to live rows

2011-02-09 Thread Wangpei (Peter)
AFAIK 2nd index only works for operator EQ.

-邮件原件-
发件人: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] 
发送时间: 2011年2月9日 3:36
收件人: user@cassandra.apache.org
主题: Re: time to live rows

I'm thinking if this row expiry notion doesn't pan out then I might
create a 'lastAccessed' column with a secondary index (i think that's
right) on it. Then I can periodically run a query to find all
lastAccessed columns less than a certain value and manually delete
them. Sound reasonable?

-Kal


Re: Row Key Types

2011-02-09 Thread Wangpei (Peter)
Did you set compare_with attribute of your ColumnFamily to TimeUUIDType?

-邮件原件-
发件人: Bill Speirs [mailto:bill.spe...@gmail.com] 
发送时间: 2011年2月2日 0:47
收件人: Cassandra Usergroup
主题: Row Key Types

What is the type of a Row Key? Can you define how they are compared?

I ask because I'm using TimeUUIDs as my row keys, but when I make a
call to get a range of row keys (get_range in phpcassa) I have to
specify the UTF8 range of '' to '----'
instead of the TimeUUID range of
'----' to
'----'.

This works, but feels wrong/inefficient... thoughts?

Thanks...

Bill-


Re: Cassandra + Thrift on RedHat Enterprise 5

2011-01-30 Thread Wangpei (Peter)
Hector document:
http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf

发件人: Vedarth Kulkarni [mailto:vedar...@gmail.com]
发送时间: 2011年1月30日 14:03
收件人: user@cassandra.apache.org
主题: Re: Cassandra + Thrift on RedHat Enterprise 5

How can I use Hector please can you explain me  in detail ?
I am new to these things.

-
Vedarth Kulkarni,
TYBSc (Computer Science).


On Sun, Jan 30, 2011 at 11:20 AM, Andrey V. Panov 
panov.a...@gmail.commailto:panov.a...@gmail.com wrote:
Use Hector instead of pure Trift. https://github.com/rantav/hector/
And checkout wiki.



Re: Schema Design

2011-01-26 Thread Wangpei (Peter)
I am also working on a system store logs from hundreds system.
In my scenario, most query will like this: let's look at login logs (category 
EQ) of that proxy (host EQ) between this Monday and Wednesday(time range).
My data model like this:
. only 1 CF. that's enough for this scenario.
. group logs from each host and day to one row. Key format is 
hostname.category.date
. store each log entry as a super column, super olumn name is TimeUUID of the 
log. each attribute as a column.

Then this query can be done as 3 GET, no need to do key range scan.
Then I can use RP instead of OPP. If I use OPP, I have to worry about load 
balance myself. I hate that.
However, if I need to do a time range access, I can still use column slice.

An additional benefit is, I can clean old logs very easily. We only store logs 
in 1 year. Just deleting by keys can do this job well.

I think storing all logs for a host in a single row is not a good choice. 2 
reason:
1, too few keys, so your data will not distributing well.
2, data under a key will always increase. So Cassandra have to do more SSTable 
compaction.

-邮件原件-
发件人: William R Speirs [mailto:bill.spe...@gmail.com] 
发送时间: 2011年1月27日 9:15
收件人: user@cassandra.apache.org
主题: Re: Schema Design

It makes sense that the single row for a system (with a growing number of 
columns) will reside on a single machine.

With that in mind, here is my updated schema:

- A single column family for all the messages. The row keys will be the 
TimeUUID 
of the message with the following columns: date/time (in UTC POSIX), system 
name/id (with an index for fast/easy gets), the actual message payload.

- A column family for each system. The row keys will be UTC POSIX time with 1 
second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of 
any messages that were logged during that time bucket.

My only hesitation with this design is that buddhasystem warned that each 
column 
family, is allocated a piece of memory on the server. I'm not sure what the 
implications of this are and/or if this would be a problem if a I had a number 
of systems on the order of hundreds.

Thanks...

Bill-

On 01/26/2011 06:51 PM, Shu Zhang wrote:
 Each row can have a maximum of 2 billion columns, which a logging system will 
 probably hit eventually.

 More importantly, you'll only have 1 row per set of system logs. Every row is 
 stored on the same machine(s), which you means you'll definitely not be able 
 to distribute your load very well.
 
 From: Bill Speirs [bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 1:23 PM
 To: user@cassandra.apache.org
 Subject: Re: Schema Design

 I like this approach, but I have 2 questions:

 1) what is the implications of continually adding columns to a single
 row? I'm unsure how Cassandra is able to grow. I realize you can have
 a virtually infinite number of columns, but what are the implications
 of growing the number of columns over time?

 2) maybe it's just a restriction of the CLI, but how do I do issue a
 slice request? Also, what if start (or end) columns don't exist? I'm
 guessing it's smart enough to get the columns in that range.

 Thanks!

 Bill-

 On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
 dmcne...@agentisenergy.com  wrote:
 I would say in that case you might want  to try a  single column family
 where the key to the column is the system name.
 Then, you could name your columns as the timestamp.  Then when retrieving
 information from the data store you can can, in your slice request, specify
 your start column as  X and end  column as Y.
 Then you can use the stored column name to know when an event  occurred.

 On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com  wrote:

 I'm looking to use Cassandra to store log messages from various
 systems. A log message only has a message (UTF8Type) and a data/time.
 My thought is to create a column family for each system. The row key
 will be a TimeUUIDType. Each row will have 7 columns: year, month,
 day, hour, minute, second, and message. I then have indexes setup for
 each of the date/time columns.

 I was hoping this would allow me to answer queries like: What are all
 the log messages that were generated between X  Y? The problem is
 that I can ONLY use the equals operator on these column values. For
 example, I cannot issuing: get system_x where month  1; gives me this
 error: No indexed columns present in index clause with operator EQ.
 The equals operator works as expected though: get system_x where month
 = 1;

 What schema would allow me to get date ranges?

 Thanks in advance...

 Bill-

 * ColumnFamily description *
 ColumnFamily: system_x_msg
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 0.0/0
   Key cache size / save period: 20.0/3600
   Memtable thresholds: 1.1671875/249/60
   GC grace seconds: 864000
   Compaction min/max 

Re: Basic question on a write operation immediately followed by a read

2011-01-24 Thread Wangpei (Peter)
What is the ConsistencyLevel value? Is it ConsistencyLevel.ANY?

Javadoc:
* Write consistency levels make the following guarantees before reporting 
success to the client:
*   ANY  Ensure that the write has been written once somewhere, 
including possibly being hinted in a non-target node.
*   ONE  Ensure that the write has been written to at least 1 node's 
commit log and memory table
*   QUORUM   Ensure that the write has been written to ReplicationFactor 
/ 2 + 1 nodes
*   LOCAL_QUORUM Ensure that the write has been written to ReplicationFactor 
/ 2 + 1 nodes, within the local datacenter (requires NetworkTopologyStrategy)
*   EACH_QUORUM  Ensure that the write has been written to ReplicationFactor 
/ 2 + 1 nodes in each datacenter (requires NetworkTopologyStrategy)
*   ALL  Ensure that the write is written to 
codelt;ReplicationFactorgt;/code nodes before responding to the client.



发件人: Roshan Dawrani [mailto:roshandawr...@gmail.com]
发送时间: 2011年1月25日 10:57
收件人: user@cassandra.apache.org; hector-us...@googlegroups.com
主题: Basic question on a write operation immediately followed by a read

Hi,

I have a basic question - maybe silly too.

Say, I have a 1-node Cassandra setup (no replication, eventual consistency, 
etc) and I do an insert into a column family and then very close in time to the 
insert, I do a read on it for the same data.

Is there a possibility that my read operation may miss the data that just got 
inserted?

Since there are no DB transactions in Cassandra, are writes immediately seen to 
readers - even partially as they get written?

Or can there be a delay sometimes due to flusing-to-SSTables, etc?

Or, the writes are first in-memory and immediately visible to readers and 
flusing, etc is independent of all this and happens in background?
Thanks.

--
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawranihttp://twitter.com/roshandawrani
Skype: roshandawrani