2012/9/24 Hiller, Dean <dean.hil...@nrel.gov> > I am confused. In this email you say you want "get all requests for a > user" and in a previous one you said "Select all the users which has new > requests, since date D" so let me answer both… >
I have both needs. These are the two queries I need to perform on the model. > For latter, you make ONE query into the latest partition(ONE partition) of > the GlobalRequestsCF which gives you the most recent requests ALONG with > the user ids of those requests. If you queried all partitions, you would > most likely blow out your JVM memory. > > For the former, you make ONE query to the UserRequestsCF with userid = > <your user id> to get all the requests for that user > Now I think I got the main idea! This answered a lot! > Sorry, I was skipping some context. A lot of the backing indexing > sometimes is done as a long row so in playOrm, too many rows in a partition > means == too many columns in the indexing row for that partition. I > believe the same is true in cassandra for their indexing. > Oh, ok, you were talking about the wide row pattern, right? But playORM is compatible with Aaron's model, isn't it? Can I map exactly this using playORM? The hardest thing for me to use playORM now is I don't know Cassandra well yet, and I know playORM even less. Can I ask playOrm questions in this list? I will try to create a POC here! Only now I am starting to understand what it does ;-) The examples directory is empty for now, I would like to see how to set up the connection with it. > Cassandra spreads all your data out on all nodes with or without > partitions. A single partition does have it's data co-located though. > Now I see. The main advantage of using partitions is keeping the indexes small enough. It has nothing to do with the nodes. Thanks! > If you are at 100k(and the requests are rather small), you could embed all > the requests in the user or go with Aaron's below suggestion of a > UserRequestsCF. If your requests are rather large, you probably don't want > to embed them in the User. Either way, it's one query or one row key > lookup. > I see it now. > Multiget ignores partitions…you feed it a LIST of keys and it gets them. > It just so happens that partitionId had to be part of your row key. > Do you mean I need to load all the keys in memory to do a multiget? > I have used Hector and now use Astyanax, I don't worry much about that > layer, but I feed astyanax 3 nodes and I believe it discovers some of the > other ones. I believe the latter is true but am not 100% sure as I have > not looked at that code. > Why did you move? Hector is being considered for being the "official" client for Cassandra, isn't it? I looked at the Astyanax api and it seemed much more high level though > As an analogy on the above, if you happen to have used PlayOrm, you would > ONLY need one Requests table and you partition by user AND time(two views > into the same data partitioned two different ways) and you can do exactly > the same thing as Aaron's example. PlayOrm doesn't embed the partition ids > in the key leaving it free to partition twice like in your case….and in a > refactor, you have to map/reduce A LOT more rows because of rows having the > FK of <partitionid><subrowkey> whereas if you don't have partition id in > the key, you only map/reduce the partitioned table in a redesign/refactor. > That said, we will be adding support for CQL partitioning in addition to > PlayOrm partitioning even though it can be a little less flexible sometimes. > I am not sure I understood this part. If I need to refactor, having the partition id in the key would be a bad thing? What would be the alternative? In my case, as I use userId : partitionId as row key, this might be a problem, right? > Also, CQL locates all the data on one node for a partition. We have found > it can be faster "sometimes" with the parallelized disks that the > partitions are NOT all on one node so PlayOrm partitions are virtual only > and do not relate to where the rows are stored. An example on our 6 nodes > was a join query on a partition with 1,000,000 rows took 60ms (of course I > can't compare to CQL here since it doesn't do joins). It really depends > how much data is going to come back in the query though too? There are > tradeoff's between disk parallel nodes and having your data all on one node > of course. I guess I am still not ready for this level of info. :D In the playORM readme, we have the following: @NoSqlQuery(name="findWithJoinQuery", query="PARTITIONS t(:partId) SELECT t FROM TABLE as t "+ "INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares < :shares"), What would happen behind the scenes when I execute this query? You can only use joins with partition keys, right? In this case, is partId the row id of TABLE CF? Thanks a lot for the answers -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr