Re: A question to 'paging' support in DataStax java driver

2016-05-09 Thread DuyHai Doan
jumped to that. > User will see inconsistent page in that case as well, right? Also in such > cases how would you design a user facing application (Cache previous pages > at app level?) > > Regards, > Bhuvan > > On Mon, May 9, 2016 at 4:18 PM, DuyHai Doan <doanduy...@gmail.com>

Re: A question to 'paging' support in DataStax java driver

2016-05-09 Thread DuyHai Doan
"Is it possible to just return PagingState object without returning data?" --> No Simply because before reading the actual data for each page of N rows, you cannot know at which token value a page of data starts... And it is worst than that, with paging you don't have any isolation. Let's

[Announcement] Achilles 4.2.0 releasd

2016-04-28 Thread DuyHai Doan
Hello all I am pleased to announce the release of Achilles 4.2.0. The biggest change is the support for type-safe function calls in the SELECT DSL as well as UDF/UDA declaration in Achilles. The generated DSL code enforces the type of each function call so that the parameter types/return

Re: Is this type of counter table definition valid?

2016-03-24 Thread DuyHai Doan
Just tested against C* 3.4 CREATE TABLE IF NOT EXISTS test_table ( part timestamp, clust timestamp, count counter static, PRIMARY KEY(part, clust)); and it just works. "However, I'm not sure how that is possible, given that the updates to partitionRowCountCol would require use

Re: Transaction Support in Cassandra

2016-03-22 Thread DuyHai Doan
upendra.bara...@continuum.net > > w: continuum.net > > [image: > http://cdn2.hubspot.net/hub/281750/file-393087232-png/img/logos/email-continuum-logo-151x26.png] > <http://www.continuum.net/> > > > > *From:* DuyHai Doan [mailto:doanduy...@gmail.com] >

Re: Transaction Support in Cassandra

2016-03-22 Thread DuyHai Doan
First Ok.ru support for acid transactions is using a complete FORK of Apache Cassandra so people using it need to maintain the fork themselves. Second, as far as I know, they don't intend to publish the source code so it's not really available in the public space either. The reason is that their

Re: Achilles not picking cassandra ConsistencyLevel

2016-03-12 Thread DuyHai Doan
E > > Thanks > > > - > Atul Saroha > *Sr. Software Engineer* > *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369 > Plot # 362, ASF Centre - Tower A, Udyog Vihar, > Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA > > On Sa

Re: Achilles not picking cassandra ConsistencyLevel

2016-03-12 Thread DuyHai Doan
It's better to post your question at the dedicated mailing list for Achilles: https://groups.google.com/forum/?hl=fr#!forum/cassandra-achilles To enable to help you further, please provide some DML logs at runtime:

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan
orks in cqlsh (presuming max_int() is a UDF): > UPDATE test_table SET data=max_int(3,4) WHERE idx='abc’; > > So, if the grammar is not supposed to allow this, then there is a bug > somewhere because in 3.3 it certainly seems to be parsed and executed > without complaint. >

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan
You have misread the CQL doc given in the link. According to CQL update grammar it's not possible to use UDF. I see UDF only allowed in select clause... Le 10 mars 2016 22:07, "Kim Liu" a écrit : > Hello - > I am experimenting with User Defined Functions in Cassandra

Re: Unexplainably large reported partition sizes

2016-03-05 Thread DuyHai Doan
Maybe tombstones ? Do you issue a lot of DELETE statements ? Or do you re-insert in the same partition with different TTL values ? On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge wrote: > I don't think compression can be the cause of the difference, because of > two

Re: Updating secondary index options

2016-03-04 Thread DuyHai Doan
> It’s not possible, it’s a PerRowSecondary index, potentially as big as the > table itself (few TBs) it will take a very long time to drop and re-create. > > > > *--* > > *Jacques-Henri Berthemet* > > > > *From:* DuyHai Doan [mailto:doanduy...@gmail.com] > *S

Re: Updating secondary index options

2016-03-04 Thread DuyHai Doan
DROP and re-create the index with the new options On Fri, Mar 4, 2016 at 3:45 PM, Jacques-Henri Berthemet < jacques-henri.berthe...@genesys.com> wrote: > Hi, > > > > I’m using Cassandra 2.2.5 with a custom secondary index. It’s created with > the below syntax: > >

Re: Not able to insert data through achilles.

2016-03-02 Thread DuyHai Doan
You're right, it's a bug I have created an issue here to fix it here: https://github.com/doanduyhai/Achilles/issues/240 Fortunately you can use the query API right now to insert the static columns: PreparedStatement ps = INSERT INTO BoundStatement bs = ps.bind(...)

[Announcement] Achilles 4.1.0 released

2016-02-22 Thread DuyHai Doan
Hello all I am pleased to announce the release of Achilles 4.1.0. The biggest change is the support for new Cassandra 3.x Materialized View by annotation. Achilles also enforces constraints on your views (all primary key columns of the base table should be in the view etc..) at compile time

Re: ORM layer for cassandra-java?

2016-02-12 Thread DuyHai Doan
tity and update > the change only though DynamicUpdate anotation. > Or there is something I am missing here?. > > Thanks , reply will be highly appreciated > > > > - > Atul Saroha > *Sr. Software Engineer* > *M*: +91 8447784271 *T*: +91 124-415-606

Re: ORM layer for cassandra-java?

2016-02-12 Thread DuyHai Doan
@Column("c1") >>> @Static >>> private String c1; >>> >>> @Column("c2") >>> @Static >>> private Boolean c2; >>> >>> >>> @Column("c3") >>> @ClusteringColumn(1) >>> private String c3; >>> >>> @Column("c4") &

Re: best ORM for cassandra

2016-02-10 Thread DuyHai Doan
For advanced object mapping you can look also at Achilles: www.achilles.io On Wed, Feb 10, 2016 at 3:21 PM, Jim Ancona wrote: > Recent versions of the Datastax Java Driver include an object mapping API > that might work for you: > >

Re: Select values from map with multiple key values in where clause

2016-02-10 Thread DuyHai Doan
It's not possible have multiple keys in the CONTAINS KEY clause Right now it is not possible to use UDF in WHERE clause, it may eventually be possible one day But you can use UDF in the Select clause to filter out data. In this case, you'll need to wait for JIRA

Re: ORM layer for cassandra-java?

2016-02-09 Thread DuyHai Doan
Look at Achilles and how it models Partition key & clustering columns: https://github.com/doanduyhai/Achilles/wiki/5-minutes-Tutorial#clustered-entities On Tue, Feb 9, 2016 at 12:48 PM, Atul Saroha wrote: > I know the most popular ORM api > >1. Kundera : > >

Re: Can we set TTL on individual fields (columns) using the Datastax java-driver

2016-02-08 Thread DuyHai Doan
I think you should direct your request to the java driver mailing list: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user To answer your question, no, there is no @Ttl annotation on the driver-mapping module, even in the latest release:

Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
a Non-Compact CF with static columns and collections ? > > Thanks > Anuj > > -------- > On Tue, 2/2/16, DuyHai Doan <doanduy...@gmail.com> wrote: > > Subject: Re: Moving Away from Compact Storage > To: user@cassandra.apache.org

Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
ave some dynamic text data which we are planning to add in > collections.. > > Please let me know if you need more details.. > > > Thanks > Anuj > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 3 Feb, 201

Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
.co.in> wrote: > Will it be possible to read dynamic columns data from compact storage and > trasform them as collection e.g. map in new table? > > > Thanks > Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 3 Feb

Re: Missing rows while scanning table using java driver

2016-02-02 Thread DuyHai Doan
Why don't you use server-side paging feature instead of messing with tokens ? http://datastax.github.io/java-driver/manual/paging/ On Wed, Feb 3, 2016 at 7:36 AM, Priyanka Gugale wrote: > Hi, > > I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan > a

Re: Moving Away from Compact Storage

2016-02-01 Thread DuyHai Doan
Use Apache Spark to parallelize the data migration. Look at this piece of code https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60 If your source and target tables have the SAME structure (except for the COMPACT STORAGE clause),

Re: Problem while migrating a single node cluster from 2.1 to 3.2

2016-01-30 Thread DuyHai Doan
You need to upgrade first to C* 2.2 before migrating to C* 3.x For each version, read the NEWS.txt file and follow the procedure: >From 2.1.x to 2.2.x : https://github.com/apache/cassandra/blob/cassandra-2.2/NEWS.txt >From 2.2.x to 3.x:

Re: Wide row in Cassandra

2016-01-28 Thread DuyHai Doan
This data model should do the job Create table Data ( text uuid; text value1 static; text value2 static; ... text valueN static; text mapKey; Double mapValue; primary key(key, mapKey); ); Warning, value1... valueN being static, there will be a 1:1 relationship between

Re: Are aggregate functions done in parallel?

2016-01-28 Thread DuyHai Doan
You can read this: http://www.doanduyhai.com/blog/?p=1876 and this: http://www.doanduyhai.com/blog/?p=2015 Long story short, UDF and UDA computation is Cassandra is not distributed. All the values are retrieved first on the coordinator node (to apply the last write win reconciliation logic)

Re: opscenter doesn't work with cassandra 3.0

2016-01-26 Thread DuyHai Doan
Hello Otis The Sematext tools, is it free or not ? And if not free, is there a "limited" open-source version ? On Tue, Jan 26, 2016 at 3:39 PM, Otis Gospodnetić < otis.gospodne...@gmail.com> wrote: > Hi, > > As Julien pointed out, there is a good OpsCenter alternative at >

Re: Cassandra start Warning

2016-01-23 Thread DuyHai Doan
For the warning about system conf, look at the official production settings recommendations: http://docs.datastax.com/en/cassandra/3.x/cassandra/install/installRecommendSettings.html On Sat, Jan 23, 2016 at 10:30 AM, Arthur Chan wrote: > HI, > > My version is 3.2.1,

Re: Cassandra 3.1 - Aggregation query failure

2016-01-18 Thread DuyHai Doan
A quick update on this issue. Today, when playing with UDA, I had also the exception: java.security.AccessControlException: access denied ("java.io.FilePermission" "/x/logback.xml" "read")" What is definitely strange is that by re-executing again the query, same query, it works. I

Re: Too many compactions, maybe keyspace system?

2016-01-16 Thread DuyHai Doan
Interesting, maybe it worths filing a JIRA. Empty tables should not slow down compaction of other tables On Sat, Jan 16, 2016 at 10:33 AM, Shuo Chen wrote: > Hi, Robert, > > I think I found the cause of the too many compactions. I used jmap to dump > the heap and used

Re: endless full gc on one node

2016-01-16 Thread DuyHai Doan
"As soon as inserting started, one node started non-stop full GC. The other two nodes were totally fine" Just a guest, how did you insert data ? Did you use Batch statements ? On Sat, Jan 16, 2016 at 10:12 PM, Kai Wang wrote: > Hi, > > Recently I saw some strange behavior on

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread DuyHai Doan
Try SELECT * FROM your_table WHERE solr_query='json:"*100 ABC Street*"'; Warning: since you're storing in JSON format, searching data inside a JSON is equivalent to a wildcard seach *xxx* and it is quite expensive, even for full text search engines like Solr On Wed, Jan 13, 2016 at 2:50 PM,

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
What is your Cassandra version ? In earlier versions there was some issues with streaming that can make the joining process stuck. On Mon, Jan 11, 2016 at 6:57 AM, Carlos A wrote: > Hello all, > > I have a small dev environment with 4 machines. One of them, I had it >

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-12 Thread DuyHai Doan
There are 2 levels of consistency levels you can define on your query when using Lightweight Transaction: - one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed corresponds to QUORUM/LOCAL_QUORUM but named differently so people do not get confused) - one for the consistency of the

Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
rom the base table as the MV > PK. But still working fine. > > > That is my first Cassandra use case and the guidance provided by you guys > pretty important. > > Thanks very much for the answers, questions and suggestions. > > > -- > IPVP > > > From:

Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
ion.user_contact ( >>> userid int, >>> contactname text, >>> contactid int, >>> createdat timeuuid, >>> favoriteat timestamp, >>> isfavorite boolean, >>> objectid timeuuid, >>> PRIMARY KEY (userid, contactname) >>> ) WITH C

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
Oh, sorry, did not notice the version in the title. Did you check the system.log to verify if there isn't any Exception related to data streaming ? What is the output of "nodetool tpstats" ? On Tue, Jan 12, 2016 at 1:00 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > Wha

Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-12 Thread DuyHai Doan
t contains multiple statements. > > > On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Achilles 4.x does offer an embedded Cassandra server support with some >> utility classes like ScriptExecutor. It supports C* 2.2 currently : >&g

Re: Modeling contact list, plain table or List

2016-01-11 Thread DuyHai Doan
> is_favourite IS TRUE >> instead of >> is_favourite IS NOT NULL? >> >> Carlos Alonso | Software Engineer | @calonso >> <https://twitter.com/calonso> >> >> On 10 January 2016 at 09:59, DuyHai Doan <doanduy...@gmail.com> wrote: >> >

Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-11 Thread DuyHai Doan
Achilles 4.x does offer an embedded Cassandra server support with some utility classes like ScriptExecutor. It supports C* 2.2 currently : https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server Le 11 janv. 2016 20:47, "Richard L. Burton III" a écrit : >

Re: Modeling contact list, plain table or List

2016-01-10 Thread DuyHai Doan
Try this CREATE TABLE communication.user_contact_list ( user_id uuid, contact_id uuid, contact_name text, created_at timeuuid, is_favorite boolean, favorite_at timestamp, PRIMARY KEY (user_id, contact_name, contact_id) ); CREATE MATERIALIZED VIEW

Re: Cassandra Java Driver

2015-12-30 Thread DuyHai Doan
Check protocol version when you create your Cluster object on the client side Le 30 déc. 2015 13:33, "ssiv...@gmail.com" a écrit : > I've just tried to use cassandra-driver-core-3.0.0_rc1 and > cassandra-driver-core-3.0.0_beta1 with C* 2.2.4 (cassandra-all-2.2.4). And >

Re: Cassandra 3.1 - Aggregation query failure

2015-12-24 Thread DuyHai Doan
> > The behavior is similar with Cassandra 3.0 as well: on the same set of > days, the query sometimes succeeds, fails most times. Would trying the > Datastax distribution offer any better chances? > > Thanks, > Dinesh. > > > On 12/24/2015 2:59 AM, DuyHai Doan wrote: &g

Re: Cassandra 3.1 - Aggregation query failure

2015-12-23 Thread DuyHai Doan
reconcile over QUORUM replicas, the query may timeout very quickly. On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Cassandra will perform a full table

Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread DuyHai Doan
For cross-cluster operation with the Spark/Cassandra connector, you can look at this trick: http://www.slideshare.net/doanduyhai/fast-track-to-getting-started-with-dse-max-ing/64 On Mon, Dec 21, 2015 at 1:14 PM, George Sigletos wrote: > Roughly half TB of data. > > There

Re: What are the best ways to learn Apache Cassandra

2015-12-19 Thread DuyHai Doan
You can have a look at academy.datastax.com. There are tons of videos, courses and materials to learn Cassandra. Le 19 déc. 2015 10:21, "Akhil Mehra" a écrit : > What are some things you wish you knew when you started learning Apache > Cassandra. > > What are some of the

Re: Cassandra 3.1 - Aggregation query failure

2015-12-18 Thread DuyHai Doan
Hello There are 2 details that are important here: 1. The node has only 4Gb of RAM 2. However, the aggregation on all ~45 rows always fails, sometimes immediately, sometimes after 30-60 seconds: The consequence of point 1 is that the JVM Heap size is small: 1Gb The formulae to compute max

Re: read time coprocessor?

2015-12-11 Thread DuyHai Doan
The new UDF (User Defined Function) and UDA (User Defined Aggregate) introduced since Cassandra 2.2 is the feature to closest HBase co-processor. 1. They are real time, in the sense that they are applied right away on the fly after fetching data from C* 2. The computation is done on the

Re: cassandra reads are unbalanced

2015-12-02 Thread DuyHai Doan
) = 1.45k > > > > Node 1 (DC2) = 2.06k (seeder) > > Node 2 (DC2) = 1.38k > > Node 3 (DC2) = 1.43k > > > > > > *From:* DuyHai Doan [mailto:doanduy...@gmail.com] > *Sent:* 02 December 2015 14:22 > *To:* user@cassandra.apache.org > *Subjec

Re: cassandra reads are unbalanced

2015-12-02 Thread DuyHai Doan
Which Consistency level do you use for reads ? ONE ? Are you reading from only DC1 or from both DC ? What is the LoadBalancingStrategy you have configured for your driver ? TokenAware wrapped on DCAwareRoundRobin ? On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen

Re: list data value multiplied x2 in multi-datacenter environment

2015-11-26 Thread DuyHai Doan
ixed or planned to be fixed? > We are using C* 2.0.14. > > I didn't find any jira ticket concerning the issue. > Regards, > > > ------ > *From:* DuyHai Doan > *Sent:* Wednesday, November 25, 2015 9:39:40 PM > > *To:* user@cassandra.apache.or

Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread DuyHai Doan
There was several bugs in the past related to list in CQL. Indeed the timestamp used for list columns are computed server side using a special algorithm. I wonder if in case of read-repair or/and hinted-handoff, would the original timestamp (the timestamp generated by the coordinator at the first

Re: Hotspots on Time Series based Model

2015-11-17 Thread DuyHai Doan
"Will the partition on PRIMARY KEY ((YEAR, MONTH, DAY, HOUR) cause any hotspot issues on a node given the hourly data size is ~13MB ?" 13MB/partition is quite small, you should be fine. One thing to be careful is the memtable flush frequency and appropriate compaction tuning to avoid having one

Re: Spark on cassandra

2015-11-12 Thread DuyHai Doan
Hello Prem I believe it's better to ask your question on the ML of the Spark Cassandra connector: http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user Second "we need to join multiple table from multiple keyspaces. How can we do that?", the response is given in your

Re: Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread DuyHai Doan
""Get me the count of orders changed in a given sequence-id range"" --> Can you give an example of SELECT statement for this query ? Because given the table structure, you have to provide the shard-and-date partition key and I don't see how you can know this value unless you create as many SELECT

[Announcement]: Achilles 4 released

2015-10-30 Thread DuyHai Doan
Hello all I am pleased to announce the release of Achilles 4.0.0 After 6 months of incubation, the latest version is out. This new version is a *complete rewrite* of the framework, using *compile-code code generation* to provide *better type-safety*, fluent *DSL for query* as well as latest

Re: Find partition row of Compacted partition maximum bytes

2015-10-26 Thread DuyHai Doan
>From C* 2.2.x > nodetool help toppartitions NAME nodetool toppartitions - Sample and print the most active partitions for a given column family On Mon, Oct 26, 2015 at 7:54 AM, qihuang.zheng wrote: > I use nodetool cfstats to see table’s

Re: Object Mapping VS Direct Queries

2015-10-22 Thread DuyHai Doan
Cons: - depending on the object mapper and the features, you may (or may not) have some slight overhead at runtime - the CQL query may be "hidden" from the developer, though some object mappers like Achilles has an option to display DML statements in the logs Pros: - make your life easier by

Re: C* Table Changed and Data Migration with new primary key

2015-10-22 Thread DuyHai Doan
Use Spark to distribute the job of copying data all over the cluster and help accelerating the migration. The Spark connector does auto paging in the background with the Java Driver Le 22 oct. 2015 11:03, "qihuang.zheng" a écrit : > I tried using java driver with

Re: Data visualization tools for Cassandra

2015-10-20 Thread DuyHai Doan
For more info about Zeppelin, look at my recent presentation slides at Apache Big Data Europe: http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Zeppelin%20-%20The%20missing%20component%20for%20the%20BigData%20ecosystem.pdf The most up-to-date branch to play with

Re: Data visualization tools for Cassandra

2015-10-20 Thread DuyHai Doan
zeppelin node. Is that true? > We already have DSE cassandra cluster deployed in 2 DCs with spark enabled > on all of the cassandra nodes (via DSE configuration). > Can I install zeppelin on a different cluster and connect it to spark and > cassandra on the remote cluster (vi

Re: Some love for multi-partition LWT?

2015-09-08 Thread DuyHai Doan
pic. Look at what is in >>> acmqueue and you'll see what others are saying is accurate. >>> >>> To guarantee you need a distributed lock or a different design like >>> datomic. Look at what rich hickey has done with datomic >>> >>> >>

Re: Some love for multi-partition LWT?

2015-09-08 Thread DuyHai Doan
o you see such need for > lock? > On 8 Sep 2015 00:05, "DuyHai Doan" <doanduy...@gmail.com> wrote: > >> Multi partitions LWT is not supported currently on purpose. To support >> it, we would have to emulate a distributed lock which is pretty bad for >> perf

Re: Some love for multi-partition LWT?

2015-09-07 Thread DuyHai Doan
Multi partitions LWT is not supported currently on purpose. To support it, we would have to emulate a distributed lock which is pretty bad for performance. On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski < marekmlewandow...@gmail.com> wrote: > Hello there, > > would you be interested in

Re: Order By limitation or bug?

2015-09-03 Thread DuyHai Doan
Limitation, not bug. The reason ? On disk, data are sorted by type first, and FOR EACH type value, the data are sorted by id. So to do an order by Id, C* will need to perform an in-memory re-ordering, not sure how bad it is for performance. In any case currently it's not possible, maybe you

Re: Order By limitation or bug?

2015-09-03 Thread DuyHai Doan
It's normal, type is the FIRST clustering column so on disk, data are sorted first by "type" naturally. C* does not have to perform any sorting in memory. And when you're using "order by type DESC", it's still not sorted in memory, C* is just doing a backward-scan on disk starting from the

Re: lightweight transactions with potential problem?

2015-08-27 Thread DuyHai Doan
Step 10 is wrong N3, N4 and N5 have accepted ballot from N2 which is higher than ballot from N1 so N3, N4 and N5 should reject request at step 9) The fact that there is an extra round trip for Read/Result at steps 5) to 8) does not matter and does not interfere with the correctness of the Paxos

Re: lightweight transactions with potential problem?

2015-08-25 Thread DuyHai Doan
The rationale of the last commit/ack phase is to set the chosen value (here the mutation) in a durable storage (here into Cassandra) and reset this value to allow another round of Paxos. More explanation in this blog post: http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

Re: Null pointer exception after delete in a table with statics

2015-08-18 Thread DuyHai Doan
Weird, you issue makes me remember of https://issues.apache.org/jira/browse/CASSANDRA-8502 but it seems that it has been fixed since 2.1.6 and you're using 2.1.8 Can you try to reproduce it using small page with Spark (spark.cassandra.input.fetch.size_in_rows) ? On Tue, Aug 18, 2015 at 11:50 AM,

Re: linearizable consistency / Paxos ?

2015-08-03 Thread DuyHai Doan
what is the fundamental difference between the standard replication protocol and Paxos that prevents us from implementing a 2-pc on top of the standard protocol? -- for a more detailed description of Paxos, look here:

Re: phi_convict_threshold

2015-08-03 Thread DuyHai Doan
You can read this paper that explains the Phi Accrual Failure detection algo: http://vsedach.googlepages.com/HDY04.pdf The phi value is the level of suspicion a server might be down. phi_convict_threshold = suspicion level above which a server is marked down On Mon, Aug 3, 2015 at 1:29 PM,

Re: linearizable consistency / Paxos ?

2015-08-03 Thread DuyHai Doan
work with the standard replication protocol, don't we also get transaction behavior? On Mon, Aug 3, 2015 at 12:14 AM, DuyHai Doan doanduy...@gmail.com wrote: what is the fundamental difference between the standard replication protocol and Paxos that prevents us from implementing a 2-pc on top

Re: phi_convict_threshold

2015-08-03 Thread DuyHai Doan
Yes it is used by Gossip On Mon, Aug 3, 2015 at 3:33 PM, Thouraya TH thouray...@gmail.com wrote: Ok, i will see this reference. thank you so much for answer. So i understand that this value is used by gossip protocol ? that's it ? Best Regards. 2015-08-03 13:41 GMT+01:00 DuyHai Doan doanduy

[Ann] Cassandra Interpreter for Zeppelin

2015-07-22 Thread DuyHai Doan
Hello I'm pleased to announce a Cassandra interpreter for Apache Zepplin. For those who don't know, Apache Zeppelin[1] is a web-based notebook that enables interactive data analytics. It is similar to IPython/Jupyter but is JVM-based and its architecture is modular enough to allow various

Re: Batch isolation within a single partition

2015-05-19 Thread DuyHai Doan
, thanks for your answer. What if I set RF 1 and the consistency level for reads and writes to QUORUM? Would that isolate the single-partition batch update from reads? (I do not consider node failures here between the write and the read(s)). On 19.05.15 07:50, DuyHai Doan wrote: Hello Martin

Re: cqlsh ValueError: Don't know how to parse type string

2015-05-19 Thread DuyHai Doan
Hello Kaushal Humm your schema is using the ancient DynamicCompositeType. Can you just send a describe of the table in cassandra-cli ? On Tue, May 19, 2015 at 9:44 AM, Kaushal Shriyan kaushalshri...@gmail.com wrote: Hi, I am running cassandra version 1.2.19 and cqlsh version 3.1.8 in my

Re: Batch isolation within a single partition

2015-05-18 Thread DuyHai Doan
Hello Martin If, and only if you have RF=1, single partition mutations (including batches) are isolated. Otherwise, with RF1, even a simple UPDATE is not isolated because one client can read the updated value on one replica and another client reads the old value on another replica On Mon, May

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread DuyHai Doan
I think that you can still use cassandra-cli from 2.0.x to look into internal table structure. Of course you will see bytes instead of readable values but it's better than nothing. It's already the case for CQL collections when you're trying to decode them using cassandra-cli On Wed, May 13, 2015

Re: Inserting null values

2015-04-29 Thread DuyHai Doan
auto promotion mode on The problem of NULL insert is already solved long time ago with Insert Strategy in Achilles: https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy /auto promotion off However, it's nice to see there will be a flag on the protocol side to handle this problem On Wed,

Re: Can cqlsh COPY command be run through

2015-04-07 Thread DuyHai Doan
Short answer is no. Whenever you access the session object of the Java driver directly (using withSessionDo{...}), you bypass the data locality optimisation made by the connector On Sun, Apr 5, 2015 at 9:53 AM, Tiwari, Tarun tarun.tiw...@kronos.com wrote: Hi, I am looking for, if the

Re: DSE 4.6 with OpsCenter 5.1.1, agent can't start, port 9042 is occupied by DSE

2015-04-07 Thread DuyHai Doan
I think the problem is with the IP address. Cassandra does listen on 192.168.56.30 and you agent log complains about not being able to connect to 127.0.0.1. Worth investigating there On Sun, Apr 5, 2015 at 3:47 PM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, getting weird problem when

Re: How much disk is needed to compact Leveled compaction?

2015-04-06 Thread DuyHai Doan
If you have SSD, you may afford switching to leveled compaction strategy, which requires much less than 50% of the current dataset for free space Le 5 avr. 2015 19:04, daemeon reiydelle daeme...@gmail.com a écrit : You appear to have multiple java binaries in your path. That needs to be

Re: Delayed events processing / queue (anti-)pattern

2015-03-24 Thread DuyHai Doan
Some ideas I throw in here: The delay Y will be at least 1 minute, and at most 90 days with a resolution per minute -- Use the delay (with format MMDDHHMM as integer) as your partition key. Example: today March 24th at 12:00 (201502241200) you need to delay 3 actions, action A in exact 3

Re: Writing to multiple tables

2015-03-16 Thread DuyHai Doan
Is BATCH the recommended way of updating all three tables at one go so that the information between the three tables is consistent ? If you're thinking about atomicity, no it's not atomic. Indeed with logged batches, what you gain is automatic retry done for you by the coordinator in case of

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-12 Thread DuyHai Doan
First idea to eliminate any issue with regards to staled data: issue the same count query with RF=QUORUM and check whether there are still inconsistencies On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan m...@frensjan.nl wrote: Hi Jens, Mikhail, Daemeon, Thanks for your replies. Sorry for my

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread DuyHai Doan
Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? Actually, depending on the definition of your primary key, select count(*) and select distinct

Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-20 Thread DuyHai Doan
Cassandra would take care of keeping the data synced between the two sets of five nodes. Is that correct? Correct But doing so means that we need 2x as many nodes as we need for the real-time cluster alone Not necessarily. With multi DC you can configure the replication factor value per DC,

Re: Storing bi-temporal data in Cassandra

2015-02-14 Thread DuyHai Doan
I am trying to get the state as of a particular transaction_time -- In that case you should probably define your primary key in another order for clustering columns PRIMARY KEY (weatherstation_id,transaction_time,event_time) Then, select * from temperatures where weatherstation_id = 'foo' and

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
in production at my company) -- *Colin Clark* +1 612 859 6129 Skype colin.p.clark On Feb 11, 2015, at 6:51 AM, DuyHai Doan doanduy...@gmail.com wrote: The very nature of cassandra's distributed nature vs partitioning data on hadoop makes spark on hdfs actually fasted than on cassandra

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
The very nature of cassandra's distributed nature vs partitioning data on hadoop makes spark on hdfs actually fasted than on cassandra Prove it. Did you ever have a look into the source code of the Spark/Cassandra connector to see how data locality is achieved before throwing out such

Re: road map for Cassandra 3.0

2015-02-11 Thread DuyHai Doan
Look at the JIRA, filter by 3.0. But it's not very accurate. There are lot of new features scheduled for 3.0. Some of them will make it on time for 3.0.0 like User Defined Functions I guess. Some other features will be shipped with future 3 middle/minor versions. On Wed, Feb 11, 2015 at 1:25 PM,

Re: best supported spark connector for Cassandra

2015-02-11 Thread DuyHai Doan
Start looking at the Spark/Cassandra connector here (in Scala): https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector Data locality is provided by this method:

Re: full-tabe scan - extracting all data from C*

2015-01-28 Thread DuyHai Doan
Hint: using the Java driver, you can set the fetchSize to tell the driver how many CQL rows to fetch for each page. Depending on the size (in bytes) of each CQL row, it would be useful to tune this fetchSize value to avoid loading too much data into memory for each page On Wed, Jan 28, 2015 at

Re: Cassandra row ordering best practice Modeling

2015-01-22 Thread DuyHai Doan
Hello Morgan The data model looks reasonable. Bucketing by day will help you to scale. The only thing I can see is how to go back in time to fetch articles from previous buckets (previous days). It is possible to have 0 article for a country for a day ? On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS

Re: Cassandra row ordering best practice Modeling

2015-01-22 Thread DuyHai Doan
:33 GMT+01:00 DuyHai Doan doanduy...@gmail.com: Hello Morgan The data model looks reasonable. Bucketing by day will help you to scale. The only thing I can see is how to go back in time to fetch articles from previous buckets (previous days). It is possible to have 0 article for a country

Re: Cassandra row ordering best practice Modeling

2015-01-22 Thread DuyHai Doan
- If there is at most 1 article per day, and some 0, I will have do more 100+ queries to get all the posts, won't it be a little too much ? 2015-01-22 20:47 GMT+01:00 DuyHai Doan doanduy...@gmail.com: well, if the current day bucket does not contain enough article, you may need to search back

Re: Script to count tombstones by partition key

2015-01-14 Thread DuyHai Doan
Nice, thanks Jens On Wed, Jan 14, 2015 at 4:21 PM, Jens Rantil jens.ran...@tink.se wrote: Hi all I just recently put together a small script to count the number of tombstones grouped by partition id, for one or multiple sstables: https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50 I

<    1   2   3   4   5   6   >