jumped to that.
> User will see inconsistent page in that case as well, right? Also in such
> cases how would you design a user facing application (Cache previous pages
> at app level?)
>
> Regards,
> Bhuvan
>
> On Mon, May 9, 2016 at 4:18 PM, DuyHai Doan <doanduy...@gmail.com>
"Is it possible to just return PagingState object without returning data?"
--> No
Simply because before reading the actual data for each page of N rows, you
cannot know at which token value a page of data starts...
And it is worst than that, with paging you don't have any isolation. Let's
Hello all
I am pleased to announce the release of Achilles 4.2.0.
The biggest change is the support for type-safe function calls in the
SELECT DSL as well as UDF/UDA declaration in Achilles.
The generated DSL code enforces the type of each function call so that the
parameter types/return
Just tested against C* 3.4
CREATE TABLE IF NOT EXISTS test_table (
part timestamp,
clust timestamp,
count counter static,
PRIMARY KEY(part, clust));
and it just works.
"However, I'm not sure how that is possible, given that the updates to
partitionRowCountCol would require use
upendra.bara...@continuum.net
>
> w: continuum.net
>
> [image:
> http://cdn2.hubspot.net/hub/281750/file-393087232-png/img/logos/email-continuum-logo-151x26.png]
> <http://www.continuum.net/>
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
>
First Ok.ru support for acid transactions is using a complete FORK of
Apache Cassandra so people using it need to maintain the fork themselves.
Second, as far as I know, they don't intend to publish the source code so
it's not really available in the public space either. The reason is that
their
E
>
> Thanks
>
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
> Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Sa
It's better to post your question at the dedicated mailing list for
Achilles: https://groups.google.com/forum/?hl=fr#!forum/cassandra-achilles
To enable to help you further, please provide some DML logs at runtime:
orks in cqlsh (presuming max_int() is a UDF):
> UPDATE test_table SET data=max_int(3,4) WHERE idx='abc’;
>
> So, if the grammar is not supposed to allow this, then there is a bug
> somewhere because in 3.3 it certainly seems to be parsed and executed
> without complaint.
>
You have misread the CQL doc given in the link. According to CQL update
grammar it's not possible to use UDF. I see UDF only allowed in select
clause...
Le 10 mars 2016 22:07, "Kim Liu" a écrit :
> Hello -
> I am experimenting with User Defined Functions in Cassandra
Maybe tombstones ? Do you issue a lot of DELETE statements ? Or do you
re-insert in the same partition with different TTL values ?
On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge wrote:
> I don't think compression can be the cause of the difference, because of
> two
> It’s not possible, it’s a PerRowSecondary index, potentially as big as the
> table itself (few TBs) it will take a very long time to drop and re-create.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *S
DROP and re-create the index with the new options
On Fri, Mar 4, 2016 at 3:45 PM, Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:
> Hi,
>
>
>
> I’m using Cassandra 2.2.5 with a custom secondary index. It’s created with
> the below syntax:
>
>
You're right, it's a bug
I have created an issue here to fix it here:
https://github.com/doanduyhai/Achilles/issues/240
Fortunately you can use the query API right now to insert the static
columns:
PreparedStatement ps = INSERT INTO
BoundStatement bs = ps.bind(...)
Hello all
I am pleased to announce the release of Achilles 4.1.0.
The biggest change is the support for new Cassandra 3.x Materialized View
by annotation.
Achilles also enforces constraints on your views (all primary key columns
of the base table should be in the view etc..) at compile time
tity and update
> the change only though DynamicUpdate anotation.
> Or there is something I am missing here?.
>
> Thanks , reply will be highly appreciated
>
>
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-606
@Column("c1")
>>> @Static
>>> private String c1;
>>>
>>> @Column("c2")
>>> @Static
>>> private Boolean c2;
>>>
>>>
>>> @Column("c3")
>>> @ClusteringColumn(1)
>>> private String c3;
>>>
>>> @Column("c4")
&
For advanced object mapping you can look also at Achilles: www.achilles.io
On Wed, Feb 10, 2016 at 3:21 PM, Jim Ancona wrote:
> Recent versions of the Datastax Java Driver include an object mapping API
> that might work for you:
>
>
It's not possible have multiple keys in the CONTAINS KEY clause
Right now it is not possible to use UDF in WHERE clause, it may eventually
be possible one day
But you can use UDF in the Select clause to filter out data. In this case,
you'll need to wait for JIRA
Look at Achilles and how it models Partition key & clustering columns:
https://github.com/doanduyhai/Achilles/wiki/5-minutes-Tutorial#clustered-entities
On Tue, Feb 9, 2016 at 12:48 PM, Atul Saroha
wrote:
> I know the most popular ORM api
>
>1. Kundera :
>
>
I think you should direct your request to the java driver mailing list:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
To answer your question, no, there is no @Ttl annotation on the
driver-mapping module, even in the latest release:
a Non-Compact CF with static columns and collections ?
>
> Thanks
> Anuj
>
> --------
> On Tue, 2/2/16, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> Subject: Re: Moving Away from Compact Storage
> To: user@cassandra.apache.org
ave some dynamic text data which we are planning to add in
> collections..
>
> Please let me know if you need more details..
>
>
> Thanks
> Anuj
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 3 Feb, 201
.co.in> wrote:
> Will it be possible to read dynamic columns data from compact storage and
> trasform them as collection e.g. map in new table?
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 3 Feb
Why don't you use server-side paging feature instead of messing with tokens
?
http://datastax.github.io/java-driver/manual/paging/
On Wed, Feb 3, 2016 at 7:36 AM, Priyanka Gugale wrote:
> Hi,
>
> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
> a
Use Apache Spark to parallelize the data migration. Look at this piece of
code
https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
If your source and target tables have the SAME structure (except for the
COMPACT STORAGE clause),
You need to upgrade first to C* 2.2 before migrating to C* 3.x
For each version, read the NEWS.txt file and follow the procedure:
>From 2.1.x to 2.2.x :
https://github.com/apache/cassandra/blob/cassandra-2.2/NEWS.txt
>From 2.2.x to 3.x:
This data model should do the job
Create table Data (
text uuid;
text value1 static;
text value2 static;
...
text valueN static;
text mapKey;
Double mapValue;
primary key(key, mapKey);
);
Warning, value1... valueN being static, there will be a 1:1 relationship
between
You can read this: http://www.doanduyhai.com/blog/?p=1876 and this:
http://www.doanduyhai.com/blog/?p=2015
Long story short, UDF and UDA computation is Cassandra is not distributed.
All the values are retrieved first on the coordinator node (to apply the
last write win reconciliation logic)
Hello Otis
The Sematext tools, is it free or not ? And if not free, is there a
"limited" open-source version ?
On Tue, Jan 26, 2016 at 3:39 PM, Otis Gospodnetić <
otis.gospodne...@gmail.com> wrote:
> Hi,
>
> As Julien pointed out, there is a good OpsCenter alternative at
>
For the warning about system conf, look at the official production settings
recommendations:
http://docs.datastax.com/en/cassandra/3.x/cassandra/install/installRecommendSettings.html
On Sat, Jan 23, 2016 at 10:30 AM, Arthur Chan
wrote:
> HI,
>
> My version is 3.2.1,
A quick update on this issue.
Today, when playing with UDA, I had also the exception:
java.security.AccessControlException: access denied
("java.io.FilePermission" "/x/logback.xml" "read")"
What is definitely strange is that by re-executing again the query, same
query, it works. I
Interesting, maybe it worths filing a JIRA. Empty tables should not slow
down compaction of other tables
On Sat, Jan 16, 2016 at 10:33 AM, Shuo Chen wrote:
> Hi, Robert,
>
> I think I found the cause of the too many compactions. I used jmap to dump
> the heap and used
"As soon as inserting started, one node started non-stop full GC. The other
two nodes were totally fine"
Just a guest, how did you insert data ? Did you use Batch statements ?
On Sat, Jan 16, 2016 at 10:12 PM, Kai Wang wrote:
> Hi,
>
> Recently I saw some strange behavior on
Try
SELECT * FROM your_table WHERE solr_query='json:"*100 ABC Street*"';
Warning: since you're storing in JSON format, searching data inside a JSON
is equivalent to a wildcard seach *xxx* and it is quite expensive, even for
full text search engines like Solr
On Wed, Jan 13, 2016 at 2:50 PM,
What is your Cassandra version ? In earlier versions there was some issues
with streaming that can make the joining process stuck.
On Mon, Jan 11, 2016 at 6:57 AM, Carlos A wrote:
> Hello all,
>
> I have a small dev environment with 4 machines. One of them, I had it
>
There are 2 levels of consistency levels you can define on your query when
using Lightweight Transaction:
- one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed corresponds
to QUORUM/LOCAL_QUORUM but named differently so people do not get confused)
- one for the consistency of the
rom the base table as the MV
> PK. But still working fine.
>
>
> That is my first Cassandra use case and the guidance provided by you guys
> pretty important.
>
> Thanks very much for the answers, questions and suggestions.
>
>
> --
> IPVP
>
>
> From:
ion.user_contact (
>>> userid int,
>>> contactname text,
>>> contactid int,
>>> createdat timeuuid,
>>> favoriteat timestamp,
>>> isfavorite boolean,
>>> objectid timeuuid,
>>> PRIMARY KEY (userid, contactname)
>>> ) WITH C
Oh, sorry, did not notice the version in the title. Did you check the
system.log to verify if there isn't any Exception related to data streaming
? What is the output of "nodetool tpstats" ?
On Tue, Jan 12, 2016 at 1:00 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
> Wha
t contains multiple statements.
>
>
> On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Achilles 4.x does offer an embedded Cassandra server support with some
>> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>&g
> is_favourite IS TRUE
>> instead of
>> is_favourite IS NOT NULL?
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 January 2016 at 09:59, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>
Achilles 4.x does offer an embedded Cassandra server support with some
utility classes like ScriptExecutor. It supports C* 2.2 currently :
https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
Le 11 janv. 2016 20:47, "Richard L. Burton III" a
écrit :
>
Try this
CREATE TABLE communication.user_contact_list (
user_id uuid,
contact_id uuid,
contact_name text,
created_at timeuuid,
is_favorite boolean,
favorite_at timestamp,
PRIMARY KEY (user_id, contact_name, contact_id)
);
CREATE MATERIALIZED VIEW
Check protocol version when you create your Cluster object on the client
side
Le 30 déc. 2015 13:33, "ssiv...@gmail.com" a écrit :
> I've just tried to use cassandra-driver-core-3.0.0_rc1 and
> cassandra-driver-core-3.0.0_beta1 with C* 2.2.4 (cassandra-all-2.2.4). And
>
>
> The behavior is similar with Cassandra 3.0 as well: on the same set of
> days, the query sometimes succeeds, fails most times. Would trying the
> Datastax distribution offer any better chances?
>
> Thanks,
> Dinesh.
>
>
> On 12/24/2015 2:59 AM, DuyHai Doan wrote:
&g
reconcile over QUORUM replicas, the query may timeout very quickly.
On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
> On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Cassandra will perform a full table
For cross-cluster operation with the Spark/Cassandra connector, you can
look at this trick:
http://www.slideshare.net/doanduyhai/fast-track-to-getting-started-with-dse-max-ing/64
On Mon, Dec 21, 2015 at 1:14 PM, George Sigletos
wrote:
> Roughly half TB of data.
>
> There
You can have a look at academy.datastax.com. There are tons of videos,
courses and materials to learn Cassandra.
Le 19 déc. 2015 10:21, "Akhil Mehra" a écrit :
> What are some things you wish you knew when you started learning Apache
> Cassandra.
>
> What are some of the
Hello
There are 2 details that are important here:
1. The node has only 4Gb of RAM
2. However, the aggregation on all ~45 rows always fails, sometimes
immediately, sometimes after 30-60 seconds:
The consequence of point 1 is that the JVM Heap size is small: 1Gb
The formulae to compute max
The new UDF (User Defined Function) and UDA (User Defined Aggregate)
introduced since Cassandra 2.2 is the feature to closest HBase co-processor.
1. They are real time, in the sense that they are applied right away on the
fly after fetching data from C*
2. The computation is done on the
) = 1.45k
>
>
>
> Node 1 (DC2) = 2.06k (seeder)
>
> Node 2 (DC2) = 1.38k
>
> Node 3 (DC2) = 1.43k
>
>
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* 02 December 2015 14:22
> *To:* user@cassandra.apache.org
> *Subjec
Which Consistency level do you use for reads ? ONE ? Are you reading from
only DC1 or from both DC ?
What is the LoadBalancingStrategy you have configured for your driver ?
TokenAware wrapped on DCAwareRoundRobin ?
On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen
ixed or planned to be fixed?
> We are using C* 2.0.14.
>
> I didn't find any jira ticket concerning the issue.
> Regards,
>
>
> ------
> *From:* DuyHai Doan
> *Sent:* Wednesday, November 25, 2015 9:39:40 PM
>
> *To:* user@cassandra.apache.or
There was several bugs in the past related to list in CQL.
Indeed the timestamp used for list columns are computed server side using a
special algorithm. I wonder if in case of read-repair or/and
hinted-handoff, would the original timestamp (the timestamp generated by
the coordinator at the first
"Will the partition on PRIMARY KEY ((YEAR, MONTH, DAY, HOUR) cause any
hotspot issues on a node given the hourly data size is ~13MB ?"
13MB/partition is quite small, you should be fine. One thing to be careful
is the memtable flush frequency and appropriate compaction tuning to avoid
having one
Hello Prem
I believe it's better to ask your question on the ML of the Spark Cassandra
connector:
http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
Second "we need to join multiple table from multiple keyspaces. How can we
do that?", the response is given in your
""Get me the count of orders changed in a given sequence-id range"" --> Can
you give an example of SELECT statement for this query ?
Because given the table structure, you have to provide the shard-and-date
partition key and I don't see how you can know this value unless you create
as many SELECT
Hello all
I am pleased to announce the release of Achilles 4.0.0
After 6 months of incubation, the latest version is out. This new version
is a *complete rewrite* of the framework, using *compile-code code
generation* to provide *better type-safety*, fluent *DSL for query* as well
as latest
>From C* 2.2.x
> nodetool help toppartitions
NAME
nodetool toppartitions - Sample and print the most active
partitions for
a given column family
On Mon, Oct 26, 2015 at 7:54 AM, qihuang.zheng wrote:
> I use nodetool cfstats to see table’s
Cons:
- depending on the object mapper and the features, you may (or may not)
have some slight overhead at runtime
- the CQL query may be "hidden" from the developer, though some object
mappers like Achilles has an option to display DML statements in the logs
Pros:
- make your life easier by
Use Spark to distribute the job of copying data all over the cluster and
help accelerating the migration. The Spark connector does auto paging in
the background with the Java Driver
Le 22 oct. 2015 11:03, "qihuang.zheng" a
écrit :
> I tried using java driver with
For more info about Zeppelin, look at my recent presentation slides at
Apache Big Data Europe:
http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Zeppelin%20-%20The%20missing%20component%20for%20the%20BigData%20ecosystem.pdf
The most up-to-date branch to play with
zeppelin node. Is that true?
> We already have DSE cassandra cluster deployed in 2 DCs with spark enabled
> on all of the cassandra nodes (via DSE configuration).
> Can I install zeppelin on a different cluster and connect it to spark and
> cassandra on the remote cluster (vi
pic. Look at what is in
>>> acmqueue and you'll see what others are saying is accurate.
>>>
>>> To guarantee you need a distributed lock or a different design like
>>> datomic. Look at what rich hickey has done with datomic
>>>
>>>
>>
o you see such need for
> lock?
> On 8 Sep 2015 00:05, "DuyHai Doan" <doanduy...@gmail.com> wrote:
>
>> Multi partitions LWT is not supported currently on purpose. To support
>> it, we would have to emulate a distributed lock which is pretty bad for
>> perf
Multi partitions LWT is not supported currently on purpose. To support it,
we would have to emulate a distributed lock which is pretty bad for
performance.
On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
marekmlewandow...@gmail.com> wrote:
> Hello there,
>
> would you be interested in
Limitation, not bug. The reason ?
On disk, data are sorted by type first, and FOR EACH type value, the data
are sorted by id.
So to do an order by Id, C* will need to perform an in-memory re-ordering,
not sure how bad it is for performance. In any case currently it's not
possible, maybe you
It's normal, type is the FIRST clustering column so on disk, data are
sorted first by "type" naturally. C* does not have to perform any sorting
in memory.
And when you're using "order by type DESC", it's still not sorted in
memory, C* is just doing a backward-scan on disk starting from the
Step 10 is wrong
N3, N4 and N5 have accepted ballot from N2 which is higher than ballot from
N1 so N3, N4 and N5 should reject request at step 9)
The fact that there is an extra round trip for Read/Result at steps 5) to
8) does not matter and does not interfere with the correctness of the Paxos
The rationale of the last commit/ack phase is to set the chosen value (here
the mutation) in a durable storage (here into Cassandra) and reset this
value to allow another round of Paxos.
More explanation in this blog post:
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
Weird, you issue makes me remember of
https://issues.apache.org/jira/browse/CASSANDRA-8502 but it seems that it
has been fixed since 2.1.6 and you're using 2.1.8
Can you try to reproduce it using small page with Spark
(spark.cassandra.input.fetch.size_in_rows)
?
On Tue, Aug 18, 2015 at 11:50 AM,
what is the fundamental difference between the standard replication
protocol and Paxos that prevents us from implementing a 2-pc on top of the
standard protocol?
-- for a more detailed description of Paxos, look here:
You can read this paper that explains the Phi Accrual Failure detection
algo: http://vsedach.googlepages.com/HDY04.pdf
The phi value is the level of suspicion a server might be down.
phi_convict_threshold = suspicion level above which a server is marked down
On Mon, Aug 3, 2015 at 1:29 PM,
work with the
standard replication protocol, don't we also get transaction behavior?
On Mon, Aug 3, 2015 at 12:14 AM, DuyHai Doan doanduy...@gmail.com wrote:
what is the fundamental difference between the standard replication
protocol and Paxos that prevents us from implementing a 2-pc on top
Yes it is used by Gossip
On Mon, Aug 3, 2015 at 3:33 PM, Thouraya TH thouray...@gmail.com wrote:
Ok, i will see this reference. thank you so much for answer.
So i understand that this value is used by gossip protocol ? that's it ?
Best Regards.
2015-08-03 13:41 GMT+01:00 DuyHai Doan doanduy
Hello
I'm pleased to announce a Cassandra interpreter for Apache Zepplin. For
those who don't know, Apache Zeppelin[1] is a web-based notebook that
enables interactive data analytics. It is similar to IPython/Jupyter but is
JVM-based and its architecture is modular enough to allow various
,
thanks for your answer. What if I set RF 1 and the consistency level for
reads and writes to QUORUM? Would that isolate the single-partition batch
update from reads? (I do not consider node failures here between the write
and the read(s)).
On 19.05.15 07:50, DuyHai Doan wrote:
Hello Martin
Hello Kaushal
Humm your schema is using the ancient DynamicCompositeType. Can you just
send a describe of the table in cassandra-cli ?
On Tue, May 19, 2015 at 9:44 AM, Kaushal Shriyan kaushalshri...@gmail.com
wrote:
Hi,
I am running cassandra version 1.2.19 and cqlsh version 3.1.8 in my
Hello Martin
If, and only if you have RF=1, single partition mutations (including
batches) are isolated.
Otherwise, with RF1, even a simple UPDATE is not isolated because one
client can read the updated value on one replica and another client reads
the old value on another replica
On Mon, May
I think that you can still use cassandra-cli from 2.0.x to look into
internal table structure. Of course you will see bytes instead of
readable values but it's better than nothing. It's already the case for
CQL collections when you're trying to decode them using cassandra-cli
On Wed, May 13, 2015
auto promotion mode on
The problem of NULL insert is already solved long time ago with Insert
Strategy in Achilles:
https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
/auto promotion off
However, it's nice to see there will be a flag on the protocol side to
handle this problem
On Wed,
Short answer is no.
Whenever you access the session object of the Java driver directly (using
withSessionDo{...}), you bypass the data locality optimisation made by the
connector
On Sun, Apr 5, 2015 at 9:53 AM, Tiwari, Tarun tarun.tiw...@kronos.com
wrote:
Hi,
I am looking for, if the
I think the problem is with the IP address. Cassandra does listen on
192.168.56.30 and you agent log complains about not being able to connect
to 127.0.0.1. Worth investigating there
On Sun, Apr 5, 2015 at 3:47 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi, getting weird problem when
If you have SSD, you may afford switching to leveled compaction strategy,
which requires much less than 50% of the current dataset for free space
Le 5 avr. 2015 19:04, daemeon reiydelle daeme...@gmail.com a écrit :
You appear to have multiple java binaries in your path. That needs to be
Some ideas I throw in here:
The delay Y will be at least 1 minute, and at most 90 days with a
resolution per minute -- Use the delay (with format MMDDHHMM as
integer) as your partition key.
Example: today March 24th at 12:00 (201502241200) you need to delay 3
actions, action A in exact 3
Is BATCH the recommended way of updating all three tables at one go so
that the information between the three tables is consistent ?
If you're thinking about atomicity, no it's not atomic. Indeed with
logged batches, what you gain is automatic retry done for you by the
coordinator in case of
First idea to eliminate any issue with regards to staled data: issue the
same count query with RF=QUORUM and check whether there are still
inconsistencies
On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan m...@frensjan.nl wrote:
Hi Jens, Mikhail, Daemeon,
Thanks for your replies. Sorry for my
Is it to be expected that select count(*) from ... and select distinct
partition-key-columns from ... to yield inconsistent results between
executions even though the table at hand isn't written to?
Actually, depending on the definition of your primary key, select count(*)
and select distinct
Cassandra would take care of keeping the data synced between the two sets
of five nodes. Is that correct?
Correct
But doing so means that we need 2x as many nodes as we need for the
real-time cluster alone
Not necessarily. With multi DC you can configure the replication factor
value per DC,
I am trying to get the state as of a particular transaction_time
-- In that case you should probably define your primary key in another
order for clustering columns
PRIMARY KEY (weatherstation_id,transaction_time,event_time)
Then, select * from temperatures where weatherstation_id = 'foo' and
in production at my company)
--
*Colin Clark*
+1 612 859 6129
Skype colin.p.clark
On Feb 11, 2015, at 6:51 AM, DuyHai Doan doanduy...@gmail.com wrote:
The very nature of cassandra's distributed nature vs partitioning data
on hadoop makes spark on hdfs actually fasted than on cassandra
The very nature of cassandra's distributed nature vs partitioning data on
hadoop makes spark on hdfs actually fasted than on cassandra
Prove it. Did you ever have a look into the source code of the
Spark/Cassandra connector to see how data locality is achieved before
throwing out such
Look at the JIRA, filter by 3.0. But it's not very accurate. There are lot
of new features scheduled for 3.0. Some of them will make it on time for
3.0.0 like User Defined Functions I guess. Some other features will be
shipped with future 3 middle/minor versions.
On Wed, Feb 11, 2015 at 1:25 PM,
Start looking at the Spark/Cassandra connector here (in Scala):
https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector
Data locality is provided by this method:
Hint: using the Java driver, you can set the fetchSize to tell the driver
how many CQL rows to fetch for each page.
Depending on the size (in bytes) of each CQL row, it would be useful to
tune this fetchSize value to avoid loading too much data into memory for
each page
On Wed, Jan 28, 2015 at
Hello Morgan
The data model looks reasonable. Bucketing by day will help you to scale.
The only thing I can see is how to go back in time to fetch articles from
previous buckets (previous days). It is possible to have 0 article for a
country for a day ?
On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS
:33 GMT+01:00 DuyHai Doan doanduy...@gmail.com:
Hello Morgan
The data model looks reasonable. Bucketing by day will help you to
scale. The only thing I can see is how to go back in time to fetch articles
from previous buckets (previous days). It is possible to have 0 article for
a country
- If there is at most 1 article per day, and some 0, I will have do more
100+ queries to get all the posts, won't it be a little too much ?
2015-01-22 20:47 GMT+01:00 DuyHai Doan doanduy...@gmail.com:
well, if the current day bucket does not contain enough article, you may
need to search back
Nice, thanks Jens
On Wed, Jan 14, 2015 at 4:21 PM, Jens Rantil jens.ran...@tink.se wrote:
Hi all
I just recently put together a small script to count the number of
tombstones grouped by partition id, for one or multiple sstables:
https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50
I
201 - 300 of 553 matches
Mail list logo