subject:"Text or...."

Re: CQL data type compatibility between ascii and text

2018-08-10 Thread thiranjith

Thanks Yoshi! That explains it a lot :)  On Fri, 10 Aug 2018 18:30:25 +1000 
Y K  wrote  Hi Thira, First, there's the 3.0 branch of 
versions and the 3.x branch of versions. 
http://cassandra.apache.org/doc/latest/development/patches.html#choosing-the-right-branches-to-work-on
 The 3.0.16 belongs to the 3.0 branch. The 3.9 and 3.11.2 belong to the 3.x. 
branch I believe the change was made by this: Remove alter type support 
https://issues.apache.org/jira/browse/CASSANDRA-12443 where it was "Fixed" in 
versions 3.0.11 in the 3.0 branch and in version 3.10 in 3.x branch. So 3.0.16 
has the fix, 3.9 doesn't have it, but 3.11.2 has it. Best regards, Yoshi 
2018年8月10日(金) 17:10 thiranjith : Hi, According to 
documentation at 
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility
 we should not be able to change the column type from ascii to text. I have had 
a mix experience with conversion between data types on different versions of 
Cassandra. For example, given the following table definition: CREATE TABLE 
changelog (     sequence int,     description ascii,     createdby ascii,     
executedon timestamp,     PRIMARY KEY (sequence, description) ) Attempting 
change the data type for column 'createdby' with following CQL       alter 
table changelog alter createdby TYPE text; gives the behaviour outlined below 
depending on the version of Cassandra: With [cqlsh 5.0.1 | Cassandra 3.0.16 | 
CQL spec 3.4.0 | Native protocol v4] InvalidRequest: Error from server: 
code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, 
per documentation) With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native 
protocol v4] Query succeeds and change the column type to 'text' (as verified 
by running describe changelog and also inserting data with non-ascii chars into 
the column) With Cassandra 3.11.2 InvalidRequest: Error from server: code=2200 
[Invalid query] message="Altering of types is not allowed" (Expected, per 
documentation) Can anyone please explain why it works on 3.9 and not on others? 
Thanks! Thira

Re: CQL data type compatibility between ascii and text

2018-08-10 Thread Y K

Hi Thira,

First, there's the 3.0 branch of versions and the 3.x branch of versions.
http://cassandra.apache.org/doc/latest/development/patches.html#choosing-the-right-branches-to-work-on

The 3.0.16 belongs to the 3.0 branch.
The 3.9 and 3.11.2 belong to the 3.x. branch

I believe the change was made by this:
Remove alter type support
https://issues.apache.org/jira/browse/CASSANDRA-12443
where it was "Fixed" in versions 3.0.11 in the 3.0 branch and in version
3.10 in 3.x branch.
So 3.0.16 has the fix, 3.9 doesn't have it, but 3.11.2 has it.

Best regards,
Yoshi


2018年8月10日(金) 17:10 thiranjith :

> Hi,
>
> According to documentation at
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility
>  we
> should not be able to change the column type from ascii to text.
>
> I have had a mix experience with conversion between data types on
> different versions of Cassandra.
>
> For example, given the following table definition:
>
>
> *CREATE TABLE changelog (*
>
> *sequence int, *
>
> *description ascii,*
>
> *createdby ascii,*
>
> *executedon timestamp,*
>
> *PRIMARY KEY (sequence, description)*
> *)*
>
> Attempting change the data type for column 'createdby' with following CQL
>   *alter table changelog alter createdby TYPE text;*
>
> gives the behaviour outlined below depending on the version of Cassandra:
>
>
>- With [cqlsh 5.0.1 | Cassandra 3.0.16 | CQL spec 3.4.0 | Native
>protocol v4]
>- InvalidRequest: Error from server: code=2200 [Invalid query]
>   message="Altering of types is not allowed" (Expected, per documentation)
>   - With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native
>protocol v4]
>- Query succeeds and change the column type to 'text' (as verified by
>   running describe changelog and also inserting data with non-ascii chars
>   into the column)
>   - With Cassandra 3.11.2
>- InvalidRequest: Error from server: code=2200 [Invalid query]
>   message="Altering of types is not allowed" (Expected, per documentation)
>
> Can anyone please explain why it works on 3.9 and not on others?
>
> Thanks!
> Thira
>
>
>

CQL data type compatibility between ascii and text

2018-08-10 Thread thiranjith

Hi, According to documentation at 
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility
 we should not be able to change the column type from ascii to text. I have had 
a mix experience with conversion between data types on different versions of 
Cassandra. For example, given the following table definition: CREATE TABLE 
changelog (     sequence int,     description ascii,     createdby ascii,     
executedon timestamp,     PRIMARY KEY (sequence, description) ) Attempting 
change the data type for column 'createdby' with following CQL       alter 
table changelog alter createdby TYPE text; gives the behaviour outlined below 
depending on the version of Cassandra: With [cqlsh 5.0.1 | Cassandra 3.0.16 | 
CQL spec 3.4.0 | Native protocol v4] InvalidRequest: Error from server: 
code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, 
per documentation) With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native 
protocol v4] Query succeeds and change the column type to 'text' (as verified 
by running describe changelog and also inserting data with non-ascii chars into 
the column) With Cassandra 3.11.2 InvalidRequest: Error from server: code=2200 
[Invalid query] message="Altering of types is not allowed" (Expected, per 
documentation) Can anyone please explain why it works on 3.9 and not on others? 
Thanks! Thira

Re: [EXTERNAL] full text search on some text columns

2018-08-01 Thread Hannu Kröger

Does someone know if you can do online upgrade of elassandra? With lucene 
plugin you cannot really because you need to drop and recreate indexes if 
lucene has been updated. 

Hannu

> Octavian Rinciog  kirjoitti 1.8.2018 kello 12.49:
> 
> Hello!
> 
> Maybe this will work? https://github.com/strapdata/elassandra (I haven't 
> tested this plugin)
> 
> 2018-08-01 12:17 GMT+03:00 Hannu Kröger :
>> 3.11.1 plugin works with 3.11.2. But yes, original maintainer is not 
>> maintaining the project anymore. At least not actively. 
>> 
>> Hannu
>> 
>>> Ben Slater  kirjoitti 1.8.2018 kello 7.16:
>>> 
>>> We (Instaclustr) will be submitting a PR for 3.11.3 support for 
>>> cassandra-lucene-index once 3.11.3 is officially released as we offer it as 
>>> part of our service and have customers using it.
>>> 
>>> Cheers
>>> Ben
>>> 
>>>> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester  
>>>> wrote:
>>>> It seems to be an interesting project but sort of abandoned. No update in 
>>>> last 8 Months and not supporting Cassandra 3.11.2  (the version i 
>>>> currently use)
>>>> 
>>>> Sent using Zoho Mail
>>>> 
>>>> 
>>>> 
>>>>  Forwarded message 
>>>> From : Andrzej Śliwiński 
>>>> To : 
>>>> Date : Wed, 01 Aug 2018 08:16:06 +0430
>>>> Subject : Re: [EXTERNAL] full text search on some text columns
>>>>  Forwarded message 
>>>> 
>>>> Maybe this plugin could do the job: 
>>>> https://github.com/Stratio/cassandra-lucene-index
>>>> 
>>>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester  
>>>> wrote:
>>>> 
>>>> 
>>> -- 
>>> Ben Slater
>>> Chief Product Officer
>>> 
>>> 
>>> Read our latest technical blog posts here.
>>> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
>>> and Instaclustr Inc (USA).
>>> This email and any attachments may contain confidential and legally 
>>> privileged information.  If you are not the intended recipient, do not copy 
>>> or disclose its content, but please reply to this email immediately and 
>>> highlight the error to the sender and then immediately delete the message.
> 
> 
> 
> -- 
> Octavian Rinciog

Re: [EXTERNAL] full text search on some text columns

2018-08-01 Thread Octavian Rinciog

Hello!

Maybe this will work? https://github.com/strapdata/elassandra (I haven't
tested this plugin)

2018-08-01 12:17 GMT+03:00 Hannu Kröger :

> 3.11.1 plugin works with 3.11.2. But yes, original maintainer is not
> maintaining the project anymore. At least not actively.
>
> Hannu
>
> Ben Slater  kirjoitti 1.8.2018 kello 7.16:
>
> We (Instaclustr) will be submitting a PR for 3.11.3 support for
> cassandra-lucene-index once 3.11.3 is officially released as we offer it as
> part of our service and have customers using it.
>
> Cheers
> Ben
>
> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester 
> wrote:
>
>> It seems to be an interesting project but sort of abandoned. No update in
>> last 8 Months and not supporting Cassandra 3.11.2  (the version i currently
>> use)
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>>  Forwarded message 
>> From : Andrzej Śliwiński 
>> To : 
>> Date : Wed, 01 Aug 2018 08:16:06 +0430
>> Subject : Re: [EXTERNAL] full text search on some text columns
>>  Forwarded message 
>>
>> Maybe this plugin could do the job: https://github.com/
>> Stratio/cassandra-lucene-index
>>
>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester 
>> wrote:
>>
>>
>> --
>
>
> *Ben Slater*
>
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>


-- 
Octavian Rinciog

Re: [EXTERNAL] full text search on some text columns

2018-08-01 Thread Hannu Kröger

3.11.1 plugin works with 3.11.2. But yes, original maintainer is not 
maintaining the project anymore. At least not actively. 

Hannu

> Ben Slater  kirjoitti 1.8.2018 kello 7.16:
> 
> We (Instaclustr) will be submitting a PR for 3.11.3 support for 
> cassandra-lucene-index once 3.11.3 is officially released as we offer it as 
> part of our service and have customers using it.
> 
> Cheers
> Ben
> 
>> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester  
>> wrote:
>> It seems to be an interesting project but sort of abandoned. No update in 
>> last 8 Months and not supporting Cassandra 3.11.2  (the version i currently 
>> use)
>> 
>> Sent using Zoho Mail
>> 
>> 
>> 
>>  Forwarded message 
>> From : Andrzej Śliwiński 
>> To : 
>> Date : Wed, 01 Aug 2018 08:16:06 +0430
>> Subject : Re: [EXTERNAL] full text search on some text columns
>>  Forwarded message 
>> 
>> Maybe this plugin could do the job: 
>> https://github.com/Stratio/cassandra-lucene-index
>> 
>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester  
>> wrote:
>> 
>> 
> -- 
> Ben Slater
> Chief Product Officer
> 
> 
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.

Re: Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Ben Slater

We (Instaclustr) will be submitting a PR for 3.11.3 support for
cassandra-lucene-index once 3.11.3 is officially released as we offer it as
part of our service and have customers using it.

Cheers
Ben

On Wed, 1 Aug 2018 at 14:06 onmstester onmstester 
wrote:

> It seems to be an interesting project but sort of abandoned. No update in
> last 8 Months and not supporting Cassandra 3.11.2  (the version i currently
> use)
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  Forwarded message 
> From : Andrzej Śliwiński 
> To : 
> Date : Wed, 01 Aug 2018 08:16:06 +0430
> Subject : Re: [EXTERNAL] full text search on some text columns
>  Forwarded message 
>
> Maybe this plugin could do the job:
> https://github.com/Stratio/cassandra-lucene-index
>
> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester 
> wrote:
>
>
> --

*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Fwd: Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester

It seems to be an interesting project but sort of abandoned. No update in last 
8 Months and not supporting Cassandra 3.11.2  (the version i currently use) 
Sent using Zoho Mail  Forwarded message  From : Andrzej 
Śliwiński  To :  Date : 
Wed, 01 Aug 2018 08:16:06 +0430 Subject : Re: [EXTERNAL] full text search on 
some text columns  Forwarded message  Maybe this plugin 
could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 
Jul 2018 at 22:37, onmstester onmstester  wrote:

Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Andrzej Śliwiński

Maybe this plugin could do the job:
https://github.com/Stratio/cassandra-lucene-index

On Tue, 31 Jul 2018 at 22:37, onmstester onmstester 
wrote:

> Actually we can't afford buying DataStax Search
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  On Tue, 31 Jul 2018 19:38:28 +0430 *Durity, Sean R
> >* wrote 
>
> That sounds like a problem tailor-made for the DataStax Search (embedded
> SOLR) solution. I think that would be the fastest path to success.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* onmstester onmstester 
> *Sent:* Tuesday, July 31, 2018 10:46 AM
> *To:* user 
> *Subject:* [EXTERNAL] full text search on some text columns
>
>
>
> I need to do a full text search (like) on one of my clustering keys and
> one of partition keys (it use text as data type). The input rate is high so
> only Cassandra could handle it, Is there any open source version project
> which help using cassandra+ solr or cassandra + elastic?
>
> Any Recommendation on doing this with home-made solutions would be
> appreciated?
>
>
>
> Sent using Zoho Mail
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMCaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=UBaju60JDLVhH7F7spCpDXFDBHek3FhzpQwQx9rupDQ=CAZxnYcVYthtRXA0895OqcIfD97V7N8xU_QGW2c4zRw=>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
>
>

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester

Actually we can't afford buying DataStax Search Sent using Zoho Mail  On 
Tue, 31 Jul 2018 19:38:28 +0430 Durity, Sean R  
wrote  That sounds like a problem tailor-made for the DataStax Search 
(embedded SOLR) solution. I think that would be the fastest path to success.    
 Sean Durity   From: onmstester onmstester  Sent: Tuesday, 
July 31, 2018 10:46 AM To: user  Subject: [EXTERNAL] 
full text search on some text columns   I need to do a full text search (like) 
on one of my clustering keys and one of partition keys (it use text as data 
type). The input rate is high so only Cassandra could handle it, Is there any 
open source version project which help using cassandra+ solr or cassandra + 
elastic? Any Recommendation on doing this with home-made solutions would be 
appreciated?   Sent using Zoho Mail     The information in this Internet Email 
is confidential and may be legally privileged. It is intended solely for the 
addressee. Access to this Email by anyone else is unauthorized. If you are not 
the intended recipient, any disclosure, copying, distribution or any action 
taken or omitted to be taken in reliance on it, is prohibited and may be 
unlawful. When addressed to our clients any opinions or advice contained in 
this Email are subject to the terms and conditions expressed in any applicable 
governing The Home Depot terms of business or client engagement letter. The 
Home Depot disclaims all responsibility and liability for the accuracy and 
content of this attachment and for any damages or losses arising from any 
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items 
of a destructive nature, which may be contained in this attachment and shall 
not be liable for direct, indirect, consequential or special damages in 
connection with this e-mail message or its attachment.

Re: full text search on some text columns

2018-07-31 Thread onmstester onmstester

Thanks Jordan, There would be millions of rows per day, is SASI capable of 
standing such a rate? Sent using Zoho Mail  On Tue, 31 Jul 2018 19:47:55 
+0430 Jordan West  wrote  On Tue, Jul 31, 2018 at 7:45 
AM, onmstester onmstester  wrote: I need to do a full text 
search (like) on one of my clustering keys and one of partition keys (it use 
text as data type). For simple LIKE queries on existing columns you could give 
SASI (https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) 
a try without having to stand up a separate piece of software. Its relatively 
new and isn’t as battle tested as other parts of Cassandra but it has been used 
in production. There are some performance issues with wider-CQL partitions if 
you have those (https://issues.apache.org/jira/browse/CASSANDRA-11990). I hope 
to address that for 4.0, time permitted.  Full disclosure, I was one of the 
original SASI authors.   The input rate is high so only Cassandra could handle 
it, Is there any open source version project which help using cassandra+ solr 
or cassandra + elastic? Any Recommendation on doing this with home-made 
solutions would be appreciated? Sent using Zoho Mail

Re: full text search on some text columns

2018-07-31 Thread DuyHai Doan

I had SASI in mind before stopping myself from replying to this thread.
Actually the OP needs to index clustering column and partition key, and as
far as I remember, I've myself opened a JIRA and pushed a patch for SASI to
support indexing composite partition key but there are some issues so far
preventing this to be merged into trunk

https://issues.apache.org/jira/browse/CASSANDRA-11734

https://issues.apache.org/jira/browse/CASSANDRA-13228

On Tue, Jul 31, 2018 at 5:17 PM, Jordan West  wrote:

>
>
> On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester <
> onmstes...@zoho.com> wrote:
>
>> I need to do a full text search (like) on one of my clustering keys and
>> one of partition keys (it use text as data type).
>>
>
> For simple LIKE queries on existing columns you could give SASI (
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html)
> a try without having to stand up a separate piece of software. Its
> relatively new and isn’t as battle tested as other parts of Cassandra but
> it has been used in production. There are some performance issues with
> wider-CQL partitions if you have those (https://issues.apache.org/
> jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time
> permitted.
>
> Full disclosure, I was one of the original SASI authors.
>
>
>> The input rate is high so only Cassandra could handle it, Is there any
>> open source version project which help using cassandra+ solr or cassandra +
>> elastic?
>> Any Recommendation on doing this with home-made solutions would be
>> appreciated?
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>
>
>

Re: full text search on some text columns

2018-07-31 Thread Jordan West

On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester 
wrote:

> I need to do a full text search (like) on one of my clustering keys and
> one of partition keys (it use text as data type).
>

For simple LIKE queries on existing columns you could give SASI (
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) a
try without having to stand up a separate piece of software. Its relatively
new and isn’t as battle tested as other parts of Cassandra but it has been
used in production. There are some performance issues with wider-CQL
partitions if you have those (
https://issues.apache.org/jira/browse/CASSANDRA-11990). I hope to address
that for 4.0, time permitted.

Full disclosure, I was one of the original SASI authors.

> The input rate is high so only Cassandra could handle it, Is there any
> open source version project which help using cassandra+ solr or cassandra +
> elastic?
> Any Recommendation on doing this with home-made solutions would be
> appreciated?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Durity, Sean R

That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) 
solution. I think that would be the fastest path to success.

Sean Durity

From: onmstester onmstester 
Sent: Tuesday, July 31, 2018 10:46 AM
To: user 
Subject: [EXTERNAL] full text search on some text columns

I need to do a full text search (like) on one of my clustering keys and one of 
partition keys (it use text as data type). The input rate is high so only 
Cassandra could handle it, Is there any open source version project which help 
using cassandra+ solr or cassandra + elastic?
Any Recommendation on doing this with home-made solutions would be appreciated?

Sent using Zoho 
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMCaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=UBaju60JDLVhH7F7spCpDXFDBHek3FhzpQwQx9rupDQ=CAZxnYcVYthtRXA0895OqcIfD97V7N8xU_QGW2c4zRw=>

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

full text search on some text columns

2018-07-31 Thread onmstester onmstester

I need to do a full text search (like) on one of my clustering keys and one of 
partition keys (it use text as data type). The input rate is high so only 
Cassandra could handle it, Is there any open source version project which help 
using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this 
with home-made solutions would be appreciated? Sent using Zoho Mail

Re: Text or....

2018-04-04 Thread Jon Haddad

Depending on the compression rate, I think it would generate less garbage on 
the Cassandra side if you compressed it client side.  Something to test out.


> On Apr 4, 2018, at 7:19 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> Compressing server side and validating checksums is hugely important in the 
> more frequently used versions of cassandra - so since you probably want to 
> run compression on the server anyway, I’m not sure why you’d compress it 
> twice 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Apr 4, 2018, at 6:23 AM, DuyHai Doan <doanduy...@gmail.com 
> <mailto:doanduy...@gmail.com>> wrote:
> 
>> Compressing client-side is better because it will save:
>> 
>> 1) a lot of bandwidth on the network
>> 2) a lot of Cassandra CPU because no decompression server-side
>> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
>> small (text data compress very well) compared to the raw size
>> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>> <jeronimo.bar...@gmail.com <mailto:jeronimo.bar...@gmail.com>> wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com 
>> <mailto:shalomsag...@gmail.com>> wrote:
>> Hi All, 
>> 
>> A certain application is writing ~55,000 characters for a single row. Most 
>> of these characters are entered to one column with "text" data type. 
>> 
>> This looks insanely large for one row. 
>> Would you suggest to change the data type from "text" to BLOB or any other 
>> option that might fit this scenario?
>> 
>> Thanks!
>> 
>>

Re: Text or....

2018-04-04 Thread Jeff Jirsa

Compressing server side and validating checksums is hugely important in the 
more frequently used versions of cassandra - so since you probably want to run 
compression on the server anyway, I’m not sure why you’d compress it twice 

-- 
Jeff Jirsa


> On Apr 4, 2018, at 6:23 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Compressing client-side is better because it will save:
> 
> 1) a lot of bandwidth on the network
> 2) a lot of Cassandra CPU because no decompression server-side
> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
> small (text data compress very well) compared to the raw size
> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>> <jeronimo.bar...@gmail.com> wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com> 
>>> wrote:
>>> Hi All, 
>>> 
>>> A certain application is writing ~55,000 characters for a single row. Most 
>>> of these characters are entered to one column with "text" data type. 
>>> 
>>> This looks insanely large for one row. 
>>> Would you suggest to change the data type from "text" to BLOB or any other 
>>> option that might fit this scenario?
>>> 
>>> Thanks!
>> 
>

Re: Text or....

2018-04-04 Thread DuyHai Doan

Compressing client-side is better because it will save:

1) a lot of bandwidth on the network
2) a lot of Cassandra CPU because no decompression server-side
3) a lot of Cassandra HEAP because the compressed blob should be relatively
small (text data compress very well) compared to the raw size

On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> Hi,
>
> We use a pseudo file-system table where the chunks are blobs of 64 KB and
> we never had any performance issue.
>
> Primary-key structure is ((file-uuid), chunck-id).
>
> Jero
>
> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>

Re: Text or....

2018-04-04 Thread Jeronimo de A. Barros

Hi,

We use a pseudo file-system table where the chunks are blobs of 64 KB and
we never had any performance issue.

Primary-key structure is ((file-uuid), chunck-id).

Jero

On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com>
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>

Re: Text or....

2018-04-04 Thread Nicolas Guyomar

Hi Shalom,

You might want to compress on application side before inserting in
Cassandra, using the algorithm on your choice, based on compression ratio
and speed that you found acceptable with your use case


On 4 April 2018 at 14:38, shalom sagges <shalomsag...@gmail.com> wrote:

> Thanks DuyHai!
>
> I'm using the default table compression. Is there anything else I should
> look into?
> Regarding the table compression, I understand that for write heavy tables,
> it's best to keep the default and not compress it further. Have I
> understood correctly?
>
> On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Compress it and stores it as a blob.
>> Unless you ever need to index it but I guess even with SASI indexing a so
>> huge text block is not a good idea
>>
>> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> A certain application is writing ~55,000 characters for a single row.
>>> Most of these characters are entered to one column with "text" data type.
>>>
>>> This looks insanely large for one row.
>>> Would you suggest to change the data type from "text" to BLOB or any
>>> other option that might fit this scenario?
>>>
>>> Thanks!
>>>
>>
>>
>

Re: Text or....

2018-04-04 Thread shalom sagges

Thanks DuyHai!

I'm using the default table compression. Is there anything else I should
look into?
Regarding the table compression, I understand that for write heavy tables,
it's best to keep the default and not compress it further. Have I
understood correctly?

On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Compress it and stores it as a blob.
> Unless you ever need to index it but I guess even with SASI indexing a so
> huge text block is not a good idea
>
> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>

Re: Text or....

2018-04-04 Thread DuyHai Doan

Compress it and stores it as a blob.
Unless you ever need to index it but I guess even with SASI indexing a so
huge text block is not a good idea

On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com>
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>

Text or....

2018-04-04 Thread shalom sagges

Hi All,

A certain application is writing ~55,000 characters for a single row. Most
of these characters are entered to one column with "text" data type.

This looks insanely large for one row.
Would you suggest to change the data type from "text" to BLOB or any other
option that might fit this scenario?

Thanks!

Re: How to Parse raw CQL text?

2018-02-26 Thread Jon Haddad

Yes ideally.  I’ve been spending a bit of time in the parser the last week.  
There’s a lot of internals which are still using old terminology and are pretty 
damn confusing.  I’m doing a little investigation into exposing some of the 
information while also modernizing it.  


> On Feb 26, 2018, at 10:02 AM, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> If this is needed functionality, shouldn’t that be available as a public 
> method or something? Maybe write a patch etc. ?
> 
> Ariel Weisberg <ar...@weisberg.ws <mailto:ar...@weisberg.ws>> kirjoitti 
> 26.2.2018 kello 18.47:
> 
>> Hi,
>> 
>> I took a similar approach and it worked fine. I was able to build a tool 
>> that parsed production query logs.
>> 
>> I used a helper method that would just grab a private field out of an object 
>> by name using reflection.
>> 
>> Ariel
>> 
>> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote:
>>> I had to do something similar recently.  Take a look at 
>>> org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some 
>>> sample code here [1] as well as a blog post [2] that explains how to access 
>>> the private variables, since there's no access provided.  It wasn't really 
>>> designed to be used as a library, so YMMV with future changes.  
>>> 
>>> [1] 
>>> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt
>>>  
>>> <https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt>
>>> [2] 
>>> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/
>>>  
>>> <http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/>
>>> 
>>> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com 
>>> <mailto:k...@peernova.com>> wrote:
>>> I just did some trial and error. Looks like this would work
>>> 
>>> public class Test {
>>> 
>>> 
>>> 
>>> public static void main(String[] args) throws Exception {
>>> 
>>> String stmt = "create table if not exists test_keyspace.my_table 
>>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, 
>>> primary key (field1) );";
>>> 
>>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>>> 
>>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>>> 
>>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>>> 
>>> CqlParser parser = new CqlParser(token);
>>> 
>>> ParsedStatement query = parser.cqlStatement();
>>> 
>>> 
>>> if (query.getClass().getDeclaringClass() == 
>>> CreateTableStatement.class) {
>>> 
>>> CreateTableStatement.RawStatement cts = 
>>> (CreateTableStatement.RawStatement) query;
>>> 
>>> CFMetaData
>>> 
>>> .compile(stmt, cts.keyspace())
>>> 
>>> 
>>> 
>>> .getColumnMetadata()
>>> 
>>> .values()
>>> 
>>> .stream()
>>> 
>>> .forEach(cd -> System.out.println(cd));
>>> 
>>> 
>>> }
>>>}
>>> }
>>> 
>>> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com 
>>> <mailto:k...@peernova.com>> wrote:
>>> Hi Anant,
>>> 
>>> I just have CQL create table statement as a string I want to extract all 
>>> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey, 
>>> ClusteringKey, Clustering Order and so on. Thats really  it!
>>> 
>>> Thanks!
>>> 
>>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com 
>>> <mailto:rahul.xavier.si...@gmail.com>> wrote:
>>> I think I understand what you are trying to do … but what is your goal? 
>>> What do you mean “use it for different” queries… Maybe you want to do an 
>>> event and have an event processor? Seems like you are trying to basically 
>>> by pass that pattern and parse a query and split it into several actions? 
>>> 
>>> Did you look into this unit test folder? 
>>&g

Re: How to Parse raw CQL text?

2018-02-26 Thread Hannu Kröger

If this is needed functionality, shouldn’t that be available as a public method 
or something? Maybe write a patch etc. ?

> Ariel Weisberg <ar...@weisberg.ws> kirjoitti 26.2.2018 kello 18.47:
> 
> Hi,
> 
> I took a similar approach and it worked fine. I was able to build a tool that 
> parsed production query logs.
> 
> I used a helper method that would just grab a private field out of an object 
> by name using reflection.
> 
> Ariel
> 
>> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote:
>> I had to do something similar recently.  Take a look at 
>> org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some 
>> sample code here [1] as well as a blog post [2] that explains how to access 
>> the private variables, since there's no access provided.  It wasn't really 
>> designed to be used as a library, so YMMV with future changes.  
>> 
>> [1] 
>> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt
>> [2] 
>> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/
>> 
>> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote:
>> I just did some trial and error. Looks like this would work
>> 
>> public class Test {
>> 
>> 
>> 
>> public static void main(String[] args) throws Exception {
>> 
>> String stmt = "create table if not exists test_keyspace.my_table 
>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, 
>> primary key (field1) );";
>> 
>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>> 
>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>> 
>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>> 
>> CqlParser parser = new CqlParser(token);
>> 
>> ParsedStatement query = parser.cqlStatement();
>> 
>> 
>> if (query.getClass().getDeclaringClass() == 
>> CreateTableStatement.class) {
>> 
>> CreateTableStatement.RawStatement cts = 
>> (CreateTableStatement.RawStatement) query;
>> 
>> CFMetaData
>> 
>> .compile(stmt, cts.keyspace())
>> 
>> 
>> 
>> .getColumnMetadata()
>> 
>> .values()
>> 
>> .stream()
>> 
>> .forEach(cd -> System.out.println(cd));
>> 
>> 
>> }
>>}
>> }
>> 
>> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote:
>> Hi Anant,
>> 
>> I just have CQL create table statement as a string I want to extract all the 
>> parts like, tableName, KeySpaceName, regular Columns,  partitionKey, 
>> ClusteringKey, Clustering Order and so on. Thats really  it!
>> 
>> Thanks!
>> 
>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com> 
>> wrote:
>> I think I understand what you are trying to do … but what is your goal? What 
>> do you mean “use it for different” queries… Maybe you want to do an event 
>> and have an event processor? Seems like you are trying to basically by pass 
>> that pattern and parse a query and split it into several actions? 
>> 
>> Did you look into this unit test folder? 
>> 
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java
>> 
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>> 
>> Anant Corporation
>> 
>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
>> 
>>> Hi All,
>>> 
>>> I have a need where I get a raw CQL create table statement as a String and 
>>> I need to parse the keyspace, tablename, columns and so on..so I can use it 
>>> for various queries and send it to C*. I used the example below from this 
>>> link. I get the following error.  And I thought maybe someone in this 
>>> mailing list will be more familiar with internals.  
>>> 
>>> Exception in thread "main" 
>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace 
>>> test_keyspace doesn't exist
>>> at 
>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
>>> at com.hello.world.Test.main(Test.java:23)
>>> 
>>> 
>>> Here is my code.
>>> 
>>> package com.hello.wo

Re: How to Parse raw CQL text?

2018-02-26 Thread Kant Kodali

wouldn't it make sense to expose the parser at some point?

On Mon, Feb 26, 2018 at 9:47 AM, Ariel Weisberg <ar...@weisberg.ws> wrote:

> Hi,
>
> I took a similar approach and it worked fine. I was able to build a tool
> that parsed production query logs.
>
> I used a helper method that would just grab a private field out of an
> object by name using reflection.
>
> Ariel
>
> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote:
>
> I had to do something similar recently.  Take a look at
> org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some
> sample code here [1] as well as a blog post [2] that explains how to access
> the private variables, since there's no access provided.  It wasn't really
> designed to be used as a library, so YMMV with future changes.
>
> [1] https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/
> master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/
> privatevaraccess/CreateTableParser.kt
> [2] http://rustyrazorblade.com/post/2018/2018-02-25-
> accessing-private-variables-in-jvm/
>
> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote:
>
> I just did some trial and error. Looks like this would work
>
> *public class *Test {
>
> *public static void *main(String[] args) *throws *Exception {
>
> String stmt = *"create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary 
> key (field1) );"*;
> ANTLRStringStream stringStream = *new *ANTLRStringStream(stmt);
> CqlLexer cqlLexer = *new *CqlLexer(stringStream);
> CommonTokenStream token = *new *CommonTokenStream(cqlLexer);
> CqlParser parser = *new *CqlParser(token);
>
> ParsedStatement query = parser.cqlStatement();
>
>
> *if *(query.getClass().getDeclaringClass() == 
> CreateTableStatement.*class*) {
> CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
>
> CFMetaData
> .*compile*(stmt, cts.keyspace())
>
>
> .getColumnMetadata()
> .values()
> .stream()
> .forEach(cd -> System.*out*.println(cd));
>
>
> }
>
>}
>
> }
>
>
> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote:
>
> Hi Anant,
>
> I just have CQL create table statement as a string I want to extract all
> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
> ClusteringKey, Clustering Order and so on. Thats really  it!
>
> Thanks!
>
> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
> wrote:
>
> I think I understand what you are trying to do … but what is your goal?
> What do you mean “use it for different” queries… Maybe you want to do an
> event and have an event processor? Seems like you are trying to basically
> by pass that pattern and parse a query and split it into several actions?
>
> Did you look into this unit test folder?
>
> https://github.com/apache/cassandra/blob/trunk/test/
> unit/org/apache/cassandra/cql3/CQLTester.java
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
>
> Hi All,
>
> I have a need where I get a raw CQL create table statement as a String and
> I need to parse the keyspace, tablename, columns and so on..so I can use it
> for various queries and send it to C*. I used the example below from this
> link <https://github.com/tacoo/cassandra-antlr-sample>. I get the
> following error.  And I thought maybe someone in this mailing list will be
> more familiar with internals.
>
> Exception in thread "main" 
> org.apache.cassandra.exceptions.ConfigurationException:
> Keyspace test_keyspace doesn't exist
> at org.apache.cassandra.cql3.statements.CreateTableStatement$
> RawStatement.prepare(CreateTableStatement.java:200)
> at com.hello.world.Test.main(Test.java:23)
>
>
> Here is my code.
>
> *package *com.hello.world;
>
> *import *org.antlr.runtime.ANTLRStringStream;
> *import *org.antlr.runtime.CommonTokenStream;
> *import *org.apache.cassandra.cql3.CqlLexer;
> *import *org.apache.cassandra.cql3.CqlParser;
> *import *org.apache.cassandra.cql3.statements.CreateTableStatement;
> *import *org.apache.cassandra.cql3.statements.ParsedStatement;
>
> *public class *Test {
>
> *public static void *main(String[] args) *throws *Exception {
> String stmt = *"create table if not exists test_keyspace**.my_table

Re: How to Parse raw CQL text?

2018-02-26 Thread Ariel Weisberg

Hi,

I took a similar approach and it worked fine. I was able to build a tool
that parsed production query logs.
I used a helper method that would just grab a private field out of an
object by name using reflection.
Ariel

On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote:
> I had to do something similar recently.  Take a look at
> org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got
> some sample code here [1] as well as a blog post [2] that explains how
> to access the private variables, since there's no access provided.  It
> wasn't really designed to be used as a library, so YMMV with future
> changes.> 
> [1] 
> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt>
>  [2] 
> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/>
>  
> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote:>> I 
> just did some trial and error. Looks like this would work
>> 
>> *public class *Test {
>> 
>> 
>> 
>> *public static void *main(String[] args) *throws *Exception {
>>>> String stmt = *"create table if not exists
>> test_keyspace.my_table (field1 text, field2 int, field3
>> set, field4 map<ascii, text>, primary key (field1)
>> );"*;
>>>> ANTLRStringStream stringStream = *new
>> *ANTLRStringStream(stmt);
>>>> CqlLexer cqlLexer = *new *CqlLexer(stringStream);
>> 
>> CommonTokenStream token = *new *CommonTokenStream(cqlLexer);
>>>> CqlParser parser = *new *CqlParser(token);
>> 
>> ParsedStatement query = parser.cqlStatement();
>> 
>> 
>> *if *(query.getClass().getDeclaringClass() ==
>>  CreateTableStatement.*class*) {
>>>> CreateTableStatement.RawStatement cts =
>> (CreateTableStatement.RawStatement) query;
>>>> CFMetaData
>> 
>> .*compile*(stmt, cts.keyspace())
>> 
>> 
>> 
>> .getColumnMetadata()
>> 
>> .values()
>> 
>> .stream()
>> 
>> .forEach(cd -> System.**out**.println(cd));
>> 
>> 
>> }
>>}
>> }
>> 
>> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali
>> <k...@peernova.com> wrote:>>> Hi Anant,
>>> 
>>> I just have CQL create table statement as a string I want to extract
>>> all the parts like, tableName, KeySpaceName, regular Columns,
>>> partitionKey, ClusteringKey, Clustering Order and so on. Thats
>>> really  it!>>> 
>>> Thanks!
>>> 
>>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh
>>> <rahul.xavier.si...@gmail.com> wrote:>>>> I think I understand what you are 
>>> trying to do … but what is your
>>>> goal? What do you mean “use it for different” queries… Maybe you
>>>> want to do an event and have an event processor? Seems like you are
>>>> trying to basically by pass that pattern and parse a query and
>>>> split it into several actions?>>>> 
>>>> Did you look into this unit test folder? 
>>>> 
>>>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java>>>>
>>>>  
>>>> --
>>>>  Rahul Singh
>>>> rahul.si...@anant.us
>>>> 
>>>>  Anant Corporation
>>>> 
>>>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>,
>>>> wrote:>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I have a need where I get a raw CQL create table statement as a
>>>>> String and I need to parse the keyspace, tablename, columns and so
>>>>> on..so I can use it for various queries and send it to C*. I used
>>>>> the example below from this link[1]. I get the following error.
>>>>> And I thought maybe someone in this mailing list will be more
>>>>> familiar with internals.>>>>> 
>>>>> Exception in thread "main"
>>>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace
>>>>> test_keyspace doesn't exist>>>>> at 
>>>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawS-
>>>>> tatement.prepare(Cr

Re: How to Parse raw CQL text?

2018-02-25 Thread Jonathan Haddad

I had to do something similar recently.  Take a look at
org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some
sample code here [1] as well as a blog post [2] that explains how to access
the private variables, since there's no access provided.  It wasn't really
designed to be used as a library, so YMMV with future changes.

[1]
https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt
[2]
http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/

On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote:

> I just did some trial and error. Looks like this would work
>
> public class Test {
>
> public static void main(String[] args) throws Exception {
>
> String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary 
> key (field1) );";
> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
> CqlLexer cqlLexer = new CqlLexer(stringStream);
> CommonTokenStream token = new CommonTokenStream(cqlLexer);
> CqlParser parser = new CqlParser(token);
>
> ParsedStatement query = parser.cqlStatement();
>
>
> if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
> CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
>
> CFMetaData
> .compile(stmt, cts.keyspace())
>
>
> .getColumnMetadata()
> .values()
> .stream()
> .forEach(cd -> System.out.println(cd));
>
> }
>
>}
>
> }
>
>
> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote:
>
>> Hi Anant,
>>
>> I just have CQL create table statement as a string I want to extract all
>> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
>> ClusteringKey, Clustering Order and so on. Thats really  it!
>>
>> Thanks!
>>
>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com
>> > wrote:
>>
>>> I think I understand what you are trying to do … but what is your goal?
>>> What do you mean “use it for different” queries… Maybe you want to do an
>>> event and have an event processor? Seems like you are trying to basically
>>> by pass that pattern and parse a query and split it into several actions?
>>>
>>> Did you look into this unit test folder?
>>>
>>>
>>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java
>>>
>>> --
>>> Rahul Singh
>>> rahul.si...@anant.us
>>>
>>> Anant Corporation
>>>
>>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
>>>
>>> Hi All,
>>>
>>> I have a need where I get a raw CQL create table statement as a String
>>> and I need to parse the keyspace, tablename, columns and so on..so I can
>>> use it for various queries and send it to C*. I used the example below
>>> from this link <https://github.com/tacoo/cassandra-antlr-sample>. I get
>>> the following error.  And I thought maybe someone in this mailing list will
>>> be more familiar with internals.
>>>
>>> Exception in thread "main"
>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace
>>> test_keyspace doesn't exist
>>> at
>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
>>> at com.hello.world.Test.main(Test.java:23)
>>>
>>>
>>> Here is my code.
>>>
>>> package com.hello.world;
>>>
>>> import org.antlr.runtime.ANTLRStringStream;
>>> import org.antlr.runtime.CommonTokenStream;
>>> import org.apache.cassandra.cql3.CqlLexer;
>>> import org.apache.cassandra.cql3.CqlParser;
>>> import org.apache.cassandra.cql3.statements.CreateTableStatement;
>>> import org.apache.cassandra.cql3.statements.ParsedStatement;
>>>
>>> public class Test {
>>>
>>> public static void main(String[] args) throws Exception {
>>> String stmt = "create table if not exists test_keyspace.my_table 
>>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, 
>>> primary key (field1) );";
>>> ANTLRStringStream st

Re: How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali

I just did some trial and error. Looks like this would work

public class Test {

public static void main(String[] args) throws Exception {

String stmt = "create table if not exists
test_keyspace.my_table (field1 text, field2 int, field3 set,
field4 map<ascii, text>, primary key (field1) );";
ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
CqlLexer cqlLexer = new CqlLexer(stringStream);
CommonTokenStream token = new CommonTokenStream(cqlLexer);
CqlParser parser = new CqlParser(token);
ParsedStatement query = parser.cqlStatement();
if (query.getClass().getDeclaringClass() ==
CreateTableStatement.class) {
CreateTableStatement.RawStatement cts =
(CreateTableStatement.RawStatement) query;
CFMetaData
.compile(stmt, cts.keyspace())
.getColumnMetadata()
.values()
.stream()
.forEach(cd -> System.out.println(cd));

}

   }

}


On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote:

> Hi Anant,
>
> I just have CQL create table statement as a string I want to extract all
> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
> ClusteringKey, Clustering Order and so on. Thats really  it!
>
> Thanks!
>
> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
> wrote:
>
>> I think I understand what you are trying to do … but what is your goal?
>> What do you mean “use it for different” queries… Maybe you want to do an
>> event and have an event processor? Seems like you are trying to basically
>> by pass that pattern and parse a query and split it into several actions?
>>
>> Did you look into this unit test folder?
>>
>> https://github.com/apache/cassandra/blob/trunk/test/unit/
>> org/apache/cassandra/cql3/CQLTester.java
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
>>
>> Hi All,
>>
>> I have a need where I get a raw CQL create table statement as a String
>> and I need to parse the keyspace, tablename, columns and so on..so I can
>> use it for various queries and send it to C*. I used the example below
>> from this link <https://github.com/tacoo/cassandra-antlr-sample>. I get
>> the following error.  And I thought maybe someone in this mailing list will
>> be more familiar with internals.
>>
>> Exception in thread "main" 
>> org.apache.cassandra.exceptions.ConfigurationException:
>> Keyspace test_keyspace doesn't exist
>> at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra
>> wStatement.prepare(CreateTableStatement.java:200)
>> at com.hello.world.Test.main(Test.java:23)
>>
>>
>> Here is my code.
>>
>> package com.hello.world;
>>
>> import org.antlr.runtime.ANTLRStringStream;
>> import org.antlr.runtime.CommonTokenStream;
>> import org.apache.cassandra.cql3.CqlLexer;
>> import org.apache.cassandra.cql3.CqlParser;
>> import org.apache.cassandra.cql3.statements.CreateTableStatement;
>> import org.apache.cassandra.cql3.statements.ParsedStatement;
>>
>> public class Test {
>>
>> public static void main(String[] args) throws Exception {
>> String stmt = "create table if not exists test_keyspace.my_table 
>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, 
>> primary key (field1) );";
>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>> CqlParser parser = new CqlParser(token);
>> ParsedStatement query = parser.query();
>> if (query.getClass().getDeclaringClass() == 
>> CreateTableStatement.class) {
>> CreateTableStatement.RawStatement cts = 
>> (CreateTableStatement.RawStatement) query;
>> System.out.println(cts.keyspace());
>> System.out.println(cts.columnFamily());
>> ParsedStatement.Prepared prepared = cts.prepare();
>> CreateTableStatement cts2 = (CreateTableStatement) 
>> prepared.statement;
>> cts2.getCFMetaData()
>> .getColumnMetadata()
>> .values()
>> .stream()
>> .forEach(cd -> System.out.println(cd));
>> }
>> }
>> }
>>
>> Thanks!
>>
>>
>

Re: How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali

Hi Anant,

I just have CQL create table statement as a string I want to extract all
the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
ClusteringKey, Clustering Order and so on. Thats really  it!

Thanks!

On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> I think I understand what you are trying to do … but what is your goal?
> What do you mean “use it for different” queries… Maybe you want to do an
> event and have an event processor? Seems like you are trying to basically
> by pass that pattern and parse a query and split it into several actions?
>
> Did you look into this unit test folder?
>
> https://github.com/apache/cassandra/blob/trunk/test/
> unit/org/apache/cassandra/cql3/CQLTester.java
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
>
> Hi All,
>
> I have a need where I get a raw CQL create table statement as a String and
> I need to parse the keyspace, tablename, columns and so on..so I can use it
> for various queries and send it to C*. I used the example below from this
> link <https://github.com/tacoo/cassandra-antlr-sample>. I get the
> following error.  And I thought maybe someone in this mailing list will be
> more familiar with internals.
>
> Exception in thread "main" 
> org.apache.cassandra.exceptions.ConfigurationException:
> Keyspace test_keyspace doesn't exist
> at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra
> wStatement.prepare(CreateTableStatement.java:200)
> at com.hello.world.Test.main(Test.java:23)
>
>
> Here is my code.
>
> package com.hello.world;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
> import org.apache.cassandra.cql3.CqlLexer;
> import org.apache.cassandra.cql3.CqlParser;
> import org.apache.cassandra.cql3.statements.CreateTableStatement;
> import org.apache.cassandra.cql3.statements.ParsedStatement;
>
> public class Test {
>
> public static void main(String[] args) throws Exception {
> String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary 
> key (field1) );";
> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
> CqlLexer cqlLexer = new CqlLexer(stringStream);
> CommonTokenStream token = new CommonTokenStream(cqlLexer);
> CqlParser parser = new CqlParser(token);
> ParsedStatement query = parser.query();
> if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
> CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
> System.out.println(cts.keyspace());
> System.out.println(cts.columnFamily());
> ParsedStatement.Prepared prepared = cts.prepare();
> CreateTableStatement cts2 = (CreateTableStatement) 
> prepared.statement;
> cts2.getCFMetaData()
> .getColumnMetadata()
> .values()
> .stream()
> .forEach(cd -> System.out.println(cd));
> }
> }
> }
>
> Thanks!
>
>

Re: How to Parse raw CQL text?

2018-02-05 Thread Rahul Singh

I think I understand what you are trying to do … but what is your goal? What do 
you mean “use it for different” queries… Maybe you want to do an event and have 
an event processor? Seems like you are trying to basically by pass that pattern 
and parse a query and split it into several actions?

Did you look into this unit test folder?

https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote:
> Hi All,
>
> I have a need where I get a raw CQL create table statement as a String and I 
> need to parse the keyspace, tablename, columns and so on..so I can use it for 
> various queries and send it to C*. I used the example below from this link. I 
> get the following error.  And I thought maybe someone in this mailing list 
> will be more familiar with internals.
>
> Exception in thread "main" 
> org.apache.cassandra.exceptions.ConfigurationException: Keyspace 
> test_keyspace doesn't exist
> at 
> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
> at com.hello.world.Test.main(Test.java:23)
>
>
> Here is my code.
>
> package com.hello.world;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
> import org.apache.cassandra.cql3.CqlLexer;
> import org.apache.cassandra.cql3.CqlParser;
> import org.apache.cassandra.cql3.statements.CreateTableStatement;
> import org.apache.cassandra.cql3.statements.ParsedStatement;
>
> public class Test {
>
>public static void main(String[] args) throws Exception {
>String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary 
> key (field1) );";
>ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>CqlLexer cqlLexer = new CqlLexer(stringStream);
>CommonTokenStream token = new CommonTokenStream(cqlLexer);
>CqlParser parser = new CqlParser(token);
>ParsedStatement query = parser.query();
>if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
>CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
>System.out.println(cts.keyspace());
>System.out.println(cts.columnFamily());
>ParsedStatement.Prepared prepared = cts.prepare();
>CreateTableStatement cts2 = (CreateTableStatement) 
> prepared.statement;
>cts2.getCFMetaData()
>.getColumnMetadata()
>.values()
>.stream()
>.forEach(cd -> System.out.println(cd));
>}
>}
> }
> Thanks!

How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali

Hi All,

I have a need where I get a raw CQL create table statement as a String and
I need to parse the keyspace, tablename, columns and so on..so I can use it
for various queries and send it to C*. I used the example below from this
link <https://github.com/tacoo/cassandra-antlr-sample>. I get the following
error.  And I thought maybe someone in this mailing list will be more
familiar with internals.

Exception in thread "main"
org.apache.cassandra.exceptions.ConfigurationException:
Keyspace test_keyspace doesn't exist
at org.apache.cassandra.cql3.statements.CreateTableStatement$
RawStatement.prepare(CreateTableStatement.java:200)
at com.hello.world.Test.main(Test.java:23)


Here is my code.

package com.hello.world;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.apache.cassandra.cql3.CqlLexer;
import org.apache.cassandra.cql3.CqlParser;
import org.apache.cassandra.cql3.statements.CreateTableStatement;
import org.apache.cassandra.cql3.statements.ParsedStatement;

public class Test {

public static void main(String[] args) throws Exception {
String stmt = "create table if not exists
test_keyspace.my_table (field1 text, field2 int, field3 set,
field4 map<ascii, text>, primary key (field1) );";
ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
CqlLexer cqlLexer = new CqlLexer(stringStream);
CommonTokenStream token = new CommonTokenStream(cqlLexer);
CqlParser parser = new CqlParser(token);
ParsedStatement query = parser.query();
if (query.getClass().getDeclaringClass() ==
CreateTableStatement.class) {
CreateTableStatement.RawStatement cts =
(CreateTableStatement.RawStatement) query;
System.out.println(cts.keyspace());
System.out.println(cts.columnFamily());
ParsedStatement.Prepared prepared = cts.prepare();
CreateTableStatement cts2 = (CreateTableStatement)
prepared.statement;
cts2.getCFMetaData()
.getColumnMetadata()
.values()
.stream()
.forEach(cd -> System.out.println(cd));
}
}
}

Thanks!

Re: Golang + Cassandra + Text Search

2017-10-24 Thread Justin Cameron

https://github.com/Stratio/cassandra-lucene-index is another option - it
plugs a full Lucene engine into Cassandra's custom secondary index
interface.

If you only need text prefix/postfix/substring matching or basic
tokenization there is SASI.

On Wed, 25 Oct 2017 at 03:50 Who Dadddy <qwerty15...@gmail.com> wrote:

> Ridley - have a look at Elassandra
> https://github.com/strapdata/elassandra
>
>
> On 24 Oct 2017, at 06:50, Ridley Submission <
> ridley.submission2...@gmail.com> wrote:
>
> Hi,
>
> Quick question, I am wondering if anyone here who works with Go has
> specific recommendations for as simple framework to add text search on top
> of cassandra?
>
> (Apologies if this is off topic—I am not quite sure what forum in the
> cassandra community would be best for this type of question)
>
> Thanks,
> Riley
>
>
> --

*Justin Cameron*Senior Software Engineer

<https://www.instaclustr.com/>

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Re: Golang + Cassandra + Text Search

2017-10-24 Thread Who Dadddy

Ridley - have a look at Elassandra
https://github.com/strapdata/elassandra 
<https://github.com/strapdata/elassandra>


> On 24 Oct 2017, at 06:50, Ridley Submission <ridley.submission2...@gmail.com> 
> wrote:
> 
> Hi,
> 
> Quick question, I am wondering if anyone here who works with Go has specific 
> recommendations for as simple framework to add text search on top of 
> cassandra? 
> 
> (Apologies if this is off topic—I am not quite sure what forum in the 
> cassandra community would be best for this type of question)
> 
> Thanks,
> Riley

Re: Golang + Cassandra + Text Search

2017-10-24 Thread Jon Haddad

When someone talks about full text search, I usually assume there’s more 
required than keyword search, ie simple tokenization and a little stemming.  

* Term Vectors, common used for a “more like this feature”
* Ranking of search results
* Facets
* More complex tokenization like trigrams

So anyway, I don’t know if the OP had those requirements, but it’s important to 
keep in mind. 


> On Oct 24, 2017, at 1:33 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> There is already a full text search index in Cassandra called SASI
> 
> On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission 
> <ridley.submission2...@gmail.com <mailto:ridley.submission2...@gmail.com>> 
> wrote:
> Hi,
> 
> Quick question, I am wondering if anyone here who works with Go has specific 
> recommendations for as simple framework to add text search on top of 
> cassandra? 
> 
> (Apologies if this is off topic—I am not quite sure what forum in the 
> cassandra community would be best for this type of question)
> 
> Thanks,
> Riley
>

Re: Golang + Cassandra + Text Search

2017-10-24 Thread DuyHai Doan

There is already a full text search index in Cassandra called SASI

On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission <
ridley.submission2...@gmail.com> wrote:

> Hi,
>
> Quick question, I am wondering if anyone here who works with Go has
> specific recommendations for as simple framework to add text search on top
> of cassandra?
>
> (Apologies if this is off topic—I am not quite sure what forum in the
> cassandra community would be best for this type of question)
>
> Thanks,
> Riley
>

Golang + Cassandra + Text Search

2017-10-23 Thread Ridley Submission

Hi,

Quick question, I am wondering if anyone here who works with Go has
specific recommendations for as simple framework to add text search on top
of cassandra?

(Apologies if this is off topic—I am not quite sure what forum in the
cassandra community would be best for this type of question)

Thanks,
Riley

Re: Cassandra blob vs base64 text

2017-02-20 Thread Benjamin Roth

You could save space when storing your data (base64-)decoded as blobs.

2017-02-20 13:38 GMT+01:00 Oskar Kjellin <oskar.kjel...@gmail.com>:

> We currently have some cases where we store base64 as a text field instead
> of a blob (running version 2.0.17).
> I would like to move these to blob but wondering what benefits and
> optimizations there are? The possible ones I can think of is (but there's
> probably more):
>
> * blob is stored as off heap ByteBuffers?
> * blob won't be decompressed server side?
>
> Are there any other reasons to switch to blobs? Or are we not going to see
> any difference?
>
> Thanks!
>

Cassandra blob vs base64 text

2017-02-20 Thread Oskar Kjellin

We currently have some cases where we store base64 as a text field instead
of a blob (running version 2.0.17).
I would like to move these to blob but wondering what benefits and
optimizations there are? The possible ones I can think of is (but there's
probably more):

* blob is stored as off heap ByteBuffers?
* blob won't be decompressed server side?

Are there any other reasons to switch to blobs? Or are we not going to see
any difference?

Thanks!

回复：UDA can't use int or text as state_type

2016-06-27 Thread lowping

problem solved !!!


INITCOND {}  should be INITCOND 0


原始邮件
发件人:lowpinglowp...@163.com
收件人:useru...@cassandra.apache.org
发送时间:2016年6月27日(周一) 16:03
主题:UDA can't use int or text as state_type


Hi, all


I got a problem today when I create a UDA like this. hope you guys help me 
solve this




CREATE OR REPLACE FUNCTION sum_fun(state int, type text) // if state type is 
SET or MAP , this is work
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java AS 'return Integer.parseInt(type)+state;' ;


CREATE OR REPLACE AGGREGATE aggr_sum(text)
SFUNC sum_fun
STYPE int
INITCOND {};




error message:
InvalidRequest: code=2200 [Invalid query] message="Invalid set literal for 
(aggregate_initcond) of type int"


cassandra version: 2.2

UDA can't use int or text as state_type

2016-06-27 Thread lowping

Hi, all


I got a problem today when I create a UDA like this. hope you guys help me 
solve this




CREATE OR REPLACE FUNCTION sum_fun(state int, type text) // if state type is 
SET or MAP , this is work
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java AS 'return Integer.parseInt(type)+state;' ;


CREATE OR REPLACE AGGREGATE aggr_sum(text)
SFUNC sum_fun
STYPE int
INITCOND {};




error message:
InvalidRequest: code=2200 [Invalid query] message="Invalid set literal for 
(aggregate_initcond) of type int"


cassandra version: 2.2

Store JSON as text or UTF-8 encoded blobs?

2015-08-23 Thread Kevin Burton

Hey.

I’m considering migrating my DB from using multiple columns to just 2
columns, with the second one being a JSON object.  Is there going to be any
real difference between TEXT or UTF-8 encoded BLOB?

I guess it would probably be easier to get tools like spark to parse the
object as JSON if it’s represented as a BLOB.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts

text partition key Bloom filters fp is 1 always, why?

2015-05-13 Thread Anishek Agarwal

Hello,

I have a text partition key for one of the CF. The cfstats on that table
seems to show that the bloom filter false positive ratio is always 1. Also
the bloom filter is using very less space.

Do bloom filters not work well with text partition keys ? I can assume this
as it can no way detect the length of the text and hence would have a very
high false positive.

The text partition key is combined using a long + _ +
epoch_time_in_hours, would it be better if we have a composite partition
key of the (long, epoch_time_in_hours) rather than combining it as a text
key ?


Thanks
anishek

efficiently generate complete database dump in text format

2014-10-09 Thread Gaurav Bhatnagar

Hi,
   We have a Cassandra database column family containing 320 millions rows
and each row contains about 15 columns. We want to take monthly dump of
this single column family contained in this database in text format.

We are planning to take following approach to implement this functionality
1. Take a snapshot of Cassandra database using nodetool utility. We specify
-cf flag to
 specify column family name so that snapshot contains data
corresponding to a single
 column family.
2. We take backup of this snapshot and move this backup to a separate
physical machine.
3. We using SStable to json conversion utility to json convert all the
data files into json
format.

We have following questions/doubts regarding the above approach
a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json
record
 and can I safely ignore all such json records?
b) If I ignore all records marked by d flag, than can generated json
files in step 3, contain
duplicate records? I mean do multiple entries for same key.

Do there can be any other better approach to generate data dumps in text
format.

Regards,
Gaurav

Re: efficiently generate complete database dump in text format

2014-10-09 Thread Paulo Ricardo Motta Gomes

The best way to generate dumps from Cassandra is via Hadoop integration (or
spark). You can find more info here:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/configuration/configHadoop.html
http://wiki.apache.org/cassandra/HadoopSupport

On Thu, Oct 9, 2014 at 4:19 AM, Gaurav Bhatnagar gbhatna...@gmail.com
wrote:

Hi,
We have a Cassandra database column family containing 320 millions rows
and each row contains about 15 columns. We want to take monthly dump of
this single column family contained in this database in text format.

We are planning to take following approach to implement this functionality
1. Take a snapshot of Cassandra database using nodetool utility. We
specify -cf flag to
specify column family name so that snapshot contains data
corresponding to a single
column family.
2. We take backup of this snapshot and move this backup to a separate
physical machine.
3. We using SStable to json conversion utility to json convert all the
data files into json
format.

We have following questions/doubts regarding the above approach
a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json
record
and can I safely ignore all such json records?
b) If I ignore all records marked by d flag, than can generated json
files in step 3, contain
duplicate records? I mean do multiple entries for same key.

Do there can be any other better approach to generate data dumps in text
format.

Regards,
Gaurav

--
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200

Re: efficiently generate complete database dump in text format

2014-10-09 Thread Daniel Chia

You might also want to consider tools like
https://github.com/Netflix/aegisthus for the last step, which can help you
deal with tombstones and de-duplicate data.

Thanks,
Daniel

On Thu, Oct 9, 2014 at 12:19 AM, Gaurav Bhatnagar gbhatna...@gmail.com
wrote:

 Hi,
We have a Cassandra database column family containing 320 millions rows
 and each row contains about 15 columns. We want to take monthly dump of
 this single column family contained in this database in text format.

 We are planning to take following approach to implement this functionality
 1. Take a snapshot of Cassandra database using nodetool utility. We
 specify -cf flag to
  specify column family name so that snapshot contains data
 corresponding to a single
  column family.
 2. We take backup of this snapshot and move this backup to a separate
 physical machine.
 3. We using SStable to json conversion utility to json convert all the
 data files into json
 format.

 We have following questions/doubts regarding the above approach
 a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json
 record
  and can I safely ignore all such json records?
 b) If I ignore all records marked by d flag, than can generated json
 files in step 3, contain
 duplicate records? I mean do multiple entries for same key.

 Do there can be any other better approach to generate data dumps in text
 format.

 Regards,
 Gaurav

Re: is lack of full text search hurting cassandra and datastax?

2014-10-03 Thread DuyHai Doan

There are some options around for full text search integration with C*.
Google for Stratio deep and Stargate. Both are open source
Le 3 oct. 2014 06:31, Kevin Burton bur...@spinn3r.com a écrit :

 So right now I have plenty of quality and robust full text search systems
 I can use.

 Solr cloud, elastic search.  They all also have very robust UIs on top of
 them… kibana, banana, etc.

 and my alternative for cassandra is… paying for a proprietary database.

 Which might be fine for some parties… but I want something that is
 documented and supported by the community and all the advantages of open
 source.

 So is DSE really giving Datastax that much of a win?  I’m sure they are
 making money of it… and I hope they’re successful of course.

 But I can’t help but feeling that cassandra as an open source project is
 being hindered by lack of a full text option.

 Additionally, some people can get away with storing the content directly
 in a full text system and skipping the cassandra route altogether.

 Seems like a situation without many winners…

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com

Re: is lack of full text search hurting cassandra and datastax?

2014-10-03 Thread Andres de la Peña

You can use also Stratio Cassandra
https://github.com/Stratio/stratio-cassandra, which is an open source
fork of Cassandra with Lucene based full text search capabilities.

-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*

Re: is lack of full text search hurting cassandra and datastax?

2014-10-03 Thread Jack Krupansky

And meanwhile, DataStax will continue to invest in and promote and support full 
text search of your Cassandra data with our tight integration of Solr in 
DataStax Enterprise.

BTW, there is in fact very strong interest in DataStax Enterprise, and not just 
as “support” for raw Cassandra, so I’m not so sure that I’d worry too much that 
DataStax is “hurting” in any way!

And to be clear, even with DataStax enterprise, your data is still stored in 
the same sstables that you’ve come to love in Cassandra, and with the same CQL 
API as well, so your data is in no way trapped in... a “proprietary database.” 
Our Solr indexing is in addition to the storage of your data in Cassandra.

Sure, there will be a few losers in any significantly large and complex 
activity, but rest assured that the vast majority will be winners here.

And as DuyHai notes, there are indeed open source options available as well.

-- Jack Krupansky

From: DuyHai Doan 
Sent: Friday, October 3, 2014 3:54 AM
To: user@cassandra.apache.org 
Subject: Re: is lack of full text search hurting cassandra and datastax?

There are some options around for full text search integration with C*.  Google 
for Stratio deep and Stargate. Both are open source 

Le 3 oct. 2014 06:31, Kevin Burton bur...@spinn3r.com a écrit :

  So right now I have plenty of quality and robust full text search systems I 
can use. 

  Solr cloud, elastic search.  They all also have very robust UIs on top of 
them… kibana, banana, etc.

  and my alternative for cassandra is… paying for a proprietary database.

  Which might be fine for some parties… but I want something that is documented 
and supported by the community and all the advantages of open source.

  So is DSE really giving Datastax that much of a win?  I’m sure they are 
making money of it… and I hope they’re successful of course.

  But I can’t help but feeling that cassandra as an open source project is 
being hindered by lack of a full text option.

  Additionally, some people can get away with storing the content directly in a 
full text system and skipping the cassandra route altogether.

  Seems like a situation without many winners…


  -- 


  Founder/CEO Spinn3r.com

  Location: San Francisco, CA

  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile

is lack of full text search hurting cassandra and datastax?

2014-10-02 Thread Kevin Burton

So right now I have plenty of quality and robust full text search systems I
can use.

Solr cloud, elastic search.  They all also have very robust UIs on top of
them… kibana, banana, etc.

and my alternative for cassandra is… paying for a proprietary database.

Which might be fine for some parties… but I want something that is
documented and supported by the community and all the advantages of open
source.

So is DSE really giving Datastax that much of a win?  I’m sure they are
making money of it… and I hope they’re successful of course.

But I can’t help but feeling that cassandra as an open source project is
being hindered by lack of a full text option.

Additionally, some people can get away with storing the content directly in
a full text system and skipping the cassandra route altogether.

Seems like a situation without many winners…

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com

Re: Adding large text blob causes read timeout...

2014-06-24 Thread Kevin Burton

oh.. the difference between the the ONE field and the remaining 29 is
massive.

It's like 200ms for just the 29 columns.. adding the extra one cause it to
timeout .. 5000ms...

On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com wrote:

Don't forget that when you do the Select with limit set to 1000, Cassandra
is actually fetching 1000 * 29 physical columns (29 fields per logical
row).

Adding one extra big html column may be too much and cause timeout. Try to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit :

I have a table with a schema mostly of small fields. About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence
ASC LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.
Only 129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious,
the query times out…

I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU
while writing out the data so it's not a good comparison unfortunately

ception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:
dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Adding large text blob causes read timeout...

2014-06-24 Thread DuyHai Doan

Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies
the number of logical rows, not the number of physical columns in the
storage engine
Le 24 juin 2014 08:30, Kevin Burton bur...@spinn3r.com a écrit :

oh.. the difference between the the ONE field and the remaining 29 is
massive.

It's like 200ms for just the 29 columns.. adding the extra one cause it to
timeout .. 5000ms...

On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com
wrote:

Don't forget that when you do the Select with limit set to 1000,
Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
logical row).

Adding one extra big html column may be too much and cause timeout. Try
to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit :

I have a table with a schema mostly of small fields. About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence
ASC LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.
Only 129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious,
the query times out…

I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU
while writing out the data so it's not a good comparison unfortunately

ception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:
dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Olivier Michallat

Assuming we're talking about the DataStax Java driver:

getBytes will throw an exception, because it validates that the column is
of type BLOB. But you can use getBytesUnsafe:

ByteBuffer b = row.getBytesUnsafe(aTextColumn);
// if you want to check it:
Charset.forName(UTF-8).decode(b);

Regarding whether this will continue working in the future: from the
driver's perspective, the fact that the native protocol uses UTF-8 is an
implementation detail, but I doubt this will change any time soon.




On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Good idea, bytes are merely processed by the server so you're saving a lot
 of Cpu. AFAIK getBytes should work fine.
 Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit :

 I'm building a webservice whereby I read the data from cassandra, then
 write it over the wire.

 It's going to push LOTS of content, and encoding/decoding performance has
 really bitten us in the future.  So I try to avoid transparent
 encoding/decoding if I can avoid it.

 So right now, I have a huge blob of text that's a 'text' column.

 Logically it *should* be text, because that's what it is...

 Can I just keep it as text so our normal tools work on it, but get it as
 raw UTF8 if I call getBytes?

 This way I can call getBytes and then send it right over the wire as
 pre-encoded UTF8 data.

 ... and of course the question is whether it will continue working in the
 future :-P

 I'll write a test of it of course but I wanted to see what you guys
 thought of this idea.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.

Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Robert Stupp

You can use getBytesUnsafe on the UTF8 column

--
Sent from my iPhone 

 Am 24.06.2014 um 09:13 schrieb Olivier Michallat 
 olivier.michal...@datastax.com:
 
 Assuming we're talking about the DataStax Java driver:
 
 getBytes will throw an exception, because it validates that the column is of 
 type BLOB. But you can use getBytesUnsafe:
 
 ByteBuffer b = row.getBytesUnsafe(aTextColumn);
 // if you want to check it:
 Charset.forName(UTF-8).decode(b);
 
 Regarding whether this will continue working in the future: from the driver's 
 perspective, the fact that the native protocol uses UTF-8 is an 
 implementation detail, but I doubt this will change any time soon.
 
 
 
 
 On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Good idea, bytes are merely processed by the server so you're saving a lot 
 of Cpu. AFAIK getBytes should work fine.
 
 Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit :
 
 I'm building a webservice whereby I read the data from cassandra, then 
 write it over the wire.
 
 It's going to push LOTS of content, and encoding/decoding performance has 
 really bitten us in the future.  So I try to avoid transparent 
 encoding/decoding if I can avoid it.
 
 So right now, I have a huge blob of text that's a 'text' column.
 
 Logically it *should* be text, because that's what it is...
 
 Can I just keep it as text so our normal tools work on it, but get it as 
 raw UTF8 if I call getBytes?
 
 This way I can call getBytes and then send it right over the wire as 
 pre-encoded UTF8 data.
 
 ... and of course the question is whether it will continue working in the 
 future :-P
 
 I'll write a test of it of course but I wanted to see what you guys thought 
 of this idea.
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: Adding large text blob causes read timeout...

2014-06-24 Thread Jonathan Haddad

Can you do you query in the cli after setting tracing on?

On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan doanduy...@gmail.com wrote:

Yes but adding the extra one ends up by * 1000. The limit in CQL3
specifies the number of logical rows, not the number of physical columns in
the storage engine
Le 24 juin 2014 08:30, Kevin Burton bur...@spinn3r.com a écrit :

oh.. the difference between the the ONE field and the remaining 29 is
massive.

It's like 200ms for just the 29 columns.. adding the extra one cause it
to timeout .. 5000ms...

On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com
wrote:

Don't forget that when you do the Select with limit set to 1000,
Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
logical row).

Adding one extra big html column may be too much and cause timeout. Try
to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit :

I have a table with a schema mostly of small fields. About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence
ASC LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.
Only 129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious,
the query times out…

I'm going to add tracing and try to track it down further, but I
suspect I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU
while writing out the data so it's not a good comparison unfortunately

ception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at
com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Kevin Burton

Yes… I confirmed that getBytesUnsafe works…

I also have a unit test for it so if cassandra ever changes anything we'll
pick it up.

One point in your above code.  I still think charsets are behind a
synchronized code block.

So your above code wouldn't be super fast on multi-core machines.  I
usually use guava's Charsets class since they have static references to all
of them.

… just wanted to point that out since it could bite someone :-P …




On Tue, Jun 24, 2014 at 12:13 AM, Olivier Michallat 
olivier.michal...@datastax.com wrote:

 Assuming we're talking about the DataStax Java driver:

 getBytes will throw an exception, because it validates that the column is
 of type BLOB. But you can use getBytesUnsafe:

 ByteBuffer b = row.getBytesUnsafe(aTextColumn);
 // if you want to check it:
 Charset.forName(UTF-8).decode(b);

 Regarding whether this will continue working in the future: from the
 driver's perspective, the fact that the native protocol uses UTF-8 is an
 implementation detail, but I doubt this will change any time soon.




 On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Good idea, bytes are merely processed by the server so you're saving a
 lot of Cpu. AFAIK getBytes should work fine.
 Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit :

 I'm building a webservice whereby I read the data from cassandra, then
 write it over the wire.

 It's going to push LOTS of content, and encoding/decoding performance
 has really bitten us in the future.  So I try to avoid transparent
 encoding/decoding if I can avoid it.

 So right now, I have a huge blob of text that's a 'text' column.

 Logically it *should* be text, because that's what it is...

 Can I just keep it as text so our normal tools work on it, but get it as
 raw UTF8 if I call getBytes?

 This way I can call getBytes and then send it right over the wire as
 pre-encoded UTF8 data.

 ... and of course the question is whether it will continue working in
 the future :-P

 I'll write a test of it of course but I wanted to see what you guys
 thought of this idea.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-23 Thread Kevin Burton

I'm building a webservice whereby I read the data from cassandra, then
write it over the wire.

It's going to push LOTS of content, and encoding/decoding performance has
really bitten us in the future.  So I try to avoid transparent
encoding/decoding if I can avoid it.

So right now, I have a huge blob of text that's a 'text' column.

Logically it *should* be text, because that's what it is...

Can I just keep it as text so our normal tools work on it, but get it as
raw UTF8 if I call getBytes?

This way I can call getBytes and then send it right over the wire as
pre-encoded UTF8 data.

... and of course the question is whether it will continue working in the
future :-P

I'll write a test of it of course but I wanted to see what you guys thought
of this idea.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Adding large text blob causes read timeout...

2014-06-23 Thread Kevin Burton

I have a table with a schema mostly of small fields.  About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
 This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC
LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.  Only
129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious, the
query times out…

I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
writing out the data so it's not a good comparison unfortunately


ception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
 at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
 at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:
dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-23 Thread DuyHai Doan

Good idea, bytes are merely processed by the server so you're saving a lot
of Cpu. AFAIK getBytes should work fine.
Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit :

 I'm building a webservice whereby I read the data from cassandra, then
 write it over the wire.

 It's going to push LOTS of content, and encoding/decoding performance has
 really bitten us in the future.  So I try to avoid transparent
 encoding/decoding if I can avoid it.

 So right now, I have a huge blob of text that's a 'text' column.

 Logically it *should* be text, because that's what it is...

 Can I just keep it as text so our normal tools work on it, but get it as
 raw UTF8 if I call getBytes?

 This way I can call getBytes and then send it right over the wire as
 pre-encoded UTF8 data.

 ... and of course the question is whether it will continue working in the
 future :-P

 I'll write a test of it of course but I wanted to see what you guys
 thought of this idea.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.

Re: Adding large text blob causes read timeout...

2014-06-23 Thread DuyHai Doan

Don't forget that when you do the Select with limit set to 1000, Cassandra
is actually fetching 1000 * 29 physical columns (29 fields per logical
row).

Adding one extra big html column may be too much and cause timeout. Try to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit :

I have a table with a schema mostly of small fields. About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence
ASC LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.
Only 129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious, the
query times out…

I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
writing out the data so it's not a good comparison unfortunately

com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided

2013-12-07 Thread Techy Teck

I am trying to insert into Cassandra database using Datastax Java driver.
But everytime I am getting below exception at `prBatchInsert.bind` line-

com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type
for value 1 of CQL type text, expecting class java.lang.String but class
[Ljava.lang.Object; provided

Below is my method which accepts `userId` as the input and `attributes` as
the `Map` which contains `key` as my `Column Name` and value as the actual
value of that column

public void upsertAttributes(final String userId, final MapString, String
attributes, final String columnFamily) {

try {
SetString keys = attributes.keySet();
StringBuilder sqlPart1 = new StringBuilder();
//StringBuilder.append() is faster than concatenating Strings in a loop
StringBuilder sqlPart2 = new StringBuilder();

sqlPart1.append(INSERT INTO  + columnFamily + (USER_ID );
sqlPart2.append() VALUES ( ?);

for (String k : keys) {
sqlPart1.append(, +k); //append each key
sqlPart2.append(, ?);  //append an unknown value for each
key
}
sqlPart2.append() ); //Last parenthesis (and space?)
String sql = sqlPart1.toString()+sqlPart2.toString();

CassandraDatastaxConnection.getInstance();
PreparedStatement prBatchInsert =
CassandraDatastaxConnection.getSession().prepare(sql);
prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE);

// this line is giving me an exception
BoundStatement query = prBatchInsert.bind(userId,
attributes.values().toArray(new Object[attributes.size()])); //Vararg
methods can take an array (might need to cast it to String[]?).

CassandraDatastaxConnection.getSession().executeAsync(query);

} catch (InvalidQueryException e) {
LOG.error(Invalid Query Exception in
CassandraDatastaxClient::upsertAttributes +e);
} catch (Exception e) {
LOG.error(Exception in
CassandraDatastaxClient::upsertAttributes +e);
}
}


What wrong I am doing here? Any thoughts?

Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided

2013-12-07 Thread Keith Wright

As the comment in your code suggests, you need to cast the array passed to the 
bind method as Object[].  This is true anytime you pass an array to a varargs 
method.

On Dec 7, 2013 4:01 PM, Techy Teck comptechge...@gmail.com wrote:
I am trying to insert into Cassandra database using Datastax Java driver. But 
everytime I am getting below exception at `prBatchInsert.bind` line-

com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for 
value 1 of CQL type text, expecting class java.lang.String but class 
[Ljava.lang.Object; provided

Below is my method which accepts `userId` as the input and `attributes` as the 
`Map` which contains `key` as my `Column Name` and value as the actual value of 
that column

public void upsertAttributes(final String userId, final MapString, String 
attributes, final String columnFamily) {

try {
SetString keys = attributes.keySet();
StringBuilder sqlPart1 = new StringBuilder(); 
//StringBuilder.append() is faster than concatenating Strings in a loop
StringBuilder sqlPart2 = new StringBuilder();

sqlPart1.append(INSERT INTO  + columnFamily + (USER_ID );
sqlPart2.append() VALUES ( ?);

for (String k : keys) {
sqlPart1.append(, +k); //append each key
sqlPart2.append(, ?);  //append an unknown value for each key
}
sqlPart2.append() ); //Last parenthesis (and space?)
String sql = sqlPart1.toString()+sqlPart2.toString();

CassandraDatastaxConnection.getInstance();
PreparedStatement prBatchInsert = 
CassandraDatastaxConnection.getSession().prepare(sql);
prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE);

// this line is giving me an exception
BoundStatement query = prBatchInsert.bind(userId, 
attributes.values().toArray(new Object[attributes.size()])); //Vararg methods 
can take an array (might need to cast it to String[]?).

CassandraDatastaxConnection.getSession().executeAsync(query);

} catch (InvalidQueryException e) {
LOG.error(Invalid Query Exception in 
CassandraDatastaxClient::upsertAttributes +e);
} catch (Exception e) {
LOG.error(Exception in CassandraDatastaxClient::upsertAttributes 
+e);
}
}


What wrong I am doing here? Any thoughts?

Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided

2013-12-07 Thread Dave Brosius

BoundStatement query = prBatchInsert.bind(userId, 
attributes.values().toArray(new *String*[attributes.size()]))




On 12/07/2013 03:59 PM, Techy Teck wrote:
I am trying to insert into Cassandra database using Datastax Java 
driver. But everytime I am getting below exception at 
`prBatchInsert.bind` line-


com.datastax.driver.core.exceptions.InvalidTypeException: Invalid 
type for value 1 of CQL type text, expecting class java.lang.String 
but class [Ljava.lang.Object; provided


Below is my method which accepts `userId` as the input and 
`attributes` as the `Map` which contains `key` as my `Column Name` and 
value as the actual value of that column


public void upsertAttributes(final String userId, final MapString, 
String attributes, final String columnFamily) {


try {
SetString keys = attributes.keySet();
StringBuilder sqlPart1 = new StringBuilder(); 
//StringBuilder.append() is faster than concatenating Strings in a loop

StringBuilder sqlPart2 = new StringBuilder();

sqlPart1.append(INSERT INTO  + columnFamily + (USER_ID );
sqlPart2.append() VALUES ( ?);

for (String k : keys) {
sqlPart1.append(, +k); //append each key
sqlPart2.append(, ?);  //append an unknown value for 
each key

}
sqlPart2.append() ); //Last parenthesis (and space?)
String sql = sqlPart1.toString()+sqlPart2.toString();

CassandraDatastaxConnection.getInstance();
PreparedStatement prBatchInsert = 
CassandraDatastaxConnection.getSession().prepare(sql);

prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE);

// this line is giving me an exception
BoundStatement query = prBatchInsert.bind(userId, 
attributes.values().toArray(new Object[attributes.size()])); //Vararg 
methods can take an array (might need to cast it to String[]?).


CassandraDatastaxConnection.getSession().executeAsync(query);

} catch (InvalidQueryException e) {
LOG.error(Invalid Query Exception in 
CassandraDatastaxClient::upsertAttributes +e);

} catch (Exception e) {
LOG.error(Exception in 
CassandraDatastaxClient::upsertAttributes +e);

}
}


What wrong I am doing here? Any thoughts?

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-10 Thread John Lumby


 From: johnlu...@hotmail.com
 To: user@cassandra.apache.org
 Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it 
 be text type?
 Date: Wed, 9 Oct 2013 18:33:13 -0400

 reduce method :

 public void reduce(LongWritable writableRecid, IterableLongWritable 
 values, Context context) throws IOException, InterruptedException
 {
 Long sum = 0L;
 Long recordid = writableRecid.get();
 ListByteBuffer vbles = null;
 byte[] longByterray = new byte[8];
 for(int i= 0; i  8; i++) {
 longByterray[i] = (byte)(recordid (i * 8));
 }
 ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
 recordIdByteBuf.wrap(longByterray);
 keys.put(recordid, recordIdByteBuf);
   ...
 context.write(keys, vbles);
 }


I finally got it working after finding the LongSerializer class source in 
cassandra,
I see that the correct way to build a ByteBuffer key from a Long is

    public ByteBuffer serialize(Long value)
    {
    return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : 
ByteBufferUtil.bytes(value);
    }

John

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-09 Thread John Lumby

I don't know what happened to my original post but it got truncated.

Let me try again :

    software versions : apache-cassandra-2.0.1    hadoop-2.1.0-beta

I have been experimenting with using hadoop for a map/reduce operation on 
cassandra,
outputting to the CqlOutputFormat.class.
I based my first program fairly closely on the famous WordCount example in
examples/hadoop_cql3_word_count
except --- I set my output colfamily to have a bigint primary key :

CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
KEY (recordid))

and simply tried setting this key as one of the keys in the output map :

 keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

but it always failed with a strange error :

java.io.IOException: InvalidRequestException(why:Key may not be empty)

After trying to make it more similar to WordCount,
I eventually realized the one difference was datatype of the primary key
of the output colfamily:
WordCount has text
I had bigint

I changed mine to text :

CREATE TABLE archive_recordids ( recordid text , count_num bigint, PRIMARY KEY 
(recordid))

and set the primary key *twice* in the reducer :
   keys.put(recordid, ByteBufferUtil.bytes(String.valueOf(recordid)));
   context.getConfiguration().set(PRIMARY_KEY,String.valueOf(recordid));

and it then worked perfectly.

Is there a restriction in cassandra-hadoop-cql support that
the output colfamily's primary key(s) must be text?
And does that also apply to DELETE?
Or am I doing it wrong?
Or maybe there is some other OutputFormatter that I could use that would work?

Cheers,   John

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-09 Thread John Lumby

 From: johnlu...@hotmail.com
 To: user@cassandra.apache.org
 Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it 
 be text type?
 Date: Wed, 9 Oct 2013 09:40:06 -0400

 software versions : apache-cassandra-2.0.1hadoop-2.1.0-beta

 I have been experimenting with using hadoop for a map/reduce operation on 
 cassandra,
 outputting to the CqlOutputFormat.class.
 I based my first program fairly closely on the famous WordCount example in
 examples/hadoop_cql3_word_count
 except --- I set my output colfamily to have a bigint primary key :

 CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
 KEY (recordid))

 and simply tried setting this key as one of the keys in the output map :

  keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

 but it always failed with a strange error :

 java.io.IOException: InvalidRequestException(why:Key may not be empty)

I managed to get a little bit further and my M/R program now runs to completion
with output to the colfamily with bigint primary key and actually does manage
to UPDATE a row.

query:

 String query = UPDATE  + keyspace + . + OUTPUT_COLUMN_FAMILY +  SET 
count_num = ? ;

reduce method :

    public void reduce(LongWritable writableRecid, IterableLongWritable 
values, Context context) throws IOException, InterruptedException
    {
    Long sum = 0L;
    Long recordid = writableRecid.get();
    ListByteBuffer vbles = null;
    byte[] longByterray = new byte[8];
    for(int i= 0; i  8; i++) {
    longByterray[i] = (byte)(recordid (i * 8));
    }  
    ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
    recordIdByteBuf.wrap(longByterray);
    keys.put(recordid, recordIdByteBuf);
  ...
    context.write(keys, vbles);
    }

and my logger output does show it outputting maps containing
what appear to be valid keys e.g.

writing key : 0x47407826 , hasarray ? : Y

there are about 74 mappings in the final reducer output,
each with a different numeric record key.

but after the program completes,   there is just one single row in the 
columnfamily
with a rowkey of 0 (zero).

SELECT * FROM archive_recordids LIMIT 9;

 recordid | count_num
--+---
    0 | 2

(1 rows)

I guess it is something relating to the way my code is wrapping along value 
into the ByteBuffer
or maybe the way the ByteBuffer is being allocated.    As far as I can tell,
the ByteBuffer needs to be populated in exactly the same way as a thrift 
application
would populate a ByteBuffer for a bigint key  --   does anyone know how to do 
that
or point me to an example that works?

Thanks   John

 Cheers,   John

cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-08 Thread John Lumby

I have been expermimenting with using hadoop for a map/reduce operation on 
cassandra,
outputting to the CqlOutputFormat.class.
I based my first program fairly closely on the famous WordCount example in 
examples/hadoop_cql3_word_count
except  ---  I set my output colfamily to have a bigint primary key :

CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
KEY (recordid))

and simply tried setting this key as one of the keys in the output 
    keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

but it always failed with a strange error :

java.io.IOException: InvalidRequestException(why:Key may not be empty)

Keystore password in yaml is in plain text

2013-10-04 Thread Shahryar Sedghi

Hi

I there a way to obfuscate the keystore/truststore password?

Thanks

Shahryar
--

Problem with sstableloader from text data

2013-10-02 Thread Paolo Crosato


Hi,

following the article at http://www.datastax.com/dev/blog/bulk-loading ,
I developed a custom builder app to serialize a text file with rows in 
json format to a sstable.
I managed to get the tool running and building the tables, however when 
I try to load them I get this error:


sstableloader -d localhost demodb/
Exception in thread main java.lang.NullPointerException
at 
org.apache.cassandra.io.sstable.SSTableLoader.init(SSTableLoader.java:64)

at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:64)

and when I try to decode the sstables to json I get this one:

sstable2json demodb/demodb-positions8-jb-1-Data.db
[
{key: 
000800bae94e08013f188b9bd00400,columns: 
[Exception in thread main java.lang.IllegalArgumentException

at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:230)
at 
org.apache.cassandra.tools.SSTableExport.serializeColumn(SSTableExport.java:183)
at 
org.apache.cassandra.tools.SSTableExport.serializeAtom(SSTableExport.java:152)
at 
org.apache.cassandra.tools.SSTableExport.serializeAtoms(SSTableExport.java:140)
at 
org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:238)
at 
org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:223)
at 
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:360)
at 
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:382)
at 
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:394)
at 
org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:477)


So it seems something is wrong with me streaming the data.
These are the relevant parts of the code:

This is the pojo to deserialize the json:

public class PositionJsonModel {

@JsonProperty(iD)
private Long idDevice;
@JsonProperty(iU)
private Long idUnit;
@JsonProperty(iE)
private Integer idEvent;
@JsonProperty(iTE)
private Integer idTypeEvent;
@JsonProperty(tEv)
private String timestampEvent;
@JsonProperty(tRx)
private String timestampRx;
@JsonProperty(mi)
private Long mileage;
private Long lat;
private Long lng;
@JsonProperty(A1)
private String country;
@JsonProperty(A2)
private String state;
@JsonProperty(A3)
private String county;
@JsonProperty(A4)
private String city;
@JsonProperty(A5)
private String locality;
@JsonProperty(st)
private String street;
@JsonProperty(cn)
private String civnum;
@JsonProperty(in)
private String info;
@JsonProperty(sp)
private Integer speed;

//getters, setters, tostring
...

And this is the main class:


   BufferedReader reader = new BufferedReader(new 
FileReader(filename));


String keyspace = demodb;
String columnFamily=positions8;
File directory = new File(keyspace);
if (!directory.exists()) {
directory.mkdir();
}
Murmur3Partitioner partitioner = new Murmur3Partitioner();
SSTableSimpleUnsortedWriter positionsWriter =
new 
SSTableSimpleUnsortedWriter(directory,partitioner,keyspace,columnFamily, 
UTF8Type.instance,null,64);


String line=;
ObjectMapper mapper = new ObjectMapper();
while ((line = reader.readLine()) != null){
long timestamp = System.currentTimeMillis() * 1000;
System.out.println(timestamp: +timestamp);
PositionJsonModel model= mapper.readValue(line, 
PositionJsonModel.class);


//CREATE TABLE positions8 (
//  iddevice bigint,
//  timestampevent timestamp,
//  idevent int,
//  idunit bigint,
//  status text,
//  value text,
//  PRIMARY KEY (iddevice, timestampevent, idevent)
//) WITH CLUSTERING ORDER BY (timestampevent DESC, 
idevent ASC)


ListAbstractType? typeList = new 
ArrayListAbstractType?();

typeList.add(LongType.instance);
typeList.add(DateType.instance);
typeList.add(IntegerType.instance);
CompositeType compositeKeyTypes = 
CompositeType.getInstance(typeList);


Builder cpBuilder= new Builder(compositeKeyTypes);
System.out.println(getIdDevice: +model.getIdDevice());
System.out.println(getTimestampEvent: 
+model.getTimestampEvent());

System.out.println(getIdEvent: +model.getIdEvent());
cpBuilder.add(bytes(model.getIdDevice()));
cpBuilder.add(bytes(DateType.dateStringToTimestamp(model.getTimestampEvent

Re: Text searches and free form queries

2012-10-09 Thread Oleg Dulin





It works pretty fast.

Cool.
Just keep an eye out for how big the lucene token row gets.
Cheers




Indeed, it may get out of hand, but for now we are ok -- for the 
foreseable future I would say.


Should it get larger, I can split it up into rows -- i.e. all tokens 
that start with a, all tokens that start with b, etc.

Re: Text searches and free form queries

2012-10-08 Thread aaron morton

  It works pretty fast.
Cool. 

Just keep an eye out for how big the lucene token row gets. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/10/2012, at 2:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 So, what I ended up doing is this --
 
 As I write my records into the main CF, I tokenize some fields that I want to 
 search on using Lucene and write an index into a separate CF, such that my 
 columns are a composite of:
 
 luceneToken:record key
 
 I can then search my records by doing a slice for each lucene token in the 
 search query and then do an intersection of the sets. It works pretty fast.
 
 Regards,
 Oleg
 
 On 2012-09-05 01:28:44 +, aaron morton said:
 
 AFAIk if you want to keep it inside cassandra then DSE, roll your own from 
 scratch or start with https://github.com/tjake/Solandra . 
 
 Outside of Cassandra I've heard of people using Elastic Search or Solr which 
 I *think* is now faster at updating the index. 
 
 Hope that helps. 
 
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote:
 Some one did search on Lucene, but for very fresh data they build search 
 index in memory so data become available for search without delays.
 
 On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote:
 Dear Distinguished Colleagues:
 
 
 -- 
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/

Re: Text searches and free form queries

2012-10-06 Thread Oleg Dulin


So, what I ended up doing is this --

As I write my records into the main CF, I tokenize some fields that I 
want to search on using Lucene and write an index into a separate CF, 
such that my columns are a composite of:


luceneToken:record key

I can then search my records by doing a slice for each lucene token in 
the search query and then do an intersection of the sets. It works 
pretty fast.


Regards,
Oleg

On 2012-09-05 01:28:44 +, aaron morton said:

AFAIk if you want to keep it inside cassandra then DSE, roll your own 
from scratch or start with https://github.com/tjake/Solandra . 


Outside of Cassandra I've heard of people using Elastic Search or Solr 
which I *think* is now faster at updating the index. 


Hope that helps. 

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote:
Some one did search on Lucene, but for very fresh data they build 
search index in memory so data become available for search without 
delays.


On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote:
Dear Distinguished Colleagues:



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Text searches and free form queries

2012-09-04 Thread aaron morton

AFAIk if you want to keep it inside cassandra then DSE, roll your own from 
scratch or start with https://github.com/tjake/Solandra . 

Outside of Cassandra I've heard of people using Elastic Search or Solr which I 
*think* is now faster at updating the index. 

Hope that helps. 

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote:

 Some one did search on Lucene, but for very fresh data they build search 
 index in memory so data become available for search without delays.
 
 On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote:
 Dear Distinguished Colleagues:

Text searches and free form queries

2012-09-03 Thread Oleg Dulin


Dear Distinguished Colleagues:

I need to add full-text search and somewhat free form queries to my 
application. Our data is made up of items that are stored in a single 
column family, and we have a bunch of secondary indices for look ups. 
An item has header fields and data fields, and the structure of the 
items CF is a super column family with row-key being item's natural ID, 
super column for header, super column for data.


Our application is made up of a several redundant/load balanced servers 
all pointing at a Cassandra cluster. Our servers run embedded Jetty.


I need to be able to find items by a combination of field values. 
Currently I have an index for items by field value which works 
reasonably well. I could also add support for data types and index 
items by fields of appropriate types, so we can do range queries on 
items.


Ultimately, though, what we want is full text search with suggestions 
and human language sensitivity. We want to search by date ranges, by 
field values, etc. I did some homework on this topic, and here is what 
I see as options:


1) Use an SQL database as a helper. This is rather clunky, not sure 
what it gets us since just about anything that can be done in SQL can 
be done in Cassandra with proper structures. Then the problem here also 
is where am I going to get an open source database that can handle the 
workload ? Probably nowhere, nor do I get natural language support.
2) Each of our servers can index data using Lucene, but again we have 
to come up with a clunky mechanism where either one of the servers does 
the indexing and results are replicated, or each server does its own 
indexing.
3) We can use Solr as is, perhaps with some small modifications it can 
run within our server JVM -- since we already run embedded Jetty. I 
like this idea, actually, but I know that Solr indexing doesn't take 
advantage of Cassandra.
4) Datastax Enterprise with search, presumably, supports Solr indexing 
of existing column families -- but for the life of me I couldn't figure 
out how exactly it does that. The Wikipedia example shows that Solr can 
create column families based on Solr schemas that I can then query 
using Cassandra itself (which is great) and supposedly I can modify 
those column families directly and Solr will reindex them (which is 
even better), but I am not sure how that fits into our server design. 
The other concern is locking in to a commercial product, something I am 
very much worried about.


So, one possibility I can see is using Solr embedded within our own 
server solution but storing its indexes in the file system outside of 
Cassandra. This is not optimal, and maybe over time i can add my own 
support for storing Solr index in Cassandra w/o relying on the Datastax 
solution.


In any case, what are your thoughts and experiences ?


Regards,
Oleg

Re: Text searches and free form queries

2012-09-03 Thread Andrey V. Panov

Some one did search on Lucene, but for very fresh data they build search
index in memory so data become available for search without delays.

On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote:

 Dear Distinguished Colleagues:

Online text search with Hadoop/Brisk

2011-05-11 Thread Ben Scholl

I keep reading that Hadoop/Brisk is not suitable for online querying, only
for offline/batch processing. What exactly are the reasons it is unsuitable?
My use case is a fairly high query load, and each query ideally would return
within about 20 seconds. The queries will use indexes to narrow down the
result set first, but they also need to support text search on one of the
fields. I was thinking of simulating the SQL LIKE statement, by running each
query as a MapReduce job so that the text search gets distributed between
nodes.

I know the recommended approach is to keep a seperate full-text index, but
that could be quite space-intensive, and also means you can only search on
complete words. Any thoughts on this approach?

Thanks,

Ben

Re: Online text search with Hadoop/Brisk

2011-05-11 Thread Edward Capriolo

On Wed, May 11, 2011 at 11:19 AM, Ben Scholl brsch...@gmail.com wrote:
 I keep reading that Hadoop/Brisk is not suitable for online querying, only
 for offline/batch processing. What exactly are the reasons it is unsuitable?
 My use case is a fairly high query load, and each query ideally would return
 within about 20 seconds. The queries will use indexes to narrow down the
 result set first, but they also need to support text search on one of the
 fields. I was thinking of simulating the SQL LIKE statement, by running each
 query as a MapReduce job so that the text search gets distributed between
 nodes.
 I know the recommended approach is to keep a seperate full-text index, but
 that could be quite space-intensive, and also means you can only search on
 complete words. Any thoughts on this approach?
 Thanks,
 Ben

Brisk was made to me a tight integration of Cassandra Hadoop and Hive.

If you are looking to full text searches you should look at Solandra,
https://github.com/tjake/Solandra, which is an Cassandra backend for
the Solr/Lucene indexes.

Edward

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-08 Thread Jean-Christophe Sirot


On 03/07/2011 10:08 PM, Aaron Morton wrote:

You can fill your boots.

So long as your boots have a capacity of 2 billion.

Background ...
http://wiki.apache.org/cassandra/LargeDataSetConsiderations

http://wiki.apache.org/cassandra/CassandraLimitations

http://www.pcworld.idg.com.au/article/373483/new_cassandra_can_pack_two_billion_columns_into_row/



Thx, I haven't seen these wiki pages.

--
Jean-Christophe Sirot

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-07 Thread Jean-Christophe Sirot


Hello,

On 03/06/2011 06:35 PM, Aditya Narayan wrote:

Next, I also need to store the blogComments which I am planning to
store all, in another single row. 1 comment per column. Thus the
entire information about the a single comment like  commentBody,
commentor would be serialized(using google Protocol buffers) and
stored in a single column,


Is there any limitation/issue in having a signle row with a lot of 
columns? For instance, can I have millions of columns in a single row?


--
Jean-Christophe Sirot

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-07 Thread Aaron Morton

You can fill your boots.

So long as your boots have a capacity of 2 billion.

Background ...
http://wiki.apache.org/cassandra/LargeDataSetConsiderations

http://wiki.apache.org/cassandra/CassandraLimitations

http://www.pcworld.idg.com.au/article/373483/new_cassandra_can_pack_two_billion_columns_into_row/

aaron

On 8/03/2011, at 4:57 AM, Jean-Christophe Sirot 
jean-christophe.si...@cryptolog.com wrote:

 Hello,
 
 On 03/06/2011 06:35 PM, Aditya Narayan wrote:
 Next, I also need to store the blogComments which I am planning to
 store all, in another single row. 1 comment per column. Thus the
 entire information about the a single comment like  commentBody,
 commentor would be serialized(using google Protocol buffers) and
 stored in a single column,
 
 Is there any limitation/issue in having a signle row with a lot of columns? 
 For instance, can I have millions of columns in a single row?
 
 -- 
 Jean-Christophe Sirot

What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-06 Thread Aditya Narayan

What would be a good strategy to store large text content/(blog posts
of around 1500-3000 characters)  in cassandra? I need to store these
blog posts along with their metadata like bloggerId, blogTags. I am
looking forward to store this data in a single row giving each
attribute a single column. So one blog per row. Is using a single
column for a large blog post like this a good strategy?

Next, I also need to store the blogComments which I am planning to
store all, in another single row. 1 comment per column. Thus the
entire information about the a single comment like  commentBody,
commentor would be serialized(using google Protocol buffers) and
stored in a single column,
For storing the no. of likes of each comment itself,  I am planning to
keep a counter_column, in the same row, for each comment that will
hold an no. specifiying no. of 'likes' of that comment.

Any suggestions on the above design highly appreciated.. Thanks.

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-06 Thread Aaron Morton

Sounds reasonable, one CF for the blog post one CF for the comments. You could 
also use a single CF if you will often read the blog and the comments at the 
same time. The best design is the one that suits how your app works, try one 
and be prepared to change.

Note that counters are only in the 0.8 trunk and are still under development, 
they are not going to be released for a couple of months.

Your per column data size is nothing to be concerned abut.

Hope that helps.
Aaron 

On 7/03/2011, at 6:35 AM, Aditya Narayan ady...@gmail.com wrote:

 What would be a good strategy to store large text content/(blog posts
 of around 1500-3000 characters)  in cassandra? I need to store these
 blog posts along with their metadata like bloggerId, blogTags. I am
 looking forward to store this data in a single row giving each
 attribute a single column. So one blog per row. Is using a single
 column for a large blog post like this a good strategy?
 
 Next, I also need to store the blogComments which I am planning to
 store all, in another single row. 1 comment per column. Thus the
 entire information about the a single comment like  commentBody,
 commentor would be serialized(using google Protocol buffers) and
 stored in a single column,
 For storing the no. of likes of each comment itself,  I am planning to
 keep a counter_column, in the same row, for each comment that will
 hold an no. specifiying no. of 'likes' of that comment.
 
 Any suggestions on the above design highly appreciated.. Thanks.

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-06 Thread Aditya Narayan

Thanks Aaron!!

I didnt knew about the upcoming facility for inbuilt counters. This
sounds really great for my use-case!! Could you let me know where can
I read more about this, if this had been blogged about, somewhere ?

I'll go forward with the one (entire)blog per column design.

Thanks



On Mon, Mar 7, 2011 at 5:10 AM, Aaron Morton aa...@thelastpickle.com wrote:
 Sounds reasonable, one CF for the blog post one CF for the comments. You 
 could also use a single CF if you will often read the blog and the comments 
 at the same time. The best design is the one that suits how your app works, 
 try one and be prepared to change.

 Note that counters are only in the 0.8 trunk and are still under development, 
 they are not going to be released for a couple of months.

 Your per column data size is nothing to be concerned abut.

 Hope that helps.
 Aaron

 On 7/03/2011, at 6:35 AM, Aditya Narayan ady...@gmail.com wrote:

 What would be a good strategy to store large text content/(blog posts
 of around 1500-3000 characters)  in cassandra? I need to store these
 blog posts along with their metadata like bloggerId, blogTags. I am
 looking forward to store this data in a single row giving each
 attribute a single column. So one blog per row. Is using a single
 column for a large blog post like this a good strategy?

 Next, I also need to store the blogComments which I am planning to
 store all, in another single row. 1 comment per column. Thus the
 entire information about the a single comment like  commentBody,
 commentor would be serialized(using google Protocol buffers) and
 stored in a single column,
 For storing the no. of likes of each comment itself,  I am planning to
 keep a counter_column, in the same row, for each comment that will
 hold an no. specifiying no. of 'likes' of that comment.

 Any suggestions on the above design highly appreciated.. Thanks.

RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Vivek Mishra

You can use:
http://code.google.com/p/kundera/ to search text.
it provides a way to search by any key over Cassandra. I guess, nothing inbuilt 
is in place for this.

Vivek


From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 
[asdk...@gmail.com]
Sent: 12 February 2011 17:27
To: user
Subject: How can I implement text based searching for the data/entities/items 
stored in Cassandra ?

I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints  suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?


Thanks
-Asil



Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Vivek Mishra

Addtionally you can use cassandra indexes for specific search.

From: Vivek Mishra [vivek.mis...@impetus.co.in]
Sent: 12 February 2011 17:38
To: user@cassandra.apache.org
Subject: RE: How can I implement text based searching for the 
data/entities/items stored in Cassandra ?

You can use:
http://code.google.com/p/kundera/ to search text.
it provides a way to search by any key over Cassandra. I guess, nothing inbuilt 
is in place for this.

Vivek

From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 
[asdk...@gmail.com]
Sent: 12 February 2011 17:27
To: user
Subject: How can I implement text based searching for the data/entities/items 
stored in Cassandra ?

I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints  suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?

Thanks
-Asil

Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.

NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.

NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Re: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Shaun Cutts

There is/are lucandra/solandra: https://github.com/tjake/Lucandra
 
-- Shaun






On Feb 12, 2011, at 6:57 AM, Aklin_81 wrote:

 I would like to text search for some of Entities/items stored in the
 database through an AJAX powered application...Such that the user
 starts typing and he can get the hints  suggested items. This is
 implemented in SQL databases using the LIKE, is it possible to anyhow
 implement this in an application powered by cassandra ?
 
 How do I go forward to implement this feature, very much required for my case?
 
 Would I have to consider a MySQL DB for implementing this particular
 feature there, and rest in Cassandra ?
 
 
 Thanks
 -Asil

86 matches

Mail list logo